This post was co-authored with Prof. Du Chunmei (Lingnan University, Department of History).
History can feel like a silent discipline, a collection of names, dates, and events confined to the pages of textbooks. The contentious and deeply human dialogues that shaped our world easily disappear behind official accounts and polished narratives. Can we bridge this divide and invite students not just to learn about the past, but to engage with it? What if we could ask Sun Yat-sen about his Three Principles of the People, or debate the merits of the 1911 Revolution with a skeptical Gu Hongming, as though they were alive again, ready for a chat?
At Lingnan University, we explored this question through the project “Innovating History Education Through AI Applications,” funded by a Teaching Development Grant (102734). Implemented in both undergraduate (HST 2203: History of Twentieth-century China) and graduate (DHG 502: Digital Approaches in Historical Research) courses, HistFig leverages a custom-built AI system to create interactive dialogues with key figures from Chinese history. Rather than simply asking a generic chatbot for facts, students posed questions to AI-simulated personas of Li Hongzhang, Lu Xun, Sun Yat-sen, Qiu Jin, Gu Hongming, Mao Zedong, Deng Xiaoping, and others, receiving answers grounded in the figures’ own writings and other historical documents.

Figure 1. HistFig chat panel.
The core technology driving this experiment is Retrieval-Augmented Generation (RAG). Unlike standard Large Language Models (LLMs) that can “hallucinate” or invent information, RAG systems are designed to ground their responses in specific data. In our case, when a student asks the AI-simulated Sun Yat-sen about the 1911 Revolution, the model first retrieves relevant passages from his speeches, essays, and letters before generating an answer. This process anchors the conversation in historical evidence, transforming the AI from a creative storyteller into a history-infused conversational partner.
In this post, we explore the technical framework of our system and reflect on its initial impact on historical research, counterfactual imagination, and student curiosity, drawing on anonymized reflections from participating students. We also bring up a few points to consider regarding the technology itself.
Building the “Time Machine”: A Look Under the Hood
RAG is a field of active research; there are multiple ways to enhance the quality and performance of retrieval based on this technology. While RAG has recently seen multimillion-dollar investments, effective systems can be built with modest resources and open-source components. Our system is a web application built on Flask, featuring a hybrid search engine that combines semantic and lexical retrieval to find the most relevant document excerpts for any given question:
- Vector Search (Semantic Meaning): We use a locally hosted ChromaDB as a vector database and an open-source embedding model (Qwen3-Embedding-0.6B by default) to convert text chunks into 1024-dimensional geometric representations (vectors) that capture their meaning. This allows the system to measure cosine similarity between the user’s query and the documents, finding texts that are semantically related even if they don’t share the exact same keywords.
- BM25 Lexical Search (Keyword Matching): In parallel, we use the BM25 algorithm to perform a traditional keyword search. This ensures that queries containing specific historical terms, like “Cultural Revolution” or “Three Principles of the People,” match documents containing those exact phrases. For multilingual support, particularly with Chinese texts, we use the jieba library for word segmentation and NLTK for English lemmatization, utilizing both unigrams and bigrams.
- Reciprocal Rank Fusion (RRF): The results from both search methods (semantic and keyword) are combined using RRF, a technique that merges the two ranked lists. The top N documents are then “retrieved” from the database to support the query.
We also employ query augmentation: when a user sends a message, the system enriches it through an internal API call to broaden the search. This augmented query is used only to generate better contextual representation (embedding) and more keywords for the purpose of document retrieval; the original message is sent to the chatbot to maintain a realistic dialogue. The most relevant text chunks are then injected into the prompt alongside the personality instructions and conversation history. The large language model (accessed, in our case, via the Poe platform) generates a response, which is streamed back to the user. We use GPT‑5 via an API and GPT‑OSS for local inference.
While the generative model itself remains a black box, the RAG architecture built around it has been designed with a pedagogical goal in mind: making the retrieval process transparent. The retrieved source documents are displayed alongside the AI’s answer, together with their cosine similarities and BM25 query terms, allowing students to see and evaluate the evidence informing each response. We thus encourage students to think critically about the relationship between a claim and its source.

Figure 2. An example of a retrieved document (abridged).
Finally, a built-in admin panel allows the instructor to add and edit historical figures, upload documents, and define a specific “personality prompt,” a short instruction that guides the LLM on how to embody that person’s tone, thought, and linguistic style. This moves the interaction beyond simple Q&A to a form of role-play, where the AI’s “voice” is as important as the information it conveys.
The Student Experience: Curiosity and Counterfactuals
The primary goal of this experiment was to see if such a tool could deepen student engagement with historical materials and examine the biases inherent in both machines and humans. In this first iteration, we limited the uploaded documents to canonical collections, scholarly monographs, and longer Wikipedia articles. The reflection assignments suggest three key outcomes.
First, the tool sparked genuine curiosity. As one student put it, the in-class workshop revealed how advanced technology “enables us to learn the ‘old’ through the ‘new’” and “brings historical figures to life.” The students visibly enjoyed talking to the virtual figures, especially the former political leaders, and discovered documents they might have otherwise overlooked given the sheer volume of the datasets.
Second, students were quick to notice the distinct voices of each figure. One reflection highlighted the contrast between Qiu Jin’s poetic, manifesto-like style and Sun Yat-sen’s more direct, policy-oriented pragmatism. Another student compared Lu Xun’s view that China’s weakness was cultural with Sun Yat-sen’s belief that it was political. By placing these figures in direct conversation, the tool made their worldview differences much more tangible, facilitating comparative analysis.
Third, and perhaps most interestingly, the project opened a space for counterfactual thinking. What would Qiu Jin, who was executed by the Qing government in 1907, have thought about the outcome of the 1911 Revolution? And what would Deng Xiaoping think about China today? Drawing on known writings, the RAG system can generate a plausible, in-character response. While not “truth,” this exercise serves as an innovative form of historical thinking.

Figure 3. A screenshot of the admin panel, where the instructor can add and modify historical figures.
Unmasking Bias: A Critical Learning Opportunity
The system’s limitations became a central part of the learning process, perhaps even more so than the novelty it brought.
Several students noted that the AI tended to simplify complex figures into stereotypical images, such as “the perpetually cynical Lu Xun.” This observation led to a discussion about how AI models, trained on finite data and steered through system prompts, can flatten the contradictions and evolutions present in a real human life. Participants also noted a “Western-centric” or “modernist” bias in the responses. For instance, the fake Qiu Jin sometimes employed contemporary feminist terminology (e.g., “gender equality laws”) that, while spiritually aligned with her life, felt anachronistic in vocabulary. Similarly, the Mao Zedong bot was described as being “too consistent” and “self-aware,” lacking the chaotic nature of his actual historical evolution. This highlights a limitation of the underlying LLMs: even when grounded in historical documents, the models are pre-trained on vast, largely Western, modern datasets, which can bleed into the simulation.
Students also recognized their own role in shaping the output. As one reflected, asking the simulated Mao about “capitalist roaders” (走資派) might have “made the AI tilt to the extreme revolutionary line.” Several students admitted that when the AI gave an answer they expected, they accepted it, but when it challenged them, they scrutinized it. This highlighted the danger of using AI to confirm pre-existing stereotypes (e.g., Gu Hongming as only a conservative, or Lu Xun as only a cynic). The system configuration itself also played an important role in shaping the generated messages. For example, a higher “temperature” parameter made the figures more emotional and engaging but led to “hallucinations” or exaggerated responses that drifted from the facts. At times, it was difficult to control the length of the responses; a simple “how are you?” could elicit long historical reflections on China’s past.
Finally, the AI’s worldview was constrained by the uploaded documents. For instance, the Gu Hongming bot could not discuss his interpersonal relationships (like his friendship with Cai Yuanpei) because those specific letters hadn’t been uploaded, leading the AI to rely on generic assumptions rather than biographical detail. A related challenge was language: while all analyzed figures were Chinese speakers, both academic courses were conducted in English; some students conducted the dialogues in English, but when the retrieved materials contained Chinese characters, the bot would toggle between languages, breaking the illusion.
A Tool for Asking (Better) Questions?
Our experiment with RAG in the history classroom proves that AI can be far more than a search engine, though important limitations remain. Preliminary findings suggest that RAG might actually work better in research environments for tasks such as “talking to your sources” rather than simulating historical personas. While gamifying history helps provoke student participation, the opacity of the model, its prompt sensitivity, the impact of pre-training data, and the multiplicity of config variables can easily distort the historical record without proper oversight. As such, applications like HistFig might be a better choice for professional historians and advanced graduate students who are well read in primary materials.
Another risk is the dilution of historiographical focus. A key question that should be discussed upfront is “who owns the prompt”: the model designers, through alignment techniques, the teacher, through system-level instructions, or students, through their questions. At the same time, students must be reminded that they are chatting with probabilistic machines with a softmax function under the hood, not real historical personas. Yet foregrounding all these technicalities can easily turn a history course into an NLP one; balancing these two aims is not straightforward. In my previous post, I emphasized the infrastructural bent of East Asian DH, with scholars focusing on the digitization of materials and the creation of databases. It remains to be seen whether a sophisticated tool like RAG becomes a genuine source of new perspectives or whether it is just the next step in the “platformization” of the humanities, creating ever more advanced systems that have little impact on the questions we ask of the past.
Despite all these issues, our attempt to spark students’ curiosity and encourage in-class engagement has shown great promise. The feedback, including suggestions for future courses to include group debates between AI figures (imagine Mao and Qin Shi Huang talking to each other), to generate real-time deepfake videos, or to add more underrepresented voices, shows a clear appetite for this new mode of learning.
The authors would like to thank Dong Jiahui (PhD Candidate in History, Lingnan University) for his contribution to the project.
Cover Image: ByteDance, Seedream-4.0, generated on Dec 18, 2025. Prompt: “Create a Cyberpunk-inspired image to accompany the following article. It should depict someone seated in front of a computer screen, with the historical figures mentioned in the article appearing behind it.”
