The technology of Natural Language Processing (NLP) identifies named entities, which are words or phrases referring to specific people, places, organizations, and other entities, in the text. In Digital Humanities, NLP helps to select needed data for specific projects, such as text analyses, social network analyses, and mapping projects. While recognizing and tagging named entities may be achieved by a range of methods including GATE and various Python libraries, MARKUS, a platform especially designed for classical Chinese, certainly improves efficiency and accuracy for scholars in need. Centered on the function of marking up named entities and with a range of tools, MARKUS is a web-based platform designed for annotating and visualizing texts written in classical Chinese. As noted by the principal developer, historian Hilde De Weerdt, MARKUS was created to respond to the “data discovery, visualization or text analysis tools” found largely unavailable in digitized Chinese texts (De Weerdt 2018).
As an avid practitioner of digital scholarship who well understood the users’ needs, De Weerd’s team has simplified the task of marking up named entities in classical Chinese. Once users upload their text to MARKUS, they can choose from different markup options. (fig. 1) “Automated markup” employs built-in models referencing personal names, place names, temporal references, and official titles, which are linked to a wide range of authoritative databases, providing impressive accuracy. Users can also opt to use the “manual markup” function for more freedom in identifying named entities. Additionally, advanced users may explore the “keyword markup” option to provide a list of keywords or regular expressions based on their specific needs. While each project begins by selecting one of these options, users are allowed to easily switch between different modes for flexibility and customization via the hidden menu on the left-hand side of the project page. (fig. 2)
Fig 1. Three markup functions offered by MARKUS
Fig 2. Switching between markup modes on a working page
While the MARKUS results can be further explored as the user wishes, by logging into an account, MARKUS allows its users to directly export MARKUS files to several websites for further exploration, such as COMPARATIVUS, DOCUSKY, Palladio, and PLATIN. Taking the comparison platform COMPARATIVUS for example, it compares two texts to identify meaningful overlaps. The identified passages can be displayed in the two texts, as a table comparison, or visualized through a pie chart. This approach provides a direct, statistical summary of the similarities between the two texts, automatically displaying visualizations and figures such as the visual location of the overlapped passages and the similarity ratio of each occurrence. (fig. 3) This comparative approach is particularly well-suited for the pre-modern Chinese hermeneutic tradition, where citing existing works was frequent among scholars, providing potential methods and perspectives for future projects.
Fig 3. Overlapped passages displayed as a pie chart in COMPARATIVUS
MARKUS is an excellent choice for beginners due to its user-friendly features. The platform has an intuitive user interface, making it easy to navigate, with clear instructions. The homepage of MARKUS directs new users to tutorial videos and a forum that allows for the sharing of individual projects and experiences. Additional Internet resources are also readily available. On the one hand, the developers’ effort in promoting the platform offered the audience both the visions and technical guidance in the form of articles, presentations, and interviews (De Weerdt 2000). On the other hand, the excellence of MARKUS has been recognized by scholars in recent years, with numerous articles, introductions, and reports published based on their experience with the platform, creating a helpful resource pool (Chen 2016; Yu and Li 2022).
MARKUS creates an inclusive environment that is accessible to scholars with varying technical backgrounds, and it does not require specialized coding or programming knowledge. It provides convenience for daily workflow with digitized texts, benefiting a wider and general audience. With MARKUS, users can read and explore texts, add comments to passages (requires logging into an account), or reference dictionaries. MARKUS may be treated as a convenient platform that increases productivity. For example, linking the name of a historical figure to CBDB (the China Biographical Database) facilitate researchers in their works. (fig. 4)
Fig 4. Example of biographical search with CBDB
While the sustainability and development of any platform are restricted by funding and resources, there are certain aspects that users expect MARKUS to grow. One noticeable limitation is the platform’s incapability to work with browsers other than Google Chrome. Additionally, the platform struggles with recognizing named entities that are not incorporated into its pre-built models, which sometimes requires additional work from the user. Finally, MARKUS currently performs well for projects of individual researchers, but enabling collaborative work would make it a more convenient platform for group projects.
Overall, MARKUS offers impressive and valuable tool sets for scholars who work with digitized classical Chinese texts. Its ability to recognize and tag named entities, combined with its customizable analysis tools and ease of use, make it an excellent platform for scholars of all technical backgrounds. It is highly recommended that users create a free account to experience its full functions, such as saving projects and commenting on passages. The recent update, which includes a Korean plug-in, enables translingual digital research, a development that will be explored in an upcoming post by Elizabeth Lee at The Digital Orientalist. Stay tuned for this exciting update!
Chen, Song. “Why Humanists Should Fall in Love with “Big Data,” and How?” Humanities Visualization, March 20, 2016. http://humnviz.blogs.bucknell.edu/welcome/why-humanists-should-fall-in-love-with-big-data-and-how/
De Weerdt, Hilde. “The Uses of Digital Philology in Tang-Song History – Part 2.” MARKUS forum, Nov. 4, 2018. https://app.chinese-empires.eu/forum/topic/31/the-uses-of-digital-philology-in-tang-song-history-part-2
De Weerdt, Hilde. “Creating, Linking, and Analyzing Chinese and Korean Datasets: Digital Text Annotation in MARKUS and COMPARATIVUS.” Journal of Chinese History 中國歷史學刊, Volume 4, Special Issue 2: Digital Humanities, July 2020: 519 – 527.
Yu, Yaxiu and Xin Li. “Research on Text Annotation Method of Ancient Works from the Perspective of Digital Humanities: A Case Study on MARKUS.” Big Data Research, 2022, 8 (6): 15-25.