The Magic of Philology and Indexing

Tracking Drug Names Across Language, Time, Space and Knowledge Domains to Produce New Visions of Traditional Medicine

This is a guest post authored by Michael Stanley-Baker, Christopher S.G. Khoo and Faizah Zakariah. Find their information at the end of this piece 

Digital Humanities is akin to “critical thinking,” so vaunted in the humanities, because it allows us to reinterpret existing primary materials in new ways, according to researchers’ critical interests. These unanticipated new discoveries are somewhat akin to “discovering” existing archaeological materials “in the basement” of museums and archives—already there in the record, but previously unnoticed, quietly waiting to be brought to the fore. 

One fundamental way DH does this is by allowing us to re-index old materials in entirely new ways. Mining old texts and organising them according to critical interest is much more powerful than simple “comprehensive” or arithmetic summaries that reiterate old assumptions. It offers the potential to re-discover the past, to re-organise materials and explore them in different ways, making new connections, even without generating “new” information.

The Polyglot Asian Medicine investigates the history of Asian drugs using a philological orientation, by transforming print and manuscript publications into machine-actionable data.In this way it develops new ways to interact with the ancient past, connect it with the lived present, and possibly shape the future development of heritage medicines.  

In this post I describe how we modelled the interconnections between different domains of knowledge using tabular data initially, and produced a knowledge graph which allows users to search, navigate, and explore them to make novel discoveries. The digital medium is far more effective than print for reproducing the philological sophistication of local knowledge systems, while also allowing for links to rigorous, valid, modern scientific data. Through modelling and interconnecting different knowledge styles, we can begin to unpack the problems of the current state of ethnopharmacology – the lack of simplistic standardisation of these systems is not a bug, it is a feature.  The power of interlinked data and digital multi-media is that they allow us to connect these knowledge scenes without degradation of indigenous knowledge styles.

Verified and Updated Species

The drug name synonymy at the project is one example of how digital media can help us better map traditional and modern knowledge. A multi-lingual drug name synonymy, culled from existing print and online dictionaries of Malay and Chinese drug names, it contains

  • 42,000 Chinese drug names
  • 3,600 Malay names
  • 8,300 botanical species

We worked with Kew Gardens’ Medicinal Plant Name Services (MPNS) to verify and update the plant species named in the dictionaries according to the most recent modern botanical standards. They cite us as follows:

The synonymy thus not only collected existing data, but normalised it with corrected spellings, as well as updating older botanical synonyms found in early dictionaries with up-to-date scientific botanical terms.

We also collected the regions the plants grow, parts of plants used, and textual provenance of first mentions in the Chinese tradition. This involved extracting discursive dictionary entries into regularised computable data using python, regex and human checking.

Accounting for Pluralism with Critical Philology

The complexity of Chinese pharmacology beggars the imagination of print dictionaries, which figure a standardised, binary, one-to-one relationship between name and object. The periodicity, unconscious reduplication, differentiation of “primary” (zhuming 主名) and “alternate names” (bieming 別名), not to mention the regionality of the authors and local names makes for a much more complex ontology. The Polyglot Synonymy differs from the normative  approach used in herbal manuals and WHO national pharmacopoeii which often privilege a single species for practical uses and regulation. The Synonymy is descriptive and philological in concept, and seeks to represent the diversity of lived practice over historical time. Without making truth claims to fixed identities, it describes changes in name over time and sources.  It does so using a critical digital philology (Stanley-Baker and Chong 2019, 2023) approach, and documents which sources make which claims.

One way the synonymy applies critical digital philology is through the “footnote icon” which reveals the source of any claims when hovered over.  Researchers using the site are thus offered multiple botanical species that have been identified with the ethnonym(Link) by different sources, and at the same time can call out which sources make which claims. This is also visualised in the knowledge graph as different nodes.  Clicking on any of the species nodes provides citation data in the sidebar.

Fig 1. Bai Bu Botanical Names and Citation Call-out
Fig. 2 Bai Bu Primary Name and Scientfic Name nodes

Reconstructing Multiple Ontologies with a Knowledge Graph

The synonymy recreates the knowledge structure of Chinese pharmacology into a digital ontology. This was a complex and intellectually demanding because, as Lena Springer (2022) writes, Chinese pharmacology, or bencao 本草 culture in historical and modern China, is a composite of many different thought styles (Fleck 1935/1981) compiled over longue-durée editorial processes and incorporates knowledge of many different actor types, classes and levels of vernacularity. It encompasses the literary, the practical, the clinical, the economic, the geographic and the botanical.

The botanical names serve as an indexical link that connects across different languages. Where Chinese or Malay drugs are both attributed to the same species, these links appear in the Knowledge Graph. These links were not knowable before we updated and standardised the species names, following Kew’s guidance. Now we are able to connect across the different languages and traditions. In the example seen here, the scientific name for cloves, Syzygium aromaticum (L.) Merr. & L.M.Perry (link), connects to seven different Chinese names, and two Malay names, Buah chengkeh and cengkih.

The botanical terms also serve as a gateway  such as Gobal Biodiversity Information Framework  (GBIF) and Plants of the World Online   (POWO) which contain maps and images of the plants, the  Biodiversity Heritage Library  (BHL) which contains early editions of early modern scientific literature, and also to MPNS, which displays dozens of other languages for the species. This led to over 25 different data tables, organised as below. 

In order to remain true to the complexity and flexibility of this data, and in order to link across and represent different epistemologies, we adopted a visual exploration approach. This led us to develop the Polyglot Medicine Knowledge Graph implemented in a Neo4j graph database management system and visualized on a Web interface using Cytoscape.js. It is titled  “Knowledge Graph” because it makes use of a graph (i.e., network) representation of nodes and links (or edges) to represent entities/concepts and types of relationships. It is a representation of the knowledge architectures in historical pharmacology. Instead of focussing on fixed entities, the knowledge graph models the interconnections between different data and data types, making transparent how they are linked, and allowing users to make novel discoveries.

New Ways to Explore Historical Name Data

The Knowledge Graph allows new ways of indexing the Chinese Bencao tradition that did not exist before. Chinese drug dictionaries often do not index botanical species, and often introduce variant or mistaken spellings. Thus, it was not possible before to easily discover all the Chinese names related to a single species. We updated the botanical names in order to link to other languages and databases, but this had a secondary effect of improving exploration of the Chinese data itself. The Knowledge Graph can display all the ethnonyms for a species together, in one place, though they may have been scattered in different locations in the Chinese source dictionaries.

These names are just the beginning. The seven Chinese names of Syzygium aromaticum shown above are just those listed as Primary Names. Opening up one Primary Name reveals a host of nodes detailing Alternate Names, Regions where it grows, Common names in English, the Part of Organism used for that drug and the Provenance, or first recorded instance of the drug name. Some Alternate Names also have their own provenance records. The graph below shows the map just for dingxiang 丁香 alone. Each data type is shown along the edges.

All of the nodes have data like this, so the synonymy can inform dozens of data points and alternate names for different drugs made from that species. This allows for entirely new forms of exploration of the data.

The graph below displays all the expanded data associated with the species, while concealing the regional data, for clarity (a hide/reveal toggle is available in the display). A close reading of the graph shows that the different names represent different plant parts, indicated by the purple nodes: the flower, twigs, bark, fruit, the root, oil and a distillate. Revealed this way, we are now able to explore the rich material culture of this plant in Chinese medical history. This is also relevant for new drug discovery, as different parts of the plant contain different chemicals.

Not only can we discover what parts of plants were used, we can discover when they came into use, again out of a lucky augmentation that “seemed like a good idea at the time.” The provenance of a drug name has traditionally been recorded as the title of the text where the name first appeared. While we were compiling the provenance data, Feng Yuchen 馮宇晨 of NTU augmented this with a thoroughgoing bibliographic review and collection of dates of publication, author names and dates and even, where known, their place of birth and historical GIS data (GIS data will be posted in the next update). Now when users click on the provenance title (Green Nodes), it reveals background data in the information window, showing when that name first came into use.  

This introduces an entirely novel way to research the entry of new drugs into the Chinese pharmacopoeic tradition. For example, we can discover the introduction of different uses of the same species, cloves, over time:

Fruit (母丁香):  5th Century 雷公炮製輪
Seed (丁子香): 6th Century 齊民要術
Bark 丁香樹皮: 8th Century 海藥本草
Root 丁香根:    11th Century 開寶本草
Twigs 丁香枝:  16th Century 本草綱目
Essential Oil 丁香油/丁香露: 18th Century 藥性考/本草拾遺

Readers sensitive to the historiography of Chinese medicine will recognise the 8th century Haiyao bencao 海藥本草 which introduced many drugs from India, Persia and Central Asia; that the 16th century was in the midst of a massive influx of silver and goods via South East Asia and the Philippines; and that the 18th century was when European missionary medicine began to enter China.  Already from this simple timeline one can put together a potted history of the use of cloves in traditions other than the Chinese, the likely periods and vectors of contact, and use this as a framework for further research.

Data Confirmation

To clarify, I want to stress again that the Synonymy does not aim to make truth claims, but assembles the claims of other reference works. It makes no claims about provenance, but indicates the dictionary sources which do. Users can often find two different Provenances for a term, because they are listed in different dictionaries. For example, the provenance for 丁香 is listed as the 10th century Kaibao bencao 開寶本草 by the Zhonghua bencao 中華本草, whereas the the Zhongyao dacidian 中藥大辭典 cites the seventh century Yao Xing Lun 藥性論.

Users can pursue the question further by turning to the Drugs Across Asia_China database, which contains over 390 Chinese medical texts, all indexed by author/date.  They can search for the term there and may find earlier uses of the term.  There, the earliest mention of 丁香 is in the 華氏中藏經, attributed to Eastern Han physician Hua Tuo 華佗 (ca 110/140-207/208 CE). The authenticity of this text is itself debated, and a point for further research.

Thus the Synonymy is produced with the philologist and the researcher in mind, offering new ways to explore the data, assemble interesting hypotheses, provide leads for further research, and fully document all of the resources along the way. This is a large advantage on print dictionaries which fail to disclose the sources from which they reach their conclusions.  The Neo4j Knowledge Graph format is much more suited to navigating the complexities of Chinese pharmacological history, and testifies to the advantages afforded by Digital Humanities methods over print media.

___

Michael Stanley-Baker is an assistant professor in History and Medical Humanities at Nanyang Technological University, where he researches the histories of Daoism, Chinese medicine and their intersections with other Asian medical systems. You can follow him on X/Twitter, Academia, or here

Christopher S.G. Khoo is an associate professor at the Wee Kim Wee School of Communication & Information, Nanyang Technological University, where he researches in knowledge graph applications, natural language processing and academic writing. Follow his work on his blog, and his publications here

Faizah Zakaria is assistant professor in the Departments of Southeast Asian Studies and Malay Studies at the National University of Singapore. She works in the field of religion and ecology, with a particular interest in heritage Malay medicine. She tweets here and her website can be found at www.faizahzak.com 

Leave a comment