The Database of Medieval Chinese Texts: A Critical Overview (part one)

This is a guest post by Laurent Van Cutsem (Ghent University).
Part 2 can be read here.

Part I: The Database of Medieval Chinese Texts. A Critical Overview


Scholars of pre-modern Chinese Buddhist texts nowadays largely rely on two collections of electronic texts: (1) the SAT Daizōkyō Text Database 大正新脩大蔵経 (SAT版);1 and (2) the Chinese Buddhist Electronic Text Association 中華電子佛典協會 (CBETA). Over the last few decades, however, researchers have become increasingly attuned to the potential shortcomings of relying solely on these two collections.

The most obvious drawback of SAT and CBETA is that these provide little to no information regarding the original extant witnesses of the texts that we purport to study, obscuring both the codicological and scribal characteristics of these precious historical documents. Exquisite proofread copies of Buddhist sūtras produced in Chang’an 長安, manuscripts copied hurriedly by students in Dunhuang 敦煌, prints of woodblocks carved out as part of the second Goryeo 高麗 Buddhist canon, and so forth all converge under a sanitized and homogenized form.3

The second major source of concern is that these otherwise convenient collections are the digital fruits of large-scale editing projects, some of which effectively began around a century ago. It is not surprising therefore that these editions would contain a certain number of transcription mistakes and imprecisions in the punctuation, especially when it comes to edited manuscripts. For instance, the Taishō edition (T.2861, vol. 85) of the Quanzhou Qianfo xinzhu zhuzushi song 泉州千佛新著諸祖師頌 (Or.8210/S.1635) contains over fifty transcription mistakes, not to mention the problematic punctuation of the preface by Huiguan 慧觀 (d.u.). Similarly, Wendi L. Adamek has noted that the Taishō edition (T.2075, vol. 51) of the Lidai fabao ji 歷代法寶記 (Pelliot chinois 2125; Or.8210/S.516) contains quite a few mistakes.4

This is where the Database of Medieval Chinese Texts (DMCT) comes into play. More specifically, the DMCT aims at answering some of these concerns where it is currently most needed, namely the Dunhuang manuscripts.

Figure 1: Homepage of the DMCT

The DMCT is a collaborative project of the Ghent Centre for Buddhist Studies (GCBS) at Ghent University and the Chung-Hwa Institute of Buddhist Studies 中華佛學研究所 (CHIBS). It was initiated in 2014 by Christoph Anderl (ed.-in-chief; Ghent Univ.) in collaboration with Joey Hung 洪振洲 (CHIBS) and with the help of Lin Ching-Hui 林靜慧, Marcus Bingenheimer (Temple University), and Chang Po-Yung 張伯雍, all affiliated or previously affiliated with the Dharma Drum Institute of Liberal Arts 法鼓文理學院 (DILA). The technical aspect of the DMCT, now MySQL-based, has been chiefly operated by Christian Bell and Jan Schrupp.5 The project was funded by the Research Foundation – Flanders (FWO), the Special Research Fund (BOF) of Ghent University, CHIBS, and the Tianzhu Charitable Foundation 天柱慈善基金會. Partner institutions or projects include the Dunhuang Academy 敦煌研究院, Kyōto University 京都大学, and FROGBEAR.

Overview of the Functions of the DMCT

The DMCT currently consists of the three following components:

  1. The “Variants” module (under the “Databases” tab; see Figure 2): this module allows users to search for the so-called “variant character forms” (yitizi 異體字) of a written word based on modern “standard characters” (guifanzi 規範字 or zhengzi 正字) among medieval Chinese textual witnesses, mostly from Dunhuang. This database is particularly welcome as it helps contextualize a variant character in a given manuscript against a much larger dataset than what is provided, for example, by the Jiaoyubu yitizi zidian 教育部異體字字典 (Dictionary of Variant Characters of the Ministry of Education), or dictionaries such as the Dunhuang suzidian 敦煌俗字典 (Dictionary of Demotic Characters from Dunhuang) of Huang Zheng 黄征. Scholars can therefore have a better idea of how common a given variant was in Dunhuang manuscripts, when this specific form began to be more frequently used, and what other forms were commonly found in these manuscripts. While the “Variants” module was not initially developed with the intention to provide data for quantitative research on variants, the editors have the ambition to gradually improve the soundness of the dataset and curate it over the following years.

Figure 2: “Variants” module of the DMCT. The graph displayed is a common variant of 學 (xué, “to study”).

  1. The “Bibliography” module (under the “Databases” tab; see Figure 3): this module contains a list of bibliographic references related to medieval Chinese texts, with a focus on Chinese Buddhist sources, Chan Buddhism, and Chinese linguistics. Due to the lack of resources and the focus of the editors on the “Variants” and “Texts” modules, the “Bibliography” section is still under construction and has not been updated in several years.

Figure 3: “Bibliography” module of the DMCT

  1. The “Texts” module (corresponding to the “Texts” tab; see Figure 4 and Figure 5): this module contains good to excellent quality TEI (Text Encoding Initiative)-based editions of medieval Chinese Buddhist texts. The DMCT offers diplomatic and regularized editions of specific textual witnesses, with a focus on the manuscripts discovered in Cave 17 at Dunhuang (e.g., “Sūtra lecture texts” 講經文, “Transformation texts” 變文, Chan Buddhist texts). In addition to projects specific to the DMCT (marked-up by Lin Ching-Hui), the Zutang ji 祖堂集 (K.1503) and related texts (marked-up by Laurent Van Cutsem), the “Texts” module integrates the excellent editions of early Chan texts produced by Marcus Bingenheimer and Chang Po-Yung.

Figure 4: The “DMCT project” part of the “Texts” module of the DMCT

Figure 5: The “DMCT project” part of the “Texts” module of the DMCT: edition of Pelliot chinois 2187.

Besides the modules made publicly accessible, the “Databases” tab contains several supplementary modules under development. For instance, the DMCT presently includes modules for “Syntax,” “Sentence analysis,” and “Chan phrases” that will be made available to users at a later stage. The latter, which developed from a research project of Zeng Chen 曾辰 (Sichuan University 四川大學; Ghent University) on four-character phrases in Chan Buddhist texts, should be released shortly. Another module on phonetic borrowings or “loan characters” (tongjiazi 通假字) in Dunhuang manuscripts is in preparation.6

In summary, the DMCT at present offers two valuable modules—the “Variants” and “Texts” modules—that are useful for scholars working with pre-modern Chinese (Buddhist) sources. The extended and growing dataset of variant characters is of great help to those researching or editing Dunhuang manuscripts and texts copied in more informal contexts. The “Texts” module, on the other hand, includes good quality TEI-based editions of mostly “non-canonical” texts based on extant textual witnesses. It is particularly strong with regard to Chan literature before the 11th century and contains some of the most authoritative editions of these texts. These two modules will be reviewed in more detail in subsequent contributions.


  1. The SAT Daizōkyō Text Database has most importantly digitized the early 20th century Taishō shinshū daizōkyō 大正新脩大蔵経 (Takakusu Junjirō 高楠順次郎 et al. 1924–1932), with additional materials.
  2. CBETA consists primarily of the Taishō shinshū daizōkyō (with revisions), the Shinsan Dai Nippon zokuzōkyō 新纂大日本續藏經 (Kawamura Kōshō 河村孝照 et al. 1975–1989), the Dazangjing bubian 大藏經補編 (Lan Jifu 藍吉富 1984–1985), and other digitized collections.
  3.  In recent years, SAT and, to a minor extent, CBETA have been addressing some of these issues by making accessible high-quality digital reproductions of the original Taishō edition and other historical collections through the implementation of IIIF (International Image Interoperability Framework) or simply by providing links to scans of the digitized collections.
  4. See Adamek (2007, 409, n.24).
  5. For more information about the technical framework of the DMCT, see Anderl (2020, 343–44).
  6. For more information on the modules in preparation, see Anderl (2020, 344 and 353–56).


