The China Biographical Database (CBDB): An Introduction and Conversation with Professor Peter Bol

The China Biographical Database

The China Biographical Database (CBDB) is ‘a relational database of biographical information for China before the early twentieth century.’ The February 2024 release contains biographical information on 535,181 individuals concentrated between the 7th and early 20th centuries. Users can browse this data either online, or offline by downloading the Access or SQLite databases from the CBDB website. As a ‘relational database,’ the CBDB is geared towards social network analysis (SNA) and prosopography. Both methods aim to identify patterns of relations among large numbers of people, but whereas prosopography looks for patterns among a broad range of relationships (kinship, social, economic, geographical, professional, institutional), social network analysis (SNA) tends to focus more narrowly on social relations. At one end of the spectrum, the CBDB can provide data in response to simple questions, such as, ‘Where does [named individual] come from?’ At the other end, the CBDB can return data for much more complex prosopographical queries: ‘Did marriage patterns among high-ranking civil officials differ from low-ranking officials in the Northern Song?’ ‘And did these patterns change over time?’ ‘How did the geographic extent of social relations among officeholders shift during the Ming Dynasty?’  

Today, the CBDB project is coordinated by an international and multi-institutional executive committee chaired by Professor Peter Bol at Harvard University. In this article, I’ll briefly introduce the database before speaking with Professor Bol about CBDB’s development and future plans.

Visualization of CBDB’s biographical coverage by period (https://inindex.com/biog)

Prosopography Databases

Developments in digital database competencies over the past few decades have produced increasingly sophisticated digital prosopography tools. Today, the CBDB is one among several biographical databases freely available online: Prosopography of the Byzantine World (PBW), Jaina Prosopography (for the Digital Orientalist guide, see link), Prosopographical Database of Indic Texts (PANDIT) (see the Digital Orientalist article here), and Prosopography of Anglo-Saxon England (PASE). However, each of these databases has slightly different emphases owing to differences in editorial priorities and source materials. Whereas the CBDB aims to provide biographical data for recorded inhabitants of pre-early 20th century China, the Prosopography of the Byzantine World project aims to provide biographical data on every name mentioned in sources from the Byzantine World. Only the former aims to provide useful statistical information for the study of society at large.

Differences in emphases also arise from the specific sources these databases draw upon. For instance, the CBDB sources include modern syntheses of biographical data (e.g. The Index for Biographical Materials on Song Dynasty Figures 宋人傳記資料索引), traditional biographical records (e.g. Official Dynastic Histories正史列傳, Epitaphs 墓誌銘), social associations from literary collections, evidence for office holding from modern and traditional sources (e.g. Chronologies of Prefectureships 郡守年表), as well as other databases (e.g. Ming-Qing Women’s Writings). Consequently, the CBDB contains richer biographical data for elite, high-ranking male officials.

The Structure of The CBDB

The CBDB User Guide provides a detailed description of the database’s structure and organization, which I briefly summarize here. The database consists of three basic kinds of code tables:

1) Code tables describing ‘entities’ (e.g. people, places, social institutions, texts, bureaucratic organizations, etc.) and their attributes (e.g. for people: gender, date of birth, date of death, ethnicity, etc.) ‘People’ constitute the focal entity around which other information is organized.

2) Code tables describing relations between entities (e.g. places-place relations might include: people-bureaucratic office relations, people-text relations, etc.)

3) Code tables describing types of relations between entities (e.g. descriptions of people-place relations might include: birthplace, a place they moved to, the place they were buried)

This structure allows sophisticated modelling of people’s relationships with their social, kinship, geographic, and institutional environments. Users can frame a query in terms of relations with the central entity—people. For example, a user investigating whether certain kinds of offices were hereditary in specific periods can use the database to find out whether people related to certain officeholders also held those same offices during a given period.

Guides and Tutorials

In addition to the 150-page User Guide, the CBDB provides a helpful series of videos on its YouTube page. This page contains links to a comprehensive set of database tutorials and recorded workshops, covering topics including how to download and install the database, how to conduct database searches based on specific biographical queries (official position, family and social relationships), and how to export search results to QGIS and Pajek, among others.

CBDB Development and Future Directions

The CBDB began with the work of Song Dynasty social historian Robert Hartwell (1932-96), who bequeathed his biographical database of 25,000 Song civil servants to the Harvard Yenching Institute (HYI) in 1966. Since then, the database has undergone reformatting, restructuring, and expansion.

When asked about the likelihood of expanding the database’s temporal scope, Professor Bol explained work is already underway to include entries from the Six Dynasties (220-589 CE) and pre-Qin (221 BCE) periods. Still, there remain challenges with extracting sufficiently granular information for Han dynasty official titles. A different set of challenges confront attempts to expand into the 20th-century Republican (1912-49) and PRC (1949-) Periods. According to Professor Bol, adding basic information like office titles for the Republican Period would be relatively straightforward. However, new kinds of code tables would need to be added to account for the institutional relations that became biographically significant during this period – for instance, university and high-school affiliations, affiliations to banks or other major companies.

The CBDB will not be adding biographical data from the PRC period for different reasons: “The PRC we cannot do, and cannot do for a very simple reason. From our perspective, serious prosopographical research requires kinship.” For obvious reasons, the collection and publication of kinship data for PRC officeholders will remain politically problematic: “The leaders don’t want their kinship known, in contrast to the past where people were extremely proud of their kinship relations and wanted it to be known.”

In conversation, Professor Bol explained that three kinds of collaboration have shaped the CBDB into its current form: collaboration with database builders who have contributed data to the CBDB, collaboration with individual scholars who want to develop databases for their own fields, and collaboration with online platforms (i.e. MARKUS). These types of collaboration are expected to yield new additions in the near future. Among these, Professor Bol described the addition of biographical information on some 10,000 doctors, 6000 individuals recorded among dictionaries, and some of the several million biographical records associated with office incumbents in Qing court reports. 

References:

“China Biographical Database Project (CBDB).” Home. Accessed March 3, 2024. https://projects.iq.harvard.edu/cbdb/home.

Fuller, Michael A, ed. 2023. The China Biographical Database User’s Guide. China Biographical Database Project: Harvard University, Academia Sinica, Peking University. https://projects.iq.harvard.edu/sites/projects.iq.harvard.edu/files/cbdb/files/cbdb_users_guide.pdf.

Leave a comment