The Books in China Database

This is a guest post by Prof. Joseph Dennis.


The Books in China Database (BIC) is designed to address a difficult problem in historical fields for which much of our knowledge comes from extant books: We can read about a book, but often have little understanding of how widely it circulated, when it was produced or how important it was in relation to other books. The project’s goal is to combine computing power with scholarly research to create data sets of books and associated information tied to specific places and times, then use that data in an open access online tool for researching and visualizing historical book circulation. 

BIC was created by Joseph Dennis, Professor of History at the University of Wisconsin, and the Local Gazetteers Research Group in Department III of the Max Planck Institute for History of Science (MPIWG), directed by Professor Dagmar Schäfer, with Senior Research Scholar and Digital Content Curator Dr. Chen Shih-Pei and IT Developer Calvin Yeh. The underlying data set was produced using Local Gazetteer Research Tools (LoGaRT), software developed by MPIWG that makes possible semi-automated tagging of texts found in 4,410 digitized local gazetteers provided by the Staatsbibliothek zu Berlin and Harvard-Yenching Library. Most BIC data was extracted from lists of books owned by institutions and commemorative records of book acquisitions. (This project was presented at the 2022 DO conference, you can watch the talk here.)

BIC has 32,387 lines of data on books held in 706 school libraries, local government offices, and other sites from the fourteenth to the early-twentieth centuries. Of these, 630 were schools, thirty-eight were government administrative offices, and thirty-eight were other locations. For each book there is a “Storage Record” consisting of associated information. All Storage Records include time and location information that makes possible chronological visualizations and spatial mapping. Additional information, such as the manner of acquisition, number of volumes, author, format, and publisher, is included when available in the original source. There are 10,746 distinct titles in the data set, including variant titles for the same text. BIC is a platform to which additional data sets can be added in the future. 

Data Extraction and Processing for BIC.

Most data for BIC was gathered using LoGaRT’s “Section Search,” which makes targeted searching within named gazetteer sections possible. For example, suppose you wanted to search for medical prescription/formula books that have fang 方 in their title. If you do a full-text search in the 4000 titles of Zhongguo fangzhi ku 中國方志庫 you get over one million hits, far too many to examine. But if you use LoGaRT Section Search to extract lists of books and combine them into a data set, then it is easy to find the medical books. Searching book titles for fang 方 in BIC results in 130 hits. A scholar can then manually exclude non-medical titles and be left with a list of about 50 works. This method has the added benefit of revealing titles that a researcher never knew existed.

After book lists were found they were tagged in LoGaRT. The primary tag “book” (shu 書) used to organize the data, one line of data per book. LoGaRT users can choose existing tags or compose their own using regular expressions. 

Because the ability to search for titles is critical to this project, book titles were made into two data columns, “original” and “refined.” This makes it possible to search BIC for both actual and expected titles. The original title is as it appears in the digitized source, including missing, obscured, or mistaken characters. When such characters are identifiable in other sources the refined title includes corrections. For titles using a variant character in the original, the refined title uses the standard character as listed in the Ministry of Education Variant Character Dictionary (jiaoyubu yitizi zidian 教育部異體字字典). 

Determining Book Locations and Dates

For BIC to produce spatial and chronological visualizations a location and date range had to be provided for each book. Locations include the institution housing the book and geocoordinates. Most geocoordinates were assigned automatically by LoGaRT using latitude and longitude data from Harvard’s China Historical Geographic Information System. These coordinates correspond to the administrative seat of the territory that is the subject of the gazetteer. Confucian schools and government offices were in the territorial seats, so CHGIS coordinates are typically close to the school location. In some cases, however, coordinates had to be researched and added manually. 

Book dates were determined through tagged date information, additional research, and application of a set of dating rules. Each book in BIC has been assigned a Start Year and End Year to enable time mapping. In some cases, the dates are precise while in others they are educated guesses. The data set contains time information in nine fields:

  • Known Accession Date
  • Acquired By Date
  • Acquired After Date
  • Known Extant Date
  • Date Lost By
  • Known Loss Date
  • Known Serial Acquisition Dates
  • Approximate Acquisition Date
  • Approximate Loss Date

Using BIC in Research.

The following section provides examples of how BIC can be used in research.

1. Researching a single title.

Suppose you wanted to use BIC to research the history of the Jifu tong zhi 畿輔通志, the gazetteer of Zhili 直隸 Province, which had three Qing editions: 1683, 1735, and 1884. If you do a Basic Search in BIC, you get thirty-eight records from thirty-two libraries. These results can be viewed as a data table, timeline, or map. The data table shows that thirty-one of the gazetteers containing the records were published between 1735 and 1884, and that many other titles on the book lists where the Jifu tong zhi appears can be dated to that period. Thus, the data table suggests that the majority of records are for the 1735 edition. 

The timeline view shows that the Jifu tong zhi had two peaks in school libraries, one in the 1760s and one in the 1890s, not long after publication of the 1735 and 1884 editions. 


Description automatically generated

Figure 1. Distribution of the Jifu tong zhi over time.

BIC’s mapping tool adds spatial information. Figure 2 below shows school libraries in Zhili Province that owned a copy of the Jifu tong zhi. The pattern suggests that decisions to acquire it were made at the prefectural and county level; it was not issued to all schools by the provincial or central government. 


Description automatically generated

Figure 2. Distribution of the Jifu tong zhi.

2. Researching Multiple Titles.

Suppose you were interested in finding the most common books in schools connected to gender. You could start by searching for terms such as 女, 男, 婦,妻, 妾, etc. The most common books having one these terms are Jiaonü yigui 教女遺規 (Bequeathed Rules for the Teaching of Girls) by Chen Hongmou 陳宏謀, and Lie nü zhuan 烈女傳 (Biographies of Virtuous Women). For Jiaonü yigui  BIC’s results page shows that the title appears on thirty-three book lists, the timeline shows that its peak was in the 1760s-1770s, and the data table reveals that Chen distributed Jiaonü yigui to Confucian schools in the provinces where he was serving as provincial governor. 

3. Comparisons.

BIC’s comparison tool makes possible visualizations of the frequency of multiple titles. Suppose you wanted to compare the appearance of the Shiji 史記 (Grand Scribes’ Records) as a stand-alone title to its appearance in sets of dynastic histories during the Qing dynasty. You could search for “史記 十七史 二十一史  二十四史” and add the results into the comparison tool. These results in Figure 3 show that book lists more frequently listed the Shiji separately and that the Twenty-four Histories overtook the Twenty-one Histories in the early 1900s. 


Description automatically generated

Figure 3. Comparison of Shiji, Seventeen Histories, Twenty-One Histories, Twenty-four Histories

4. Quantitative Studies.

BIC lists titles by total number of records and locations which makes possible quantitative studies. If you wanted to find the ten most common books in Ming school libraries filtering the entire data set for the Ming dynasty yields the results in Table 1:

Table 1. The Ten Most Common Books in Ming School Libraries.

Book TitleNumber of RecordsNumber of Libraries
大明律 7658


The Books in China Database makes it possible to get a better understanding of the relative commonality of particular books, their spatial distribution, and when they appeared and disappeared in local institutions. This information will enable a better understanding of the history of books and their importance in different times and places.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s