This is a guest post by Ruth Mostern, University of Pittsburgh
@ruthmostern
Like all terms for disciplines and methodologies, the digital humanities, a label that has been in use since the early years of this century, is one that refers to a multitude of heterogeneous approaches. Within the digital humanities lie activities that collectively encompass every element of scholarly workflow. Asianists, like other humanists, may bring computational methods to bear at any phase of their activities: from discovering materials in libraries and databases; to organizing, reading, and analyzing materials that pertain to a topic of inquiry; to publishing and teaching about one’s insights. Twenty-first century Asianists, like other humanists, routinely conduct searches across digitized text corpora and use computers to draft manuscripts, communicate with colleagues, and manage bibliographies. We do not claim to be engaging in the digital humanities when we write emails or save documents. What, then, does it mean to assert that a project represents scholarship in the digital humanities?
Debates in the Digital Humanities is a hybrid print and digital book series that explores the wide range of communities and practices that might be associated with the term digital humanities: from data mining to game design, from visualization to librarianship. Some initiatives require supercomputing centers, bespoke code, or insights intended to advance research in computer and information science. Outside my work as an Asianist, in collaboration with technical director Karl Grossner, I lead the World Historical Gazetteer (WHG), a large-team content and infrastructure project to develop a linked open data platform for integrating datasets of historical place names. The WHG necessitates the development of new standards, new code, and new approaches to data integration. On the other hand, much of the content we index arrives in the form of simple spreadsheet files. It requires no special training to complete a submission that complies with the WHG Linked Places TSV format (an Excel template is available here), but each file reflects specialist knowledge and spatial history research. I have written two books about Chinese history (Dividing the Realm in Order to Govern (Harvard Asia Center, 2011), and The Yellow River (Yale University Press, 2021)). They rely on computational practices, but they utilize ones that have not been innovative since the 1970s. As with the WHG index, my books rely on spreadsheets of historical named places. I draw attestations of place-making events from historical sources and aggregate them together, permitting quantitative analysis using methods that the literary critic Franco Moretti termed “distant reading” in his 2000 article “Conjectures on World Literature.” Once my team and I transform multiple spreadsheets into relational databases, we can run queries on them and output the results in the form of maps and timelines. My books make their contributions in the realm of historical scholarship, not information science.
My research begins with printed material: reference works, tables, and annals collated from historical documents, which comprise lists of historical phenomena that I want to explore. My team and I digitize them, convert them into spreadsheets, standardize the formats of each spreadsheet, merge the spreadsheets, and conduct additional research to ensure that we are recording all the relevant attributes for each entity (each unique piece of information, such as a named place or a historical event) that we are describing. We take care to design each spreadsheet to ensure that each entity will retain a unique ID and that we can track the provenance of any piece of information. These tasks take a great deal of time, and they require attention to best practices for data management, but they do not involve any special computational skills. Once we have done this, I can use my scholarly judgment to oversee the classification of entities into categories: for instance, categories of changes to local government units for Dividing the Realm, and categories of civil engineering activities for The Yellow River. For each evolving database, my team and I add contextual information such as GIS shapefiles and (for The Yellow River) rainfall data from the Monsoon Asia Drought Atlas. Each book includes an appendix with more detail about its data design. “The Digital Gazetteer of the Song Dynasty” is a website with additional information about the data for Dividing the Realm. The data is all available to download at that site as well. A similar website for The Yellow River data is currently in progress.
I ask only simple questions of my data. The insights underlying each book arise from acts of counting and basic arithmetic, exploring topics such as the number of events of various types that occurred within a given date range, with the answers displayed on maps and timelines like the one below. Practices of designing, querying, and visualizing data underlie the research and writing process for each book. Through data exploration, I have been able to identify the patterns, anomalies, key locales, and turning points that have become the architecture of my narratives. Data analysis and visualization have permitted me to formulate arguments about long-term and large-scale historical change that would have been impossible to articulate if I had begun only with reading documents. Data analysis and visualization have offered clues about what to read in historical sources, since I have been able to pinpoint times, places, and phenomena that deserve special notice.

These projects have not presented technological challenges pertaining to innovative, complex, or unstable software or novel formats and standards. However, they have demanded that I develop several habits of mind for which I initially had few models, particularly about how to frame questions for which a large corpus of data might provide illuminating answers. Moreover, in the service of developing powerful classification systems, I have had to learn to disregard ambiguity in historical sources in favor of a more nomothetic approach. I have needed to be able to assert that a complex historical event is simply one instance of “a flood” or “a county merger.” For many kinds of historical research, finer distinctions would matter a great deal, but for the questions I have chosen to ask, they do not.
Finally, it is essential to emphasize that these have been team projects. Data development is extraordinarily time consuming. I have been honored to work with student assistants without whom the work would never have been completed. Kaiqi Hua was a stalwart partner for both books. My acknowledgements and appendixes also name the other students who worked on each project. Moreover, each book rests on partnership with an extraordinarily talented data science expert: Elijah Meeks for Dividing the Realm, and Ryan Horne for The Yellow River. Neither book relies on twenty-first century technology. Notwithstanding that, I have only rudimentary skills in the twentieth-century methodologies on which they rest: geographical information systems, database management, and data visualization. To my regret, there are still significant disincentives for graduate students and faculty in history to acquire such expertise, though some people manage to do so anyway. I end this post with my perennial hope for changes to the profession that will make it easier for scholarship like mine to flourish, and to offer credit to everyone who contributes to a collaborative initiative.

2 thoughts on “Quantitative Scholarship as “Low Tech” Digital Humanities”