Can we really trust Digital Humanities?

When I write posts about Digital Humanities, be that here or in my blog in Russian, I tend to talk about things I wish I knew about when I just started in this field: I create tutorials, collections of useful links, or share pointers about some neat and useful techniques. Today, however, I want to write about something different. Today’s text is a rant about the state of things in DH – a critique, but from someone who is in the field and believes in its potential.

What I want to talk about are the assumptions that lie at the basis of the quantitative analysis of texts – the methods that are used in the context of “distant reading” and seem to arouse the most suspicion among the “traditional” humanists – rightfully so, I must say, but without a good understanding of the quantitative methods, they divert their attention to more familiar objects of criticism. This time, I want to join the critics.

Digital Humanities are constantly pressed to establish their credibility – the field has been condemned for many sins, from not being able to reach any results at all to being the main tool for facilitation of neoliberalism in universities. As a whole, quantitative approaches raise serious questions in feminist studies as well (see, for example, Tickner, 2005). As a consequence, we are in a hurry to produce results. Even Franco Moretti, who has long been an established scholar by the time he published his “Distant Reading” admitted that he felt the pressure and the disappointment with the lack of tangible results.

So, there is a drive to produce results and to produce them fast – to show, how the new approaches are different, more efficient, or precise, or even how they can become a way out of the trap of constant self-doubt the 20th-century literary criticism has led us into. At the same time, doing DH is not just a question of asserting oneself, but also a temptation: it is hard not to be seduced by an idea of a method that relies on mathematics and could be proven by someone else to reveal some kind of truth about ourselves and thus rendering further scrutiny unnecessary – a safe haven, a Deus ex Machina for the humanities, where “Machina” takes on a new sense and transforms into a computer. Too often do we choose to believe in this god, to press the “show results” button in whichever tool we use without going in-depth into details, hoping that the answer that we get is the answer to our questions and not someone else’s. These results then get published in a form traditional for the humanities – without too many technical details or formulas, making it “more accessible” and creating an illusion that the foundation we are standing on is firm – and that all we need is to build upon it. But, by doing so and not questioning the base assumptions, we undermine the field – it stops being methodologically driven, because “using a computer” is not a methodology.

Statistics cannot be trusted on the same scale as human minds. As soon as statistics are applied to anything concrete, they stop being “objective” and become a practice of interpretation, before we even know it.

The reluctance to look deeper at the methods leads to sloppy research. In many cases, people treat different metrics that seem to solve the same problem as interchangeable – this happens when metrics are used to find collocations. Again, many of the metrics that are used today were adopted for practical uses and became widespread just because they were “good enough” for these practical reasons – this is the core definition of what “heuristic” methods are: often not only is there no explanation of why they work, but also there is not much research on how they might be wrong and biased. This is, for example, the case of the tf-idf metric, which is widely used for topic modeling and the creation of stop lists.

Now, the stop lists. These are the lists of words that are excluded from a corpus before analysis since they are so frequent and ubiquitous that they provide no meaningful information about the contents. These are also the main sources of my frustration for the past several months and the main reason that I am writing this post. Let me use them as an example.

Those who are working with modern languages don’t usually give the stop lists much thought – they are already compiled and are readily available in major language processing tools. They clean the unnecessary noise from the text and ensure the reproducibility of our research, as they are used by everyone. Well, since I work with Classical Chinese, I had to look into this, and I have some bad news. In many cases, these lists were created with a metric that intuitively “worked,” then extra words were excluded manually, and the lengths of the lists were tweaked until the results seemed just right. Then, the rules set for a specific task were extrapolated into other tasks and languages.

If we look at the comparison of stop lists for the English language (Nothman, 2018), we will see that over a half of the words in these lists can be called controversial. Moreover, there is often no documentation to justify the choices made and to describe the methods that led to such choices. And when old lists are reused or when new ones are created specifically for DH research, humanists who are trained to scrutinize human decisions still sometimes choose to blindly believe in “established practices.”

Slingerland et al. (2017) provide a list of stop words used for their research, but there is no explanation for why it was made so. This raises many questions, as it is extremely aggressive to exclude numerous meaningful words such as “woman” 女 from the search results. Would a similar action even be possible in qualitative research? Others (Allen et al., 2017) offer minimal documentation, but the list itself is unaccessible, and without it, the results are rather meaningless (I was not able to retrieve the list even after contacting the authors multiple times as I am writing this post). For articles published in Chinese, one is lucky if it is even indicated whether a stop list was used at all. Let me remind you, using a stop list is one of the basic operations in text pre-processing, but it partially relies on the manual exclusion of words until the results of the consequent operations satisfy the researcher – and in many cases, this researcher does not justify these exclusions or is not paying attention to the ones made by others. 

Based on where things stand at the moment, be that modern or pre-modern languages, not using stop lists and noting simply which of the results the researcher did not consider starts to look rather reasonable – at least there is only one factor of data misrepresentation, as opposed to a heuristic statistical method that looks for “insignificant” words, complemented by a subjective and undocumented decision to exclude more words in order to get to a certain list length that was in turn recommended 40 years ago by a computer scientist that dealt with a completely different task and language.

Can we really talk about the absence of biases in digital approaches and treat quantitative methods as a black box? On which grounds can we assert “distant reading” as a complementary method to the “close” one if we do not know what our method actually is and use it in a slapdash manner?

So, as we try to produce quantitative analyses of our languages, histories, and cultures that is meaningful, it is worth remembering that some of the metrics we use were created by computer scientists that were not particularly interested in our research questions and were pressed to create tools that are “good enough” for a particular business purpose. Sometimes no one knows why these tools work and surprisingly often enough no one can remember how they came to be. By ignoring this, we do not advance the humanities, but rather escape to a place that makes us believe that the absence of ultimate answers is not an inherent feature of human existence, but a question of technology.

References

Allen, Colin, Hongliang Luo, Jaimie Murdock, Jianghuai Pu, Xiaohong Wang, Yanjie Zhai, and Kun Zhao. “Topic Modeling the Hàn diăn Ancient Classics”. Journal of Cultural Analytics (2017) link

Edward Slingerland, Ryan Nichols, Kristoffer Neilbo, Carson Logan, The Distant Reading of Religious Texts: A “Big Data” Approach to Mind-Body Concepts in Early China, Journal of the American Academy of Religion, Volume 85, Issue 4 (2017): 985–1016 article, preprint

Nothman, Joel, Qin Hanmin, and Roman Yurchak. Stop Word Lists in Free Open-source Software Packages. Proceedings of Workshop for NLP Open Source Software (2018): 7–12 link

Tickner, J. Ann. “What Is Your Research Program? Some Feminist Answers to International Relations Methodological Questions.” International Studies Quarterly 49, no. 1 (2005): 1–21 link

2 thoughts on “Can we really trust Digital Humanities?

Leave a comment