The Three Levels of Digital Humanities
I wish to sketch out a general framework that should make it easier to understand how you can start with using digital methods and data-driven analysis. All too often I see people struggle with a barrier to entry that is too high. And I think it need not be so. What I mean to say is that we should not let the more fancy, more advanced type of digital humanities dictate what we think DH is or can mean for us.
So today I wish to introduce three distinct levels of digital humanities. The first is really basic and simple: just doing your regular old research but with the help of a computer here and there, instead of pen and paper. It is the humanities, but done digitally. The second is a fairly vast landscape of intermediate DH, typified by the use of specific pieces of software and along with them a fair chunk of time that solely goes into getting the software to output a meaningful result. This level I would properly call Digital Humanities. The last level is superior in its technological complexity: you write your own software and mold and change the input as you see fit. No longer are you so much concerned with digitized surrogates of historical evidence, but with things that are truly (or born) digital, like plain text or metadata. It’s still humanities, on some level, but done through data. It’s computational in nature. This third level may sound attractive but it’s not something you can simply arrive it in most instances. For fields like ours, the first and second level are much more beneficial. Let’s take a closer look.
Starting out: doing your thing, digitally
What we see above is the hermeneutic circle of the academic workflow. Of course, you may identify other areas of major concern and you should think how this workflow fits your situation, but I hope that this particular one catches the majority of cases. We can actually start best on the left, with publishing. I include official ones, like books and articles, as well as unofficial ones, like grant proposals. In any case, for a good academic (like yourself) your publications invariably lead to new questions, ideas, or avenues. By finishing one thing, you start to think of new things to do (by things I mean of course: things that would benefit not just your curiosity but the advancement of the state of the art in your field). This unresolved question needs to be sharpened, tested and fully worked out into subquestions and project descriptions. That is the finding phase, which also includes the first stage of a project in which you collect your materials. By then, it’s time to read (or examine or interview) them. This leads to notes, both while reading and afterwards (notes of notes). Once you’ve collected enough notes, writing is upon you, to bring it all together into a publication, completing the circle.
Of course there is overlap between the stages, but generally they can be distinguished in this order. Now the thing is: each phase can benefit from digital tools. And Microsoft Word is barely the right choice for the writing phase, let alone the other ones.
- Publishing: think of a good filing system into folders and file names, slice up everything you have into categories of importance, with the most important stuff directly on your computer but with the option of storing less important stuff on external hard drives. Think of a backup strategy, like Zenodo for publications, Dropbox for personal files, and a local backup for your entire computer. Also notice how you may prefer certain file formats over others, such as .PDF over .DJVU. Perhaps even go so far as to convert files into your preferred file format, to ensure long-term storage.
- Finding: You know how to find new stuff through footnotes, bibliographies, catalogs, and so forth. But knowing how to find things in the digital world is a different game. How to get, for example, a digitized version of the first edition of a certain book? It could be interesting to get familiar with OSINT: open source intelligence. The same techniques they share in this domain can be applied to our own needs. Over time, of course, you will curate a list of preferred websites (if only in your head). Also invest in a bibliography manager, like Zotero. This greatly helps in managing your project, keeping track of what you have found and what you have read.
- Reading: Since by now you will have collected most of your material digitally, you need a digital reader to go with it. There are some amazing new innovations in this regard. I will only list a few of them: MarginNote, LiquidText, and Flexcil (check out PaperlessX YT channel).
- Note taking: Please don’t do it in Word. In this domain, too, there are some amazing innovations but it should be noted that we are in the midst of it with no definite winner (or end goal) in sight. Among the top contenders are Notion, Obsidian, and EverNote (check out Keep Productive YT channel). Other apps with a more specific purpose can help as well, like outliners such as OmniOutliner or task managers such as Todoist (though I think academic working is less short-term task-oriented and more long-term goal-oriented). Think of note taking as a file cabinet in which you want to leave snippets that interconnect with each other. You will use the connections later in the writing phase.
- Writing: Maybe you open Word at this stage. But did you know there are other options out there, too? If you work with right-to-left scripts, Mellel or Nisus Writer Pro may be a better choice. And Scrivener has an interesting take on writing as well. At this stage, a project management system can help keep you moving in the right direction, something you could set up with (aforementioned) Notion.
The next level: getting digital
In the next level, properly digital humanities if you will, you are no longer simply using digitized surrogates of your historical sources, but you are trying to apply an organization to it that is meaningful (and labor intensive) in its own way, or you are digitizing yourself. In other words, you create digital resources which may be only a stepping stone towards a publication but could be a real result of your project on its own. The use of specialized tools is almost unavoidable, and with it comes a dearth of tutorials and explanations. If something breaks, you may have to do some sleuthing to get it to work (looking at you, ScanTailor).
- Publishing: Next to paper publications, you may wish to produce digital resources. Think of a curated collection of images, or a careful selection of passages, or an annotated map. This is almost always hosted in a way that is easily accessible and manually browsable, i.e. with a graphical user interface. This brings along two factors: 1. the technology you make it in, 2. the platform you host it on. For both factors it is wise to talk with people around you to make the right choice. This choice should be informed by A. easiness to use, B. expected longevity, and C. flexibility to extract the intellectual contents and put into another technology or platform. Specific recommendations are hard to make, but try to use as mainstream and common technology as possible. E.g. selection of texts: maybe you can build it in Notion, which has an option to turn it into a publication, annotated maps: why not use Google Maps at first, curated images: try Omeka. Be aware of the existence of abandonware, even when it still looks shiny (case in point: Palladio).
- Obtaining: At this stage your goal is probably to get as much as possible of X. The nature of the digital world is such that the presence of one item is not unique. Things only become interesting once they are brought together in large numbers. Obtaining bulk data can be achieved by asking nicely. More and more, proper download options are available. At this stage, it probably is no trouble to you to do so by hand, going over item by item and clicking the download button. Something I have had fun with is SiteSucker. But be careful, because it is a powerful weapon that can quickly go overboard and attempt to download the entire internet.
- Organizing: You probably do not want to learn a programming language yet, to be fully equipped to download and obtain whatever you want. Your time is much better spent organizing things. Because organization is an editorial practice, and that is something for which you can use your trusted Humanities skills. Organization starts with folder and file names and quickly turns to using programs such as Zotero or Tropy, to flexibly type everything in, tag it, and rearrange it in whatever way you see fit.
- Annotating: So you have spent considerable time obtaining and organizing your evidence. Typically, you do not ‘close read and make notes’ with what you have by now, but instead you annotate it. You are enriching it with insights of your own or you draw out meaningful connections. Instead of looking into the contents of the evidence, you look at the structure. There are a great many ways to do it, depending on what your evidence is, what kind of annotations you want to make, and what your desired outcome is. Much ado is about Gephi. And I get: it’s crazy powerful and let’s you set up systems and networks within a graphical user interface. My sense is that we may be moving away from it. Gephi is a bit overpowered for most of our use cases and can get clunky. What we need is a lightweight graph-network analyzer with an intuitive interface. I must confess I do not know which piece of software offers this. Annotation can also be speech-to-text, breaking texts into grammatical units, manual object classification, making maps (either with or without GIS), etc. Last but not least, transcription is a big deal here. Transkribus has come out as a winner in this regard. For better or for worse. Give it a try and see if it fits your needs, just remember that if it doesn’t there still are other options out there.
- Writing: You will be producing two things: 1) one is a digital asset, probably a repository or interactive archive, where others can browse through your curated and annotated collection. 2) But you will also write a (few) paper(s). One may be about the digital aspect of your project, but likely at least one paper (or book) will be within your own discipline answering a conventional research question.
Beyond the norm. It’s all data now
- Archiving: So you have seen enough to know you want to get more out of it. You learn a programming language (*cough*, Python, *cough*), because you know that as much as pieces of software are great, the ultimate tool is a programming language so you can write your own software. At this stage, you will need data management, and not on the manual level, but in a more robust way. Learn the ropes of Git and semi-permanent repositories such as Zenodo. Learn how to store data programmatically and retrieve it.
- Collecting: It seems that in data analysis people are fond of speaking about collecting data. And my gut feelings tells me it’s more about simply going out to get it than asking for it. So learn how to scrape and manipulate data in bulk and you’ll be thankful later. If it’s accessible over the internet, you can get it.
- Cleaning: The problem of collecting data in an automated fashion is that you invariably get ‘dirt’ with it. Stuff isn’t formatted the way you want it or by selecting on a key term you accidentally took a bunch of things that are completely unrelated but happen to have that key term. So you need to clean it, and this you do, too, automated as much as possible. Careful though: cleaning is an editorial practice. Included in cleaning could be things like tokenization (used when analyzing natural language). Think of Mallet, or NLTK (and then weep when you realize it basically only works for modern English). Or a custom OCR run, using Kraken.
- Analyzing: Even when it comes down to diving deeply into the meaning of what you have in front of you, you will not make much use of traditional methods you learned in the Humanities, or, rather, you will do so in a more abstract manner. More immediately you will be making use of analytical methods provided by data analysis (e.g. rolling average) and perhaps even machine learning (e.g. clustering).
- Visualizing: This means that when it is time to disclose your results, the main dish will likely be a visualization, a graphic that tries to make concrete some very abstract relations or trends within your data. I am brief about this phase and the previous one because for each there is an incredible amount to learn if you want to and techniques are changing fairly fast.
I hope that presenting this schema in three parts gives you a sense of direction. On what level would you like to operate? What does that entail? I also hope that I have given some arguments in favor of the first level (Humanities, digital). It is a fine level to be at, with already a lot of improvements over one’s regular workflow. Most importantly, it gradually provides a springboard to attempt to get into the second level.