Time and pragmatism in the digital humanities: TEI and Juxta Commons for South Asian manuscript collation

The best advice I had when starting out with digital humanities came from David Beavan and was simply to only do what my research actually needed. It’s easy to get completely carried away by a shiny, new tool, and before you know it you’ve spent all your research time playing around with software instead of doing any research. The question of time is crucial and, I find, one of the main reasons many are hesitant to take up digital humanities methodologies, especially during their PhDs. Quite simply, will they have the time?

When I was preparing to produce a critical edition of a typically 300-page manuscript for my own PhD in South Asia Studies, the question of time was painfully acute. I needed a way to streamline my workflow with the manuscripts, but also wanted digital tools to aid me in my work, improve my accuracy and, hopefully, speed up the process. After experimenting with various things, I decided on a combination of TEI for manuscript description and Juxta Commons for aiding with the collation of the manuscripts.

TEI

Much can be said about TEI (Text Encoding Initiative) and its myriad applications for digital representations of texts, both for research and library purposes. I will not go into all this here, but rather focus on how TEI can work quite out-of-the-box for manuscript collation projects.

TEI is expressed in XML, making it familiar to anyone with experience of mark-up language. At its core, the complexity of TEI does not lie in its technicalities (these essentially being those of XML), but rather in its bewildering array of element tags, clustered in so-called modules befitting a variety of primary source materials. Working on a TEI project typically entails selecting and tweaking modules to create a schema, and then using the processing instruction at the top of the TEI file to guide both TEI editors and users of the resulting file as to the correct use of the schema.

Well-formed, consistent TEI adhering to a carefully defined schema is of course a virtue and an aim to strive for. In my current professional role, which is all about collections information and consistent entry of cataloguing metadata for manuscripts, this remains the case. However, the reality is that not every digital humanities project needs to be equally stringent. If you’re going to put together a stack of TEI files to use for a particular outcome, you need to do what works for you with the time you have available.

For instance, TEI provides a very rich selection of element tags to facilitate granular transcriptions. You can record multiple potential readings of a word, with comments as to which of the readings you prefer, and really go into as much detail as you like. However, this may not really be needed if you’re aiming to output a large amount of transcription for comparison.

Juxta

Juxta is a software toolkit that enables visual collation of TEI files. You can input several files and save them as a comparison set, and Juxta quickly gives you a visual overview of the differences between them, line by line and word by word.

The shading of words indicates the degree of difference between witnesses; clicking on one gives a list of the variations between the different input files. The uses of this function for manuscript collation are obvious; suddenly you don’t need to scrutinise different manuscripts line-by-line at the same time.

Screenshot from Juxta showing manuscript variations.

The right-hand boxes show variations of a single word between manuscripts, here giving the full line a different reading.

Juxta is not flawless. While it handles Devanagari input well, elongated vowels and nasalisations are sometimes missed. Similarly, small inconsistencies in the use of the <l> and <lg> elements in the TEI source files can throw whole sections into confusion.

A more serious concern, however, is that Juxta in itself does not constitute a worthwhile collation. There are functions that allow output of collated files, but these are naturally quite mechanical. Juxta is only an aid to the manual task of assessing variations and choosing the better reading, not a replacement for it.

Digital, not conceptual?

More importantly, the use of Juxta and TEI, or any digital tool for that matter, is not a replacement for conceptual thinking around what exactly you’re doing. In my case, I was working with manuscripts written in Brajbhasha, an early modern North Indian vernacular that, like most early vernaculars, was not pinned down by rules and expectations of consistency. Linguistic variations can and do occur across a single manuscript. This is part of the texture of the language and something to highlight and showcase rather than bury in the text-critical footnotes.

Juxta actually made me more aware of this latter point by its ability to visually highlight the enormous amount of so-called inessential variations between manuscripts. While they may be inessential to linguistic meaning, they’re part and parcel of the literary tradition as it appears to us in manuscripts.

In the end, the critical edition I developed using TEI and Juxta is just a standard OpenOffice document (I find OpenOffice handles Devanagari input better than Word). I did the final collation the old-school way, line-by-line, and entered the variations and so on in the footnotes. However, referring to TEI and Juxta alerted me to tons of issues I otherwise would have missed, and I do think it enabled me to look at the manuscripts with a more discerning eye.

The future

While I was preparing this article, I found to my horror that Juxta Commons is no longer supported by its developers, Performant Software. While the desktop app remains available and functioning (provided you can get Java 6 to run on your hardware), the browser-based version is gone. Juxta’s place will be taken by something called the FairCopy Editor, which promises to be able to do a lot of things with TEI. The developers did not get back to me before this piece was published, but it’s something to watch out for.

Cover image: Detail from MS Hindi 335. Photo: Wellcome Collection, CC BY 4.0.

Time and pragmatism in the digital humanities: TEI and Juxta Commons for South Asian manuscript collation