Criticus: A tool to move from transcription to collation

Introduction

In my last few posts, I explored how Python could be used to do some codicological analysis of manuscripts transcribed on the New Testament Virtual Manuscript Room (NTVMR). A codicological analysis is not the only thing you can do with a digital transcription, however. Especially if you have done several transcriptions of the same text contained in different manuscripts. These transcriptions can be compared to one another against a base text to reveal all of the alternate readings. This process is called collating.1

Collating Transcriptions

The Institute for Textual Scholarship and Electronic Edition and the University of Birmingham, UK (ITSEE) have developed a tool to collate digital transcriptions. This tool is called the standalone collation editor and can be found on ITSEE’s GitHub.  The collation editor can be downloaded onto your local machine and uses locally stored files to collate. Therefore, the first thing you’ll need is your transcriptions! In my last post, I demonstrated how to use Python to get a hold of your transcription using the Requests library.  A list of transcriptions could be programmatically downloaded.

Alternatively, I also discovered another tool developed by ITSEE that can be used for transcriptions. They have a standalone transcription editor that can be found here. This editor works just like the one found on the NTVMR, but transcriptions are saved locally from the outset. You are also able to bring your own base text, in case you are transcribing something other than biblical text. The standalone transcription editor does not have any images of the manuscripts available. It just contains a transcription editor, so you will still need to view images on the NTVMR, Center for the Study of New Testament Manuscripts (CSNTM), or elsewhere.

Once you have transcriptions saved locally, you will also need some other dependencies to use the collation editor, such as Python3 and Java Runtime Environment version 8 or higher. In a future post, I will explain more about how to use the collation editor because even after saving transcriptions locally and downloading the other dependencies, the collation editor is still not able to be used just yet. While a transcription on the NTVMR or the standalone transcription editor yields an XML transcription, the collation editor takes JSON files as input according to the documentation. The documentation explains how the JSON file should be formatted and what it looks like.

So, how do we convert our XML transcriptions into JSON files? The least appealing option is to write the JSON files by hand. Surely, XML transcriptions could be programmatically converted to JSON. Rather than reinvent the wheel, though, check out Criticus.

Criticus

Criticus is a desktop app created by David Flood that can be found on David’s GitHub here. Criticus contains ten different tools for working with XML transcriptions to help solve some of the common data conversions needed to utilize the whole toolchain for transcription analysis.  Criticus can be downloaded and run locally, or if you already have Python installed, Criticus can be easily installed with pip!

Criticus can do several things, and David provides extensive instructions for using Criticus. In this post, I will focus mainly on converting XML transcriptions to the proper JSON format needed for the collation editor and the interface for editing the project configuration file needed in the collation editor.

Converting the transcription to JSON is a simple process with Criticus. Simply open a terminal and enter python -m criticus or python3 -m criticus to run the app. Next, click on the button “TEI to JSON”.

A new window should pop up that looks like this:

In the first line of input, click on the browse button to find the XML transcription that you want to convert. Next, you can choose to convert all the verses in the XML file or just one verse. The one-verse option is helpful in case you need to edit your transcription at a particular verse and need to update your JSON. This allows you to convert just one verse rather than reconverting the entire file when the rest of the verses don’t really need to be converted again. However, Criticus follows the verse reference format specified by the IGNTP and INTF. For example, B06K13V1 is the verse reference for Romans 13:1 (B06 = Romans, K13 = chapter 13, V1 = verse 1). This is important, especially if you are transcribing anything that is not biblical text and possibly does not follow this format. For instance, I transcribed the Euthalian Apparatus Prologues to the Catholic Epistles, and they do not have a book, chapter, or verse system. I created one for the Euthalian Apparatus, that maps to this format so that it would work with Criticus, and you may need to also. If you are collating biblical text, however, there is no need to worry.

The final step is to choose the output folder that you want to save the JSON file to. The documentation of the standalone collation editor states that the JSON files should be saved in /collation/data/textrepo/json/. So make sure you have the standalone collation editor saved on your machine and select the appropriate folder specified by this path to save the file. Repeat these steps for every transcription, and your collation editor JSON folder should contain a folder for each manuscript with its own JSON files.

Each verse is saved into its own JSON file, and the name of the file follows the format specified above. However, notice that my book symbol, “B,” does not contain a number. I named the “book” CathProl but still utilized chapter and verse numbers. As an example, the first “verse” of the Catholic Prologues in MSS 33 looks like this:

And that’s it! David has created a fantastic tool for doing this conversion, so you don’t have to. The JSON files are easy to read, and you can briefly skim through them to double-check that the information is correct. XML can be tricky, though, and tags can be nested within one another in several ways. Because of this, there are bound to be edge cases that have not been coded in David’s app. He responds to messages and is always looking for those edge cases. He will update the app if you reach out to him. If you have the technical chops, you could fix the edge case yourself and put in a pull request! David would be happy to have help from contributors. You can find David’s contact information on the homepage of his GitHub account.

The next tool I will briefly cover allows you to create a configuration file for the standalone collation editor. The documentation of the collation editor specifies what the configuration file should look like. You could also make the configuration file by hand, but David has provided a GUI for creating it. Click on the button Configure Collation Editor to get started. First, you need to choose a folder to save the file in. Click the browse button and look for /collation/data/project/default/config.json inside the standalone collation editor. This is the path that the configuration file needs to be saved in for the editor to work. A config file already exists, so select it, and Criticus will update the file with the appropriate information. You can give your project a name and select a base text to compare all the transcriptions against. The base text must also be prepared in a JSON format, just like the transcriptions. Next, you can add witnesses by putting the witness’s name in the text box beside the Add Witness button and then clicking the button. The witness should appear in the large Witnesses box above. Make sure to include the base text in the list of witnesses as well. The names of the witnesses should match the names of the folders that contain your JSON verse files. Witnesses can also be removed from the list, and once you are done with the configuration, make sure to click the Update button. This will ensure that you update the config.json file.

If you click Start Collation Editor before clicking the Update button, the old unmodified config.json file will be used instead. On that note, clicking the Start Collation Editor will start the editor, so you do not have to open it from the terminal. The collation editor deserves a post of its own, however, so I will not go any further than this.

Conclusion

Collation is a common task for further analyzing transcriptions of the same text amongst different manuscripts. Unfortunately, the output of transcription tools is not the correct input for collation tools. Criticus provides the tools needed to convert transcriptions to the proper input type. Criticus does much more than this, so feel free to explore the documentation to discover everything else Criticus does. In my next post, I will demonstrate how to use the standalone collation editor now that we have gotten the correct input type and properly configured the collation editor.

  1. D. C. Parker, An Introduction to the New Testament Manuscripts and Their Texts (Cambridge, UK ; New York: Cambridge University Press, 2008), 104. ↩︎

Leave a comment