This week’s guest contribution was organized by our Editor for Northeast African Studies, So Miyagawa, and was written by Lance Martin, a Ph.D. Candidate at the Catholic University of America in the Department of Semitics. His dissertation is on Syriac translations of Coptic texts in late Antiquity. He has also served as the Digital Humanities Specialist for the Coptic Scriptorium project since 2019.
Today, I will introduce some ways to search Coptic Scriptorium’s database. These will include methods that do not require advanced Coptic skills. Therefore they can be useful to students of Greco-Roman late antiquity and not only Coptologist. Coptic is the last stage of Ancient Egyptian, and Christians in Egypt used it extensively throughout the first millennium CE. For those not familiar with Coptic Scriptorium, the project uses interdisciplinary digital and computational methods to make richly annotated Coptic texts openly available online. We have published monastic texts, hagiography, biblical texts, apocrypha, documentary papyri, and magical texts in multiple digital formats for reading and research. Additionally, we have developed tools to process Coptic text and automatically annotate it for features such as parts of speech, language of loan words, lemmas, named entities, non-named entities, and more. All our data is available in a searchable database.
Search by People and Places
From the project’s corpora page, users can search for manuscripts, authors, and works through the filters on the menu or the search box. With the recent addition of named entity recognition, filters for people and places mentioned in Coptic texts are now included. To explore some of the project’s annotation layers, let’s, as an example, find out where and how a familiar place, the city of Alexandria, appears in Coptic texts. In order to find Alexandria, I will select the ‘places’ filter, and an alphabetically arranged list of all named places in the database will load. After browsing a bit, we come across the entry for Alexandria and, under it, see a list of every corpus that mentions it. Note that each unique place on the list is linked to Wikipedia. Linking to Wikipedia gives us a stable table of authorities to assign unique identifiers to people, places, and events, a process commonly known as wikification. This is, as the saying goes, another story. For now, let me point out the other features on this page. There are two basic search options: via documents or ANNIS. To browse all documents that mention Alexandria, click on the title, follow the link, and you will see all relevant texts.
Screenshot of “On Mercy and Judgment” in Analytic View.
Choose one to read the full text. When viewing a document, you can switch between normalized Coptic text (hover over a sentence to see its translation), a diplomatic text, and the analytic text. The latter includes all part-of-speech, entity annotations, and, if available, an interlinear translation. Each of these options prioritizes different kinds of information: the normalized makes reading the text itself easiest, the diplomatic facilitates the study of text on a manuscript page, and the analytic gives immediate access to lexical and parsing assistance.
Document Search from the Alexandria entry in “places.”
If you want to read full texts, searching for documents is an excellent option, but, if you want to compare passages of multiple texts on a single page, an ANNIS query is most effective. ANNIS is an open source, versatile web browser-based search and visualization architecture for corpora of texts in linguistic research. Let’s take a closer look. From the Alexandria entry, I will now follow the “ANNIS search” link to the side of Alexandria which will return a query listing every mention of Alexandria in our corpora.
ANNIS search from the Alexandria entry in “places.”
The interface simultaneously allows you to efficiently analyze how Alexandria appears in texts. Users still have the option to read full texts by selecting relevant dropdown boxes, but the real treasure of ANNIS entries is the annotation and syntax tabs. They provide robust textual annotations including normalization, lemmas, part-of-speech tagging, syntactic analysis and entity annotation. Translations are included in the interface if available and each word is bilaterally linked to the Coptic Dictionary Online. Our corpora form the first syntactically annotated corpus of Coptic, and the syntax tab gives immediate access to syntax trees that follow Universal Dependencies annotation schema for cross-linguistic comparison. Automatic queries from the corpora page enable researchers to quickly find apt passages mentioning specific people and places, but I should mention that you can search directly from ANNIS and that there are a plethora of ways to do so, including by part of speech, lemma, Greek loan words, morphological unit, entity, and more. There is not enough space to go into more detail, but see the “help/example” tab in the interface or watch our tutorial here to learn more complex queries.
Reverse search through the Dictionary
Adjacent to our corpora search, the Coptic Dictionary online improves upon Crum’s Coptic Dictionary on which it is based, by increasing its coverage of Coptic words, adding Greek loan words, and inserting full example sentences. It supports searches by dialect and definition keywords. Each entry lists all known spellings of the word and any Greek words it is known to translate. In addition to all of this, it has a number of features that can support research by linking to our other tools. Let me use a simple example to highlight some of its features and show how to access other Scriptorium annotation layers through the dictionary. Say I want to study how the phrase “Kingdom of Heaven” is used in Coptic texts. I can simply enter kingdom (in the Latin alphabet) to return relevant definitions. Once done, a single viable entry, ⲙⲛⲧⲣⲣⲟ (kingdom), appears. From the entry page, there are multiple options to explore word usage in more depth. For starters, I will view ⲙⲛⲧⲣⲣⲟ’s term network by clicking on the small graph to the right of the listed form. Entity Term Networks provide a graphic visualization of an entity’s relationships with other words in its span. The networks present a large amount of data in a small space, revealing patterns and their relative frequency more readily. In this specific case, the network for ⲙⲛⲧⲣⲣⲟ gives a clearer idea of its potential semantic relationships: almost always followed by ⲛ ‘of’, continuing to God (ⲛⲟⲩⲧⲉ) or heaven (ⲡⲏⲩⲉ) rather than to a temporal emperor such as Diocletian (ⲇⲓⲟⲕⲗⲏϯⲓⲁⲛⲟⲥ), together meaning ‘the reign of Diocletian’. Try pulling up the network for other Coptic nouns by yourself!
Left: Term network for ⲙⲛⲧⲣⲣⲟ. Right: Click on an entity icon for sense-based searches.
From the entry page, I can also search in ANNIS by clicking on the Coptic Scriptorium logo under “Attestation” (It is the one that looks like a C with a dot in it) to return a query for the lemma, “ⲙⲛⲧⲣⲣⲟ.” This will bring up every passage in which ⲙⲛⲧⲣⲣⲟ appears in our database, including the Kingdom of Heaven and Diocletian’s reign, and is, therefore, a bit too broad to help with my present question. Fortunately, the dictionary supports more focused queries. To search by semantic category, you can select one of the entity type icons found just above the definition. In this example, there are three options: abstract, place, and time (the cloud, the place marker, and the clock). Since I am interested in the kingdom not of this world, I will choose the abstract icon. The resulting query includes entries that list the Kingdom of God and the Kingdom of Heaven, but not the Roman Empire or the Kingdom of Israel (places) or kings’ reigns (times). In only a minute, I accessed not only all iterations of ⲙⲛⲧⲣⲣⲟ in our database but filtered these mentions to find those most pertinent to my interests.
Coptic Scriptorium‘s toolset has served as a catalyst for my research. I hope today’s brief introduction to some of the project’s basic search functions will do the same for you.