Classical Ethiopic (Ge’ez: ግዕዝ Gəʽ(ə)z), one of the classical written languages preserved on the African continent, has sustained religious and literary development for over a millennium as the liturgical language of Ethiopian and Eritrean Orthodox Christianity. Nevertheless, a substantial portion of this invaluable documentary heritage remains difficult to access and faces the ongoing threat of dispersal and deterioration. In response to these challenges, digitization projects that gained momentum in earnest at the turn of the twenty-first century are beginning to transform the landscape of Ge’ez studies. This article provides an overview of the current state and future prospects of the digitization of Ge’ez textual materials.

Figure 1. Example of a Ge’ez manuscript (Tegrāy, Gulo Makadā, ʿUrā Qirqos, manuscript Ethio-SPaRe UM-014. Miracles of Mary. Folia 10v–11r; Alessandro Bausi, CC BY 3.0 Unported)1
Following the official adoption of Christianity in the Aksumite Kingdom of ancient Ethiopia in the fourth century CE, Ge’ez developed into the primary language of religious and literary expression. Ethiopia and Eritrea are home today to numerous monasteries and churches, which collectively house tens of thousands of Ge’ez manuscripts. Yet only a small fraction of this vast manuscript tradition has been systematically investigated and cataloged. Similarly, only a limited number of manuscripts have been microfilmed or digitized, leaving the majority vulnerable to loss and damage. As urgent measures are required to preserve this manuscript heritage and transmit it to future generations in a form amenable to scholarly study, several international projects have been launched in recent years.
The forerunner among these is the Ethio-SPaRe project (“Cultural Heritage of Christian Ethiopia: Salvation, Preservation, Research”), conducted between 2009 and 2015 at the Hiob Ludolf Centre for Ethiopian and Eritrean Studies (HLCEES) at the University of Hamburg.2 Funded by the European Research Council (ERC) and headed by Dr. Denis Nosnitsin, the project focused primarily on churches and monasteries in the Tigray region of northern Ethiopia, undertaking the conservation and systematic investigation of important manuscripts. Concretely, it involved identifying significant monastic libraries, compiling manuscript catalogs, producing digital reproductions of important manuscripts, and classifying and analyzing the texts they contain. The outputs of these activities have been made available as interconnected searchable databases (e.g., one for art objects and another for bibliography) and virtual libraries. This initiative has made a substantial contribution to establishing the infrastructure necessary for the preservation and study of Ge’ez textual materials.
Alongside efforts in manuscript preservation, a noteworthy new development in the application of digital technologies to Ge’ez studies is the ERC-funded TraCES project (“From Translation to Creation: Changes in Ethiopic Style and Lexicon from Late Antiquity to the Middle Ages”),3 carried out between 2014 and 2019. Led by Professor Alessandro Bausi of the University of Hamburg, this project aimed to construct a large-scale corpus of Classical Ethiopic texts and to elucidate changes in style and vocabulary through quantitative methods.
Ge’ez literature originated primarily in translations of biblical and theological texts during late antiquity and subsequently came to include translations from Copto-Arabic as well as original compositions during the medieval period. With a focus on the lexical, grammatical, and stylistic changes that occurred over this long literary history, TraCES employed, for the first time, an approach combining large-scale digital corpus construction with linguistic analysis. By building a digital corpus in which morphological and part-of-speech annotations were added to a body of critically edited Ge’ez texts, it became possible to analyze in detail the differences in usage across textual origins and historical periods.
The use of a morphologically annotated corpus enables the quantitative detection of changes in word frequency and collocational patterns, thereby revealing diachronic shifts in grammatical and lexical choices. This analysis also yields new evidence for estimating the dates of composition and the historical background of texts of unknown provenance. Among the outputs of the TraCES project, a complete morphologically annotated corpus (approximately 100 MB) has been made publicly available for free download and use by researchers.4 The corpus data are provided in three formats for each text: the Ge’ezTextArchive format, the international standard TEI/XML format, and the ANNIS format for multi-layered corpora, facilitating a wide range of research applications. A number of valuable research tools that emerged as by-products of the project have also been made available, including an annotator for Ge’ez texts (GETA) and a lexicon released on GitHub.5
As an example of integrated digital archive construction for manuscripts themselves, the long-term project Beta maṣāḥǝft,6 launched in 2016, deserves special mention. Supported by the Hamburg Academy of Sciences and Humanities and planned to continue until 2040, this project aims to build a multimedia environment for the comprehensive description and study of the Christian manuscript cultures of Ethiopia and Eritrea. The project brings together a large number of scholars, including Prof. Alessandro Bausi (scientific director) and Dr. Denis Nosnitsin (research fellow).

Figure 2. Top page of Beta maṣāḥǝft
Beta maṣāḥǝft is developing an XML-based digital platform that consolidates all information pertaining to Ge’ez manuscripts. On this platform, information is recorded and managed in an integrated manner encompassing not only the textual content of each manuscript — including main text, vocalization marks, and marginal annotations — but also its physical characteristics, colophon information recording details of donors and owners, and surrounding metadata such as associated institutions (holding libraries and churches), authors, and successive owners and scribes. High-resolution manuscript images are also provided wherever possible, and for a number of manuscripts, critical editions and transcriptions have been additionally published in machine-readable form.
One of its particularly notable features is a cross-searchable hypercatalogue (integrated catalog) that links to other existing online manuscript databases, including the Ethio-SPaRe and TraCES databases mentioned above, enabling users to search across multiple databases of Ethiopian manuscript materials simultaneously. The platform encompasses the broadest possible range of available data, from manuscripts held in Western libraries to those still extant in Ethiopia and Eritrea. In addition to the manuscript repository itself, the platform provides in an integrated form all the ancillary information necessary for manuscript culture research: an integrated catalogue of texts contained in manuscripts (Clavis Aethiopica), a database of related scholarly literature (an annotated bibliography), a geographical dictionary of scriptoria and the churches and monasteries that became depositories, and prosopographical data on authors, translators, and owners. By accumulating and linking this multidimensional data, Beta maṣāḥǝft is constructing a comprehensive research infrastructure that enables a richly layered understanding of the manuscript heritage of Ethiopia and Eritrea.
As noted above, the construction of Ge’ez corpora and the digitization of manuscripts are opening up avenues for applying digital humanities methods to this field. Frequency analysis and stylometric analysis using morphologically annotated corpora represent one such application; more recently, challenges in the automated transcription of manuscript images using deep learning (HTR/OCR) have also begun.7 Digitized resources are significant not only for specialized research but also for education and outreach. While Ge’ez does not attract a large number of learners, the growing availability of online access to corpora and digital manuscripts is enabling an increasingly broad audience — encompassing not only researchers and graduate students but also ecclesiastical communities in Ethiopia and Eritrea and general enthusiasts — to engage with this classical language and participate in its study.
References
Hizkiel Mitiku Alemayehu, “Handwritten Text Recognition Best Practice in the Beta maṣāḥǝft workflow,” Journal of the Text Encoding Initiative [Online], Rolling Issue, (2022), https://doi.org/10.4000/jtei.4109.
Alessandro Bausi, “Composite and Multiple-Text Manuscripts: The Ethiopian Evidence,” in One-Volume Libraries: Composite and Multiple-Text Manuscripts, ed. Michael Friedrich and Cosima Schwarke (Berlin and Boston: De Gruyter, 2016), https://doi.org/10.1515/9783110496956-005.
Footnotes
- The photograph is reproduced under Creative Commons Attribution 3.0 Unported from: Bausi, “Composite and Multiple-Text Manuscripts: The Ethiopian Evidence,” 152. ↩︎
- “Ethio-SPaRe: Cultural Heritage of Christian Ethiopia. Salvation, Preservation, Research,” HLCEES Hiob Ludolf Centre for Ethiopian and Eritrean Studies, Universität Hamburg, accessed November 13, 2025, https://www.aai.uni-hamburg.de/en/ethiostudies/research/ethio-spare.html. ↩︎
- “<TraCES/> From Translation to Creation: Changes in Ethiopic Style and Lexicon from Late Antiquity to the Middle Ages,” Universität Hamburg, accessed November 13, 2025, https://www.traces.uni-hamburg.de/. ↩︎
- “Corpus,” <TraCES/>, Universität Hamburg, accessed November 13, 2025, https://www.traces.uni-hamburg.de/texts/corpus.html/. ↩︎
- “Geez lexicon of the TraCES project,” GitHub, accessed November 13, 2025, https://github.com/TraCES-Lexicon/lexicon. ↩︎
- “Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea,” Universität Hamburg, accessed November 13, 2025, https://www.betamasaheft.uni-hamburg.de/; and “Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea,” Universität Hamburg, accessed November 13, 2025, https://betamasaheft.eu/. ↩︎
- For example, see Alemayehu, “Handwritten Text Recognition Best Practice in the Beta maṣāḥǝft workflow.” ↩︎
The article is based on an English translation from the Japanese article 「古典エチオピア語(ゲエズ語)のデジタルコーパスと写本アーカイブの動向」(Current Trends in Digital Corpora and Manuscript Archives of Classical Ethiopic (Ge’ez)) written by So Miyagawa, published in the 172nd issue of the Japanese Digital Humanities web-magazine『人文情報学月報』 (Digital Humanities Monthly, DHM) by the International Institute for Digital Humanities (人文情報学研究所). Please visit: www.dhii.jp/DHM/.
Cover image: Book of the Gospels, Northern Highlands artist, late 14th–early 15th century. Ethiopia, Lake Tana region. Parchment, acacia wood, tempera, ink; 41.9 × 28.6 × 10.2 cm. The Metropolitan Museum of Art, Rogers Fund, 1998 (1998.66). Public Domain. https://www.metmuseum.org/art/collection/search/317618
