Readers of The Digital Orientalist, you are among the first to know!
Today’s post is dedicated to the release of the first public Syriac Handwritten Text Recognition (HTR) model on Transkribus and testing the current OCR/HTR capabilities on the Syriac manuscripts and fragments from the Austrian National Library, Vienna (ÖNB).
As mentioned in my previous posts, the rapid progress of text recognition for Syriac has been shaping a new era for Syriac Digital Humanities. We have talked about several practical OCR tools for Syriac, such as Google Lens (here) and Archive.org (here).
Recently, during the HTR Winter School 2024 of IMAFO (Institute for Medieval Research – Austrian Academy of Sciences, Vienna), between November-December 2024, the first public Syriac HTR model on Transkribus was successfully trained by the Syriac Group. This achievement provides scholars and students with a public base model for their digitized Syriac manuscripts. Now this public model can be used to generate initial transcriptions for Serto script manuscripts, and then further refine it for different specific projects, building upon the Vienna Syriac Gospels model.

I. Releasing the First Public Syriac HTR Model on Transkribus
The Vienna Syriac Gospels public model of Serto is now accessible through the Transkribus platform. You can even try the model immediately without registration on the Transkribus website: https://www.transkribus.org/model/syriac-gospels-of-vienna.
For full functionality and to utilize the model with your own manuscript images, you will need to register for a Transkribus account. Once registered, you can select the “Vienna Syriac Gospels model (Serto)” from the public models to begin transcribing manuscript images from your projects. If you have already registered in Transkribus, then you can directly see the public model here.

While Transkribus operates on a credit-based system, each user receives 100 free credits monthly, which may be sufficient for smaller projects or initial testing. For larger needs and funded projects, it is highly encouraged to consider contributing to the development of this valuable HTR tool by exploring Transkribus’ options for project-based subscriptions or collaborations.
This base model, trained on the Syriac Vienna Gospels manuscript: “ÖNB Cod. Syr. 1”, scribed by Moses of Mardin (about whom you can read in my first post for The Digital Orientalist here), in 1554 in Vienna, offers a starting point for transcribing other Syriac manuscripts in Serto script. Users can further train their own Syriac models on Transkribus for their specific projects.
In addition to the HTR model, a user-friendly website has been created (thanks to the Transkribus team and the Austrian Academy of Sciences for making it available for some time to present the results of the Syriac HTR workshop) that allows anyone to explore the Syriac Vienna Gospels online and read more about the manuscript (ÖNB Cod. Syr. 1) and the public HTR model, including the list of contributors who trained this model (to whom the author of this post is very thankful!): https://app.transkribus.org/sites/Syriac-Vienna-Gospels
This website “Vienna Syriac Gospels – Moses of Mardin 1554” provides:
- Searchable images of the manuscript
- Searchable transcriptions of the text
- Background information about the manuscript and its significance

What is the Importance of a Public Model?
The availability of a public Syriac HTR model and an online platform for exploring the Syriac Vienna Gospels marks a significant step towards democratizing access to Syriac written heritage. For it empowers scholars, students, and heritage professionals worldwide to engage with these valuable sources, regardless of their prior experience with HTR technology.
For those interested in HTR technology and integrating it into their research, this resource can also support the development of their own models, as mentioned above. The open-access dataset used to train the model, is available publicly on GitHub (https://github.com/HTR-School-Vienna/2024–Syriac/tree/main) and on Zenodo repositories (https://zenodo.org/records/14714089).

II. Testing the Model: Digital Recognition of Syriac Manuscripts in Vienna
To evaluate the effectiveness of the Vienna Syriac Gospels model, this model was used to transcribe and identify Syriac texts as a part of the ongoing project: “Identifying Scattered Puzzles of Syriac Liturgy” (ISP) at the Austrian Academy of Sciences – IMAFO. This project aims to create a digital corpus of extant Syriac liturgical manuscripts and make both complete and fragmentary manuscripts accessible to scholars and the interested public (for a brief description of the project, see here). The model was tested on a selection of Syriac manuscripts and fragments housed at the Austrian National Library. Below, I will briefly discuss the results obtained from testing the model on three of these manuscripts.

The first example that I can share here is “MS ÖNB Cod. Syr. 2” which is a manuscript of Syriac psalms in Serto script with dimensions: 14 x 9.5 cm. It contains 150 of David’s Psalms, usually used for liturgical and other private devotional purposes perhaps as a personal monastic psalter. Using the Vienna Syriac Gospels Model (Serto), I was able to transcribe and identify the text of the psalter successfully. The identification of the text in this manuscript was further facilitated by the ability to search the recognized Syriac text online. Since it is a biblical text, many available online corpora helped verify the content of the HTR-recognized images.

The second test was on “MS ÖNB Cod. Syr. 3” which is a Gospel parchment fragment written in Syriac Estrangelo script used as a lectionary (biblical readings for the liturgical services). The dimensions of this fragment are 35.5 x 26 cm. Although it is undated, paleographically, the manuscript can be dated to approximately the 6th or 7th century based on paleographic similarities with the Syriac manuscript of Florence, MS 1.56. As this manuscript is not in Serto script, the Vienna Syriac Gospels Model was not used in this instance. Instead, I used the OCR/HTR tool of Google Lens to recognize its texts and link some of the recognized words with some of the online Syriac Gospel textual corpora, confirming the fragment’s content as Matthew 5:19-22. This demonstrates the potential of HTR technology in efficiently identifying manuscript texts.

The third test for this post was conducted on the parchment manuscript “ÖNB Cod Syr 6”. Its dimensions are 31 x 21 cm, with 209 folia. It is a Syrian Orthodox liturgical Fenqitho ( a hymnal for Sundays and Feast Days of the West Syriac liturgical year), written in Estrangelo script. Although there is no colophon to indicate its date, based on paleographic estimates, it can be dated between the 9th and 10th centuries (probably earlier), which can be considered then one of the oldest Fenqitho manuscripts. The HTR tests on this manuscript could recognize its texts and link some of its texts with those offered by the ISP project.

There were other tests on the Syriac manuscripts and fragments in the Austrian National Library, which proved the functionality of the Syriac HTR tools, with a promising near future for an integrated Syriac ecosystem. The complete identifications will be posted here gradually on the website of the ISP project in addition to a forthcoming publication and an edition of these scattered Syriac puzzles in ÖNB Vienna and other libraries.

III. Final Words: Sharing is Caring!
The Vienna Syriac Gospels Public Model on Transkribus is an initiative to encourage other projects to share their models publicly so everyone can benefit. In this post we have observed how privately developed models, even indirectly, contribute to fine-tuning HTR capabilities for tools like Google Lens, improving recognition of scripts such as Estrangelo and East Syriac. This improvement most likely occurred because many projects have transcribed texts available in databases, which is invaluable for linking recognized texts in manuscript images with those in Syriac corpora. Therefore, if you have a model trained on data that can be shared, consider making it public to benefit the entire Syriac community! Sharing via HTR tools like Transkribus or on platforms like GitHub and Zenodo facilitates collaborative development, expands access to these important resources, and supports the Syriac digital ecosystem.


4 thoughts on “From Vienna to the World: Launching the First Public Syriac HTR Model on Transkribus”