Artificial Intelligence for Garshuni-Arabic on Gemini

The recent rise of Artificial Intelligence (AI) chatbots like Gemini has opened up exciting new possibilities for interacting with and studying languages, including Syriac. In my previous posts on AI for Syriac using Google Gemini AI (see: Part1 and Part2), I explored Gemini’s capabilities in communicating, translating, and generating Syriac text. For those unfamiliar, Garshuni-Arabic refers to the practice of writing Arabic in the Syriac script, an old tradition that arose especially from the close interaction between these two Semitic languages: Syriac and Arabic. Today, with the help of AI tools like Google Gemini, we can explore how technology can bridge the gap for Syriacists who cannot read Arabic or Arabists who cannot read Syriac.

In this post, I will dive deeper into how AI can help tackle the complexities of transcribing Garshuni texts, saving scholars time and effort. I will share some tests I conducted using Google Gemini to transcribe portions of my recently published article “A ‘Colophon’ or a ‘Chronicle’? A Lengthy Garshuni-Arabic Colophon”[1] and show how the AI handles transcription from Syriac to Arabic script, as well as translation into English. Finally, I will address some challenges and improvements that could make AI even more effective for Garshuni.


[1] Published in: Literary Snippets – A Colophon Reader, ed. George A. Kiraz and Sabine Schmidtke (Gorgias Press, 2024), 99–142. Open access: https://www.degruyter.com/document/doi/10.31826/9781463244033-008/html

1. What is Garshuni or the Garshunography Phenomenon?

Garshuni is the practice of writing Arabic using the Syriac alphabet (see also a previous post in DO, written by Rosie Maxton, on G/Karshuni here). This phenomenon, sometimes referred to as Garshunography, emerged during centuries of cultural exchange between Syriac and Arabic-speaking communities, particularly among Middle Eastern Christians. Garshuni allowed these communities to engage with Arabic-language texts in religious, liturgical, and everyday contexts while still using the familiar Syriac script. This practice helped preserve a connection to the “Holy Language” of Syriac, which was seen as an essential part of their Christian identity.

Although early evidence is sparse, some suggest that Garshuni might date back to as early as the 6th century, though the earliest known text—a note about ink preparation (British Library Add. 14,644)—is from the early 6th century.[2] This example, written in the Estrangelo script, points to the practicality of using Syriac script for another language, but such early evidence is rare.

In addition to Arabic, other languages such as Armenian, Greek, Kurdish, Latin, Ottoman Turkish, Persian, Malayalam and others have used similar allographic traditions, with texts written in Syriac script. Notable examples include the use of Syriac for Greek in the Melkite tradition, with one early fragment from the Damascus National Museum (9th-11th centuries) containing a mixture of Syriac rubrics and Greek prayers.[3]


[2] Françoise Briquel-Chatonnet, Alain Desreumaux, and André Binggeli, “Un Cas Très Ancien de Garshouni? Quelques Réflexions Sur Le Manuscrit BL Add. 14644,” in Loquentes Linguis: Studi Linguistici e Orientali in Onore Di Fabrizio A. Pennacchietti = Linguistic and Oriental Studies in Honour of Fabrizio A. Pennacchietti = Lingvistikaj Kaj Orientaj Studoj Honore al Fabrizio A. Pennacchietti, ed. Pier Giorgio Borbone, Alessandro Mengozzi, and Mauro Tosco (Wiesbaden: Harrassowitz Verlag, 2006), 141–147.

[3] Sebastian P. Brock, “Greek and Latin in Syriac Script,” Hugoye: Journal of Syriac Studies 17:1 (2014): 33–52.

Functionality and Ideology in Garshuni

The rise and persistence of Garshuni can be understood from both functional and ideological perspectives. Functionally, Garshuni made Arabic texts accessible to those more familiar with the Syriac script than Arabic script, particularly in liturgical settings. For example, this practice allowed communities to accurately transmit prayers and biblical readings in Arabic, without losing their Syriac heritage.[4]

Ideologically, Garshuni reinforced a cultural identity among Syriac-speaking Christians, allowing them to maintain a connection to their religious traditions and sacred language even as they adapted to the growing use of Arabic. I recall the late Fr. Barsoum Ayyub in Aleppo (d. 1998) who preferred Garshuni for liturgical readings to preserve the sanctity of Syriac.[5]

Garshuni as a Cultural Bridge

Garshuni then acts as a bridge between languages and cultures, preserving the identities of communities who were transitioning to Arabic while keeping Syriac alive. The diversity of Garshuni texts, ranging from liturgical manuscripts to personal letters and chronicles, showcases how this practice adapted over time. However, its complex orthographic variations, dialectal influences, and the interplay between Syriac and Arabic make it a challenging field of study.

Thanks to advances in AI like Google Gemini, we can explore new possibilities for transcription, analysis, and translation of Garshuni texts. Nevertheless, some challenges remain as obstacles.


[4] For the functionality argument, see: George Anton Kiraz, “A Functional Approach to Garshunography: A Case Study of Syro-X and X-Syriac Writing Systems,” Intellectual History of the Islamicate World 7:2–3 (2019): 264–277.

[5] For a brief biography of Fr. Barsoum Ayyub, see: G. A. Kiraz, “Ayyub, Barsoum,” in Sebastian P. Brock et al. (eds.), The Gorgias Encyclopedic Dictionary of the Syriac Heritage (Gorgias Press, 2011), 49.

2. Gemini AI and Garshuni Tests

Transcription from Syriac to Arabic Script

The first test I conducted with Google Gemini was to transcribe a typed Garshuni text. I copied the Syriac-scripted text into Gemini and asked it to transcribe it into Arabic script. The AI handled the task very well, even correcting and vocalizing some Arabic words automatically. For instance, words like “نُصِفَ”, “المُختصين”, “جَلّت”, “ويَدْنيه” etc., were given the correct vowel markings (tashkil), suggesting that Gemini could semantically interpret the text.

Upon receiving the Arabic text, I checked it quickly and corrected some ambiguities. This was not surprising, since Syriac has fewer letters than Arabic. Consequently, when writing in Garshuni it is necessary to modify some Syriac letters to express Arabic letters which have no equivalents in Syriac. However, these solutions were variable and sometimes inconsistent. This most likely reflects the different ways of writing similar-sounding words or differences in pronunciation of the same words. That is to say, one must expect that corrections to the automatic text transcription for Garshuni texts are often necessary.

Some transcription challenges I encountered due to the limited number of Syriac letters compared to Arabic are as follows. The Arabic word “ܟ̣ܬܐܡ“) ”ختام” meaning “the end”) was misinterpreted as “كتاب” (meaning “book”). This likely happened because the AI tried to match the word to the context of the surrounding lines. Another example was the confusion between ܠܟܕܡܗ̈ (meaning “to the service”) and the nonsensical “لكُدَمِه” in Arabic, caused by the same Syriac letter “ܟ” being used for both “ك” and “خ” in Arabic. Another similar case is the Garshuni word “ܐܠܡܦܪܘܛܗ” which is represented as “المَفروطة” while it should be “المفروضة” (meaning  “mandatory”).

Despite these challenges, Gemini was able to pragmatically interpret other words quite well. For instance, it replaced “ܘܝܥܨܝܗ” (meaning “he disobeys”) with the synonymous Arabic phrase “ويُغضِبه” (meaning “he angers him”). This is impressive considering the complex interplay of meaning and context required to make such a leap.

Translation to English

After transcribing the Garshuni text into Arabic, I tested Gemini’s ability to translate the Arabic into English. The results were promising. While some errors occurred, the AI handled the text well once I combined it into full paragraphs, rather than translating line by line, since Gemini does not know that I have transcribed the text in my publication from a manuscript where I wanted to keep the line numbers.

The translation was improved when I asked Gemini to “take out the line numbers and make the Arabic text as one paragraph”, then “translate it to English”. Of course, there might be other tools for offering other options of translations, such as with Google Translate, but at least with Gemini we can have a workable translation, if we present a controlled Arabic text.

Well, we should keep in mind that when I worked on this text for the publication purpose, I tried to provide a literal English translation of the Arabic text, aiming to capture the simplicity of its colloquial style and allow readers to experience its unique characteristics. So, this specific Arabic of the dialect of Mardin can add another challenge to the automatic translation process. Therefore, I found that it helps to make minor corrections to the Arabic text before asking for translations, as dialectal variations (like the Arabic of Mardin) can complicate things further.

Extract Garshuni Texts from Images

Although there are several methods nowadays to extract Syriac texts from images, such as OCR (Optical Character Recognition) with Google Lens (see my previous posts on Syriac OCR, here, and especially on using Google Lens, here), or with A/HTR (Automatic/Hand Text Recognition which can be done either with Transkribus or eScriptorium), but Gemini AI can also extract the text from images, with very good results (of course, depending on the quality of the images and the digitized texts if they are from printed books or written by hands as manuscripts).

So, since Gemini can also extract Garshuni text from images and transcribe it into both Syriac and Arabic scripts, why not use this feature to provide the transcribed text before translating it into English? I tested this by uploading a photo of a Garshuni text from my article in the published book, and I was impressed by the results. The AI transcribed each line of Syriac-scripted Garshuni, then provided an Arabic-scripted version and an English translation. While some of the earlier mis-transcriptions recurred, others did not, suggesting that Gemini improves its accuracy when working with clearer images.

While comparing these results with the previous tests, some ambiguities were repeated, while others were not. So, the words “ܟ̣ܬܐܡ” and “ܠܟܕܡܗ̈” are again mis-transcribed as “كتاب” and “لكُدَمِهِ”, respectively. Interestingly  “ܐܠܡܦܪܘܛܗ̈” is correctly transcribed as “المفروضة”, and the transcription of the word “ܘܝܥܨܝܗ” is corrected to include appropriate tashkil, as “ويُعْصِيه”! At the end, Gemini gives an “Overall Meaning”, indicating that the text contains a religious theme, but it interprets it as an “inscription or dedication”, which may have been correct if we did not already know that is a colophon at the end of a liturgical manuscript. Finally, Gemini gives “Important Notes” to inform the reader of the confusing aspects of dealing with this Garshuni-Arabic text, especially regarding how to render some words, and to the importance of context in understanding the correct interpretations of the ambiguities in the text.

3. Challenges for AI in Garshuni Transcription

Despite the success of these initial tests, there are still several challenges that need to be addressed for AI to work perfectly with Garshuni texts:

Script Identification: AI must be trained to accurately recognize Syriac script, especially when it is used for Garshuni. Visual similarities between Syriac and Arabic scripts can cause confusion.

Arabic NLP Adaptation: Natural Language Processing (NLP) models need to account for orthographic variations and dialectal influences, requiring retraining on Garshuni-specific data.

Limited Data: The scarcity of digitized Garshuni texts makes it difficult to train AI models, therefore, efforts should focus on digitizing manuscripts and expanding the available datasets for research.

Dialectal Variations: AI systems need to account for the regional and dialectal differences within Garshuni, which reflect spoken Arabic influences from different areas.

Final Remarks

AI offers tremendous potential for Garshuni research, but it will require collaboration between scholars, AI researchers, and communities to unlock its full potential. By developing more sophisticated OCR/HTR/ATR tools specifically trained for Garshuni, with large digital archives of Garshuni texts, and building advanced NLP models capable of understanding the nuances of this script, we can unlock the wealth of knowledge embedded within Garshuni literature. By working together, we can ensure that Garshuni, like Syriac, remains a vital and accessible part of our shared cultural heritage in the digital age.

I invite readers of the Digital Orientalist to join this exciting journey. Whether you are an AI researcher, a Syriac scholar, or someone interested in learning about this fascinating phenomenon, your contributions are invaluable. Together, we can preserve and study Garshuni with different tools of AI!

2 thoughts on “Artificial Intelligence for Garshuni-Arabic on Gemini

Leave a Reply