At the Dawn of Digital Studies on Arabic Script in France

As Dominique Stutzmann points out in an article for the renowned French media outlet L’Histoire in 2017, paleography has undergone a “digital revolution” over the past ten years. The contribution of digital technologies to paleography and codicology has profoundly transformed the study of Latin and French scripts, especially through advances in handwritten text recognition (HTR). This evolution has addressed several fundamental challenges, such as the indexing and transcription of medieval manuscripts, thereby opening new perspectives for researchers. These advancements are also beginning to influence Arabic paleography, enabling researchers to tackle similar challenges in the study of Arabic manuscripts and broadening the scope of digital humanities in non-Latin scripts.

Three projects originating in France — but collaborating with other European institutions — can be cited as illustrating this digital revolution and the advancement of digital technologies for paleography : the HIstorical MANuscript Indexing for user-controlled Search (HIMANIS) project and the Hours – Recognition, Analysis, Editions – Heures (HORAE) project led by Professor Stutzmann, along with the Consortium pour la Reconnaissance d’Écriture Manuscrite des Matériaux Anciens (CREMMA).

HIMANIS was launched to improve access to medieval manuscripts from the French royal chancery of the 14th and 15th centuries. In collaboration with institutions such as the Institut de Recherche et d’Histoire des Textes (IRHT), the University of Groningen, and the company A2iA, the project used HTR to automate the transcription of these registers, facilitating indexing and search. The project structured and automatically transcribed a corpus of over 1,400 pages, revolutionizing historical research by making these documents text-searchable.

HORAE, a continuation of HIMANIS, focused on medieval books of hours, a different corpus digitized by the Poitiers media library. This project, funded by the French National Research Agency, developed recognition and segmentation tools tailored to illuminated manuscripts. HORAE led to the creation of the Arkindex platform, facilitating the automatic processing and transcription of books of hours for broader access by researchers and the public.

The CREMMA project, along with its extension CREMMALAB, focuses on advancing digital resources for manuscript studies, particularly for historical and heritage materials. Initiated in 2017 under the Île-de-France region’s DIM-MAP (Domaine d’Intérêt Majeur – Matériaux Anciens et Patrimoniaux) program, CREMMA formed a consortium to build an accessible server infrastructure aimed at providing manuscript recognition tools through the eScriptorium platform. This platform is supported by various academic and research institutions, including the INRIA’s ALMAnaCH team and the École Nationale des Chartes, among others​.

The CREMMALAB component, established in 2021, emphasizes user support, documentation, and scholarly discussions. Under the guidance of postdoctoral researcher Ariane Pinche, CREMMALAB has organized workshops, training sessions, and seminars to foster collaborative discourse on manuscript text recognition. These initiatives facilitate research, particularly for projects dealing with medieval manuscripts, by offering training materials and promoting community engagement. These activities culminated in a dedicated seminar series at the École des Chartes, titled “Ancient Documents and Automated Handwritten Text Recognition,” which took place on June 23–24, 2022. This event brought together scholars to assess the state of HTR technologies and discuss the technical and methodological challenges they present, marking a milestone for CREMMA’s contributions to digital paleography.

Discussing these projects focused on Latin and Old French is essential. What we refer to as the dawn of digital studies on Arabic script has already been, and will undoubtedly continue to be, inspired by these innovations developed for Latin and French script corpora.

Study of Arabic Script and Digital Technology Ongoing Projects : CALLFRONT and Others. 

In France, the study of Arabic script spans codicology, the history of the book, and art history, with key contributions from institutions like the Collège de France, where François Déroche focuses on Quranic manuscript codicology, the École Pratique des Hautes Études (EPHE), which offers specialized programs such as ‘History and Codicology of Arabic Manuscript Books,’ and the IRHT, which provides resources like the Molé collection and organizes seminars on Arabic codicology.

The script is also studied within Islamic art history, with institutions like the Louvre, Sorbonne University, and the Institut National d’Histoire de l’Art (INHA) analyzing Arabic calligraphy as an art form. The Callfront project, a collaboration between Sorbonne and INHA, highlights this interdisciplinary approach and stands out as one of the most prominent projects in this field. 

The CallFront project, titled ‘Calligraphies en caractères arabes aux frontières du monde islamique,’ is a collaborative research initiative led by Eloïse Brac de la Perrière, in collaboration with Maxime Durocher. Funded by the French National Research Agency since January 2023 for a duration of 36 months, it also receives support from the Sorbonne University’s OPUS (Observatoire des Patrimoines de Sorbonne Université), the Barakat Trust, and the Max van Berchem Foundation.

This project is dedicated to studying calligraphic styles in Arabic script from regions such as the Iberian Peninsula, the Maghreb, Sub-Saharan Africa, Anatolia, the Balkans, India, Southeast Asia, and China. Its aim is to document and understand the production and use of Arabic calligraphy in these areas, considering their geographical distance from the historical center of Islamic civilization and their interactions with other cultural and linguistic communities.

To achieve its goals, CallFront is structured around three main research axes :

1. Gathering and describing isolated corpora 

This part of the project aims to bring together calligraphic styles that have been studied in isolation, to compare them and identify potential connections. A digital corpus, developed with the Omeka-S open-source content management system and hosted by the INHA, will be made available online in 2025 to facilitate these analyses. The contributors to these corpora hail from various institutions around the world and are not exclusively from the field of art history. For instance, Nurila de Castilla, whom we have consistently highlighted throughout the article, is among the notable contributors, once again showing that the study of script is not the exclusive domain of art history or codicology but is shared between these two disciplines.

2. Documenting border calligraphic styles 

Given the scarcity of information on these styles, the project collects conventional sources (biographies, hagiographies, historical chronicles) and lesser-known documents circulated within calligraphy circles (manuals, teacher templates, corrections).

3. Analyzing calligraphic practices

In collaboration with a team of professional calligraphers coordinated by Nuria Garcia Masip, CallFront seeks to reconstruct the creation methods of calligraphic styles whose practice has often disappeared.

The CallFront project employs several digital technologies to advance the study of Arabic calligraphy at the frontiers of the Islamic world. It uses Omeka-S to create a shared digital corpus hosted by the INHA. This corpus is based on standardized protocols designed to describe a wide variety of calligraphic styles. Additionally, the project is developing a custom ontology to establish a unified terminology. This ontology is grounded in the general features of writing and involves a systematic analysis of letter morphology, enabling comparative studies across different styles.

Ultimately, CallFront aims to enhance the understanding of Arabic script calligraphic traditions on the borders of the Islamic world by combining historical, artistic, and practical approaches. To date, it is the most successful project in the study of Arabic script using digital technology. But other projects or studies could be just as promising. 

The Projects of the École Pratique des Hautes Études on Digital Humanities Related to Arabic Script

In August 2024, the École Pratique des Hautes Études launched a call for applications for a doctoral fellowship in the modeling and digital analysis of Arabic paleography linked to a research group entitled “Arabic Pal”. This group will work in close collaboration with the teams of the ERC Synergy MiDRASH project (Migrations of Textual and Scribal Traditions via Large-Scale Computational Analysis of Medieval Manuscripts in Hebrew Script). This PhD will focus on the development of descriptors and models for the study of Arabic script and medieval Arabic manuscripts. 

Speaking of PhDs, at the same time, Riham Mokrani is working on another promising and innovative doctoral thesis, entitled “Étude numérique des écritures arabes de la période mamelouke” (‘Digital study of Arabic script from the Mamluk period’). This thesis focuses on the application of digital and computational methods to the paleographic analysis and codicological modeling of Mamluk Koranic manuscripts (1250-1517). It begins with a paleographic and codicological description, then focuses on the adaptation of digital paleography tools to Arabic script and the creation of a method for codicological modeling of Koranic manuscripts. Next, a computational analysis will be carried out using the previously established descriptions, aiming to situate the scripts according to three criteria: date of copying, place of copying and copyist. The thesis aims to provide an overview of Qur’anic manuscript production in the Mamluk period, analyzing regional influences and internal factors of the Mamluk political system. The results of the analysis will be contextualized according to the evolution of Mamluk society, production centers and socio-political changes of the time, in order to understand how these elements shaped the tradition of Qur’an copying. 

This work could not only enrich the understanding of Qur’anic production at the time, but also provide a solid methodological foundation for future research in Arabic paleography which looks exciting and promising. 

Future Developments: Drawing Inspiration from What Has Been Done in Latin?

Another area where the École Pratique took action was the organization of a doctoral symposium a doctoral symposium on the History of the Book and Arabic Codicology took place, organized by Riham Mokrani (EPHE) and Abduselam Fetic (EPHE) and led by Nuria de Castilla (EPHE, Proclac) on May 28 and 29, 2024. It was intended for PhD students seeking to deepen their knowledge of Arabic codicology to better understand the sources they work with. 

An afternoon session was dedicated to methodological advances in the study of Arabic script and included a segment focused on digital humanities. 

Five speakers had the opportunity to give presentations during the afternoon session, including three who focused on Arabic script. Among them was François Déroche (CdF, Proclac), a renowned specialist in Arabic paleography and codicology, who holds a chair at the prestigious Collège de France dedicated to the history of the Qur’an and its transmission. He gave a general presentation on the analysis of Arabic script and its challenges. In a more digitally-oriented aspect, Benjamin Kiessling (EPHE, Aoroc), one of the main architects of the eScriptorium software, discussed recent advancements in optical character recognition for historical Arabic documents. Finally, Riham Mokrani, a doctoral candidate at the EPHE, provided a detailed explanation of the methodology she plans to use for her paleographic study of Mamluk Qur’anic manuscripts through digital humanities, which is the subject of her ongoing PhD dissertation.

Ariane Pinche (CNRS, Ciham) – who, as we mentioned, made significant contributions to the CREMMA project as a postdoctoral researcher –  and Malamenia Vlachou (IRHT ENPC, LIGM) completed the list of speakers. Specializing primarily in Old French and Latin scripts, their presentations respectively addressed the themes of approaches to transcribing manuscripts and the integration of paleographic and computational analysis for examining handwriting. The presence of two researchers specialized in Old French and Latin scripts at a conference dedicated to Arabic codicology is telling: as the session’s conclusions suggested, studies on Arabic script were still in a primitive stage in terms of digital technology usage. Drawing inspiration from advances in Latin script studies, while taking into account the differences between the two scripts, was a starting point to advance the field. 

Conclusion 

The exploration of Arabic script through digital technologies in France has demonstrated substantial progress but also highlights the field’s nascent stage compared to studies on Latin script. Projects like CallFront and the doctoral research initiatives under the École Pratique des Hautes Études have laid foundational work by integrating methodologies from established studies in Latin paleography while addressing the unique challenges of Arabic script.

While Arabic script studies are increasingly adopting digital tools, further advancements are essential to unlock their full potential. Drawing from the successes of Latin paleography and adapting these techniques to the distinctive complexities of Arabic script provides an exciting avenue for future research. As this field matures, it holds the promise of reshaping our understanding of historical Arabic texts and their contexts, enriching both scholarship and public engagement with this rich underrated cultural heritage.

2 thoughts on “At the Dawn of Digital Studies on Arabic Script in France

Leave a comment