This article was written by guest contributor Adrien de Jarmy (University of Strasbourg) and Clarck Junior Membourou Moimecheme (Sorbonne Nouvelle University). The author bio is below.
The BADR Project, “Writing and Memory of the Battle of Badr (7th–21st c.)”, examines the genesis, transmission, and sociopolitical uses of narratives about the Battle of Badr (2/624) from their earliest attestations to contemporary reinterpretations. It explores how these narratives were written, reshaped, and mobilized as a repertoire of action and legitimation in religious, legal, and military contexts throughout the history of the Islamic world.
The project unfolds in two complementary parts. The first aims to retrace, at multiple scales, the evolution of Arabic narratives of the Battle of Badr up to the 13th century, when Ibn Sayyid al-Nās (d. 734/1334) produced a decisive synthesis. A TEI-XML encoding standard was designed for premodern Arabic texts following the logic of hadith literature – that is, texts composed of a main narrative (matn) associated with a chain of transmission (isnād). This model makes it possible to identify and structure the named entities (persons, places, objects) contained in the corpus, before integrating them into a relational database intended to generate visualizations and dynamic analyses.
The second focuses on the memories and reinterpretations of the Battle of Badr from the 13th to 21st century. These questions were at the heart of the international conference held at the University of Strasbourg on 13–14 November 2025, which gathered scholars working on the long afterlives of Badr in the fields of history, theology, literature, and the social sciences. A collective volume synthesizing the two parts of the project is expected to be published in 2027.
Methodology and Results of the Project
The methodology of the first part – namely the semi-automated processing of the BADR textual corpus – relies on four key stages: data extraction, semi-automated text markup, database creation, visualization, and data deposition. The BADR project relies solely on free or open-access tools, software, and languages (language: Python; Python libraries: matplotlib, PySide6, PyMySQL; tools: Notepad++, VS Code, MySQL, phpMyAdmin), which take into account the specificities of Arabic script (RTL, right-to-left).
1. Corpus Construction
This phase involved the identification and extraction of texts available in .txt format from online digital libraries, particularly al-Maktaba al-Shamela and the OpenITI repository developed within the KITAB Project. The extraction was carried out thanks to the work of two MA student interns within the project. The texts were then cleaned (removal of footnotes and the lightweight Markdown-based markup system) to facilitate subsequent processing of raw data.
2. Semi-Automated Text Markup
The BADR project aims to develop a TEI-based approach for the semi-automated encoding of premodern Islamic texts. It focuses on the development and implementation of a standardized TEI-XML schema. Elaborating an encoding guide was essential for defining tag and attribute choices during both manual and automated markup, applied to selected text extracts to produce a dataset (On this encoding guide, see A. de Jarmy and C. J. Membourou Moimecheme, “Developing a TEI-based Approach for Encoding Premodern Islamic Texts,” Journal of the Text Encoding Initiative, forthcoming in 2026):

Fig. 1. Manual encoding of an excerpt from the Sunan of Abū Dāwūd in Notepad++
Using this dataset and the implementation of several Python scripts constituting a whole processing environment made it possible to perform complex technical operations:
1. Automated encoding of all texts into TEI format based on the analytical outputs, especially named entities: personal names mentioned in the texts, authorities cited in the chains of transmission (asānīd), names of social groups such as tribes, geographical information (cities and physical features such as hills, valleys, and, specifically in the case of Badr, wells), supernatural entities such as angels, objects (clothing, weapons, booty), and animals (particularly horses and camels).
2. Identification and extraction of Qurʾānic and poetic passages.
3. Detection of recurrent narrative motifs based on a predefined selection (text reuse).
3. Database Creation, Data Visualization, and Deposition
The automatically extracted data are stored and structured in a relational SQL database. Comprising 19 tables, it is regularly enriched and corrected. It is accessible online through the phpMyAdmin web application. In total, the database includes 43 texts (Sīra-maghāzī texts, ṭābaqāt, recollections of hadiths, tafsīr and dalāʾil al-nubuwwa), around 700,000 words, and several tens of thousands of encoded named entities.

Fig. 2. BADR project database in phpMyAdmin
The database can be freely accessed on the server of the University of Strasbourg, and researchers can query cross-referenced named entities using SQL. An application was also developed to facilitate the visualization of data stored in the database. The use of powerful Python libraries such as matplotlib made it possible to transform raw data into informative charts and to generate visualizations for macro-level analyses of named entities. The encoded texts will then be deposited on the Nakala data repository.
4. Examples of Research Topics
Queries in the database allow for precise mapping of texts on the Battle of Badr up to the 13th century and for the exploration of a wide range of topics. Among these, one line of inquiry focused on the authorities cited in the chains of transmission (asānīd).

Fig. 3. Main authorities quoted in chains of narrators or asānīd
Fig. 3 displays the main authorities cited in our corpus. The predominantly Medinan origin of the narrators is immediately apparent, confirming on the one hand the Medinan formulation of the earliest narratives of the battle of Badr. However, the hierarchy of narrators revealed by the data does not correspond to what one would expect. The importance traditionally attributed by modern historians to the letter of ʿUrwa b. al-Zubayr (d. 94/712–13) to the Umayyad caliph ʿAbd al-Malik – together with recent studies on the narratives ascribed to him – has often led to the assumption that his version, transmitted by Ibn Isḥāq (d. 150/767) and al-Ṭabarī (d. 310/923), was dominant. Yet Fig. 3 suggests a different picture: the account of Mūsā b. ʿUqba (d. 141/758–59) in his Kitāb al-maghāzī and the narrative attributed to ʿIkrima (d. 105/722–23), appears to have been preferred. This shift invites a re-evaluation of the relative weight of early authorities in the transmission of the Badr narratives and highlights how quantitative approaches can nuance long-standing assumptions in the historiography.
Another example of how this method allows for nuanced research concerns the space devoted to each character in the narrative.

Fig. 4. Percentage distribution of the main characters mentioned in the corpus
Overall, the Battle of Badr is primarily a text about Muḥammad, which is unsurprising. Fig. 4 shows that in the entire corpus, significant attention is also given to Companions such as ʿUmar b. al-Khaṭṭāb and ʿAlī b. Abī Ṭālib. In this regard, these figures reflect the dominant approach of the historiographical texts that make up much of the corpus, such as the maghāzī literature.

Fig. 5. Percentage distribution of mentions of the main characters in the Tafsīr of Muqātil b. Sulaymān
However, Fig. 5 presents a different picture in the Tafsīr, the Qurʾānic commentary of Basran Muqātil b. Sulaymān (d. 150/767), in which Abū Jahl appears immediately before Muḥammad. While the dominant narrative portrays the Battle of Badr as the story of Muḥammad and his Companions fighting the Quraysh, in Tafsīr literature it is framed more as a personal confrontation between Muḥammad and Abū Jahl.
The precise mapping of the texts also makes it possible to identify when new elements are introduced. We may take as an example the mentions of angels in the corpus, a theme already attested in the Qurʾān itself.

Fig. 6. Named entities associated with angels in Ibn Hishām’s (d. 218/833) Sīra al-nabawiyya
In Fig. 6, we see that in Ibn Hishām’s (d. 218/833) Sīra al-nabawiyya, a work largely based on Ibn Isḥāq, angels are mentioned primarily as a collective group, and within this group only Jibrīl appears as an individually named figure.
Fig. 7. Named entities associated with angels in al-Wāqidī’s (d. 207/823) Kitāb al-maghāzī

Fig. 8. Named entities associated with angels in Ibn Saʿd (d. 230/845) Ṭabaqāt al-kubrā
However, as Fig. 7 and Fig. 8 show, al-Wāqidī (d. 207/823) explicitly introduces the figures of Mikāʾīl and Isrāfīl, the latter also being cited by his disciple Ibn Saʿd. Isrāfīl is an angel with an eschatological function, expected to appear at the end of time. Therefore, with al-Wāqidī and Ibn Saʿd, the narrative of Badr acquires new angelological dimensions and incorporates elements that go beyond the earlier, more restrained angelology of the sīra tradition.
Conclusion
The BADR Project demonstrates how the combined use of TEI-XML encoding, relational databases, and quantitative visualization tools can shed new light on the composition, transmission, and evolution of premodern Islamic narratives. By structuring a large corpus of sīra, maghāzī, tafsīr, and hadith-related materials, the project makes it possible to trace not only the geographical and intellectual origins of key authorities, but also the shifting narrative weight assigned to individual actors, motifs, and supernatural elements across time and genres. These results show that digital methods do not supplant traditional philological approaches; rather, they reveal patterns and offer new hypotheses about the formation of early Islamic historiography.
Beyond its technical achievements, the BADR Project provides a reproducible model for the encoding and analysis of premodern Arabic texts, one that can be extended to other corpora and research questions. As the encoded texts are deposited and the database continues to expand, the project opens new avenues for collaborative research at the intersection of Islamic studies, digital humanities, and the history of historiography. The forthcoming collective volume will further synthesize these findings and highlight the broader implications of the longue durée study of the Battle of Badr for our understanding of Islamic memory and narrative production.
Project Members
Digital Humanities Team
PI: Adrien de Jarmy (Associate Professor, University of Strasbourg, GEO – UR 1340)
Clarck Junior Membourou Moimecheme (Associate Professor, Sorbonne Nouvelle University, CEAO – EA 1734)
Maxime Antonio (Developer)
Conference Organization
PI: Adrien de Jarmy (Associate Professor, University of Strasbourg, GEO – UR 1340)
Renaud Soler (Associate Professor, University of Strasbourg, GEO – UR 1340)
Clarck Junior Membourou Moimecheme (Associate Professor, Sorbonne Nouvelle University, CEAO – EA 1734)
Éric Vallet (Full Professor, University of Strasbourg, GEO – UR 1340)
Adrien de Jarmy is Associate Professor of Islamic Studies and Digital Humanities at the University of Strasbourg (France). He defended his PhD in 2023 at Sorbonne University under the supervision of Mathieu Tillier. His dissertation, entitled Historiographical and Normative Conceptions of the Figure of the Prophet Muḥammad: From the Earliest Attestations in the Sources to the 3rd/9th Century, examines the formation and circulation of early representations of the Prophet Muḥammad in Islamic historiography and normative literature. Following his doctorate, he was awarded a grant from the French Institute of Islamology (IFI) to develop and lead, as Principal Investigator, the BADR Project.
Clarck Junior Membourou Moimecheme is Associate Professor at Sorbonne Nouvelle University (France). He defended his PhD in 2022 under the supervision of Éric Vallet and Yves Coativy at the University of Western Brittany. His dissertation, entitled The Emirs of Mecca (13th–14th Centuries), focuses on the political and social history of Mecca during the late medieval period. He subsequently joined the BADR Project as a postdoctoral researcher, before being appointed Associate Professor at Sorbonne Nouvelle University in 2025.


One thought on “The BADR Project (7th–21st c.) – A TEI-Based Approach to Premodern Islamic Texts”