For my first post here on African studies and the use of digital tools, I’ll talk about my direct experience. I’d like to share my experience of building a collaborative team for transcribing a complex corpus, composed of amateurs and scholars, that met IRL and/or online or never met but shared the same tool. The ways they used this tool and the reason why they transcribe were tremendously diverse but the result proves to be really efficient and the process joyful and intellectually challenging.
Antoine d’Abbadie’s Notebooks in the Horn of Africa (1837-48): the Challenges of a Complex Corpus
The 20 notebooks of the French scholar Antoine d’Abbadie contain the notes he took during his stay in the Horn of Africa (today’s Ethiopia, Eritrea, and Djibouti) between 1837 and 1848. Preserved at the French National Library (BNF), these notebooks consist of about 4000 pages of tight handwriting, which challenged the eyes and patience of previous generations of researchers, and even that of Antoine d’Abbadie himself, who made little use of his own notes during his long scientific career. The black and white microfilming carried out in the 1960s did not facilitate access: the fuzzy black and white reproduction lacks contrast and the reading of the microfilm rolls proves time-consuming and complex to navigate through the scattered notes. The only catalogue describing these notebooks was faulty and partial. The difficulty of accessing these mythical notebooks is not only due to their materiality.
The content also presents methodological and disciplinary challenges. On the one hand, the notes were taken as they went along for eleven years, in apparent disorder. On the other hand, the disciplines covered are numerous: geography, hydrography, meteorology, history, epigraphy, anthropology, philology, codicology… Similarly, the numerous languages and writing systems used show that we have lost the extended skills of the comprehensive and polyglot scholars of the 19th c. Or at least that contemporary scholarship no longer values this type of erudition. French remains the main language – which does not make it easy to access for Ethiopian colleagues, in particular. This French is full of terms in local languages transliterated into Latin characters with numerous diacritical marks, sometimes created by Antoine d’Abbadie. Of course, there is an intensive use of Ethiopian abugida to write down the Ethiopian-Semitic languages, which follows vernacular use, but also to write down all the other languages encountered during his stay, even if it meant inventing glyphs to render sounds. Antoine also uses Greek and Hebrew for his internal referencing system. He sometimes writes in English, Latin, and Basque. Finally, he uses shorthand to write more quickly, or to hide some of his remarks.
All this explains why this corpus has remained under-used until now. Digital tools and collaborative sciences have removed these obstacles.
Open Access to High Definition Images
The Transcribing Antoine d’Abbadie project (2020-2023) is a collaboration between the BnF and the CNRS, which aims to acquire the text of all the notebooks and to publish it electronically. It is headed by myself, Anaïs Wion (CNRS), and Vanessa Desclaux (BnF), with the very precious collaboration of Mathilde Alain who had a one-year contract in 2020-2021.
The first step has been to digitise the corpus in colour and in high definition. These images are deposited in Gallica, the digital library of the BnF. How to transform these images into text? There are two possible methods. On the one hand, manual transcription, which is the subject of this post. On the other hand, Handwritten Text Recognition (HTR), which depends on the first one and with which we have also experimented (but that’s another story – keep an eye on my forthcoming posts). In order to carry out the transcription, the images of the notebooks are imported thanks to the IIIF protocol into the Transcrire tool, developed and produced under the direction of Fabrice Melka (IMAF, CNRS) within the framework of the Consortium Archives des Ethnologues (in the future, I will present the HumaNum Consortiums and how they serve the Social Sciences with the use of digital tools).
Transcrire aims to allow the manuscripts and archival collections of French institutions to be transcribed in a participatory way, with everyone being able to create an account to transcribe among the many collections proposed.
Technically, this tool (which was revamped in 2020) is based on Omeka-S, with the Scripto plugin for transcription tools. It offers back-office tracking of transcribed pages, with a graduated validation system that allows you to declare a page complete but not close to changes, or to declare its transcription complete and close it to changes. As any functionality can be abused, we used this page closure function to distinguish pages that we had chosen not to transcribe, such as maps for example!
Transcribing is about Making Choices: the Need for Guidelines
Transcribing is not an automated act. It is a scientific act that requires choices to be made. Respecting or not respecting line breaks is a fairly simple choice. Rendering diacritical marks that do not exist today in unicode requires imagination. We have written two specific guides for the transcribers. The first guide leaves out all the difficulties, proposing to transcribe as closely as possible to the text, but without taking into account the complexity of the layout or the spelling. The second guide is aimed at “experts” who have already acquired some experience. It renders diacritics or marks off some structuring elements of the layout.
We then uploaded the notebooks gradually onto the Transcrire platform, as it would have been counterproductive to put too much material online at once. Transcribers need to see their work progress. A huge corpus could cause discouragement indeed.