The digital world is, now more than ever, becoming a necessity in our daily life: due to the unexpected situation caused by the COVID-19 pandemic, most of us are working from home. This has not only resulted in an overall increase of time spent using the computer, but also – and maybe more productively – in a boost in curiosity for finding out interesting stuff online.
This is, at least, what happened to me when I recently found out about an interesting initiative of digitization of India-related material going on at the Giorgio Cini Foundation (San Giorgio, Venice).
The Cini Foundation is an internationally-known cultural institution devoted to the development of scientific research – especially humanities-related research – that has its seat on the Isle of San Giorgio, in the Venice Lagoon. Among various kinds of important historical and literary documents, the foundation owns some materials that might be relevant for those of us researching Indian music.
In fact, Alain Daniélou, a great connoisseur of India, founded the Istituto Interculturale di Studi Musicali Comparati (IISMC) in 1969 at the Cini Foundation. Since its beginning, the aim of the institute has been that of promoting courses, seminars, conferences, and exhibitions on world-music, trying to maintain an intercultural approach on musical studies. Archiving audio, visual, and textual works from different musical cultures was one of the main goals and this led to the creation of the “Fondo Daniélou.”
The Fondo Daniélou originated from the donations that Alain Daniélou gave to the Cini Foundation in 1971. The archive consists of a collection of literary material on music, philosophy, arts and religions of India, and groups more than 300 manuscript-copies of Indian śāstras (treaties) on music. It also includes around 300,000 micro-index cards classifying Indian musical terminology and “secondary sources” on Indian culture and musical theory.
The digitization project at the Cini Foundation had already started in 2001, when the index cards of the collection were scanned with the aim of making them available to a larger audience. The publication A Descriptive Catalogue of Sanskrit Manuscripts in the Alain Daniélou’s Collection at the Giorgio Cini Foundation (Biondi, 2017) gives a definite list of the manuscripts grouped in the Fondo, not only providing information on the Indian libraries from which they were copied, but also naming 43 paṇḍits who had worked on Daniélou’s project. After that, the 178 elements were made available through International Image Interoperability Framework (IIIF) technology on the Foundation’s website.
The resource is quite well-structured, although the only available language is Italian; the interface is extremely friendly to any kind of users and appears as such:
If you click on one of the elements, you will be redirected to a page where all the metadata of the element are listed:
Metadata of the handwritten copy of Madanapāla’s Ānandasanjīvanisaṅgīta done by A. Daniélou.
The nicest feature is the users’ ability to access the original images (click on “accedi all’allegato digitale”) of the chosen element and play with them a bit, as you see here:
IIIF digitalization of Madanapāla’s Ānandasanjīvanisaṅgīta
They use Mirador, “a configurable, extensible, and easy-to-integrate image viewer, which enables image annotation and comparison of images from repositories dispersed around the world.” This tool has been optimized to display resources from repositories that support the IIIF, and within the 178 digitized pieces of Fondo Daniélou, you can adjust brightness, contrast, saturation, toggle the grey-scale, and invert colors.
This would be a nice tool in itself, but there is actually more to come: the most recent project, initiated by the Cini Foundation in 2019, aims at digitizing the whole collection through Optical Character Recognition (OCR), something that is only in its infancy whit regard to most of Indian languages.
Through the ARCHiVe project, in fact, a group of researchers has been trying to run Optical Character Recognition (OCR) algorithms on a set of 30 cards from the collection, using tools such as the Google Vision API OCR and Transkribus. The scripts are mostly Roman and modern Devanagari, both handwritten and typed – an “easy” data set compared to most of the ancient manuscripts we generally deal with.
However, the results – as far as it appears – are quite positive and give hopes even to classical Indologists, for whom OCR is almost a dream! Manuscript tradition is extremely developed throughout Indian history and this make things more complex: the variety of handwriting types and of scripts would require an intensive training for any machine, and the errors would probably be still very high but…there is hope, and we are really curious to see what the results will be!