Legacy cataloguing and TEI: David Pingree and Wellcome Collection

By Alexandra Eveleigh, Stephanie Cornwell, and Adrian Plau.

Wellcome Collection holds one of the world’s most unique manuscript collections. The diversity of the collection presents a massive challenge in ensuring that users can find and access the manuscripts. In an example of our work towards increased accessibility and discoverability, this piece details an ongoing project where legacy cataloguing work undertaken decades ago is used as the starting point for representing hitherto almost completely unstudied parts of our Sanskrit manuscripts on our online catalogue.

Wellcome Collection manuscripts

With its stated aim of exploring health and human experience, Wellcome Collection is typically assumed to contain material directly relating to medicine and related science. While not entirely untrue, the assumption does cloud the range and depth of the collections, tracing in turn back to the unusually expansive approach to collecting espoused by Sir Henry Wellcome and his many purchasing agents. The collection of South Asian manuscripts is a case in point. Far from restrained to Sanskrit texts on Ayurveda, there are works spanning a myriad of genres and languages, including Persian and Arabic, and well over one thousand years of written records from across the region. Surfacing this range in ways that do justice to the individual manuscript while ensuring consistency and accessibility to users is, mildly put, a challenge.

Henry Wellcome (left) and Captain Peter Johnston-Saint, who, among other things, worked as a purchasing agent for Wellcome. Image: CC BY 4.0, Wellcome Collection.

Pingree cataloguing


David Pingree was an historian of premodern, global science. Throughout the 1970s and 1980s, Pingree catalogued many of Wellcome’s South Asian manuscripts in the course of his research: Wellcome staff, in London, microfilmed manuscripts, sent the microfilms to Pingree at Brown University, USA, who used the images to create handwritten cataloguing notes which were then copied and sent back to Wellcome. Some of these notes formed the starting point for the Pingree’s 2004 Catalogue of Jyotiṣa Manuscripts in the Wellcome Library, but the overwhelming majority remained unpublished and were therefore only accessible in person at Wellcome.

Like many other workplaces, Wellcome has gradually become increasingly ‘paperless’ over recent years, so in 2019 we made scans of Pingree’s notes for our own working convenience. But these PDF copies of handwritten documents were not searchable, and still required the reader to navigate an arcane 1950s system of manuscript identifiers which had since been partially  superseded anyway in Wujastyk’s two Handlists of Sanskrit and Prakrit Manuscripts (1985, 1998). Nevertheless, the Pingree notes represent a rich source of detailed cataloguing information which deserved to be available to a much wider audience.

Seeking to put Pingree’s scholarly legacy to good advantage, we determined to re-purpose his handwritten notes into fully searchable, electronic texts which could be made available through our online catalogue. Whilst not entirely without risk (Pingree’s entries are sometimes only brief, and clearly scholarship will have moved on since these notes were first written), our methodology was inspired by library and archive catalogue ‘retro-conversion’, practised internationally in the 1990s and early 2000s as an efficient and economical means to convert older card, typescript and print catalogues into machine-readable formats. Retroconversion techniques break the cataloguing workflow down into discrete stages: standards definition, technical mark up, data entry, data validation and quality control. Significantly, some of these stages (for example, quality assurance) can be partially automated, whilst others are completed by staff skilled in the technical aspects of cataloguing but who do not need to be specialists in the languages or subjects of any particular field.

Retroconversion also involves negotiating with copyright owners to enable re-use: we are extremely grateful to the American Philosophical Society, who hold the rights to Pingree’s unpublished papers, for permitting us to make catalogue records based on Pingree’s work openly available under a CC BY 4.0 licence.


TEI cataloguing

Established bibliographic conventions used to describe published books (which can generally be assumed to exist in multiple, identical copies) are relatively slimline in the information recorded – author, title, date and publisher are usually sufficient to enable item discovery online. Manuscripts, however, are unique, and manuscript cataloguing often includes partial or even complete transcription of the texts. Standard library management software is not really designed for this depth of detail, and moreover, licensing terms for the use of library catalogue applications generally restrict editing access to cataloguers with an expert knowledge of specialist MARC encoding standards.

The Text Encoding Initiative [TEI] in contrast is one of the longest-lived and most influential projects in the Digital Humanities. Established in 1987 as a standard for presenting texts in digital form using the generic computer mark-up language, XML, TEI takes a generalist approach to overall text structure – this means that even though the manuscript description module was originally developed to meet the needs of scholars working with medieval manuscripts in the European tradition, it is adaptable to texts of other traditions and dates, scripts and languages (including multiple languages in one manuscript). Manuscripts are physically and textually very diverse – Wellcome’s manuscripts collection includes works written on palm leaves, on wood, on paper and parchment, and even on animal bones – and TEI provides a flexible structure for describing these physical characteristics too. Furthermore, TEI is an open standard designed to be both hardware and software independent; it is widely used not only for encoding and storage, but also for the analysis of texts: expertise in the application of TEI can be readily found in most humanities disciplines and across the information professions.

Converting Pingree’s notes into TEI

Whilst there was now a plan and means to make Pingree’s notes more accessible, the act of creating the TEI records still bore a few challenges for the cataloguers.

Most of the manuscripts are written in Sanskrit with transliterated Devanagari extracts included in Pingree’s notes. Although some cataloguers at Wellcome are multi-lingual, few could read the manuscripts and due to the volume, it would be impractical to limit who could catalogue. The project would also require learning some new systems and practices as it would be the first time for most using XML or TEI to catalogue and GitHub to publish and update files. In addition, work had to be carried out entirely online due to the Covid-19 pandemic, meaning no access to the physical manuscripts with few digitised. In practice, this meant that only Pingree’s digitised notes and some previous inventory files could be used to create the records.    

Although seemingly barriers, the nature of the work became an opportunity for cataloguers to develop their knowledge of the manuscript collections, previous cataloguing practices and improve technical skills.

Without access to the original works, the need for the cataloguer to understand the contents of the manuscript were less important. Instead Pingree’s handwritten transliterations of incipits, explicits and colophons were copied using an online keyboard to improve the descriptive detail. The benefit to the use of TEI, XML and GitHub was that it supported the ethos of an iterative record, one which is not authoritative or complete at the point of initial cataloguing, instead acting as a starting point to be enhanced in the future.

Pingree’s notes being very concise and in a consistent layout meant they were relatively simple to convert to the appropriate TEI elements and attributes in the template. Their simplicity also allowed for relative speed in record creation and a sense of general patterns in common subjects, genres, and reoccurring texts. The act of removing TEI elements from the template that weren’t mentioned was also helpful for reinforcing the different possible physical and intellectual properties of manuscripts and learning more about more general manuscript cataloguing conventions in the process.

Working on the TEI files from home, although enabled by more recent technological advancements, unexpectedly mirrored Pingree’s cataloguing process as a remote worker writing notes from microfilm images. The mirroring also highlights the main reason for the lack of information on the physical qualities of the manuscript such as specific dimensions, bindings, materials, and colours, which for the most part neither Pingree nor the TEI cataloguer could capture due to their circumstances.

Header image from Wellcome MS Indic Alpha 978, first catalogued by Pingree.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s