The Toyo Bunko Archive: a source of joy and torment

As promised previously, in this post I am leading you in a deep dive into a major digital archive I have used throughout the years: the Toyo Bunko. I chose this archive not only because I spent many hours on it, but also because it suffers from some accessibility issues that make the navigation difficult and at times frustrating. I hope that sharing my experience with the website will make your life using it a bit easier. But let me preface this post by saying that Toyo Bunko is an invaluable resource for anyone interested in Buddhist art and archeology, as well as Central and South Asian Studies, and is worth using extensively. I highly recommend it, despite the navigation issues.

When I was writing my MA thesis on the Buddhist caves of Kizil in modern Xinjiang, I used Toyo Bunko a lot – and let me tell you, it was my cross and delight. The archive has scans of the excavation reports from the 19th-early 20th century excavations in Central Asia. Some of these books are so rare that even after all these years,  I still have not managed to find and hold in my hands any physical copy – I am thinking, for example, of the early travelogs of British army officers, or Russian reports. The fact that a digital copy of these rare books exists at all and is so easily accessible for scholars across the world is amazing in itself. As per the archive’s own description, in fact, the goal of the project is “to make “invisible” books visible [for] everyone. Today surprisingly many books are invisible from the general public because accessibility to precious books is restricted due to their fragility and safety. To let them come out of the dark rooms of libraries, we establish the digital archive of precious books and improve accessibility to them on the Portal site.” The intent is certainly worthy of high praise.

The website, however, has tested my patience time and time again. The page navigation is slow and clunky – depending on the speed of the internet I am working with, it can take up to a minute to load each page of the book I am trying to access. I do not fault Toyo Bunko for my poor internet access while traveling in rural China (of course!), but over time I have wondered if there are ways of making the navigation of pages faster, for example, by having a viewing option with lower image resolution. I found that one way to speed up the page turnover is to use the option “Facing pages,” which is usually found on the top right-hand-side of the webpage once a book is selected. Not all books when in the “Facing pages” option will be readable, though, as they will appear very small on the screen, thus prompting the viewer to revert to the single-page view by simply clicking on the image.

Sometimes this is all I could see for minutes while the page loaded…

Once a book is selected, at the bottom of the webpage there is the OCR text of each scanned page  – this is a truly welcome feature that I have used time and time again to copy and paste big chunks of text into Google Translate (I admit that I shamelessly use and abuse it especially if I am skimming a text in a utilitarian way, just to find information). However, in several instances, the format of the OCR text is so chaotic that it is impossible to understand. This happens often when the scanned page has in-text images and footnotes – unfortunately, in archaeological reports and scholarly books both abound. There is definitely room for improvement, however, and the creators of the site are well aware of the shortcomings, saying that “Although the current OCR technology is imperfect, even imperfect results can support useful search. […] in the future we aim at establishing a mechanism for soliciting collective annotation in a collaborative environment.” Make sure to reach out to them if you want to help improve the OCR.

Some chaos in the OCR.

I have made ample use of the search box, which is perhaps the most intuitive feature of the website and works quite well. For example, after choosing the voice “Expedition Records” from the homepage, I simply input “gandhara” in the “Full Text” search box at the top right of the webpage and in less than a second I had over 300 hits – pretty standard operation, but the nicest feature is the “Translation and Thesaurus” box on the right. The website automatically indexes different translations of the same word and allows you to search all of them at the same time. Once again, the feature is not perfect and partial matching is not supported. This has led to some frustration while imagining possible spellings of the names of archaeological sites – as many of us dealing with Central Asian toponyms know, some can be tricky. For instance, a search for ‘Takht-i-Bahi,’ the current spelling of the name of an archaeological site in the Khyber Pakthtunkhwa province of Pakistan, bears no results. A search for ‘takht’ instead gives 44 results, which then have to be carefully looked through one by one to sieve out those hits that actually refer to Takht-i-Bahi (variously spelt as Takht i bāhāi, Takht ībhai, Takht-ī-Bahai, and so on).

My search for “gandhara” and all the spellings in several languages under Translation and Thesaurus on the right.

Despite these flaws, I keep going back to Toyo Bunko again and again. It has 245 rare books, 72,591 pages of writing, images and maps – how could I not! This archive is a treasure chest for those of us who study Central Asia, and especially everything related to early Buddhist archaeology.

