This is the first in a series of interviews with scholars working on Digital Methods for Buddhist Studies. Our initial focus will be on scholars based in Japan, and/or working in Japanese, but we hope to expand beyond Buddhist Studies, and to scholars working in other languages and other parts of the world in future interviews. We hope that these interviews will not only introduce some of the exciting work of researchers in Japan to a new audience, but also explore the background of some well-known projects, while touching on the human stories of Digital Humanities as well.
For our first interview, we spoke to Charles Muller, professor at Musashino University in Tokyo, Japan, and Emeritus Professor at the University of Tokyo. Professor Muller is a scholar of East Asian thought, religion, and intellectual history, specializing in Korean Buddhism, and is the founder and editor of the Digital Dictionary of Buddhism (DDB) and the CJKV-English Dictionary of Confucian, Daoist, and Intellectual Historical Terms, two widely-used online reference works for East Asian Buddhism, East Asian philosophy and intellectual history. He is also the founder and network editor for the H-Buddhism scholars information network, to name only a few of the many projects he is involved in.
The DDB and CJKV dictionaries [CJKV stands for “Chinese, Japanese, Korean, and Vietnamese”] are remarkable not only because of their highly specialized content, and their online format, but because of their longevity as digital projects (both of them have been operating and continuously adding content since 1995), and their unique access and editorial model. As Professor Muller explains below, in addition to a standard institutional subscription model that can provide access to university-based users through a VPN, the DDB and CJKV-E dictionaries also allow any individual user on the internet to perform up to twenty searches a day without membership privileges, but may contribute an entry or group of entries of 350 words or more to gain full, unlimited access for two years. All entries are edited by Professor Muller before inclusion, so this model not only encourages contributions through “crowd sourcing,” but allows the DDB to be one of the few professionally-edited and open-access resources on the internet for information about Buddhism.
In this interview, we ask Professor Muller about these projects, his long engagement with Digital Humanities, and what he recommends for younger scholars of Buddhism and related fields who are interested in Digital Humanities. Interview by Tom Newhall [Links and text in brackets have been added for clarity]
TN [Tom Newhall, Buddhist Studies Editor for the Digital Orientalist]: What is your involvement with Digital Humanities, and in particular, how and why were the DDB and CJKV-E Dictionaries created and how have they evolved?
CM [Charles Muller]: The story of my involvement in Digital Humanities is long and complicated. When I started compiling the DDB and CJKV-E dictionaries in the mid-80s, PCs were only just coming to be widely used, and there wouldn’t be an Internet for another ten years. Once the dictionaries were placed on the web in ‘95, I gradually came to be associated with scholars whose names are now synonymous with the founding of Buddhist studies DH, such as Christian Wittern, Lewis Lancaster, Urs App, Masahiro Shimoda, and others. Unicode and XML were only in the early stages of their formulation, and it wasn’t for another further decade, say, around 2005 or so, that I even heard the terms “Digital Humanities,” “crowdsourcing,” etc., mentioned, and it wasn’t until I arrived at the University of Tokyo that I thought about attending or joining any form of digital humanities organization or events. The venue for academic exchange among Buddhologists and others had been, since the early 1990’s, the Electronic Buddhist Text Initiative (EBTI), the Pacific Neighborhood Consortium (PNC) and so forth. It was in the conferences and symposia run by these organizations that most early Buddhist studies and Asian studies digitizers presented and discussed their works.
And of course “digitizers” is a key word here, since so much of what we were doing in the early days was simply digitizing analog materials. Remember, in the early days, there was no content on the web whatsoever—we had to create it. This meant endless long, grueling hours spent with scanners and OCR software, etc. Once I became aware of digital humanities as an emerging methodological tradition, getting involved was a natural fit. But the nature of digital humanities itself has been in constant flux, and will remain in constant flux, until, one day, I expect that it will disappear, since not that much will exist outside of it.
As for the origins of the [DDB and CJKV-E Dictionary] project, the details are pretty well laid out on a page on the dictionary site that gives the history. I also recently published a chapter in a digital humanities volume that recounts events as they happened.
To put it simply, during the early stages, the digital dimensions of the dictionary project developed pretty much by accident. As a graduate student learning how to read Buddhist and other classical Chinese texts (ca. 1986), relying on Japanese, Chinese, and Korean dictionaries, I simply didn’t like looking things up twice. So, I kept a notebook with me, and when I looked up any word, I took a little bit of extra time to write down my definition (I think my advisors thought I was a bit crazy). After a couple of semesters of doing that, I happened to get my hands on my first PC, so I started typing the data into word processor files. In 1993, I sent a draft manuscript of what I had compiled to a university press, and they barely looked at it.
1994 (the same year that I took up my first position in Japan) was the year the web came online. I was immediately intrigued. I read in a computer magazine how to make an HTML page, and that was it—the DDB and CJKV-E (along with the rest of my stuff) on the web were born (I suppose my pages may be among the longest, continual surviving pages on the Web as a whole, let alone Asian Studies research resources). After that it was just a steady series of developments—learning various kinds of programming, learning some technologies (such as SGML, XML. etc.). The pivotal event for the project was my meeting a brilliant web programmer named Michael Beddow on an internet forum in 2000, as Michael took my XML data and set it up with a search engine in pretty much the same form you can see it today.
After that, the other major technological event was the development of data sharing applications via web API (around 2006-2007) that allowed the dictionary data to be integrated with other sites and applications such as SAT, CBETA, DDBAccess, and so forth. And we just kept building up the content.
TN: What are some of the main challenges with these projects and how have you dealt with them?
CM: In a project of this duration that spans over thirty years since its inception, we have had to overcome numerous technological hurdles as the technology itself evolved—along with user expectations.
The earliest major obstacle was that of basic character encoding, since prior to the advent of Unicode, the Han characters in Japanese, Korean, Taiwanese, and Chinese computers were all in different encodings, and so data was not transferable. Thus, during the nineties, there were all kinds of workarounds attempted to get around this problem. Also, the character sets were small (for example, the first JIS character set [a text encoding standard for Japanese and Chinese Characters that predates Unicode] only contained around 6000 characters). So when Unicode [an international standard for encoding all types of written characters] came along it was [a] huge [improvement in the ability to use CJKV text with computers].
Another challenge was that of pre-web API [i.e. before the advent of web API (Application Programming Interface) technology]. During the first ten years of the web the only way another site or application could use my data was if I gave them the whole dataset to load into their database. I had a couple of bad experiences with this from the outset. Once web API came out, it solved a lot of problems. Also before XML [eXtensible Markup Language: a way to encode the contents of a text-only documents using a set of special tags] and TEI [Text Encoding Initiative: a standard set of such XML tags used to describe the content and structure of electronic texts] were developed, people were forced to save data in proprietary formats such as Microsoft Word documents, which was a huge problem. The best thing I ever did [for my dictionary projects] was to put the data into TEI-based XML.
But there was also the problem of how to share that data in the best way. In the beginning, like many others, I was naïve and idealistic, and allowed the full dataset to be downloaded freely. Before long, temples, institutes and individuals began to publish their own versions of the DDB [Digital Dictionary of Buddhism] on their web sites without even mentioning my name. To solve this problem, Larry Lessig convinced me to try his newly-conceived Creative Commons [CC] license. I applied for the license, but it didn’t mean anything to people—they just ignored it and published my data. A good example is the case of the Soothill dictionary [Soothill and Hodous’s A Dictionary of Chinese Buddhist Terms (1937)], which our team digitized with a JSPS [Japan Society for the Promotion of Science] grant and posted with a CC license. Within a year there were scores of sites publishing the data in full, with no acknowledgment of our work. Therefore, I realized that if I wanted the project to grow, I had to protect the data. Luckily, Michael [Beddow] came along to set us up with password security. But then we had to figure out how to arrange who had full access, who had partial access, and so forth. Again, we figured this out by trial and error, and the results are what you see at present.
TN: How does the DDB continue to evolve and be maintained, and how do you envision the future of this project?
CM: Unfortunately for me, I still remain the main coordinator, editor, and manager of the data. I had always expected that I would find collaborators along the way to share the task. But the fact is, not that many people are really obsessed with lexicography the way I am. You have to be crazy about words and compiling them.
In any case, the situation has been for the past twenty years or so that we accept contributions from users via email or the feedback link [on each page of the DDB]. Luckily, we have had a number of people who have offered huge contributions, such as Paul Swanson, Griffith Foulk, Seishi Karashima, and others. And we have also been lucky to have a series of deeply interested excellent young scholars who have contributed huge amounts of data during their Ph.D. studies. Some of these include Michael Radich, Jeffrey Kotyk, and Billy Brewster. There are scores of other scholars (listed on our credits page) who have made steady ongoing contributions. But in the end, I guess about 90% of the entries have been my own work.
As far as the future of the project is concerned, I’ll stay with it in my capacity as long as I can, with my eye out for what can be done in the future. If for some reason it ends up that I can’t handle it anymore, I would probably look first for a Buddhist related academic organization to take it on, and if that couldn’t be arranged, I would probably just dump the data on GitHub or something like that. In any case, with the data being in TEI-XML format it can reasonably be programmed or manipulated in many ways.
TN: How do you work with collaborators on this project?
CM: That’s pretty simple. People just send in their data as MS-Word files and I run some scripts on it, convert to a rough TEI-XML format, and then edit it. The other main way of collaboration is the usage of the feedback link generated with each entry. We get tons of feedback—rarely a day goes by without it.
TN: What do you see as some of the most significant developments in DH with regard to Buddhist Studies, particularly in East Asian and CJKV-centered Buddhist Studies?
CM: For me as a textual scholar, the most significant developments have been the creation of the online canons and other text databases such as C-Text [i.e. the Chinese Text Project], the Academia Sinica text database [aka Scripta Sinica or Zhongyin yanjiuyuan hanji dianzi wenxian 中央研究院漢籍電子文獻], and related tools. But of course there are a wide range of new applications and technologies available for manipulating graphical data, video data, GIS location data and so forth.
TN: Does your physical location in Japan contribute to or enhance the development of your own project? Are there aspects of the Digital Humanities Community in Japan that have aided your project?
CM: Being in Japan (or at least in an East Asian country) has been one of the most important conditions from the start. In the 80’s and 90’s there was just not enough software support for East Asian languages in the West. But I also benefited from being in Japan in terms of its research and academic structure. If my first position had been in the States, I would’ve had a huge amount of work dumped on me as an assistant professor in terms of teaching and university service, along with which I would have had to focus my main energies on getting peer reviewed articles and book manuscripts published, in order to get tenure. Here in Japan, my university didn’t really care what I did as long as I handled my classes adequately. Also, in East Asia (and especially Japan) the development of lexicons, and even indexes is regarded as a valid academic contribution—seen as valid scholarly work. In the states, this work would have never been recognized until it got to be huge.
TNa: Does your work on the DDB or in DH more broadly tie into your more traditional scholarly contributions? If so, how? How has work on the DDB helped shape your own research?
CM: The DDB and CJKV-E basically grew out of my primary scholarly interest, which is that of translating ancient, medieval, and pre-modern philosophical and religious Han-character classics into English. Therefore, the two have served to mutually complement each other throughout. Having the DDB and related tools is also an integral component of my work in reviewing and editing manuscripts for the BDK [Bukkyō Dendō Kyōkai 仏教伝道協会 or the Society for the Promotion of Buddhism] translation series, for which I am the Publications Chairman. So while I’m working with the texts translated by both myself and others, I’m constantly adding to and enhancing the data in the dictionaries, while at the same time using the data to confirm the accuracy of translations.
TN: What do you see as the future of DH in Buddhist studies? What other Digital Humanities tools or approaches do you think we need in Buddhist Studies and why?
CM: Buddhist studies was in fact one of the earliest fields to take full advantage of digital technology. This is because Buddhist studies covers such a wide area, culturally, linguistically, historically; and fields, such as art history, anthropology, sociology, and so forth. That is why scholars of Buddhism were involved in early periods with issues like text encoding, character encoding, image handling, GIS applications, etc. Therefore, I can easily envision a future in which Buddhist studies continues to lead the way. Take for example the case of handling images with the IIIF [International Image Interoperability Framework: a standardized method to handle audio/visual resources from archives around the world]. Dr. Nagasaki of SAT [the SAT Daizōkyō Text Database, a digitized version of the most widely-used collection of East Asian Buddhist texts] and DHII [International Institute for Digital Humanities at the University of Tokyo] has literally taken the lead in introducing Japanese institutions to the application of this technology, and is contributing to its further development. The same is the case with TEI-compliant character encoding and contributing to Unicode.
What other tools are needed? That’s hard for me to answer, because I basically have everything I need to fulfill my simple tasks. In terms of approaches, the main change in approach that is needed, is for field mechanisms and administrators to find out how to duly recognize and evaluate digital scholarship. This has been in issue for more than a generation now. The situation is getting a little better now, but still hasn’t gone far enough.
TN: What can Digital tools do for Buddhist Studies that traditional scholarship cannot do, or is difficult to do?
CM: Everything!!! What can be done now without digital tools?
TN: What would you recommend for younger scholars in Buddhist Studies getting started in Digital Humanities?
CM: Don’t be afraid to invest the time and energy it takes to learn new technologies, programming, etc. Again, this is not such a dilemma as it was a generation ago, as there are few nowadays who don’t recognize the value of digital skills. As I was coming up, there were no DH courses available in Humanities’ programs, so virtually all of my skills are self-taught. Learning these things consumed huge amounts of time and energy—time and energy with which I could have probably published a few more books. But in the end it enabled me to handle many more things than I would have been able to otherwise, and I don’t regret a moment that I spent learning.
One other piece of advice to young scholars who already, and who will probably continue to spend huge amounts of time sitting, typing, mouse clicking, and staring into a computer screen: take care of your body condition. Get up often. Stretch, move around, play a sport or something. Otherwise you will develop health problems later on that will be difficult to fix. Take my word for it!
TN: What are the most important skills and tools to learn?
CM: That all depends on what you want to do. Nowadays, DH branches into so many diverse directions that almost no one can master all the tools and skills needed to do all of them. I suspect that for textual studies TEI and XML will remain a basic necessity. For programming languages, it seems that Python is now the main go-to language for many tasks. For images, I guess Mirador is useful. And it doesn’t hurt to know some Linux command line skills. Aside from that, well, it’s getting pretty hard for me to keep up with everything that’s going on.
The Digital Orientalist would like to thank Professor Muller for his time in preparing this interview. If you’d like to know more about Professor Muller and his work, or would like to become involved as a contributor to the DDB or CJKV-E dictionaries, please see Professor Muller’s website, or the “contributions” page of the DDB and CJKV-E.