Continuing our series of interviews with scholars working in Digital Humanities in Japan, this is the first of a two-part interview with Professor Kiyonori Nagasaki (twitter @knagasaki), Senior Fellow at the International Institute for Digital Humanities in Tokyo, and technical director of the SAT project, a searchable database of historical texts and images based on the most widely-used edition of the Chinese-language Buddhist Canon, the Taishō shinshū daizōkyō, originally published in Japan in 1924.
The SAT Project is remarkable as one of the earliest such digitization projects for Buddhist texts, and in addition to providing its own set of valuable resources and tools, is unique for its functionality to link to other tools such as the Digital Dictionary of Buddhism.
As a member of the Japanese Association for Digital Humanities Dr. Nagasaki is also a leader in digital humanities in Japan. In this first half of the interview, we ask him about his research and the SAT project, and then the difficulties he and the SAT team faced when trying to create the SAT database, reflecting some of the difficulties of many large-scale digital humanities projects.
[This interview originally was conducted in Japanese and translated by Tom Newhall.]
Tom Newhall: Could you briefly explain your research to those who are unfamiliar with it? In particular, what is your involvement with Digital Humanities? What is the SAT project, and how, and how have you been involved with this Project?
Kiyonori Nagasaki: My research is on building a foundation for digital research in Buddhist Studies.
Buddhism is not only a type of philosophy, but comprises literature, art, architecture, history: specifically, the history of technology, history of science, and history of medicine, to name only a few related fields. Because of this, building a foundation for digital research in Buddhist studies is not simply a matter of building a database of information about Buddhism, but requires establishing and maintaining a foundation for digital research in all of these fields.
My work is to support the construction of such a platform for digital research in all these fields, as well as researching and improving the techniques, standards, and specifications for such digital projects.
Up to now, I have been involved with several institutions including the University of Tokyo Next Generation Humanities Development Center, the Kyoto Institute for Research in Humanities, the National Institute of Japanese Literature, the National Institute for Japanese Language and Linguistics, the National Museum of Japanese History, the National Museum of Ethnology, the International Research Center for Japanese Studies, the National Diet Library, the Tokyo National Research Institute for Cultural Properties, and the Kansai University Open Research Center for Asian Studies.
In collaboration with research institutions like these, I have been involved with the improvement of standards and specifications for the Unicode character encoding standards (aka ISO/IEC 10646), the TEI (Text Encoding Initiative) Guidelines for text encoding, and the IIIF (Image Interoperability Framework) standards for web content.
As a member of the SAT Daizōkyō database research team, I was involved with registering over 3000 unique characters that appear in the Chinese Buddhist Canon in Unicode. For TEI, in addition to promoting the use of TEI guidelines within Japan, I also established a Special Interest Group within the TEI Consortium to consider how TEI could be used for Japanese and East Asian materials. In addition to creating TEI Guidelines in Japanese, in February 2021, we succeeded in introducing rules for Ruby text (aka Furigana) to the TEI P5 Guidelines In order to make the TEI P5 Guidelines compatible with Japanese Language materials. I am also involved in promoting the use of the IIIF standard within Japan. Mainly, my work has involved with a number of open-source projects to improve the “IIIF Viewer” platform.
In addition to all that, I have also been involved with Digital Humanities education by leading classes on Digital Humanities at the University of Tokyo and several other universities, and have also made a number of Japanese-language educational resources for Digital Humanities available in on the web, in order to train and educate people who can put such tools into practice.
At the SAT project, we do a variety of such activities, aiming to build a platform for digital research in Buddhist Studies. Through cooperation with other Japan-based Digital Humanities projects, sharing resources, and integrating tools with each other, the range of our activities has gradually expanded over the years.
As for my own relationship with the SAT project, I had previously worked on building a web-based database for humanities research, but in the fall of 2005, I got invited to take part in the early stages of a new text database project. In 2008, I then built the website for the project, allowing the shift of our workflow to the web.
After that, I became the technical director of the project, and in addition to thinking about how to set the direction for the technical side of the project, I have also worked on the systems development side of the project. For example, in the 2018 version of SAT, we included a function to use the Word2Vec software to enable text analysis of the Taishō Shinshū Daizōkyō (the Japanese printed edition of the Chinese Buddhist canon that SAT is based on) as well as a function to compare images using IIIF, both tools that I developed.
Additionally, the “Computers and the Humanities” Special Interest Group (SIG) has been a part of the Information Processing Society of Japan (IPSJ; One of the largest computer science organizations in Japan with about 20,000 members) since 1989, and a group with a keen interest in technical matters, I have presented this research at this and other conferences. There have been over 1000 presentations made on DH at this conference over the years, and I myself have given about 70 presentations at this conference alone.2
About the SAT project
TN: What was the main reason that this project got started? What need did it address, and what do you see as an advantage or opportunity that having these works in a digital format provides that having them as paper doesn’t provide? Is there anything that is lost or anything that is a disadvantage to the digital medium?
KN: The SAT project was started in 1994 by Prof. Yasunori Ejima (of Tokyo University) in order to digitally retrieve the Taisho Canon. After his passing, Prof. Masahiro Shimoda took over the project.1
There are great benefits in sharing Buddhist texts through a digital infrastructure. For example, doing so gives us more time to focus on thinking by streamlining the process of obtaining necessary resources. Although there have been many difficulties in digital media in the past, these have been slowly disappearing (and occasionally, newly appearing) through the emergence of new technological frameworks.
For example, before IIIF (International Image Interoperability Framework), we had to treat digitized images of Buddhist texts on each individual web site, and it was difficult to see and compare these rare materials. However, due to the emergence of IIIF, we can see any of them easily across different websites of the organizations which have preserved them. That is, we can seamlessly browse not only witnesses for a variety of Buddhist texts but also Buddhist icons and other visual materials on the Web. At this point, we are expanding our treatment of visual materials using IIIF. In addition, we are also in the initial stage of applying deep learning to Buddhist resources including texts, images of icons, videos of rituals, and so on.
The Challenges of Breaking New Ground
TN: What have been some of the main challenges with the SAT so far, and how have they been addressed? And how do you see this project developing in the future?
KN: There have been a lot of different problems, and every problem presents a challenge, so it is difficult to say which one is the biggest challenge. For that, I think it would probably be better to ask our project leader, Professor Shimoda Masahiro, but speaking from the perspective as the technical director for the project, one of the major problems we had is that in Japan there were almost no university positions for specialists in Digital Humanities, nor were there programs to educate people in Digital Humanities broadly so that they could apply these skills in ways appropriate for Buddhist materials, and to retain collaborators who properly understood these materials and were also able to work with the data involved.
Since most of the members of the SAT team were graduate students or worked as instructors at universities, for a long time, much of this work was on a volunteer basis, and could only be done in fits and starts.
Nevertheless, as a result of these efforts, we completed the SAT 2008 version and made it publicly available. Following that, in 2010 we established the International Institute for Digital Humanities and in 2012 we began an education program for Digital Humanities in the Division of Humanities and Social Sciences at the University of Tokyo.
But no matter how well we built SAT, it was difficult to explain the benefits of this database to other researchers within Japan, and there was a period when we had a lot of difficulties creating connections to materials from other fields. Even if there were Buddhist materials digitized by Japanese libraries and universities, it was difficult to link these with the SAT database in a way that was useful for people who research Buddhism.
With this in mind, when IIIF appeared and became prevalent throughout Japanese research institutions and libraries, we saw a way to solve this problem. Alongside the Digital Humanities Center at the University of Tokyo, the SAT project has worked to propagate IIIF by running seminars in cities throughout Japan and introducing a variety of skills through tutorials on my own blog, and now, beginning with the National Diet Library (NDL) and the National Institute of Japanese Literature (NIJL), as well as the University of Tokyo, Kyoto University, and other University libraries, the use of IIIF to publish digitized images is becoming much more common in Japan. As a result, it has become possible to freely use a large volume of images of Buddhist texts from outside organizations and to link these with the SAT database.
So, that is how we have been working with image data, but in order to work in conjunction with text databases, I have also been doing seminars and setting up production environments to make the Text Encoding Initiative (TEI) guidelines usable in Japan. Such work applies not only to Buddhist studies, but also to Japanese linguistics, literature, history, etc., and through such training, younger researchers in these fields who can really work with TEI are being trained little by little, and the ability to use such tools on a large scale is finally starting to come to fruiting, but this is still a work-in-progress.
Although there are many databases and projects in the US and Europe that utilize TEI, adoption of this standard and the creation of such databases in Japan has lagged up to now. But with the help of the above-mentioned organizations, we are now aiming to build an entirely new text database for research on Japanese culture within the next decade that will enable researchers from Japan and those studying Japan elsewhere to engage in a more robust study of Japanese history and culture, as well as to allow better integration and participation in the international DH community. Indeed, the translation of these guidelines is an important step in the process of this exchange, that may one day enable scholars working on Digital Humanities in Japan to emerge as partners with projects in other parts of the world.
In addition to those issues, the handling of gaiji (written characters not supported by a particular font or encoding system) was extremely complicated. For this, we realized there was a problem with the licensing the Konjaku Mojikyō fonts (a project to encode a large number of Chinese-Japanese-Korean characters), so we decided to abandon that plan and move to the GT Fonts set (another Chinese-Japanese-Korean font encoding project developed by the “University of Tokyo Multilingual Processing Research Center”), but development and updates for that project were also discontinued, so from 2012, we started to apply to register such gaiji from Buddhist texts into Unicode, and now there are over 3000 such unusual gaiji characters from Buddhist texts that can be used as regular characters in Unicode.
When the SAT project began to directly participate in the standardization process developed by the ISO (International Standards Organization) for registering such characters, the process was totally different than other academic research, and as an organization comprised of researchers, this was a very difficult job for us at first. But thanks to the initiative of some younger scholars who tackled this problem head-on, we have finally been able to bring this aspect of our project to fruition.
In this way, SAT is not just doing Buddhist studies, but a variety of work to advance Digital Humanities in Japan on the whole, which in turn makes SAT a very rich project. I myself am really nothing more than the technical director for the project, but through perseverance in this endeavor, I hope to create an even better research environment for doing Buddhist studies.
1 This story was published in Japanese in 2019 as Shimoda Masaharu 下田正弘, Nagasaki Kiyonori永﨑研宣, eds. Dejitaru gakujutsu kūkan no tskurikata: Bukkyōkara teikisuru jisedai jinbungaku no moderu デジタル学術空間の作り方 仏教学から提起する次世代人文学のモデル. Bungaku tsūshin 文学通信.
2 Past presentations can be found at the IPSJ Digital Library “Informatics Square.”