Text-Matching at the Canonical Crossroads: An Introduction to BuddhaNexus (Part I)

BuddhaNexus is a text-matching database with visualization capabilities that draws its data from Buddhist literary corpora in Pāli, Sanskrit, Tibetan, and Chinese. It allows users to conduct intralingual searches (e.g. searching among texts in Chinese only) of individual volumes for textual matches across the collection in question. Additionally, users are also able to produce Sankey visualizations of connections within different collections in the same language, which offers an intertextual view across collections, sections within collections, and within single texts.

This is the first of a two-part series covering this platform, its mission, and the technical development of its key features that are sure to widen the scope of research across the four main “Buddhist languages.” For this installment, which covers the mission and current functions of the platform, I corresponded with Dr. Orna Almogi, a scholar at the Khyentse Center (the initiative’s major sponsor) for Tibetan Buddhist Textual Scholarship at Universität Hamburg and one of the directors of the BuddhaNexus project.

BuddhaNexus is one of several platforms that have seized upon the rapidly changing terrain of digital research. From Dr. Almogi’s perspective, the platform provides an opportunity to leverage the growing forms of digital access and technology in order to better understand Buddhist textual cultures:

The first step towards a better understanding of the history of the composition of individual texts, on the one hand, and the emergence of entire corpora of Buddhist works, on the other, is to locate in the various works and textual corpora (approximate) textual matches and to study them from various angles. These textual matches may be cases of either acknowledged citations or “borrowed” textual passages with no attribution. Until not very long ago, such matches were searched for manually, but with the advent of computer technology, coupled with the ongoing digitization of Buddhist texts, faster and more accurate methods have been developed, and this is where BuddhaNexus comes into play.

As a new user, I can see this promise come to life on the platform. In a browser-based, multi-pane interface, BuddhaNexus focuses on one of the four main “Buddhist languages” at a time. After selecting a language, users then select one of four view options: text view generates interlinear matches of strings of characters across the entire Inquiry Text; table view generates tables with isolated matches within the Inquiry Text; numbers view displays segment, volume, and/or folio numbers that correspond to matches within the Inquiry Text; and graph view displays approximate distributions of matches across the Inquiry Text in a pie graph and histogram. Users will appreciate the fact that navigating back-and-forth between each of these four view options preserves slider settings in the search filter, as well as the text selections made in the drop-down menus. This means that users can seamlessly view text-matches from alternative angles.

Text view of the Chinese canon showing matches within the Mahāpārinirvāna Sūtra 大般涅槃經 for the eight-character string 如來不久當般涅槃, with slider settings open to the right.

Graph view of Pāli canonical texts showing the distribution of intralingual matches between the Mūlapariyāya Sutta, the first among the Majjhima Nikāya Suttas, and the entire Pāli corpus.

The view options reveal connections and patterns across texts and can illuminate the co-emergent processes of large-scale scriptural evolution and canonical formation. These processes are often not immediately observable to the analog researcher due to the volume of texts necessary to measure these connections. The aptly named BuddhaNexus project thus seeks to locate texts at the intersection of time and space; scriptural evolution and canonical formation, which converge in different ways across histories and geographies, become more easily observable through the platform’s databases and visualizations:

[Understanding this evolution and formation] will necessarily mean, at times, shedding light on various issues related to these two processes, both on the macro and micro levels, such as the existence of intellectual networks, various stages in the evolution of a specific scripture, intertextuality between various scriptures, the linkage between treatises and scriptures, and the impact of various social and political aspects of society on both the evolution of scriptures and the formation of canons […] Understanding how these intellectual networks functioned, who their active scholars were, and what works were available to them or were popular in their circles, will facilitate our understanding of the evolution of individual works or of a group of works and of the formation of canons, small or big.

In order to facilitate research surrounding these textual networks, the database also offers access to visualization tools that display intertextual information across each of the four textual corpora. For more granular information, users can then mouse over individual lines that connect smaller units within the selected corpus in order to spotlight connections linked to a single text. Even a simple intralingual search reveals the magnitude—and research potential—of the BuddhaNexus database.

Sankey diagram showing intertextual links between the Kangyur and Tengyur divisions of the Tibetan canon.

Of course, one of the major challenges of a project like this, which combines full-scale data from several databases (SuttaCentral for Pāli, GRETIL for Sanskrit, ACIP and BDRC for Tibetan, and CBETA  for Chinese), is search accuracy. This is something that Dr. Almogi and her team are keenly aware of at both a practical level and technological level:

In general, the biggest challenge has been to set “gold standards” in order to get the best results, on the one hand, and avoid “noise,” on the other. While striving to optimize the default “gold standards,” BuddhaNexus understands that such standards may fluctuate depending on the text under inquiry, research questions, and personal interests, so it allows the user to manually set several of the parameters.

While continuing to fine-tune the current platform tools, Dr. Almogi and her team plan to increase the number of texts and textual corpora. The Sanskrit corpus will soon be considerably enlarged through a new partnership with the Digital Sanskrit Buddhist Canon (DSBC) project at the University of the West. Likewise, the Tibetan corpus will soon be enlarged through the incorporation of two paracanonical collections. In addition to linking with other databases focused on archiving and cataloging, the team is also currently developing tools for translingual text-matches in addition to the current monolingual matches and is planning to add authorial/translator data to the databases for more differentiated analysis. The future of BuddhaNexus thus looks very bright, and I will explore these and other issues related to the technological development of the platform in Part II of this series with Sebastian Nehrdich, who oversees this development.

In its current state, BuddhaNexus offers users intuitive search and visualization access across four Buddhist literary corpora in four languages. The database will likely see the most use from researchers in Buddhist textual studies, especially those working with large volumes of texts. The slider settings that allow for more granular searches, however, may also be of use to scholars who wish to focus on fewer texts and/or change the parameters regarding the approximation of the textual matches. The global search function also provides a narrower view of the distribution of a short string of characters or a single compound. Additionally, and from a bibliographical perspective, BuddhaNexus is a must-add platform for any libguide focused on Buddhist textual cultures across Asia. Since transregional and translinguistic approaches to Buddhist Studies are now more valuable to our understanding of the tradition and its development than ever before, this platform stands to alter the way we think about Buddhist texts, their evolution, and the ways that their content links otherwise disparate religious cultures and communities. Considering future plans for authorial and linguistic expansions to the database, BuddhaNexus promises to be an indispensable tool for Buddhist Studies research in the 21st century.

Please keep an eye on this outlet for Part II of this series, which will be published in early 2021.

