The evolution of Kaom.net

Guyin xiaojing 古音小鏡 is a database that gathers data and tools for Old Chinese linguistics, but not exclusively. I previously mentioned this database for its breadth, and planned to review it. By the time I got back to do so, kaom.net had already changed and evolved in new directions. In this post, I focus on the Chu script database, and the advantages that they give to scholars of premodern Chinese language.

General features of kaom.net

The homepage is structured to convey immediately what’s new in the website. As you land on it, under 更新 you find a list of sections that have been updated, with a brief description of what and a date. This is very useful to keep track of how sections of this database you may use for research are changing. If you glance at the dates, the speed at which data is added is impressive, especially considering that this is a project run entirely by volunteers. The addition or editing of new sections is also marked in the drop-down navigation menu with yellow asterisks, as indicated in the following image:

The database is organized into 6 units: Old Chinese 上古音; Reconstructions 構擬; Ancient Scripts 古文字; a section that maps the geographical distributions of pronunciations for the same word in 19 Sinitic languages 漢語地理; a section on toponyms 地名; and a section on references works whose data has been imputed and is made searchable through an interface.

The search box on the top left allows you to search by components. For example, searching for jin 晉 returns tables listing the character’s unicode data, its structure, and the words in which is appears as components in the current script. Ancient graphs are announced to be forthcoming.

The Chu strips database

One of the subsections of Ancient Scripts 古文字 is the database of graphic forms in Chu writings. This is based on corpora of bamboo manuscripts from the Warring States 戰國時代, which so far have all come from areas that, back in the last centuries BCE, belonged to the Chu 楚 state (click on the “principal materials” at the end of the table to see the bibliographic details). It is a new development of kaom.net, and a very exciting one for scholars who work with this material. The starting page lists the corpora, the year of discovery or publication of the corpus, the content of those texts, and how much of the total amount of characters has been added to the database. This is particularly important to understand how representative the result is.

Let’s look at one example to see all the functions of this section. I will search for 尼. In the search function, you can decide to use the default mode or have the database search for both traditional, simplified, and variant forms. I would say that selecting the second option is always preferable, since it gives you more results (see below).

Hit “search” 查詢 and you land on a page in two sections: the upper one gives you what is now the standard graphic form, its variants according to the Dictionary of Character Variants 異體字字典, and other words in which 尼 appears as component according to the Hanyu da zi dian 漢語大字典.

In the second table you have all the occurrences in Chu manuscripts in which 尼 and its variant appear. This gives us 30 examples; if you select the default option when searching, the database limits the search to 尼, giving only 7 results.

The results are organized by corpora, listed in the first column. Right underneath the thumbnails that illustrate the graphs there is their transcription and normalization. In other words, you have both a representation of what is present in the graph and the indication, using the modern script, of which word the graph is writing. For example, the second correspondence in the Qinghua corpus is specified to have the components {彳尼} (the use of curly brackets is the standard annotation used in manuscript studies). This is the transcription. No normalization is provided in this case. In the last two entries for the same corpus, you have in round brackets the normalization, i.e. the words being written with that graph.

All of these readings are based on the initial publications of manuscripts, and therefore may not be final. Even so, the fact that the database distinguishes these two is very important, because it’s a good reminder that searching for 尼 does not necessarily mean searching for the word “to stop.”

What you want to know next is exactly in which texts within the Qinghua and Shanghai corpora these graphs appear. This is easily found out by clicking on each thumbnail. The resulting page provides the image of the strip in which it appears, in two formats. On the left, the strip is broken into graphs and each is listed, transcribed, and normalized. The graph you searched for is highlighted. Right above it, you have the name of the text and the volume in which it was published. On the right, you find a replica of the strip as initially published, with the initial transcription. Right above this section, you have an interactive list of the # of strips for this text (which allows you to navigate the manuscripts in full), including the verso side (indicated with a B) when relevant (see fig. 4).

Figure 4

This is important. As mentioned already, it is a good reminder of the distinction between graphic representations and words. Secondly, it allows you to quickly see the use of the word in context – again, what it provides is the initial interpretation, which may have been improved already, but these initial readings are the starting point for a research, since all the scholars respond to them. Thirdly, because what is reproduced is the image of the strip itself, one can see other aspects of a text besides its written content, such as punctuation marks. Finally, while PDFs of these publications are easily available, not everyone may have them. This database guarantees free access to this material to anyone who has an internet connection. Its founder, Gu Guolin 顧國林 strongly believes in free access to knowledge. (Even if you had the PDFs or the physical publications, how convenient to gain a first sense of the passage with a simple click!).

A little known trick

So now that you have discovered the Chu script with this beautifully informative interface, you want to dig more into ancient scripts! Right next to the Chu script database there is the Oracle Bone Inscriptions database and the Bronze Inscriptions database. You found your new obsession. You go into either of them and search for 我, and land on a plethora of results! Yay. You go to click on each thumbnail to discover the context, as we did before for 尼. Only… the images are not interactive. Roll back to your search box and add “=” after the word, and when you land on the overview of thumbnails again, you will now find them interactive. As with the Chu manuscripts, you are given the context in which the character was used and an image of the page from the official publication.

This is a function that was added in subsequent phases, after some of the structures of the Oracle Bone Inscriptions and the Bronze Inscriptions databases were already in place. This was an interesting tip given to me by Guolin as we discussed how databases evolve and are sustained, in relation to the DO 23 conference on Sustainability. The case of databases of ancient Chinese manuscripts (and therefore, scripts, languages, reconstructions) is a particularly good one, because of the current influx of data. In comparison to kaom.net, which runs completely on a volunteering basis, databases such as the Bamboo and Silk manuscripts put together by Wuhan University and Hong Kong University are strikingly behind (the website stopped being updated some 10 years ago).

There is no easy solution to the questions of updating and sustaining databases. It has to do with funding, availability of researchers and independent scholars, and priorities. Fortunately, kaom.net is keeping up in the game, with beautiful additions. In future posts, I will continue to review its tools and resources.

The evolution of Kaom.net

Like this:

Related

Published by

maddalena poli

One thought on “The evolution of Kaom.net ”

Leave a ReplyCancel reply

Share this: