Jingyuan Digital Platform: Font Making and Database Development for Shang Oracle Bones (Part 1)

This is a guest post by Peichao Qin | University of Cambridge.

Tired of struggling to find and type out complex oracle bone script characters? You’re not alone. For years, scholars and enthusiasts alike have faced the frustrating challenges of working with these ancient inscriptions—challenges that stem from the lack of a proper font and efficient search tools. An insane number of characters, variants and transcriptions are out there right now thanks to more-than-a-century-long discoveries and research. Imagine spending hours and hours just trying to locate a single glyph or having to manually piece together characters from a mixture of strokes and blot marks using low-resolution rubbings. This not only creates problems for scholars who want to read the texts and search for the relevant literature, but also for enthusiasts who just want to type the character and create non-pixelated artworks. This was the reality for the oracle bone script, until now.

I’m excited to introduce the Jingyuan Digital Platform, a brand-new solution designed to transform how we interact with Shang oracle bone inscriptions. This platform offers two major game-changing tools: the world’s first ultra high-resolution font for oracle bone script (available for free download here) and a comprehensive, user-friendly search engine for these ancient glyphs. Whether you’re typing in Word, designing a poster, or conducting in-depth research, the platform streamlines the entire process, making it faster, easier, and more accurate. Plans and proposals for lexicon, dictionary and thesaurus creation are also in place, which makes the website more worthwhile to watch out for in the future.

What are the Oracle Bones and Why A Font?

Oracle bone inscriptions (OBI), also known as the oracle bone script, can be dated to the later part of the Shang dynasty (ca. 1250 B.C. – 1046 B.C.). It is the inscriptional product of pyromantic divination conducted by Shang elites, a controlled process of systematic drilling of hollows and burning of metal rods to produce cracks on turtle shells or ox scapulae, and the subsequent record-keeping practices of Shang scribes to keep track of the relevant divination events (also see Henry’s post). The oracle bone script, in general, is known to possess highly complex character structures and a huge number of variant forms. Since its first discovery in 1899, over 4,000 characters and 50,000 distinct variants have been identified. The 125-year-long development of the scholarship has produced a lot of useful literature related to the decipherment studies of individual characters and transcriptions for published oracle bone corpora, offering invaluable materials for relevant linguistic and historical examinations of the Shang dynasty. However, the lack of font support and coherent encoding for both archaic and modern forms of the oracle bone characters, and the long absence of efficient database query support have often made the field rather difficult to navigate for both beginners and advanced learners. The input of oracle bone glyphs and database building have been constantly relying on copying and pasting rubbings images which are not so easily indexed and searched.

Due to this consideration, I have created the Jingyuan Digital Platform (镜原甲骨数字平台) to meet this specific challenge and set out to solve the technical problems that currently exist for the text rendering, glyph info query, and the input of complex transcriptions in the field of Shang oracle bone research.

图形用户界面, 文本, 应用程序

描述已自动生成

The platform was officially launched on July 4, 2024. One of the launch articles can be found here and the website link is also included here and at CUL.

The programming languages and frameworks of the website mainly include MySQL, Python, Javascript, Vue 3 and Nuxt 3. It is configured to provide SSR-based and SEO-friendly services, which enable dynamic rendering of the data and increase website loading speed. Utilizing the latest frontend web stack, the platform is a modern and lightweight web application. At the moment, the site consists of four major modules, with others still under active development:

  1. A high-resolution oracle bone font.
  2. A database including over 52,288 glyphs.
  3. A Multi-purpose text editor for inputting transcriptions.
  4. Geographical and timeline visualizations.

In this two-part article, I will explain the programming technology that made them possible and the academic considerations behind the creation of these modules. In doing so, I hope that some reflections on my attempts regarding font development, database building, and interface design can be helpful for palaeography studies and the general field of DH.

Oracular Font for Oracle Bone Inscriptions

As we know, a font is the very base by which the display of characters is made possible in computer documents. However, the font development for archaic character systems has continually proven to be a challenging task. Especially for characters like the oracle bone script which exhibit highly cursive features and are composed of a large number of strokes, the manual labor and costs for either grid-system-based or binarization-based (i.e. inversion of image from black to white background) font creation methods have been prohibitively high.

In order to address this problem and adapt to the unique characteristics of the script, the platform employs a novel method of creating font-compatible glyphs, which involves a mixture of hand-drawn processes and computer vision technology-based automation. Specifically, the glyphs are first drawn by hand onto an A4-sized semi-transparent paper, in either a 4-4 or 3-4 grid, creating an initial upscale from 20px-sized glyphs (usually croppings from rubbings or photos) to ~500 px. Then, after being scanned and cropped to individual png-format images via a Python script, a thinning algorithm and vectorization algorithm are applied to the detected glyph contour in each image to create fixed-width stroke paths to imitate a writing procedure not unsimilar to human writing. During this process, the noise and blot marks in the image are removed, and ends of strokes are smoothed via a Gaussian Filter to ensure the quality of the glyphs and natural connectivity between strokes. And since the glyphs at this step are vectorized path data, they become highly resistant to blurring and distortion during upscaling. This allows for further upscaling to 2k resolution (~2048×2048px) without losing fidelity.

图片包含 文本

描述已自动生成
图示

描述已自动生成

Over the course of the last three years, I have used this method to trace and convert over 50,000 archaic glyphs from 20px-sized raster images to 2k vectors that are of professional grade. As the glyphs are re-traced pixel-to-pixel from the rubbings, they are near-identical to the original forms, while being significantly upscaled and cleaned of noise. In this sense, the font can also act as a new “digital character compilation”. In contrast to the traditional print publication, a digital font is obviously smaller in size, quicker in indexing, and much easier to use.

And to make it easier for daily use, the font is also mapped directly to modern Chinese characters such as 日, 月, 云, 气 by assigning the corresponding “codepoints”, such as  U+ 65E5 for 日, U+6708 for 月. The codepoint, in this case, refers to a unique identifier assigned to each character (usually numbers or letters) located in the Unicode standard, which allows computers to represent and manipulate text consistently across different systems. By mapping the oracle bone glyphs to their corresponding modern Chinese characters through these codepoints, the font allows easy typing of oracle bone glyphs in common text documents like Microsoft Word and Excel, as well as support for creative designs and artistic mockups including posters, merch, and even animation videos.

电脑萤幕的键盘

中度可信度描述已自动生成

Glyph Database and Query Support for All Characters

A usable font is good and well of course – it solves some of our immediate needs for typing common occurring characters and making searchable transcriptions. But a standalone font is far from enough if we do not provide means for locating the desired glyphs at the same time. In academic situations, it is often required to quickly find and type a character from a specific rubbing or with specific pronunciation and components. And the decipherment status of the oracle bone script dictates alternative ways beside modern character or pinyin 拼音 must be considered, otherwise more than 3/4 of the oracle bone glyphs that do not have modern equivalents or transcribable forms will be missed (like the ones below).

Therefore, in addition to creating a font, we must record some extra information such as corpus origin, modern transcription, pinyin spelling, and calligraphic classification for each glyph to make them also suitable for other means of indexing. This information, once created (in either Excel or JSON), can then be used in SQL queries, a most common database language designed to search through complex information. Specifically, the website has implemented nine modes of database query to cover all possible search methods. These include:

  1. Modern transcription and pinyin spelling of a glyph
    (search “马” or “ma” to get )
  2. Components of a glyph (search “人木” to get 图标

描述已自动生成)
  3. Corpus origin (search “合1234” to get glyphs from no. 1234 in Jiaguwen Heji 甲骨文合集)
  4. Moxi 摹系 number (search “1234” to get glyphs from no. 1234 from the 2022 tracing compilation Jiaguwen moben daxi 甲骨文摹本大系)
  5. Zibian 字编 number (search “1234” to get the glyph no. 1234 from the 2012 character compilation Jiaguwen zibian 甲骨文字编)
  6. Xinbian新编 number (search “1234” to get the glyph from p. 1234 of the 2014 character compilation Xin jiaguwen zibian 新甲骨文字编)
  7. Gulin 诂林 number (search “1234” to get the glyph no.1234 from the 1996 dictionary Jiagu wenzi gulin 甲骨文字诂林)
  8. Code point (search “󴲜” (U+F4C9C) to get )
  9. UID – unique identifier string for each glyph (search “p8w7ujqanz” to get 卡通人物

描述已自动生成)

This can solve scenarios where a glyph can not be located by one specific method. For example, glyphs that do not have modern equivalents are usually composed of common components, e.g. 八, 魚 can be combined to produce   卡通人物

描述已自动生成. For glyphs that do not have identifiable components, we can use the cited occurrences from corresponding character studies, which might also prove very common for our daily scenarios. Of course, if all else fails, we can always turn to the rubbing number on which the glyph appears.

In addition to these search methods, the platform also offers powerful filtering functionality for the glyphs. The user can further limit the glyphs based on the type of the character (combined glyph, practice inscription, fake inscription, etc.) 字头类型, corpus origin 著录来源, calligraphic categorization 字形组类, glyph tags 字形标签, and decipherment status 释读情况. This adds favorably to the established precision of the search system which is designed to locate the correct glyphs based on user requirements.

In general, this combination of the font as the base glyph compilation and the database interface as the base search tool firmly guarantees the correct input of desired oracle bone graphs and sets the foundation for future development of a genuine glyph database that covers the functions of a dictionary, lexicon, and eventually a transcription corpus. Some effort has to be made in order to become familiar with the functionalities of these modules of course. But compared to the current academic situation where everything is done by copy-pasting images, it is no doubt a worthwhile attempt towards the efficient utilization of the textual resources this field has to offer.

Later in this series, I will continue to explore the functionalities offered by the Jingyuan platform for other aspects of the oracle bone scholarship.

4 thoughts on “Jingyuan Digital Platform: Font Making and Database Development for Shang Oracle Bones (Part 1)

Leave a Reply