Aozora Bunko: Notes on Usage

This article was co-authored by guest contributors Raúl Cervera Álvarez (Universitat de Barcelona) and Celia Gonzalez Diaz (Universitat Autonoma de Barcelona).

Introduction

The well-known database Aozora Bunko has been mentioned more than once in The Digital Orientalist. For example, James Harris Morris used data from Aozora Bunko back in 2020 in a post entitled “Web Scraping with Python for Beginners.” Similarly, Anna Oskina used data from Aozora Bunko in her post “Genius loci: Extracting names and places from Japanese texts.” Her excellent article is a detailed tutorial on how to extract proper names and places from Japanese texts and create word clouds via Python.

After having been used and noted by our colleagues, we thought that the repository deserved a post of its own. Here we seek to show with the support of images and screenshots, the depth and uses of this database, essential for any digital humanities specialist interested in Japanese Studies, while also explaining what exactly Aozora Bunko is, its goals, and the main pillars on which it stands. 

What is Aozora Bunko?

Aozora Bunko (青空文庫、literally “Blue Sky Library”) is a digital database of Japanese texts. Its main objective is digitizing texts that, according to Japanese copyright law, are in the public domain, so they can be distributed free of charge to the general public. Most of these digitizations are of Japanese literary texts, but some are Japanese translations of foreign (mostly English) works. 

Since its founding in 1997, Aozora Bunko has created and collected electronic copies of copyright-free texts which are compiled, transcribed and corrected by teams of volunteers. It currently contains more than twelve thousand titles. The collection is ordered according to the name of the author, the title of the work, or year of publication, which makes it easy to search for books. We will include instructions on how to use its search engine below.

Aozora Bunko’s logo.

How to use Aozora Bunko

Before we start exploring how this database works, we would like to mention that the interface is currently only available in Japanese, and therefore it might be difficult to navigate for users without any knowledge of the Japanese language. Nevertheless, in order to explain the platform to the readers of The Digital Orientalist regardless of their language abilities and for those new to Japanese studies, we will guide the reader through its expansive collection of literary texts and the main features of the site. 

At the top of the main page, you can encounter a short list of links to different instructions aimed at helping the user navigate through the website to find the files they are looking for and make use of them. Right below this list of instructions, you can find a table of contents which includes a complete index of all the texts available in the collection (second row of the table, see screenshot below) and search criteria based on the first syllable of the surname of the author (third row) or the first syllable of the title of a literary work (fourth row).

The surname of the authors are using the gojūon 五十音 system. In other words, they are grouped according to syllable groups. The user chooses the group where the first consonant sound matches that of the initial syllable of our author’s surname. For example, to find a surname starting with a syllable beginning with the English “k” sound, or the Latin letter “k” when romanized i.e. a surname starting with ka , ki , ku , ke or ko こ sound, the user must choose the syllable corresponding syllable group which is this case is kagyō か行. For the title of works, however, the list is composed of all the main syllables of the hiragana chart. As such, we can choose the syllable we are looking for directly from this list. 

In the two bottom rows of this table of contents we can find a link to a list of works that are being reviewed and are expected to be uploaded to the repository (fifth row) in the near future, and another search criteria (sixth row) which allows users to search according to the genres and topics of the texts in the data based.

Besides the search criteria we have just mentioned, you can also use your keyboard to search in the top right corner if you already know what you are looking for. Once you type the keyword (name of the author, title of the work, etc) and press the enter key, it will redirect you to a list of search results. 

Screenshot of the main page of Aozora Bunko.

Now that we have reviewed the main search tools that appear on the main page, we will show an example of an advanced search. This will help readers to appreciate all the information that the Aozora Bunko site offers.

Let’s suppose we are interested in finding The Crab Cannery Boat (Kani Kōsen 蟹工船), a novel written by Kobayashi Takiji 小林多喜二. If we search for this work via the author’s surname, we should look for the syllable group that starts with “k” (i.e. the initial of the writer’s surname), which means we should select kagyō か行 as in the above example. After clicking this option, a list of authors whose surnames start with the letter “k” will appear on the screen. 

Screenshot of the page where we can find the list of authors whose surname starts with the letter “k.”

Once we have found and clicked on Kobayashi Takiji in this list, we will see a brief section of biographical information about the writer, a list of novels and stories written by this author that are available on Aozora Bunko and a list of titles of works which are under revision, but have not yet been uploaded. 

Screenshot of Kobayashi Takiji’s author page.

On this page, we must select Kani Kōsen 蟹工船, the original title of The Crab Cannery Boat, and then we will be able to access more biographical information about Kobayashi’s life, information about the novel, publication dates and details, and files or links to the text in different formats such as HTML, which can be read online, and ZIP folders which need to be downloaded. The text can be copy-pasted and in this case includes furigana, which is a reading aid consisting of phonetic transcription notes added above the kanji that indicate its pronunciation in this particular context.

Screenshots of “The Crab Cannery Boat” page.

Review and Concluding Thoughts

Aozora bunko has managed to establish itself as the online repository par excellence for many reasons. From the enormous wealth of documents available on it, to its relatively intuitive search engine, this database is a key tool, due to its accessibility, to address pieces of Japanese literature for those academics who do not have the resources to set up a very extensive personal library or who do not have the possibility of accessing libraries or physical repositories easily.

To wrap up, we would like to remark the notable influence this project has had in the popularization of Japanese literature around the globe. Not only because it offers its features free of charge to all those interested in the subject, is easy to access and comfortable to read, but also because the content uploaded can be accessed by foreign translators interested in translating those works to their own language without any added costs. Of course, it is important to note that legislation varies according to country, so it is worth searching for the relevant legislation that affect you before creating and publishing translations.

If the text is out of copyright (or not copyrighted) in Japan, or has been published under an open license that grants you the necessary rights, you can create adaptations (such as translations) and publish them. The creator of these adaptations or translations owns the copyright to these new versions if they are sufficiently original.

To better explain the aforementioned global influence, we thought it would be interesting to talk about the situation in our country, Spain, where the fair use doctrine doesn’t exist. We have noticed that many translations of Japanese Literature works have been done via the texts available in the Aozora Bunko repository, a fact that has certainly reduced translation costs and promoted a situation in which freelance translators can copyright their own versions of the works. See the example in the below figure. 

An example of how Aozora Bunko has been the main source of many translations to the Spanish language. This picture has been taken from the bibliography of “Vida de un militante y otros relatos proletarios,” Ediciones Satori (2022), a compilation of stories by Kobayashi Takiji (小林 多喜二).  We can observe here that all the cited texts have been taken and translated directly from the repository.

2 thoughts on “Aozora Bunko: Notes on Usage

Leave a Reply