HTR, Islamic Languages, Islamic Studies, New Post, OCR, Ottoman Studies

At the Dawn of Digital Studies on Arabic Script in France (3): Bridging Communities and Building the Future

Introduction The first article of this series explored the general landscape of digital studies on Arabic script in France. The … Continue reading At the Dawn of Digital Studies on Arabic Script in France (3): Bridging Communities and Building the Future

Japanese Studies, New Post, OCR

Overview of NDL Kotenseki OCR: The Key to Early Modern Japanese Archives?

This article was written by guest contributor Xia-Kang Ziyi (Kyushu University). The author bio is below. One of the biggest … Continue reading Overview of NDL Kotenseki OCR: The Key to Early Modern Japanese Archives?

Japanese Studies, OCR, Online Resources

Politicians Speak: An Introduction to the Database System for the Minutes of the Imperial Diet

Japanese historians often cite the minutes of the Imperial Diet[1] to demonstrate changes in state policy and the views of … Continue reading Politicians Speak: An Introduction to the Database System for the Minutes of the Imperial Diet

AI, Conference Proceedings, HTR, LLM, OCR

An Introduction to Conference Proceedings You May Have Missed: Unaccompanied Videos

Since November 2025, we have been publishing the proceedings of The Digital Orientalist’s Virtual Conference 2025 (AI and the Digital … Continue reading An Introduction to Conference Proceedings You May Have Missed: Unaccompanied Videos

AI, Apps, Archiving, Coding, Conference, DH in General, Digitization, Indian Studies, Machine Learning, OCR, Online Resources, Social Media, Software

Imagining a Contextual Tech for Indian Languages: A Postcard from Bahu Bhasa 2025

Building on the conversations initiated under the Future of the Commons (overviews of which can be found here and here), … Continue reading Imagining a Contextual Tech for Indian Languages: A Postcard from Bahu Bhasa 2025

DH in Practice, New Post, OCR, South Asian Studies, Workflow

Why Extracting Hindi Text from PDFs Is So Much Harder Than English (And How You Can Do It)

As a Digital Humanities student working with Hindi-language texts, I expected extracting text from a Hindi PDF to be a … Continue reading Why Extracting Hindi Text from PDFs Is So Much Harder Than English (And How You Can Do It)

DH in Practice, New Post, OCR, South Asian Studies, Teaching

Teaching Bengali Digital Texts to Anglophone Undergraduates: What Voyant Reveals about the Infrastructural Bias of DH Tools

In designing an introductory Digital Humanities class, I am often faced with the question of how best to incorporate linguistic … Continue reading Teaching Bengali Digital Texts to Anglophone Undergraduates: What Voyant Reveals about the Infrastructural Bias of DH Tools

African Studies, AI, OCR, Textual Analysis, Visualization

Islam West Africa Collection: Dataset, Distant Reading, and Uses of AI for Discourse Analysis

Islam West Africa Collection (IWAC), created and maintained by F. Madore, is an open-access database that provides access to press clippings from the mainstream press in West African (Burkina Faso, Ivory Coast, Benin, Togo, Niger, Nigeria) as well as Islamic publications, and video recordings, all of those documents related to Islam. Complex tools enable discourse analysis and answer various scientific questions through keywords mapping, topic modelling, sentiment analysis and spatial visualization. Continue reading Islam West Africa Collection: Dataset, Distant Reading, and Uses of AI for Discourse Analysis

AI, DH in General, HTR, Islamic Studies, New Post, OCR

At the Dawn of Digital Studies on Arabic Script in France (2) : A Brief History of Handwritten Arabic Text Recognition in France

Introduction The first article of this series explored recent advances in the digital study of Arabic script in France in … Continue reading At the Dawn of Digital Studies on Arabic Script in France (2) : A Brief History of Handwritten Arabic Text Recognition in France

DH in Practice, HTR, Machine Learning, New Post, OCR, Syriac Studies

From Vienna to the World: Launching the First Public Syriac HTR Model on Transkribus

Readers of The Digital Orientalist, you are among the first to know! Today’s post is dedicated to the release of … Continue reading From Vienna to the World: Launching the First Public Syriac HTR Model on Transkribus