This paper presents a computational pipeline for working with spatial data in Ottoman Turkish, from extracting place names via named-entity recognition (NER) to geocoding and, finally, mapping toponyms.
Introduction
With the “spatial turn,” Geographic Information Systems (GIS) have been increasingly leveraged in Digital Humanities, offering new directions for studying historical texts. Previous studies (e.g., Emiralioğlu 2019) underscored the depth of Ottoman geographic knowledge. In this context, computationally investigating the spatial limits of this knowledge via GIS can provide new insights and confirm the qualitative findings with quantitative evidence. Although many studies deployed GIS on Ottoman Turkish texts before (Ma 2021, Yaycıoğlu et al. 2022), this paper provides a reproducible pipeline for analysing Ottoman Turkish texts.
By presenting an end-to-end automated mapping pipeline consisting of extracting entities, geocoding, and mapping them, this paper answers the question of how we can start getting the raw data for mapping. The scripts utilised in this study are also shared openly so that readers can apply the same pipeline to their data.
Data and Resources
Although Ottoman Turkish is no longer a living language, a substantial amount of data is openly available online. However, the degree of standardization across these data sources varies considerably. While Ottoman Turkish was originally written in the Perso-Arabic script, texts are often transliterated into the Latin alphabet. In practice, no single standard is consistently applied: some texts indicate only long vowels while providing consonants according to the modern Turkish alphabet, whereas others follow the IJMES transliteration chart. The IJMES system minimizes information loss when mapping the Ottoman Turkish script onto Latin characters. In this study, I therefore utilize texts transliterated according to the IJMES standard. Since the named entity recognition (NER) model discussed below was also trained on texts in this format, input data provided in other transliteration schemes may not yield good results.
As potential data sources, readers can check the Latin-transliterated manuscripts held by the Presidency of the Manuscripts Institution of Turkey, which is the largest Latin-transliterated Ottoman Turkish data source. Another data source is the DUDU treebank, the largest annotated Latin-transcribed Ottoman Turkish treebank with 1,782 sentences and 17,125 words (Yılandiloğlu and Siewert 2025).
Raw data is always important, but a gazetteer is also essential for the task of geocoding. Several open access gazetteers are available online, including NFS, which contains 16,296 populated places from 764 population registers dating between 1830 and 1849 (Kabadayı et al. 2022); OttGaz, an Ottoman Turkish gazetteer for 1,068 geocoded administrative place names in the Ottoman Empire (Hanley 2024); and Pleiades with 351 Ottoman Turkish names for 269 unique places (Bagnall et al. 2016). While the first two deal with administrative locations in the Ottoman Empire, the last one also focuses outside of the Ottoman Empire and records all types of geographic locations, including rivers, mountains, and populated places.
NER processing can be implemented via the state-of-the-art transformer model trained on 6,992 IJMES-transliterated Ottoman Turkish entities and yielded a 90% span-level micro-averaged F1 score. The model is publicly accessible.
The NER Pipeline: Solving the “Sentence” Problem
A central challenge in applying named-entity recognition to Ottoman Turkish texts is the absence of clearly defined sentence boundaries. Many Ottoman Turkish texts lack punctuation marks, which makes sentence-based NER approaches unreliable. To address this issue, a chunk-based approach that is independent of sentence segmentation can be implemented. In other words, the input texts can be split into fixed-length chunks of 128 tokens. To mitigate the loss of contextual information at chunk boundaries, a sliding window strategy can be used with a stride of 32 tokens, meaning that each window moves by 32 tokens for each chunk. The chunk-based NER processing is implemented in the Jupyter notebook.
Geocoding and Disambiguation: From Name to Coordinate
Following entity extraction, the extracted names are geocoded using the gazetteers. Before matching, the pipeline applies a normalization step that converts selected Ottoman transliteration characters (e.g., ḳ → k) into simplified forms. This normalization substantially increases match rates since NFS and OttGaz do not utilize IJMES transliteration. The same is applied to the place names in Pleiades as well.
A key challenge in geocoding historical texts is ambiguity, as many place names (e.g., Ḳara Ḥiṣār) correspond to multiple locations. To address this, the pipeline implements a disambiguation algorithm based on distance (Yılandiloğlu 2025b). For each ambiguous place name, the script identifies the five nearest unambiguous locations within the same document. To disambiguate locations, the pipeline selects the candidate that is spatially closest to the majority of the five nearest unambiguous entities, which fall within a 200-kilometer radius. This method leverages geographic context, assuming that place names mentioned together in a text are likely to be spatially proximate. While not perfect, this approach provides a transparent and reproducible mechanism for resolving homonymous toponyms in large-scale automated analyses. It is therefore recommended to manually inspect the output of this algorithm to remove falsely geocoded names.
Visualization: Mapping with Kepler.gl
For visualization, the pipeline leverages Kepler.gl, an open-source, browser-based mapping tool that requires no programming expertise (kepler.gl Developers 2025). By providing various tools for mapping, Kepler.gl enables researchers to ask various research questions, such as identifying the geographic distribution of the place names mentioned in the historical sources or dense regional clusters, such as concentrations in Anatolia or the Balkans.
The final output of the geocoding step is a CSV file containing place names, coordinates, and metadata, which can be directly imported into Kepler.gl via drag-and-drop. Once loaded, users can interactively explore the spatial distribution of locations by applying filters, adjusting density-based clustering, etc. In the left-hand panel, users can create layers, filter the data, and modify the base map configurations (see the official documentation for further information). A snippet from the Layers panel can be seen in Figure 1.

Figure 1. Layers panel in Kepler.gl.
An exemplary map can be observed in Figure 2 for the first volume of Künhü’l-ahbar, written by Mustafa Âlî.

Figure 2. The geocoded names in the first volume of Künhü’l-ahbar.
Figure 2 shows the diverse geographic coverage of Künhü’l-ahbar from the Atlantic Ocean, Baḥr-i Muḥīt to Sri Lanka, Cezīre-i Serendīb. It also demonstrates that the regions around Istanbul and the Levant were mentioned the most frequently, while Yemen and Europe remained underrepresented. However, the insufficiency of the gazetteers significantly hinders the pipeline from capturing Ottoman Turkish place names. For instance, since the gazetteers do not have an entry for Enderīn (London) and Paris, although the book mentions them (Mustafa Âlî 2020, 1:275, 395), they cannot be seen in Figure 2. This creates the misleading impression that the book does not mention these places at all. Therefore, the map should be critically interpreted, and the limitations should be considered. The CSV file that is used to generate Figure 2 can be found in Zenodo.
Conclusion
To conclude, this paper presents a fully automated and reproducible pipeline for extracting, geocoding, and mapping toponyms from Ottoman Turkish texts. While such a pipeline substantially reduces manual labor, it also requires critical engagement with the method itself, from scrutinizing the reasons for the unmatched entities, which might be due to the insufficiency of the NER model and the existing gazetteers, to disambiguation strategies.
All scripts used in this study are publicly accessible on the notebook. Readers can upload their own Ottoman Turkish text files and gazetteers to the notebook and adapt the pipeline to their own research questions.
Please feel free to contact me with any questions!
References
Bagnall, Roger S., et al., eds. 2016. Pleiades: A Gazetteer of Past Places. Accessed July 26, 2016. https://pleiades.stoa.org/.
Emiralioğlu, Pınar. 2020. “The Ottoman Enlightenment: Geography and Politics in the Seventeenth- and Eighteenth-Century Ottoman Empire.” The Medieval History Journal 22 (2): 298–320. https://doi.org/10.1177/0971945819897449.
Kabadayi, M. Erdem, Grigor Boykov, Akın Sefer, and Piet Gerrits. 2022. “Kabadayi_Boykov_Sefer_Gerrits_Ottoman_NFS_Gazetteer_23112022_16296_populated_places_version_1.” Dataset. Zenodo, November 23. https://doi.org/10.5281/zenodo.7351936.
kepler.gl Developers. 2025. kepler.gl. Accessed July 26, 2025. https://kepler.gl/.
Ma, Jilian, Akın Sefer, and M. Erdem Kabadayı. 2021. “Geolocating Ottoman Settlements: The Use of Historical Maps for Digital Humanities.” Proceedings of the International Cartographic Association 3: 1–8. https://doi.org/10.5194/ica-proc-3-10-2021.
Mustafa Âlî, Gelibolulu. 2020. Künhü’l-Ahbâr. 1. Rükün. Edited by Suat Donuk. İstanbul: Türkiye Yazma Eserler Kurumu Başkanlığı Yayınları.
OttGaz: Ottoman World Gazetteer. 2025. Accessed July 26, 2025. https://ottgaz.org/wiki/Main_Page.
Yaycıoğlu, Ali, Antonis Hadjikyriacou, Fatma Öncel, Erik Steiner, and Petros Kastrinakis. 2022. “Mapping Ottoman Epirus (MapOE).” Journal of the Ottoman and Turkish Studies Association 9 (2): 145–152. https://doi.org/10.2979/tur.2022.a902180.
Yılandiloğlu, Enes, and Janine Siewert. 2025. “DUDU: A Treebank for Ottoman Turkish in UD Style.” In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2025), edited by Špela Arhar Holdt et al., 74–79. Tartu: University of Tartu Library.
Yılandiloğlu, Enes. 2025a. “From Text to Map: A Reproducible Geocoding Pipeline for Ottoman Studies.” Zenodo. https://zenodo.org/records/17968293.
Yılandiloğlu, Enes. 2025b. “Mapping Orientalism: A Quantitative Study of Eighteenth-Century British Travel Writing.” Master’s thesis, University of Helsinki. http://hdl.handle.net/10138/602337
Roger S. Bagnall et al., eds., Pleiades: A Gazetteer of Past Places, accessed July 26, 2025. https://pleiades.stoa.org/.
Pınar Emiralioğlu, “The Ottoman Enlightenment: Geography and Politics in the Seventeenth- and Eighteenth-Century Ottoman Empire,” The Medieval History Journal 22, no. 2 (2020): 298–320. https://doi.org/10.1177/0971945819897449.
M. Erdem Kabadayı, Grigor Boykov, Akın Sefer, and Piet Gerrits, Kabadayi_Boykov_Sefer_Gerrits_Ottoman_NFS_Gazetteer_23112022_16296_populated_places_version_1, dataset (Zenodo, November 23, 2022). https://doi.org/10.5281/zenodo.7351936.
kepler.gl Developers, kepler.gl, accessed July 26, 2025. https://kepler.gl/.
Jilian Ma, Akın Sefer, and M. Erdem Kabadayı, “Geolocating Ottoman Settlements: The Use of Historical Maps for Digital Humanities,” Proceedings of the International Cartographic Association 3 (2021): 1–8. https://doi.org/10.5194/ica-proc-3-10-2021.
Gelibolulu Mustafa Âlî, Künhü’l-Ahbâr, 1. Rükün, ed. Suat Donuk (İstanbul: Türkiye Yazma Eserler Kurumu Başkanlığı Yayınları, 2020).
OttGaz: Ottoman World Gazetteer, accessed July 26, 2025. https://ottgaz.org/wiki/Main_Page.
Ali Yaycıoğlu, Antonis Hadjikyriacou, Fatma Öncel, Erik Steiner, and Petros Kastrinakis, “Mapping Ottoman Epirus (MapOE),” Journal of the Ottoman and Turkish Studies Association 9, no. 2 (2022): 145–152. https://doi.org/10.2979/tur.2022.a902180.
Enes Yılandiloğlu and Janine Siewert, “DUDU: A Treebank for Ottoman Turkish in UD Style,” in Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL 2025), ed. Špela Arhar Holdt et al. (Tartu: University of Tartu Library, 2025), 74–79.
Enes Yılandiloğlu, “From Text to Map: A Reproducible Geocoding Pipeline for Ottoman Studies” (Zenodo, 2025a). https://zenodo.org/records/17968293.
Enes Yılandiloğlu, “Mapping Orientalism: A Quantitative Study of Eighteenth-Century British Travel Writing” (Master’s thesis, University of Helsinki, 2025b). http://hdl.handle.net/10138/602337.
