Introduction
In my previous work exploring the first volume of Künhü’l-Ahbar, Gelibolulu Mustafa Âlî’s work on history and geography from the 16th century, I established a computational pipeline to extract and geocode mentions using three different gazetteers for Ottoman Turkish. While I leveraged Named Entity Recognition (NER) and geocoded the location entities via three different gazetteers for Ottoman Turkish, Pleiades, Ottgaz, and Ottoman NFS Gazetteer (Kabadayı et al., 2022), in this post, I focus on how to analyze the individuals who co-occur in sentences with geocoded location entities, which adds another layer to the map and bridges Geographic Information Systems (GIS) and network analysis. While a map consisting solely of geocoded locations can provide information on which places were mentioned in historical records, it only tells half the story. Analyzing who is mentioned with which place can reveal the source of information, a key figure for a place, and more. Thus, it is worthwhile to move beyond the “where” and reveal the “who.” The final data is visualized through a folium-based map. The source code, output data, and map can be found here.
Why Co-occurrence Matters
Co-occurrence analysis relies on the assumption that the proximity in text implies a relationship between two entities, in this case, a place and a person. In historical Ottoman Turkish records such as Künhü’l-ahbar, co-occurrence analysis can be insightful since a location is rarely mentioned in isolation, but it is mentioned with the figures that the author finds important to mention. Such figures can include the conqueror of a city, the writer of a source mentioned in the text, or an officer who was appointed to or resigned from a position in that place. Thus, we are no longer just mapping Istanbul or Egypt, but we are mapping the human interactions that define those spaces within the text.
Step 1: Finding the Sentence Boundaries
The structured data for Künhü’l-Ahbar comes from the Latin-transliterated Ottoman Turkish Corpus (LATOC). First, the sentence boundaries must be found. This is possible using the NLTK sentence tokenizer. Here, I obtained the geocoded location entities using the code that I generated for the previous post.
Step 2: Extracting People with NER
Once I had cleanly extracted the sentences containing 2,137 geocoded entities for 370 unique locations, I applied a Named Entity Recognition (NER) to uncover the people-space relationship. I processed the extracted sentences through the NER model enesyila/ota-roberta-base-ner. This process yielded 833 person-place co-occurrences.
Because the transformer model I use relies on subword tokenization, the output requires minor data cleaning, such as removing the Ġ artifact of RoBERTa tokenizers or joining multi-word entities. I then filter the dataset to only include sentences where at least one person was found, to keep the final map with the people for the geocoded locations. Thus, this method ignores the individuals that do not occur in the same sentence as a geocoded location entity. This limitation should be taken into consideration.
Step 3: Building the Map
Following the extraction of the locations and associated people, I use folium to build the map. While kepler.gl provides a solid visualization for such data, the web interface truncates the sentence if the context sentence is long. Since viewing the full sentence is important and sentences are often quite long in Ottoman Turkish, I used folium instead of kepler.gl.
I created a CSV table where each row has the location name, latitude and longitude, the name of the co-occurring PER entity, and the sentence of the mention with the APA-style reference.
When a user clicks on a location, in addition to the name of the location, they can see the people who were mentioned in the same sentence as the location in Künhü’l-Ahbar. These people are historians such as Ebū Ḥāmid-i Endülüsī for Baḥr-i Rūm, Süleymān bin Behsā for Enbār, and ʿAzīzī for Meyyāfāriḳīn, religious figures like Ḥażret-i Süleymān bin Dāvūd for Ṭuleyṭule, and political figures like Sulṭān Süleymān Ḫān bin Selīm for Ṭrablūs-Ġarb and Anṭūnyūs for Ḳayṣeriyye. This approach transforms the map from a simple geographic plot into a network.

Figure 1. The co-occurrence map.
The final interface enables users to view the person entities that might be of interest to historians. For instance, one mention says Baṭlamyūs ḳavlince Baḥr-i Hind’de yigirmi biñden ziyāde cezīre vardur [According to Ptolemy, there are more than twenty thousand islands in the Indian Ocean] (Gelibolulu Mustafa Âlî, 2020, 314). By utilizing larger datasets and miscellaneous-labeled entities that might have the work names, one can analyze the chain of knowledge transmission in Ottoman Turkish works through such references.
Conclusion
Leveraging such an approach offers a scalable, reproducible way to analyze large corpora of historical texts. By automatically extracting co-occurring entities, this work moves beyond basic mapping to visualize the human-place relations in the Ottoman Turkish work, Künhü’l-Ahbar. The same pipeline can be utilised for different domains, works, and authors in Ottoman Turkish. However, manually eliminating the false positives during postprocessing can significantly remove the noise. The code, map, and final data are publicly shared here. The code can be executed by updating the paths.
References
M. Erdem Kabadayı, Grigor Boykov, Akın Sefer, and Piet Gerrits, Kabadayi_Boykov_Sefer_Gerrits_Ottoman_NFS_Gazetteer_23112022_16296_populated_places_version_1, dataset (Zenodo, November 23, 2022), https://doi.org/10.5281/zenodo.7351936.
Gelibolulu Mustafa Âlî, Künhü’l-Ahbâr, 1. Rükün, ed. Suat Donuk (İstanbul: Türkiye Yazma Eserler Kurumu Başkanlığı Yayınları, 2020).
OttGaz: Ottoman World Gazetteer, accessed July 26, 2025, https://ottgaz.org/wiki/Main_Page.
Roger S. Bagnall et al., eds., Pleiades: A Gazetteer of Past Places, accessed July 26, 2025, https://pleiades.stoa.org/.

One thought on “From Coordinates to Connections: Mapping People and Places in Ottoman Turkish Texts”