Linked Data: An Intensive Introduction by Terhi Nurmikko-Fuller

Three seems to be a magical number. Something about it is so convenient, encapsulated in Netflix’s method of describing a film for its audience. If one is too tired to read the 2-3 lines film description, the three keywords (“mind bending”, “thriller”, “quirky”, and so on) constitute a network in itself to help one decide whether to press “play” or not. Another instance of the popular usage of three can be seen in applications such as what3words, which let someone describe or navigate locations in just three words. 

Within digital humanities projects and digital projects in general, the deployment of thinking in terms of three finds a structuring sensibility through the linked data publishing model. In her book Linked Data for Digital Humanities, Terhi Nurmikko-Fuller provides an orientation of linked data and its relevance to enable digital archivists and digital humanists to pay closer attention to their process of information and knowledge organization. 

As a paradigm of publishing on the Internet, linked data is a way of organizing the metadata behind various forms of content—text, images, video, audio — in a manner that makes these discoverable. The Semantic Web, which is a way of referring to the Internet in terms of a machine-readable network of information, uses descriptive frameworks such as linked data in order to help users arrive at the information they are looking for. Those working in the GLAM sector – beginners as well as advanced practitioners, who might be unaware of newer principles of publication paradigms on the Semantic Web or want to upgrade their metadata with changes or upgrades in the publication models – are very likely to find the book an invaluable starting point, prompting reflection on the efficacy of their own current strategies for making their projects machine-friendly in order to make them audience-friendly. But digital humanists in general must read it for an understanding of the nuances of the digital. 

These nuances, digital humanists might be surprised to learn, resonate with literary and philosophical approaches to examining the ways of the world: Austrian philosopher Ludwig Wittgenstein’s work is known to have quoted fellow writer Ferdinand Kürnberger saying that everything one knows can be captured in three words. 

Another example that helps one understand the origins of linked data within the pre-digital conception of the world is from the domain of English grammar. The Subject-Verb-Object structure – Dickens wrote Great Expectations – presents a way of connecting the author and the text. The Subject and the Object are “data entities”that are connected by the verb (“wrote”) that defines the relationship between the two entities. 

The linked data approach is thus conceived of in terms of Resource Description Framework triple (or RDF triple) that consists of relationships between various elements in a project. These triples are visualized as “minute networks”, diagrams of circles and lines that are expressed through Hypertext Transfer Protocol (HTTP) URIs (Uniform Resource Identifiers): these are not clickable and return a 404 code if clicked. They exist to represent data and mediate the steps involved among accessing different pieces of information within and outside a discourse. 

In order for them to work efficiently, they need to be absolute and unambiguous in the way they frame entities and the relationships among them. This is where an understanding of ontologies – descriptions of how entities are related to each other – becomes helpful, especially in determining what kinds of relationships are likely to make sense to machines or software, allowing these to make meaning out of content and its relationship with other pieces of content. 

In her work, Terhi Nurmikko-Fuller includes quite a few case studies, spanning vast topics and spatio-temporal distances, that make use of linked data. These are useful examples that help demonstrate the principles and applications of linked data and are likely to inspire digital humanists to build archives and other forms of live projects as per these models. 

For example, PARADISEC (Pacific and Regional Archive for Digital Sources in Endangered Cultures) hosts more than 200 terabytes of audio and video data in more than a thousand languages from the Pacific region.

ElEPHãT (Early English Print in HãthiTrust), another case study, is also built on the paradigm of linked data making data and collections from different sources available in a coherent way to make searching and researching processes easier for those interested in the period and print culture of that era.

Detractors of linked data claim that linking everything can have disastrous consequences for privacy, or that nothing can ever be described without falling into the traps of stereotyping individuals and groups. In addressing these concerns, Nurmikko-Fuller puts forth strong suggestions:

  1. She observes that the solution to these issues is not less information, or broken information, or hidden information. The solution lies in taking greater ownership of the processes of data creation and data consumption by data producers and data consumers. 
  2. Besides, if one is supportive of openness as an ideal on the Web, linked data, when connected with other people’s data, contributes to linked open data, even Five Star Linked Data according to the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics). 
  3. Even if several items or entities need to remain private, or not open, for example, the materials that are protected by copyright, linked data can still help researchers and users because their metadata is not, or should not be, copyrighted. 
  4. It is imperative for ontologies to be conceptualized in as diverse ways as possible. If left to Big Tech, mostly unwittingly, the taxonomy involved in the organization of information will be governed by Western, WASP (White Anglo-Saxon Protestant) models of looking at the world. Projects from the Global South and perspectives from the Global South may have diverse ways of linking ideas and concepts, all of which need to be made visible through inventions of novel linked data paradigms. 

As the above discussion might reveal, Nurmikko-Fuller’s book provides an in-depth, and compelling, introduction to linked data and its relevance to engagement with studies in digital technology. But if one were to summarize it in the form of one takeaway, one must note that, more than anything else, thinking in terms of linked data is a natural extension of how the human mind operates. That is, just as humans make sense of things with the help of association, publishing information on the Semantic Web via linked data makes navigating websites and applications deliver an intuitive experience to users. As “the ultimate method for identifying and uniquely labelling everything”, in the words of Nurmikko-Fuller, linked data is a reliable way to document information. 

How well or soon this approach to publishing data is widely-adopted is likely to be understood as more digital humanists and archivists who work with linked data publish their experiments and processes. While the book focuses only on linked data, it also opens up more areas of conversation with others discussing various points of comparison and contrast between linked data and other models.  

References

Terhi Nurmikko-Fuller, Linked Data for Digital Humanities, 1st Edition (London: Routledge, 2023) 

____

Cover image by Thomas Shafee. Wikidata in the Linked Open Data Cloud. Databases indicated as circles (with wikidata labelled as ‘WD’), with grey lines linking databases in the network if their data is aligned. (Data from https://lod-cloud.net/datasets)

Leave a comment