Your first computed visualization, or, how to use Python to understand the reception of Bukhari and Muslim Hadith collections

There comes a time when using Word or PowerPoint or Excel is not going to be enough to create a visualization. There is simply too much to process or the envisioned result is too complex to do by hand. Then what? It’s time to learn a few more tricks. This is best done on something that is only slightly more complex than what you previously were used to do. For example, from drawing a master-student network diagram by hand to a project involving thousands and thousands of such relations in which you want to compute the average distance is too big a leap to take all at once. Maybe you will eventually learn the technical calculations needed, but even then it is doubtful that you will understand how to apply it and make fruitful use of it. Strategically using software and exploiting regularities in your data is perhaps the most important to gradually learn. So here is an example of a small step.

The Example

The problem we are faced with is to say something about two popular works that are very much alike. In the 9th c. CE a person we refer to as al-Bukhari and a person we refer to as Muslim both compiled a collection of Hadith: stories about Muhammad. There were many other compilations but these compilations quickly became the most popular. They were deemed highly reliable, in Arabic ṣaḥīḥ, and So we know them as Sahih al-Bukhari and Sahih Muslim. But how popular were they, and how does their popularity compare?

First idea: We can shift this practical question to a research question by observing that commentary writing is a sign of popularity. So a parallel question is: what are the commentaries written on these collections throughout history?

Second idea: Collecting all commentaries is way too much work. But maybe counting unique ones is enough as a rough measure. Probably someone else already went to the trouble of charting all commentaries. After some digging around, I found the perfect book that did this: jāmiʿ al-shurūḥ wa-l-ḥawāshī by Abdallah Muhammad al-Hibshi.

Third idea: 30+ pages of names and names is too much to describe or visualize by hand. What if we only focus on the death date of the authors? Then the data entry becomes doable by hand.

Fourth idea: what if we had those death dates, what could we do with them? One thing we could do is group them by century and do a simple bar chart for every century from the 9th to the 20th. But I have the impression the data we have (death dates of 600+ commentators) is more granular than to simply chunk it by century. What if we did some sort of heat map? This step in my thinking required previous experience. I had to know what kind of visualizations are possible and how each situation invites only certain visualizations.

We can create a heat map showing interest over time in the Sahih collections of al-Bukhari and Muslim by collecting death dates of commentators.

The central idea

The Central Idea

This is the central idea: to use death dates of commentators to find the parts of history when the Sahih collections were more or less popular. Why is this a good idea?

  1. There are packages within Python, such as Seaborn, that will almost instantaneously generate these heat maps if we feed it the right information. This makes it drastically less complicated than you may have previously assumed. All we need is a simple list of those dates and we’re set. (not entirely but more on that in a minute).
  2. The amount of manual labor is doable: browse through 30+ pages and type out any death date you encounter.
  3. We want to see the big picture, not too general nor too detailed. Those death dates are a good middle ground between looking at the actual textual transmission and the general perception of the literary canon.
  4. We are covering 1200 years and need an even measure throughout the period. Those death dates are a fairly good bet for this. As an example: we could also count the number of manuscript copies of Sahih al-Bukhari and Sahih Muslim, but this would certainly be severely skewed towards certain centuries which have a higher survival rate and certain regions which have catalogued their manuscripts more pervasively. It may work if we would only be looking at a certain period.

It will give us, then, a fair view within a reasonable amount of work using technology in a straightforward manner.

You may be taken aback a bit by the prospect of having to use Python, a real programming language, but you will soon see that we won’t do that much with it. Besides, starting out with Python will be just as daunting now as it will be tomorrow and the benefits of getting the hang of it are plentiful. So why don’t you?

Working Backwards

Now that we have a general idea of what we want to do, I find it useful to start at the end and walk backwards. As I indicated before, Seaborn is an easy choice that gives very nice looking graphics right out of the box. So what does Seaborn need? A list of death dates? Not exactly. We could do that, but that would make us move in the direction of a scatterplot. We initially talked about a heat map. One that looks like this:

This is a heat map of not just the commentators on al-Bukhari and Muslim, but on four other Hadith collections as well, in total more than 600 commentators. From left to right we have time, indicated in the Islamic calendar (200 AH equals very roughly to the year 800 CE, 1400 AH equals very roughly 2000 CE). The darker purple, the more commentators lived at once. We can see clearly that there are two major concentrations in history. This coheres with the narrative presented by Joel Blecher, scholar of the history of Hadith collections.

To satisfy Seaborn in the best way, we would need to pivot our data: instead of death dates we want number of commentators per year.

Fifth idea: we can imagine 1200 buckets: one for each year between 200 AH and 1400 AH (okay make it 1201 buckets to include both those years as well). Imagine yourself standing in front of those buckets with a very large amount of marbles in your hands. We go down the line of buckets and if we think a commentator lived in that year, we put a marble in its bucket. Of course we don’t know when these commentators lived, but we know when they died. So let us make an assumption of a 40-year floruit before they died. Then instead of going down the line of buckets, we can treat the commentators one by one. We can take a death date, find the bucket corresponding to it, place a marble in it, and do so for the 39 buckets on the left of it. Doing so for all death dates will give us 1200 buckets with some only a few and others very many marbles.

Sixth idea: Technologically, we are thinking of a simple Excel sheet with the first column the year and the second column the number of commentators, twelve hundred rows strong. In fact, an Excel sheet is just a description of what is just as good, namely a .csv-file.

So from the graphic that Seaborn will make, we walked back to a csv-file. We basically already described the process how we can get from death dates to number of alive commentators, and this we will do with some lines of Python that we will write ourselves.

Python can best handle these death dates as a list. Because then we can just loop over the list. This list can be loaded in, so we are free to create this list wherever we want. What I mean is that this list needs to be manually entered, by looking at the book from al-Hibshi and typing out all death dates. If you are most comfortable with it, you can use Excel. Or just a simple text editor. Doesn’t matter. It’s probably best to save it as a .txt or .csv file.

So now we have this workflow in mind:

  • Read al-Hibshi’s book

    You will find the relevant passages from the book

  • Manually type out all death dates

    You will get a .txt file with on every new line a year of death

  • Write a Python script to convert the death dates to number of commentators alive per year, for 200AH-1400AH

    You will get a .csv file with 1200 lines, each representing a year, with the number of commentators presumed to be alive

  • Plug this .csv-file into Seaborn using Python, and customize the looks here and there

    You will get .svg and .jpg graphics showing where concentrations of commentators are over the centuries

Putting it into Practice

I hope that you will manage the first two stages without problems. Find your source, type out what you need. But you are probably unsure how to do steps three and four. If you are completely new to Python I have some bad news for you: you will be adding a few more hours to this project to familiarize yourself with Python and its ecosystem. What I would recommend is getting a very general sense of what Python is, what pip is (it helps you install other people’s code so you can immediately use it and don’t have to reinvent the wheel each time), and what Jupyter Notebooks are (it helps you code line by line and get visual feedback, which is especially great for data exploration). There are a myriad of resources available such as ProgrammingHistorian’s introductions to Python, pip, and Jupyter, or take the RealPython introduction to Python and pip and its intro to Jupyter, or give any of these terms a search on YouTube or the wider web. You may need some help setting things up for the first time; it can be a hurdle.

Once you have Python installed, and you have a more experienced people around to call for help, Here is the code I used for step three:

# Setting things up
collections = ['Tirmidhi', 'Nasai', 'Muslim', 'Ibn Maja', 'Bukhari', 'Abu Dawud']

numbersInCollections = []

for i in range(0,len(collections)):
    numbersInCollections.append([0] * 1201)

def FromDeathToLife(collection, lifespan):
    numberOfCommentatorsPerYear = [0] * 1201
    with open(collection+'.txt', 'r') as deathDates:
        for date in deathDates:
            death = int(date)
            birth = int(date)-lifespan
            for year in range(birth,death):
                numberOfCommentatorsPerYear[year-200] += 1
    return numberOfCommentatorsPerYear

# Loading into list of lists
for collection in collections:
    numbersInCollections[collections.index(collection)] = FromDeathToLife(collection, 40)

# Writing to disk for individual Hadith collection
for collection in collections:
    y = 200
    with open('lifespan40commentarieson' + collection + '.csv', 'w') as output:
            output.write("year,commentators,matn")
            for number in numbersInCollections[collections.index(collection)]:
                output.write("\n" + str(y) +"," + str(number)+"," + collection)
                y += 1

# Total commentators six collections
allCommentatorsPerYear = [0] * 1201
for i in range (0,1200):
    for x in range (len(collections)):
        allCommentatorsPerYear[i] += numbersInCollections[x][i]

with open('lifespan40All.csv', 'w') as output:
    y = 200
    output.write("year,commentators,all")
    for number in allCommentatorsPerYear:
        output.write("\n" + str(y) + "," + str(number) + ",all")
        y += 1

# Sahihayn
with open('lifespan40Sahihayn.csv', 'w') as output:
    y = 200
    output.write("year,commentators,matn")
    for number in range(0,1201):
        Bukhari = numbersInCollections[4][number]
        Muslim = numbersInCollections[2][number]
        output.write("\n" + str(y) +"," + str(Bukhari)+",Bukhari")
        output.write("\n" + str(y) + "," + str(Muslim) + ",Muslim")
        y += 1

A few things to note: you are seeing the final code all at once, so it is almost by necessity overwhelming. You are also seeing the code in itself, and not in a Jupyter Notebook. I am going to leave the implementation of the code to you as an exercise. I know it could be a tall order but I am constrained by space and the medium of a written article doesn’t help either. Here is something you might not have realized: professional programmers constantly search the internet for how to do the next step they want to take. Start small, keep searching, keep trying, and hack your way towards a satisfactory result.

In summary form, here is what the code does:

Lines 1-7: we define which books we study and what time range we will use.

Lines 8-17: we define a function that will open a .txt-file with death dates, takes a death date, calculates the earliest floruit date, then puts a marble in the buckets representing their life.

Lines 18-21: here we actually let the computer do the calculations for all six collections, by calling up the function for each name and giving it a floruit of 40 years. The computer now has our result in memory, separate by collection.

Lines 22-30: for each collection, make a new file on the computer with the extension .csv, and first write a line that simply says “year, commentators,matn” – this will function as a header for each column. Then go through all the years and write a new line for each including the actual year, the number of commentators, and the source on which they commented. It may read counter-intuitive to see “with open” when we actually want to write this information into a new file, but writing it this way will make sure that once we escape the with-command, it will automatically close it for us, so it is a very compact way of writing the code.

The remaining lines basically repeat the logic, but combine several collections.

Here is the code for the next and final step:

import pandas as pd
import matplotlib.pyplot as plt
import scipy
import seaborn as sns

df = pd.read_csv('lifespan40Sahihayn.csv')

heatmapData = pd.pivot_table(df, values='commentators', index=['matn'], columns='year')
plt.figure(figsize = (20, 5))

plot = sns.heatmap(heatmapData, cmap='RdPu', xticklabels=100, yticklabels=False)

plt.xlabel("Year in AH")
plt.ylabel("Sahih Muslim                  Sahih Bukhari", verticalalignment='center')
plt.title("When did they write commentaries on the Sahihayn?")

fig = plot.get_figure()
fig.savefig('HeatMapCommentatorsSahihayn40Floruit.svg')

Yup it’s that simple: we import a few libraries, we read the .csv-file we just created into memory (notice I used ‘df’ for that; it is a convention in data science to do so, it stands for ‘data frame’), we tell Pandas what the different columns are for, we tell Matplotlib we need a sizable canvas, we tell Seaborn to paint a heat map on that canvas out of the data frame, we give the canvas a few more things like labels, and we save it to disk. Done!

Wait.

No we’re not done just yet.

Now we need to interpret our results!

Comparing Sahih al-Bukhari and Sahih Muslim, an interesting lesson emerges from the data, something that would have definitely eluded me from simply browsing their commentary listings in al-Hibshi’s catalog. As it turns out, Sahih Muslim (the bottom one) was decidedly more popular early on. And then, around the year 800AH (1400CE) this abruptly changes to Sahih al-Bukhari and from then on never changes back. Sahih Muslim did keep receiving some commentaries, but is far outnumbered compared to Sahih al-Bukhari. The concentrations in the 9th century and 13th century are undeniable too.

These few insights that the data is giving us merits closer inspection. From this ‘distant reading’ perspective, we now ought to return to a ‘close reading’ method, picking up books from specific periods in search for an explanation. And perhaps the things we find along the way invite us to make another data set and perform some computations.

One thought on “Your first computed visualization, or, how to use Python to understand the reception of Bukhari and Muslim Hadith collections

Leave a comment