Never Trust Al-Shamila Search Results☝️

In this post our Arabo-Islamic Studies editor Maxim digs deeper into searching full-text Arabic corpora, comparing Shamela with Elkirtasse. He shows us that we should not blindly trust the search function of a repository, as you might rely on incomplete search capacities and miss out on important sources.

…أمّا بعد

Following up on my previous post  here. I had a lot of questions and letters from colleagues and others (thank you all for your kind words) about the search results of al-Shamila. 

In short, search results in al-Shamila are simply 💩. NOT ONCE TRUST THEM ! It is easier to show you rather than to tell.

Let’s pick a random book, let’s say al-Bidāya wa al-nihāya  by b. Kathīr. Now lets search for the word “khamr” (wine), for no particular reason, on https://al-maktaba.org and lets see what we get:

Screenshot 2018-11-30 at 18.44.06.png

 

Ok, 28 results, you already assume that this is wrong because of the title of this blog. The search results looks even more strange if we search the web page itself :Screenshot 2018-11-30 at 18.48.45.png


If we look for it on http://islamport.com, a Shamela clone, we get this:Screenshot 2018-11-30 at 18.59.36.png

Islamport suggests there are only 23 hits of the word ‘wine’. How can we verify if it is 28, 23, or something else? Searching in google with advanced options does not work on a single book of Al-Shamila , although its pretty fast and good with the whole website: (if you found a work around please share down in the comments 👇).

Advance.gif


The same applies to http://islamport.com :

١٢٣١٢٣١٢٣.gif

As for the web page http://shamela.ws it just confuses me!  Honest, what is going on there? All I get there on my native Safari browser is this:shamila.gif

And on Chrome for Mac is the same. Its just 🦀 :

Chrom.gif


Groooupy, another host for the Shamela corpus, is just a joke really:

شسيب.gif


Ok, you will say: “How about the Shamila application on windows” ? Lets see: shamila windoes.png

Does this confirm there are 28 results? You might say: Max “use al-Baḥith, a separate-search-specified-application that indexes the books of Shamila application (on your windows machine) and makes the search better and quicker”. Yallah, lets see :

 000 max.gif


We might get confident now that there are indeed only 28 instances of the word ‘wine’ in Ibn Kathīr’s al-Bidāya wa al-nihāya. Now let us see the real results by using elkirtasse application. Again, this is  why I was telling everybody that they should use it:I told you .png

 

223 hits. Almost 200 more hits than Shamela could offer us!

Perhaps the most reliable (but also most labor-intensive) way of searching is to convert the book to plain text and use a simple text editor. Let us double check with the same book that I downloaded from al-Shamila and converted it to a .txt file:

فثءف.gif

Conclusion: ditto. 223 hits. In other words, while the text corpus of al-Maktaba al-Shamela is unrivaled, its search engine is not to be trusted.

Looking forward to your comments here 👇. You can also message me on my account on Twitter, and Facebook والسلام .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: