In this post our Arabo-Islamic Studies editor Maxim digs deeper into searching full-text Arabic corpora, comparing Shamela with Elkirtasse. He shows us that we should not blindly trust the search function of a repository, as you might rely on incomplete search capacities and miss out on important sources.
Following up on my previous post here. I had a lot of questions and letters from colleagues and others (thank you all for your kind words) about the search results of al-Shamila.
In short, search results in al-Shamila are simply 💩. NOT ONCE TRUST THEM ! It is easier to show you rather than to tell.
Let’s pick a random book, let’s say al-Bidāya wa al-nihāya by b. Kathīr. Now lets search for the word “khamr” (wine), for no particular reason, on https://al-maktaba.org and lets see what we get:
Ok, 28 results, you already assume that this is wrong because of the title of this blog. The search results looks even more strange if we search the web page itself :
If we look for it on http://islamport.com, a Shamela clone, we get this:
Islamport suggests there are only 23 hits of the word ‘wine’. How can we verify if it is 28, 23, or something else? Searching in google with advanced options does not work on a single book of Al-Shamila , although its pretty fast and good with the whole website: (if you found a work around please share down in the comments 👇).
The same applies to http://islamport.com :
As for the web page http://shamela.ws it just confuses me! Honest, what is going on there? All I get there on my native Safari browser is this:
And on Chrome for Mac is the same. Its just 🦀 :
Groooupy, another host for the Shamela corpus, is just a joke really:
Ok, you will say: “How about the Shamila application on windows” ? Lets see:
Does this confirm there are 28 results? You might say: Max “use al-Baḥith, a separate-search-specified-application that indexes the books of Shamila application (on your windows machine) and makes the search better and quicker”. Yallah, lets see :
We might get confident now that there are indeed only 28 instances of the word ‘wine’ in Ibn Kathīr’s al-Bidāya wa al-nihāya. Now let us see the real results by using elkirtasse application. Again, this is why I was telling everybody that they should use it:
223 hits. Almost 200 more hits than Shamela could offer us!
Perhaps the most reliable (but also most labor-intensive) way of searching is to convert the book to plain text and use a simple text editor. Let us double check with the same book that I downloaded from al-Shamila and converted it to a .txt file:
Conclusion: ditto. 223 hits. In other words, while the text corpus of al-Maktaba al-Shamela is unrivaled, its search engine is not to be trusted.