In this post our Arabo-Islamic Studies editor Maxim digs deeper into searching full-text Arabic corpora, comparing Shamela with Elkirtasse. He shows us that we should not blindly trust the search function of a repository, as you might rely on incomplete search capacities and miss out on important sources.
…أمّا بعد
Following up on my previous post here. I had a lot of questions and letters from colleagues and others (thank you all for your kind words) about the search results of al-Shamila.
In short, search results in al-Shamila are simply 💩. NOT ONCE TRUST THEM ! It is easier to show you rather than to tell.
Let’s pick a random book, let’s say al-Bidāya wa al-nihāya by b. Kathīr. Now lets search for the word “khamr” (wine), for no particular reason, on https://al-maktaba.org and lets see what we get:
Ok, 28 results, you already assume that this is wrong because of the title of this blog. The search results looks even more strange if we search the web page itself :
If we look for it on http://islamport.com, a Shamela clone, we get this:
Islamport suggests there are only 23 hits of the word ‘wine’. How can we verify if it is 28, 23, or something else? Searching in google with advanced options does not work on a single book of Al-Shamila , although its pretty fast and good with the whole website: (if you found a work around please share down in the comments 👇).
The same applies to http://islamport.com :
As for the web page http://shamela.ws it just confuses me! Honest, what is going on there? All I get there on my native Safari browser is this:
And on Chrome for Mac is the same. Its just 🦀 :
Groooupy, another host for the Shamela corpus, is just a joke really:
Ok, you will say: “How about the Shamila application on windows” ? Lets see:
Does this confirm there are 28 results? You might say: Max “use al-Baḥith, a separate-search-specified-application that indexes the books of Shamila application (on your windows machine) and makes the search better and quicker”. Yallah, lets see :
We might get confident now that there are indeed only 28 instances of the word ‘wine’ in Ibn Kathīr’s al-Bidāya wa al-nihāya. Now let us see the real results by using elkirtasse application. Again, this is why I was telling everybody that they should use it:
223 hits. Almost 200 more hits than Shamela could offer us!
Perhaps the most reliable (but also most labor-intensive) way of searching is to convert the book to plain text and use a simple text editor. Let us double check with the same book that I downloaded from al-Shamila and converted it to a .txt file:
Conclusion: ditto. 223 hits. In other words, while the text corpus of al-Maktaba al-Shamela is unrivaled, its search engine is not to be trusted.
Looking forward to your comments here 👇. You can also message me on my account on Twitter, and Facebook والسلام .
There is a search feature that allows you use wildcards. If you just search the application with “خمر” it will only show you the term “خمر” as it appears without any proclitics or enclitics. If you look at the search box you will see a م with a checkbox, if you type the term “خمر” and uncheck this box you will find significantly more results.
Shakarullaha alladhi ittasalani bikum