Scanning a book by hand is time-consuming. However, there are good reasons to keep scanning books. Many books are only available as hard copy, and scanning is one way to obtain a digital copy of it. Even when a book is also available digitally, it may still be necessary to revert to scanning the hard copy, as we shall see. The principle itself of wanting a digital copy is fundamental to this blog and will not be called into question here. Instead, in this post I shall reflect on why scanning is one of the better strategies to obtain digital copies.  In this question, the following factors play a role:

  • current state of technology and its future
  • current strategy of publishers and its future
  • enforcement of intellectual copyright laws and its future

It is only recently that the bigger copy machines have the function to scan to e-mail. It is only with that functionality that scanning books has become within a hand’s reach for anybody. Not so long ago, we had to use flatbed scanners. These were slow, hard to handle, and produced poor results. The big copy machines from Xerox or Océ produce fine results; file size is relatively low, image quality is good, and it takes 7-9 seconds per two pages. Of course, technology marches on and we can only expect better, smaller, and faster results. You may be inclined to conclude that waiting would be better then. But the technology as we have it now as already well surpassed the bare minimum. With the speed we have now, a 400 page book could be scanned in about half an hour. This is a fair time; it would maybe take more time to go to the book store and buy a hard copy. Especially when the book is rare and must be ordered. In that case, it could take days before you own the book. Spending 30 minutes behind the copy machine is therefore an investment that is well worth its time.

Much of our ability as scholars to use digital resources depends on publishers. Big article databases, like Jstor, offer their resources in PDF. This is a great relief. However, they are noticeably reluctant in this. Jstor, for example, tries to guide the user in the direction of the image-per-page interface, where all we see is a picture of one page of the article, and we have to click to see a next page. It is only by clicking on a certain button, and then by agreeing to their terms of use, that we get the PDF. Moreover, what we get on our computer is a file with a filename made up of numbers, forcing us to once again put in effort to change it to something meaningful. Nonetheless, this is obviously vastly superior to scanning an article by hand.

For books, things have taken a turn for the worse. Of course there are promising developments. For example, Brill Publishers is moving more and more into digital publishing. However, bad precedents have been set. Digital Rights Management technology looked like one ugly invention, allowing access to a file for a certain time or under a certain password. This is tedious and tiring and in general frustrates the workflow of students and scholars. However, recently publishers have found far more effective ways to frustrate our workflow. DRM technology still gave the whole file to the user. Now a new trend is emerging where companies are chopping up the book into its pages; only giving one page at a time. This makes it nearly impossible to obtain a private copy. I am specifically thinking of the company called EBL – E-Book Library, a subsidiary of ProQuest. For some reason, this company seems to be growing rapidly. They feature an awful interface with a slow response time and extremely limited navigation options. I can only hope that EBL changes its course or that librarians around the world will realize that its product is utterly rubbish.

This brings us to our final point, the legal aspect. In a perfect world, we would not have to scan and everything would already be digitally available. In fact, Google tried to set a step in that direction with its massive digitalization project. However, they soon incurred the wrath of publishers and authors and were forced to hide the digital copies for those books not in the public domain (that is, on which copyright applies). They are still there in Google’s database, they just don’t show it to us.  I have noted in an earlier post other acts of legal aggression. In other parts of the Internet we see similar unpredictable, behavior. We can think of the aggregation of data by the US intelligence agency NSA and the hot debate it stirred. We can think of Bitcoin and Tor, which combined allowed the relatively free and unharmed dealings in drugs on The Silk Road. Bittorent website The PirateBay knows its ups and downs, being sued and/or blocked in several countries. A bit further back in history but equally relevant is the story of Napster. All these things were going on seemingly invincible, yet in the end they were all taken down by use of legal force. When it comes to technology, it is a most precarious situation we live in. It could very well be that we are experiencing the ‘Wild West years’ of the Internet, where seemingly everything goes. It may not be like this forever. Case in point: where YouTube started out ad-free, it is showing more and more (and more) ads. What if the big publishers force Océ and Xerox to get rid of the scan-to-email function? What if Jstor decides it not longer will supply PDFs of entire articles? What if is forced to shutdown? These cases seem unlikely, but their possibility is definitely not 0.


Therefore, I must conclude, it makes sense to invest some time in scanning material that you need for your personal, academic use.

One thought on “To Scan or Not To Scan

  1. Hi there, I assume that you know of Library Genesis, don`t you? The idea to scan whatever is in reach is a great idea, but do you know Library Genesis at Check that out first. It may be that some of the books are already there.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s