Creating a digitization workflow for your library
As we deal with an increasingly electronic world, the demand for digitization services by our patrons only rises. Our patrons expect electronic access to materials to meet their legal information needs. Many libraries, however, are struggling to accommodate these requests as they feel ill-equipped to make the necessary decisions required for creating a digitization workflow. This brief article will discuss some of the issues considered in this process and share some of the experiences we have had digitizing in a large academic law collection.
The first issue a library needs to consider when embarking upon a digitization effort is whether these services will be done “in-house” or contracted out to third parties. If a library has only a small collection of materials to scan, it may be unwise to purchase digitization equipment as the return on investment will be too low. There are a number of companies that provide digitization services, some of them at a surprisingly low cost. A company in San Jose, for example, will scan monographs at a cost of $1 for every 100 pages. The reason, however, why the price is so low is because this is “destructive book scanning” meaning that the spine is removed from the book and is not returned to the library. Obviously, this would be an unacceptable choice for digitizing materials that have archival value. For those materials, non-destructive book scanning would be the only choice but the price is much higher as human intervention is required. For film and bound-book non-destructive scanning, companies such as Hein provide reasonably priced services to their institutional customers.
If your library has a collection of materials to scan that is large enough to justify purchasing scanning equipment there are a few things to remember. First, flat-bed scanners, even though very popular and cheap, are poorly suited for scanning monographs. The scans from a flat-bed scanner will often be low quality and pressing the book face down on the scanner may result in damage to the spine. The alternative is using a plenary/overhead scanner. This avoids damage to the book and often results in a much higher quality scan. The disadvantage, however, is that these scanners are considerably more expensive than a flat-bed scanner. You can purchase a basic unit from Atiz starting at around $6000 or you can make an overhead scanner yourself using two SLR cameras and a “DIY” kit for even less. To make these scanned documents searchable, you will need to purchase OCR software. Many libraries already have licenses for Adobe Acrobat which has a price of $300 (or $120 with an educational discount). There are other OCR products available (such as ABBYY FineReader) that have a higher cost but also promise a higher accuracy rate.
Once you have the necessary equipment, and have selected the materials to be digitized, you can begin creating a workflow of what metadata fields are going to be collected and how these materials are going to be stored and accessed. Many libraries are purchasing institutional software to host their electronic files. This makes it easier to create a controlled vocabulary and ensure consistency of metadata. Other libraries are adding these digital assets to their online collection using next-generation ILS software. Collaborative documents, such as provided by Google Docs, may be helpful especially when a digitization project includes multiple staff members. This helps to ensure that material is not scanned twice and helps to improve consistency.
Another advantage of purchasing your own equipment is that once your digitization project is complete, you can use it for document delivery and inter-library loan services. Patrons love being able to get book or practice guide chapters delivered to their email accounts. At Santa Clara, we purchased a student scanning kiosk (the Zeutschel Zeta) that has become immensely popular among our students. Students love being able to quickly and easily digitize a document and share it with classmates or professors. We have also noticed a decline in photocopier usage in the library as students are becoming increasingly tempted to make electronic, rather than print, copies of their documents.
 See http://www.atiz.com/usstore/. Other vendors include BetterLight, Digital Library Systems Group, i2S, Indus, Kirtas, Konica/Minolta, Microbox, Phase One, SMA, Tarsia, Treventus, ZBE and Zeutschel. For a review of plenary scanners check out Jody L. DeRidder, Overhead scanners: reports from the field. 29 Library Hi Tech 9 (2011).
 See http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000621011. Here’s a tip using Acrobat for OCR (optical character recognition). You have three options provided by Adobe, 1: Searchable Image, 2: Searchable Image (Exact), and 3: ClearType. If you are scanning archival materials that you do not want altered in any way select “Searchable Image (Exact)”. This will create an OCR text file and make the document searchable but will not straighten the pages nor alter the text. If you want to make a document that you can send via email consider selecting “ClearType”. This will replace the text with Adobe’s own font, straighten the pages, and reduce the file size. Obviously, you should not select “ClearType” for any court document or archival item.