1/24/2014

Literally saving Books


I've been a fan of the paperless office concept for some time.

But people never seemed to apply it to books, until online digital libraries like those of Google books and Amazon started to grab headlines.

Regardless of the Copyright and Authors rights issues, there is an inherent 'fear of loss' and 'convenience' factor in having a personal copy backed up as a digital file.

To this end, since online sharing is, for this moment in history, effectively 'banned' we each have to choose how we deal with it.

Being from a generation with access to physical books, and being aware there is a lot of good reference material in them. I've sought out to back things up first by duplicating paper xerox copies of pages and chapters, then using digital scanners and software for archiving and turning those into files. Then turing those files into PDF files and finally OCR'd PDF files optimized for size and archival quality accessibility in some far distant future, in the PDF-A format.

I've even recently become interested in M-disc technology from Milleniata which proposes that with the right DVD media, files can be backed up that will last 1000 years or more.

Millenniata - Write Once, Read Forever

LG initally was the only company that produced drives with the power to scribe M-discs, as it takes a higher power laser to etch the information into a non-dye layer.

But Sony/NEC apparently also dabbled in "Archival Quality" DVD burners in the recent past

Sony Optiarc

After exiting the business NEC and then Sony left a fine drive with Overburn capabilities 5280S-CB-PLUS

And the Vinpower digital company licensed these drives to continue providing them through outlets like Amazon.com

They serve multiple purposes, aside from being able to Burn M-disc media, they can also copy traditionally discs like those for Games in consoles.

The Sony/NEC Optiarc alliance was founded on the idea to produce a more stable, predictable drive that could produce more reliably burned media and fewer badly burned discs or "coasters". They include a stronger direct drive, gear train and more closely monitored and regulated drive speed.

They are not cheap by todays standards, costing about twice what a $15 ordinary consumer grade burner would cost.

But back to "Saving books"

Converting, copying or saving a book implies its digitally scanned into a computer file format.

In order to do that the book has to be setup under photographic conditions suitable for a capture sensor to take an image.

In some scanners this requires de-binding, or literally destroying the book by cleaving the binding off the back of the spine to free the pages so they can be shoved into a sheet feeder, or laided flat one at a time on a flatbed scanner.

I've been down the sheet fed path with one book, one alternative to cleaving is to "de-glue" the pages from the binding, it is sometimes recommended to use a "heat gun" or "hair dryer" to melt the waxy glue and pull the pages free. But in my case I try not to collect lots of hardware I may never use again, unless absolutely necessary.. so I tried using what I already had on hand, a clothing Iron. I heated it up and slid it along the backside of a paper bound book, and it in fact did melt the glue.. and I could pull the pages free.

I still had to trim the pages however since it is nearly impossible to remove all of the glue, and I couldn't tolerate letting that glue get into the gears and rollers of an expensive sheet fed printer. It would simply ruin it and I would never complete the task.

To trim hundreds of pages, you need a near professional paper cutter made of a hard material like steel.

In my case this led to purchasing a discount paper cutter from Amazon.com


It is possible to find them "used" and on sale, which I was fortunate enough to do.

I was aware you can take the books to someplace like a Kinkos or Printing company with a professional paper cutter that would charge $0.50 to a $1.00 per book. But some print shops will do it in a less than satisfactory manner, ask questions, or even refuse on the basis of a possible act of piracy. Further you don't always have control over how much of the border well between pages will remain and you could end up loosing text or images.. so I choose to de-bind and trim the pages myself.

Another alternative is a "near edgeless scanner" like the Plustek Optibook3600 

I briefly owned one of these, but being a flatbed and optimized for the mass market and cost savings, it was just too slow. I couldn't imagine archiving a book having to lift and heave the spine and entire mass of the book up and down over long periods of time.

I kept a careful eye on the Open hardware designs Google had released for hands free automating the scanning of a book. But it appeared too ad-hoc and hard to reproduce, and then it took up so much space.

That left me with Atiz BookSnap option.

I had purchased a BookSnap at the beginning of 2009 but a family death occupied my time and set unused for a while.

A surprisingly innovative and useful device, it is essentially a bundled set of equipment and software to use a pair of Canon PowerShot cameras to acquire images, then post process those into PDF files.

Its somewhat intimidating from a lot of angles, but ultimately I concluded it was the correct path to take for most of my book archiving needs.

First it was intimidating because the BookSnap was not cheap, then PowerShot Cameras are not cheap and acquiring two of the same model was especially not cheap.

Add to this support for the product was not great as it had been recently cancelled because Canon had removed a feature that allowed the Canon PowerShot cameras to remotely download images over USB cables to a computer. This had been a standard feature for a number of years, and then was abruptly removed in a Canon driver "update" for the Cameras after they were released. This caused some confusing in the market and to the end user unless you watch what goes into the Canon drivers available for download. So basically you needed to "know" do not Upgrade your Canon drivers direct from the Canon website or up to the latest and greatest versions.

Canon apparently had shifted the feature sets around a bit, and PowerShots would no longer be able to be remotely used as capture devices. This feature does remain in the Canon driver sets for DSLR cameras, a different and much higher cost camera type, which they perceive as becoming more adopted by a less frugal middle class.

To documentation that came with the equipment was also not the smoothest or easiest to understand.

First there was the hardware setup and alignment of the cameras. For various reasons you needed to have the cameras equipped with SD cards and Batteries and then cabled to a computer over a long USB cable, the provided cables with the cameras and hub that came with the BookSnap were just too short.

Ultimately I learned the Canon G10s which I had, also had an optional battery replacement and tiny trap door to run a power cable out to external power bricks to avoid the inconvenience of rechargable batteries. And I learned Monoprice had "ballun" impedance protected and corrected USB cables, and a twin long distance USB repeater cable that took the place of the Atiz hub and delivered good USB signals at a long distance.

After all of that I had wires running everywhere, power lines for the cameras, usb lines for the data, and power cables for the over head lights. I found an online source for black tiny cable raceway with adhesive tape and a side grooved lock to allow running the cables down the side of the BookSnap.

Then it was time to deal with the software.

BookSnap comes with BookScan and Book Edit software, but its not labeled that way.

BookScan software is called "BookDrive Capture Control" software and changed names in later products to "BookDrive Capture"

BookEdit software was versioned from V3 to V6 as "BookDrive Editor Pro" and up until V4 was "locked" during activation to one computers hardware, so it could not be run on multiple machines.
In V6 of the software they adopted a USB dongle key for "enabling" the software wherever it was installed. However this was not available to me, and I was stuck on V4 as the last upgrade available for the BookSnap.

It took some time.

But I eventually figured out the work flow should be to capture the images whole cloth as jpeg or tiff files. 

The BookScan software allows for some configuration to autoname and save sequential files to Left and Right or Combined Left&Right folders of BookScan images. And some minimal "Cropping" of the images during capture (this isn't as important as more post processing including Cropping can be achieved in the BookEdit software)

Then the workflow continues after the BookScan capture session has ended and the BookEdit post processing session is begun.

Opening BookEdit, you find both a large central preview region for Left and Right pages, and a Left navigation bar to indicate which book is being worked on. Clicking the L or R or L&R buttons allows designating the file folders or "sources" of images for a "book". A wrench icon for the book opens the post processing tools and permits customizing how the pages for that book will be post processed "en masse" during a Batch session.

At the bottom of the Left navigation is a large ]> arrow key for kicking off the Batch session to process all the pages in all of the books setup in the Left Navigation column.

And finally a Left navigation "tiny" Export button, will open up another utility program for "binding" all of the post processed images for a "book" into a multipage pdf or multipage tiff file.

Within the export tool are options for choosing jpeg or tiff compression schemes or optimization of the book to render it a smaller size rather just a large concatenation of images.

However you might not want to compress or optimize too much until "after" running the multipage document through an OCR engine like the one in Abbyy Finereader Pro or Adobe Acrobat Pro to enable full text indexing. These can often also perform the final optimization step.

Once all of this was understood, I then began to become aware of the importance of "White Balance" and "Exposure" and "Focus" as well as "Image Stabilization" control.

The software with BookSnap manages a few settings, but things like the camera "Mode" which determines if Auto White Balance, or a Custom White Balance is applied during capture [are not].

Auto Focus can also be a mixed blessing in small quarters unless turned off and manually set, or manipulated to perform its function using something like manual control, or an optional laser beam to help it focus.

Atiz products also have a number of separate options that can be purchased, but I'm not sure any of them would work with the BookSnap or its software.

Normally for example the trigger options for capturing a set of pages with the cameras, is by tapping the keyboard "enter" key, or setting up an interval timer to auto capture a set of images every 2, 3 or 5 seconds. 

But an optional USB proximity switch could be installed on the frame such that when the V shaped transparent sled were brought down to flatten the pages, an "Enter" keypress would be sent to the BookScan software to initiate image capture. These aren't unique to Atiz however as they emulate an additional USB HID input device, a second keyboard, and can be configured to send different key strokes. One such option is a foot switch, or a big button photobooth or kiosk device.

I all it opens up some exciting possibilities.

Recently I became aware of a Kickstarter project by way of an Amazon.com offering for a manual scanning stand, called Fopydo - which stands for [ Foto Copy Document ] scanning stand.


Essentially it is a corrugated "black" plastic frame that normally resides flat, and which you can toss in a backpack or briefcase.

It is for all intents and purposes a portable "V-cradle" which as was mentioned before is a superior platform for conducting non-destructive scanning of books and other documents.

What caught my eye was a YouTube video of the device in use 



 


The BookScan portion of the capture would depend upon a cell phone, Android or iPhone, or could be a Canon Powershot or DSLR camera.

The raw images could then be returned to a computer and offloaded by USB or SD memory card transfer, and post processed in the same way as with the BookSnap, practically distortion free.

And since its so light, it might almost be equally likely you'd have it in your pack as you would your cell phone.

This struck a cord with me because I often visit libraries, labs or places where a quick access to a scanner or copy machine might not be available, and I'd prefer to have something a little better than just a hand held phone snapshot of a book or document.

It's lighter and less bulky than a tripod, and the cell phone is practically optimized for on the spot shooting with minimal adjustments for the environment.

The same BookEdit software that came with BookSnap should also be able to process these images, but even if that is not an option. Fopydo has a suite of software available for free on its website to download and use to post process the images. Then an OCR engine like Abbyy Finereader or Adobe Acrobat could be used to index and optimize it.

And another option for slightly distorted images could be to use Booksorber to "de-warp" the images




A more specialized software for taking cell phone images and de-warping and pearing them for storage.