In the past week, Google has announced both its acquisition of ReCaptcha, the main company providing “captcha codes” to websites around the world, and a new deal with On Demand Books, the maker of the Espresso Book Machine, which prints books on demand. The proposals are aimed to make the proposed Google Settlement more “exciting,” but they only raise further concerns over Google’s monopoly power in the online book market.
The acquisition of ReCaptcha will allow Google to use ReCaptcha’s technology to improve its book scanning, something which ReCaptcha was already doing for the not-for-profit Internet Archive. reCAPTCHA started as a project of the School of Computer Science at Carnegie Mellon University. Back in May of 2007, the ReadWriteWeb blog explained how the use of captcha codes works to improve book scanning:
There are many projects underway to scan old books and other texts into digital format, but Optical Character Recognition software often falls short, especially with oddly stylized text or old, faded works. When the computer can’t figure out a word, a human has to step in and enter it manually. This means reading thousands of digital images of words and deciphering them — or essentially what you do when you solve a CAPTCHA image. The Internet Archive project scans 12,000 books per month and sends the team at Carnegie Mellon hundreds of thousands of images of words the computer can’t figure out, according to the Washington Post. These images are turned into CAPTCHAs for the reCAPTCHA program.
But if the computer doesn’t know the word, how will it know if the human entered it properly? The reCAPTCHA program gives users two words to decipher: one which it already knows, and one which is a mystery. Employing a certain level of trust, the computer assumes that if the user correctly identifies the word it knows, then he probably figured out the one it doesn’t correctly as well.
In the news about ReCaptcha’s acquisition on Google’s blog (posted by a co-founder of ReCaptcha and a Google representative), no mention is made of the Internet Archives’ project. Will Google pull the plug on the Archive’s use of Google’s newly-acquired intellectual property? Since ReCaptcha’s software is not entirely open source, but proprietary, the acquisition may have the effect of damaging an important and competing book scanning project.
The On Demand Books deal will make Google the supplier of some 2 million out-of-print and out-of-copyright books for use in On Demand’s machine. While everyone has the right to do what s/he wants with out-of-copyright books, the effect of this deal may be to choke off competition from other book scanning projects.
The Espresso Book Machine is a high-speed printer. According to the company, it can turn out a 300-page paperback in under five minutes and at low cost, about a penny a page. There are only five Espressos currently in operation in the U.S., but the deal will give ample incentive for book stores and libraries to lease the $100,000 machine — and to use Google exclusively as its source for texts. This will be irresistible, in fact, if the Google Settlement is approved: the Settlement grants Google, and only Google, publishing-on-demand rights for out-of-print but in copyright texts.
Books supplied by Google and printed by the Espresso will supposedly sell for around $8, of which $1 will go to On Demand (in addition to leasing fees) and $1 will go Google. Google says it will donate its commission to not-for-profit causes, but that highlights precisely the problem with monopolies: they can do what they want, make exclusive deals, give money way and price their competitors right out of the market.
Remember, however, that the Google Settlement isn’t just about Google. Equally culpable is the publishing industry whose main interest in the Settlement is to monetize the library system. As I stated in my earlier posting, no longer will publishers be limited to selling books to libraries across the US. Rather, the industry will provide time-limited licenses for which libraries will have to pay and pay again. Connections for free public libraries will be free, but should any patron wish to print out pages from in-copyright books, a per-page royalty will be charged. Contrast this with the current situation: When you go to the library and want to photocopy something, you pay only a photocopy charge, not a royalty to the copyright holder on top of it.
Moreover, the publishing industry will avail itself of the royalties accrued from Google’s exploitation of “orphan” books. Although the number of titles is “relatively” small, clearly Google and the publishing industry would like to divvy up these potential earnings for their own enrichment.The Google Settlement is a power grab that should be stopped.
 The locations are The Internet Archive (San Francisco), the University of Michigan Shapiro Library (Ann Arbor), the New Orleans Public Library, The Brigham Young University Bookstore (Provo, UT) and the Northshire Bookstore (Manchester Center, VT). Another seven Espressos will go into operation in the Fall of 2009. For a complete worldwide list, see http://www.ondemandbooks.com/our_ebm_locations.htm