Once the elation wears off over the prospect of being able to search nearly every book ever published in the United States — whether in- or out-of-print — or being able to buy books for which one previously had to scour rare and used book shops, reality sets in: the proposed Google Settlement does not offer many of the public benefits that its advocates claim. Mainly, the proposed agreement rewards the case defendant, Google, with a near-monopoly over digital books, and the plaintiffs — actually, the Authors Guild and the Association of American Publishers — with far more power than they imagined possible under the copyright law.
Here’s what the lawsuit is about: U.S. copyright law reserves to the copyright holder the right to reproduce copyrighted works. A digital copy is a reproduction. Despite this, Google audaciously embarked on a project to scan entire libraries in order to create a massive searchable database. The search code is Google’s intellectual property, but the database itself is comprised of the copyright-protected property of each and every publisher and author whose works were digitally copied. Google reasonably has argued “no harm, no foul:” no one would have access to the scanned books. Rather, it would merely permit the viewing of brief snippets — fragments of sentences — in response to search terms designated by the user.
“Snippet view” is pretty useless, actually: not enough to constitute research and often less useful than even a library card catalog entry. It has no commercial value, except insofar as Google can sell advertising on its search return pages, and no scholarly value, except to point the searchee in the direction of topical books s/he would have to acquire by other means: at a bookstore or library. The commercial and informational limitations of Snippet view provided ample motivation for Google to settle the case and secure the rights to display and exploit much, much more. (For those unfamiliar with Snippet view, click on the image, which shows a Snippet view search return for the phrase “copyright infringement.”)
Many people thought Google should litigate, not settle, in order to obtain a ruling that its “book search” project was “fair use” and thus required no permission from, or compensation to, the copyright holders of the scanned books. That defense, however, is far from guaranteed and even a little doubtful. In litigation Google runs a substantial risk of its project going the way of Napster, Grokster, Morpheus and Pirate Bay, but without the ability to make it a commercial venture. For its part, the Authors Guild and the Association of American Publishers didn’t want to lose on its claim of copyright infringement, but more important, it saw money to be made — and in particular a way to get royalty money out of libraries. As the Author’s Guild President, Roy Blount Jr., wrote on the Guild’s site in October 2008, “[f]ar more interesting for most of us — and the ambitious part of our proposal — is the prospect for future revenues.”
The Google settlement is a brilliant collusion between plaintiffs and defendant to create unprecedented riches for themselves in the digital book market. What began as a mere Google Book Search project has transmogrified into a Google Book Industry. Google will be handed a range of opportunities, most of them brand new, to “monetize” its database, while the plaintiffs will be given royalties and control over their distribution.
Spokespersons for both Google and the authors and publishers groups have painted rosy pictures of near-universal access to the information in the database, of sharing with others its newly created “rights,” and of the great educational benefits to students and scholars. Here is a sampling of those comments:
Richard Sarnoff, former chairman of the Association of American Publishers and co-chairman of the U.S. affiliate of Bertelsmann A.G., which owns Random House:
We have never said that the same kinds of outcomes would not be available to Microsoft or Amazon or anyone else who is willing to make the same investments. We have a road map to do it now.
Roy Blount, President of the Author’s Guild:
Readers wanting to view books online in their entirety for free need only reacquaint themselves with their participating local public library: every public library building is entitled to a free, view-only license to the collection. College students working on term papers will be able to point their computers to resources other than Wikipedia, if they’re so inclined: students at subscribing institutions will be able to read and print out any books in the collection.
David Drummond, Senior V.P. of Corporate Development and Chief Legal Officer, Google:
We believe strongly in an open and competitive market for digital books. As part of that commitment, today we announced that for the out-of-print books being made available through the Google Books settlement, we will let any book retailer sell access to those books. Google will host the digital books online, and retailers such as Amazon, Barnes & Noble or your local bookstore will be able to sell access to users on any Internet-connected device they choose. Retailers can also pursue their own digitization efforts of out-of-print books in parallel.
Yet a closer look at the Settlement Agreement as it now stands reveals stingier control, self-dealing contrary to the public interest and promises which may not even be legal to make.
1. For a one-time payment of $45 million, Google will be released from all past liability for the books it has already scanned. This money will be distributed to copyright holders (at around $60 per title). In addition, Google will pay the plaintiffs’ legal fees and an additional $34.5 million to fund and launch a “Book Registry.” When you hear about a “$125 million settlement,” keep in mind that that figure includes a hefty amount for the plaintiffs’ lawyers.
For future copying, Google gets a free pass — that is, it will not have to pay anything for the scans it makes until money begins rolling in on database licensing, advertising (e.g., on banner ads that accompany certain types of search results and book-display pages) and other revenue streams. Google will charge a 10% fee off the top for “operating expenses” and split the remaining revenue, paying 30% to itself and 70% to the Book Registry on behalf of authors and publishers. (For convenience, Google calls its share 37%: 10% off the top, and then 30% of 90%.)
Contrary to statements made by the parties to the lawsuit, these terms — the one-time fee and the free pass — cannot be granted to anyone other than Google, for the simple reason that neither the New York District Court which has jurisdiction over the case, nor Google nor the plaintiffs have any legal power or authority to do so. The settlement resolves no issue of law regarding whether digital copying for purposes of database searching constitutes copyright infringement or “fair use.” Furthermore, the plaintiffs are not even legally bound to accept the same or similar projects or offers thereof by anyone else. (A “roadmap” doesn’t create a legal right.) In fact, should any party wish to scan books on its own, it must run the same risk that Google ran of being sued. And without the luck of a new class action settled in an identical manner as Google’s, even a single copyright holder could shut down such a project.
2. The Book Registry will keep a database of copyright holders and their works and pay out royalties to authors and publishers. It will also have the power to negotiate prices and conditions of display and sale by Google, and to decide what constitutes an out-of-print book (based on availability and other issues) and what is an “orphan,” i.e., a book whose copyright ownership cannot be ascertained or whose owner cannot reasonably (according to the Book Registry) be found.
Google will license access and otherwise commercially exploit orphan titles, while the Book Registry will collect the income on such titles, but there is no provision in the current copyright law that permits the use of orphans just because their owners aren’t around to object. Nor is there any precedent for the parties to distribute the royalties to these works among themselves and the owners of other copyrighted works. Although Google insists that orphan titles, which probably number under 600,000, constitute a tiny proportion of the more than 10 million books it has scanned so far, if the potential for income were economically insignificant, the parties would never have included orphans in their negotiations.
3. Google will have the right to display the complete contents of out-of-print (but in copyright books) unless their copyright owners opt out. Owners of orphan works will, of course, not be able to opt out. For in-print non-fiction books, Google will have the right to display up to 20% of a book’s content, but not more than 5 consecutive pages, following which there must be at least two blocked pages. For fiction, the formula is different: up to 5% or 15 pages, whichever is less, adjacent to where a user lands on a given page from a search. To avoid spoilers, the final 5% of such books (or 15 pages, whichever is greater) will be blocked. That said, rightsholders will have several options for previews. One of them, for example, allows Google to display up to 10% of the book without an adjacent page limitation.
4. Google will sell time-limited subscriptions for database access to institutions (universities, colleges, corporations, think tanks, etc.), the price for which will be set by Google and the Book Registry. In theory, prices will be set according to the type of institution, but most important here is the creation of a massive revenue stream that never existed before. Until now, authors and publishers derived income from libraries only by selling them books.
Although paid subscription users will have the right to print out books whose contents are displayed in full, piracy concerns inconvenience the user with busywork that will do nothing to dissuade willful infringers:
With respect to copy/paste, the user will not be able to select, copy and paste more than four (4) pages of the content of a Display Book with a single copy/paste command. Printing will be on a page-by-page basis or a page range basis, but the user will not be able to select a page range that is greater than twenty (20) pages with one print command for printing.
For not-for-profit colleges and universities and public libraries, on the other hand, there are no guarantees. The Settlement Agreement provides that Google may provide public access service “free,” but there is no obligation. Not-for-profit colleges and universities which offer bachelor’s degrees will get no more than one computer terminal for every 4,000 full-time students. Other not-for-profit education institutions will get no more than one computer terminal for every 10,000 full-time students. For public libraries, the agreement specifies “no more than one terminal” per library building.
It will probably be more efficient to wait your turn at a terminal than requisition a book through inter-library loan, but copying any part of the book is going to cost you at these “free” locations: users at libraries and not-for-profit institutions of higher learning will have to pay a fee — set by the Registry but collected by Google — to print out any pages from books. This represents a further victory by the publishing industry over public libraries, which collect no royalties on behalf of authors or publishers when library users photocopy pages from hard copies. In essence, the Google Settlement transforms the “public good” of free public and not-for-profit libraries into a profit center for Google, authors and publishers.
5. Google plans a full range of commercial exploitation: Printing on Demand (POD), Custom publishing (e.g. coursepacks), PDF downloads, consumer subscription models, terminals in copy shops that charge a per-hour or per-user fee, and the like. Whether Google may authorize third parties to do any of this is not only legally questionable, but such offers may turn out to be economically uninteresting. Although Google has not said what percentages it will offer, presumably it will not be forgoing its 10% administrative share. Even if it offers third parties two-thirds of its income share — 20% — there may be few takers outside of Amazon and a few large bookstore chains. Even then, the chains may find that such meager profits don’t justify the use of a store’s time and resources. Google’s monopoly lies in its 10% administrative fee plus 30% share of revenues. No other bookseller, virtual or otherwise, will be able to come close to that unless they scan the books and enter into similar settlements themselves.
6. The privacy issues raised by the Amazon Kindle incident loom even larger under the proposed Google Settlement. Google will collect data not only on which search terms readers use and which books they look at, but also on how long a reader spends on each page. Readers will “store” their virtual libraries with Google. There is no limit to the kind and extent of data which Google may collect on users’ reading habits, how long it may retain it or what it may do with it, except as Google unilaterally determines. It is only a matter of time before Google begins responding to subpoenas for such information, whether it be in criminal prosecutions, matters of so-called national security or even cases of libel or divorce.
7. Google will have the right not to include books in its database for “editorial” reasons and “non-editorial” reasons. The former is not at all defined and the latter is ambiguously defined as “reasonable quality, legal, or technical concerns that are not solely editorial-based concerns.” For books excluded on solely editorial grounds, Google will advise the Book Registry, but there is no provision to make this information publicly available. Partial or non-editorial exclusions remain Google’s secret.
8. Will the Google Book Industry be a boon to students? Most students seem to do their research these days on the Internet rather than in the library, so anything to add to the possible sources of information could make some difference. However, the content display limitations for in-print works, or those out-of-print works whose owners opt out, are not research-friendly. They will often, if not usually, obscure information needed for a full understanding of the topic being researched. It may be good for quick quotations, but such limited displays are worthless for scholarship.
Furthermore, as Geoffrey Nunberg has pointed out in The Chronicle of Higher Education, the metadata Google has provided for the books scanned thus far can be wildly inaccurate. Mr. Nunberg refers to the metadata Google provides as a “train wreck: a mishmash wrapped in a muddle wrapped in a mess.” For example,
Start with publication dates. To take Google’s word for it, 1899 was a literary annus mirabilis, which saw the publication of Raymond Chandler’s Killer in the Rain, The Portable Dorothy Parker, André Malraux’s La Condition Humaine, Stephen King’s Christine, The Complete Shorter Fiction of Virginia Woolf, Raymond Williams’s Culture and Society 1780-1950, and Robert Shelton’s biography of Bob Dylan, to name just a few. And while there may be particular reasons why 1899 comes up so often, such misdatings are spread out across the centuries. A book on Peter F. Drucker is dated 1905, four years before the management consultant was even born; a book of Virginia Woolf’s letters is dated 1900, when she would have been 8 years old. Tom Wolfe’s Bonfire of the Vanities is dated 1888, and an edition of Henry James’s What Maisie Knew is dated 1848.
There are also classification and other errors (including missing and illegible pages), and some books may inadvertently be hidden from any direct search at all. For example, Google has scanned three volumes of the Victorian title, My Secret Life, but a search for it turns up only the third volume, designated as such not on the search result page, but only in display view. The “Other Editions” link on Volume 3’s display view does turn up volumes 1 and 2, but the user has to click on them to know, since the links don’t indicate that they are for different volumes and not different editions with identical content. Moreover, nowhere to be found is the fact that the work consists of 8 volumes. Only the length is specified — “2359 pages” — and you need to add up the pages in Volumes 1, 2 and 3 to know you have less than half the work. Finally, Google has (laughably) included as a subject category for the work “Literary Criticism,” which this book definitely is not.
Mr. Nunberg is optimistic that most of the “metadata,” informational and scanning errors will eventually be corrected, but currently they are built into the system and Google has no substantial financial interest in doing it better. That is one of the many disadvantages of a monopoly.
Whether the monopoly created by the Settlement Agreement is the fault of Google or the authors’ and publishers’ representatives, it is now everyone’s problem. The Agreement is wholly aimed at benefiting Google to the near-exclusion of competitors. At the same time, it affords minimal provisions for the public good, which provisions must be balanced against a de facto expansion of control by copyright owners on what libraries can offer the public without incurring royalty obligations. Many hoped this case would be settled in the public interest, but the proposed Agreement carries the ball in the opposite direction.
 Helft, Miguel, “11th-Hour Filings Oppose Google’s Book Settlement,” New York Times, September 8, 2009, http://www.nytimes.com/2009/09/09/technology/internet/09google.html
 For further legal failings of the Google Settlement, see James Grimmelman, “How to Fix the Google Book Search Settlement,” Journal of Internet Law, April 2009, http://works.bepress.com/cgi/viewcontent.cgi?article=1022&context=james_grimmelmann
 Geoffrey Nunberg, “Google’s Book Search: A Disaster for Scholars,” The Chronicle Review, August 31, 2009. http://chronicle.com/article/Googles-Book-Search-A/48245/
The proposed Google Settlement agreement can be found at http://www.googlebooksettlement.com/r/view_settlement_agreement