Internet Librarian Keynote: Google: Catalyst for Digitization?
Wed 26 Oct 2005, 1:18 pm
This was an entertaining session, set up as a confrontation, but really Roy Tennant and Richard Wiggins were, I believe, presenting different facets and perspectives on similar goals and desires. Wiggins was looking more at the possibilities and promise, with Tennant reminding us of the problems and pitfalls still surrounding Google’s digitization efforts. Both are great speakers, and Adam Smith from Google was again showing goodwill and a good sense of humor. Click through for the play-by-play.
Technorati tags: il05, google, digitization
Rich Wiggins
4 yrs. ago thought we needed a federally-funded project to digitize the content of LC.
Typically digitization projects are “cream of the crop” or “cherry picking.” Why not have a truly ambitious project: all the content of a large research library (or all the books, or all the text in all the books).
How much space? Depends on what you are measuring (just text?). What about format/compression?
Text-only of all unique print items in LC: estimates 20 terabytes.
Getting so inexpensive (disk space, digital imaging, broadband delivery, even labor). Technology improving–get away from the flatbed.
Cost per page for JSTOR: less than $.40/page with OCR and correction. Reasonable to assume we can get to $.05/page.
Can we digitize everything, but no OCR it until someone looks at it(!)?.
Using commodity disks and cheap RAID arrays (Brewster Khale & Google doing the same thing there).
Axiom: if it is worth keeping the item in the collection, it is worth the one-time cost of digitization.
Barriers:
- Once it is digitized, can we legally deliver it? (Some) authors and publishers say no.
- Attitude: let’s just digitize the good stuff (wouldn’t it be fun to have library committee meetings decide what the “good stuff” is?)
Benefits:
- Preservation
- Access
- Improving digitizing technology
- New standards (open XML)
- Force the issue of large-scale rights management
Conclusion: Think big! Let’s build a digital library that is an entire library. Draws parallel with JFK’s moon challenge: do it not because it is easy, but because it is hard.
Why trust Google? They are smart, agile, innovative, show no fear, have enough money to take on Disney and Pat Schroeder, and they won’t do it alone (as Google’s competitors wake up and say “why aren’t we doing this?”)
Roy Tennant Google: Catalyst for Digitization? Or Library Destruction/
Trying for a light tone, but really believes that more/easier access is better, and there is room for many players in this space.
In honor of Halloween: “Google? The Devil or Merely Evil?”
“Scary Monsters”
- Google trying to shield their activities under fair use may destroy it for us all.
- Closed access to open material: Google print doesn’t show that there are many available versions of public domain titles, instead just showing publisher’s current versions, and don’t link to the library, just to “buy this book.”
- Blind, wholesale digitization: large research collections are not weeded by policy (to try and improve ARL ranking) “Blind wholesale digitization [of unweeded research collections] is no more a good thing than buying books based on color.” Easily, freely available, dated, inaccurate information will trump newer info just by circumstance.
- Ads: How long before we see ads for antidepressant meds next to Hamlet?
- Secrecy: agreements with libraries have been kept (largely) secret. Michigan did reveal after FOI challenge. Rumors indicate that Michigan has the best agreement from the library perspective, while others are eager to agree to less-favorable terms (but we don’t know)
- Longevity: What to Google, Enron, and WorldCom all have in common? They are/were all publicly-traded companies motivated by profit. Harvard Library is 400 years old, Google is 7 years old. Whom should we trust with our intellectual heritage? Libraries (like, duh).
Adam Smith, Project Manager for Google Print
Welcome comments and criticism to make their product better. They release things quickly, which can take people by surprise at times.
Trying to make information more discoverable by more people.
As ambitious as Google’s plans sound, it’s only a small piece, and they welcome the activity from other companies and institutions.
Q. Tell us about the scanning robots? Smith: Rumors. Publisher scanning is destructive process. Library scanning is nifty new automated technology that he is not at liberty to discuss.
Q. Privacy issues? Smith: all Google products are governed by Google’s privacy policy.
Q. Is it true that a library is asking for only manual page-turning in the digitization? Smith: no comment. Stephen Abram: at Internet Librarian International, Oxford volunteered that they had asked for that.
Q. Could or should Google have done something different when doing the print for libraries? Suggests that earlier disclosure of details could have staved off some controversy. Smith goes over the snippet policy and display for in-copyright materials.
Tennant: publishers are looking at copyright in a very literal way in terms of “making copies,” while Google holds that the making the copies is not the important part, it’s the sharing and distribution.
Wiggins: Hopes that the IP lawsuits for Google go well, and break through to a more modern copyright system.
Q. Is Google working on a better display for better browsing of search results? Smith: right now, the short-term goal is getting more books in the system. Once they get the large amount of content, we can experiment with better ways to find and use that information down the road.
Wiggins: what you are really asking is “does PageRank work well as BookRank,” and I’d say no. Maybe the solution comes with the social networks.
Q. Discovery and retrieval need to go hand-in-hand. Will Google Print become a great way to discover books that you can’t get to? Smith: we already work with OCLC Open WorldCat and many other partners. There is no solution to that yet.
Wiggins: Google could be building a catalog to the world’s largest Carnegie library, and people are complaining that they aren’t building a bus system to get us to the library.
Q. Who decides what snippets get displayed when a book is discovered? Smith: Things are too early to discuss.
Q. (from Liz Lawley) Microsoft research does this kind of thing all the time and publishes the research on a public Microsoft Research site. Google is much more secretive–how do you reconcile that with your stated desire to share everything. Smith: I may not be the person to discuss that. I have responsibility for this project, not these policies. Lawley: don’t you get frustrated that you always have to say “no comment?” Smith: This is the first time I have been publicly asked.

Is Michael Gorman’s term up yet?
Just as I was beginning to forget about ALA President Michael Gorman, he shows up in my aggravator, er, aggregator again. First with the quotes in the Wall Street Journal article (see also the commentary on ACRLog and CopyCense [via…
Trackback by See Also — November 7, 2005 @ 11:04 pm