Dualing Keynotes…And a Third

“Google Print: Making the Virtual Library Real” is the title of Rich Wiggins’ half of this morning’s dual keynote. Great slide on why Google should never be bought by Microsoft: Clippy might suggest Encarta instead of Wikipedia.

This is part of an extended conversation that Rich and Roy Tennant have had over many years. Rich thinks that digital library projects have been about the cream of the crop, things that are easy to digitize. I have to agree. It’s always seemed to me that most digital library projects are really about digitizing a collection, not an entire library. Rich now asks, “Why not a truly ambitious project?” It’s hard to weigh a virtual library. The smallest public library has more content than most “digital libraries.” How many bytes are in the Library of Congress? It depends on how you measure. Ask about resolution, color depth, format, and compression choices. Costs are going down. Disks are cheap, as is digital imaging, broadband delivery, and labor. If you just do text-only rendition (ASCII), you ‘re talking about 17 to 20 terabytes. Lots of cost estimates out there. Rich thinks it could come down to a penny a page, maybe a nickel. We spend more money cataloging and putting a book on the shelf than we do in acquiring it. One option is to digitize everything, but only OCR when book is requested.

Major barrier is rights management. The paradox of latent value: Obscure titles sit on a shelf and doesn’t deliver royalties, but authors still object to digitization. It’s cheaper to digitize everything than to figure out what’s the “good stuff.” The Google digitization project enhances preservation. Access is another benefit. He thinks the technology will improve. New standards: Open, XML-based. This will force the issue of large scale rights management. Fair use is a balance. Many virtual library projects suffer from a paucity of real content. Let’s think big, let’s build a virtual library that’s really a library. We should do this because it’s hard.

Google’s print vision is clearly one that Rich buys into. He wants the digitization of the most important books in the corpus done by a forward thinking company rather than the government. It shouldn’t be a TVA project. Why Google? Looks like a love fest to me. Google, says Rich, is smart, agile, innovative, show no feat, too young to be afraid, worth $100 billion, and they won’t do this alone. Microsoft’s in the game, as evidenced by last night’s announcement, blogged below by Paula, but only for a few hundred thousand books. Compare that to the billions that Google’s digitizing. “Think Big, Bill” says Rich.

Now it’s Roy’s turn. He’s titled his half of the talk “Google: Catalyst for Digitization Or Library Destruction?” Great graphics on the slides. He thinks digitization is great. Is Google the devil or merely evil? He’s talking about scary monsters. The first one is copyright. Libraries have been shielded by fair use. Now fair use is in play in the court. This could be bad for libraries if the court’s decision comes down on the wrong side. Second scary monster: Closed access to open material. Google Print won’t tell you about public domain access to books republished by other publishers. No link to library, only to buy book. Third scary monster: Blind wholesale digitization. This isn’t such a great idea. Large research collections aren’t weeded because libraries are judged by the size of their collections, even if it’s crap. “Blind, wholesale digitization is no more a good think than buying books based on color.” He’s quoting himself here.

Fourth is ads. Content can bring eyeballs to Google and to Google ads. How long before we see ads for antidepressant medication next to Hamlet? Number 5 is secrecy. The Open Content Alliance and University of California’s agreement was sent to LJ and Information Today yesterday by Roy. Rumor is that University of Michigan has the best agreement with Google, but we don’t know. Fifth is longevity. Like Enron and WorldCom, Google is a publicly traded company motivated by profit. If Rich’s comments were a love fest, Roy’s are definitely a Google bash. He’s going into some comparisons about what Google has in common with libraries. All he can come up with is that we’re both on planet Earth. Google is 7 years old, but Harvard Library is nearly 400 years. Who do we trust with our intellectual heritage? Libraries or Google? Roy says, “Libraries (like, duh)”

Now Adam Smith from Google is invited on stage. His remarks are very brief, basically a feel-good Google loves libraries and its digitization program will help us. Question from the audience, “Tell us about your scanning robot.” It’s a rumor, according to Adam. “Are you scanning manually?” No comment. Someone else pushes the question, so Adam comes up with a partial answer. There’s two scanning processes, one is destructive (that’s for the publisher program, where the publisher knows the book will be cut apart and has backup copies), the other is non destructive (that’s the one they’ll use for libraries and how they do it is proprietary). “What about privacy? What about the cookies from downloaded books?” It will be part of the normal privacy policy of Google.” Is there manual page turning for the library scanning project?” Google won’t comment. Ron Milne said last week at Internet Librarian International that Oxford’s were being manually turned.

At the outset of Google Print, Google said what they were intending to do with snippets, but the press romanticized the program and misstated its purpose. Google will respect copyright. There’s a full text index, but the display for in-copyright books is only three snippets. That’s all you see. You don’t get the entire work. Adam says it’s tremendous from a discovery perspective. He also points out that the index today is almost entirely from publishers not libraries. And publishers have negotiated the copyrights so that this is not a violation.

Question about search results display. “We’re limited to the 10 to 20 docs on a screen. How many items can one see at a time? Is Google looking at displaying hits in any other way? Adam says they’re focused on getting more books into Google Print. “Once you have a large collection, then you can figure out how people interface with information. Then you can look at different interfaces.” How to make information more useful. Rich thinks that page rank won’t work well as book rank. He wants knobs and dials. It will be interesting to see if Google does this.

Google’s digitization of library books may become an easy way to discover books you can’t get. “What steps is Google talking or fantasizing about to connect readers with library books. Adam notes that the companies are working with OCLC for WorldCat, and with SirsiDynix to link people up with a library. Outside the U.S.,they’re working with national libraries. “We are working with publishers to provide greater access.” Publisher program will also evolve. Rich says Google is building the world’s largest Carnegie library, but people are complaining that they’re not building a bus system to get you to the library.

Who decides what snippets get posted? It depends on what your search terms are.

Liz Lawley comments that Microsoft is doing sliders, which are similar to the knobs and dials Rich mentioned. Microsoft research publishes. You can go to site and read the papers. Google is extraordinarily secretive. ”How do you reconcile this with the notion you’re doing this for the good of humanity?” Adam ducks the question, saying he wasn’t around when Google set the policies so he doesn’t know.

Marydee Ojala
Editor, ONLINE: The Leading Magazine for Information Professionals

Comments are closed.