Archive for October, 2005

What’s New In Search Engines

ITI Bloggers October 26th, 2005

The “Greg and Gary Show” went on this morning, as Greg Notess and Gary Price discussed the latest developments in search engines. They concentrated mainly on the “Big Four”—AskJeeves, Google, MSN, and Yahoo!—but they also briefly covered some of the lesser known search engines. Their slides with all the links are here.

Greg stressed that it’s important to look at several of the search engines and said that if you are only searching one of them, you are not doing a comprehensive search. Here are a few of the latest search engine features:
AskJeeves: It has made many significant improvements and is vastly improved over what it was several years ago. The search results page shows a number of very useful links to help expand or narrow the search. Their cached page carry a date and time that the page was cached—a very useful feature. One complaint about AskJeeves has been the large number of ads on their pages, but they have recently decreased the number.
MSN: It shows you the time of caching (date only). A useful feature is free access to the Encarta encyclopedia. In common with some other engines, MSN has a Virtual Earth page for displaying maps. A “Site Builder” button allows the user to choose advanced features and have the command automatically added to the search box, which is good for people (like me!) who never can remember the syntax of many of these features.
Yahoo!: Cache pages have a link to the Wayback Machine. Tabs on the search page allow the user to see many “verticals” or related searches. A limited amount of content from subscription databases is available in search results. Yahoo! now has a page to search blogs and also link to Flickr images. Yahoo! Mindset allows sliders to be set to control the importance of various characteristics of products when shopping. And results can now be sent to a cell phone, indicating Yahoo!’s recognition of the growing importance of mobile computing.
Google: Stock prices appear, if they’re available on results pages for company searches. Google says it has a blog search capability, but they are really searching RSS feeds, not the entire content of blogs.
A9: Owned by Amazon, A9 has a number of features derived from the Amazon site, such as “Search Within the Book” (which has data from more publishers than Google Print has and which also can show the 100 most frequently used words in the book to form a rudimentary concordance—a potentially useful feature). The user can indicate which types of content (books, images, etc.) to include in the search. A9’s map and local search feature has the very useful capability of browsing images of buildings along a given block of a street.
Exalead: The only search engine that allows true proximity searching and truncation. It also features automatic display of related search terms.
Gigablast: Provides a link to the Wayback Machine and also allows subset searching of retrieved information, including domains and paths.
Rollyo: Rollyo.com (short for “roll your own”) uses Yahoo!’s database, but incorporates subset searching, which Yahoo! does not.
RedLightGreen: Free access to the RLG Union Catalog.
Topix: A huge database of news, accessing 14,000 sources and organizing the data into over 200,000 topic groups. An RSS feed is available.
Findory: A personalized news search engine.

Clearly, there’s a lot going on in the search engine world, and keeping up with all the changes is a huge job. As Gary mentioned, sometimes changes and enhancements appear, only to disappear shortly afterwards. We can thank Greg and Gary for their Herculean efforts to keep us all on top of this rapidly moving field.

Don Hawkins
Columnist, Information Today



Email This Post To a Friend Email This Post To a Friend

IL05 Tops on Flickr’s Hot Tags List

ITI Bloggers October 26th, 2005

Whoohoo! The IL05 tag on Flickr just just hit the Hot Tags list on Flickr! As Steven Cohen would say, "Librarians rock!" Or at least we are prolific bloggers and Flickr users! Bloggers are using the IL05 tag for both both Flickr and Technorati posts. It’s turned out to be a great way to see what’s happening around the conference from many different vantage points.

Nancy Garman
ITI, ngarman {at} infotoday(.)com
Technorati/Flicker Tag:


Email This Post To a Friend Email This Post To a Friend

Suggestion for IL06

ITI Bloggers October 26th, 2005


Overloaded electrical outlet! We’re working on more power strips for IL06 scheduled for Oct 23-5, 2006 in Monterey.


IL attendees clustered around an electrical outlet. Need juice to blog!
Jane Dysart, Program Chair


Email This Post To a Friend Email This Post To a Friend

Google’s Adam Smith

ITI Bloggers October 26th, 2005


Adam Smith frequently stepped to the end of the speaker’s platform this morning when he answered questions. Posted by Picasa

Dick Kaser
ITI V.P., Content




Email This Post To a Friend Email This Post To a Friend

Dualing Keynotes…And a Third

ITI Bloggers October 26th, 2005

“Google Print: Making the Virtual Library Real” is the title of Rich Wiggins’ half of this morning’s dual keynote. Great slide on why Google should never be bought by Microsoft: Clippy might suggest Encarta instead of Wikipedia.

This is part of an extended conversation that Rich and Roy Tennant have had over many years. Rich thinks that digital library projects have been about the cream of the crop, things that are easy to digitize. I have to agree. It’s always seemed to me that most digital library projects are really about digitizing a collection, not an entire library. Rich now asks, “Why not a truly ambitious project?” It’s hard to weigh a virtual library. The smallest public library has more content than most “digital libraries.” How many bytes are in the Library of Congress? It depends on how you measure. Ask about resolution, color depth, format, and compression choices. Costs are going down. Disks are cheap, as is digital imaging, broadband delivery, and labor. If you just do text-only rendition (ASCII), you ‘re talking about 17 to 20 terabytes. Lots of cost estimates out there. Rich thinks it could come down to a penny a page, maybe a nickel. We spend more money cataloging and putting a book on the shelf than we do in acquiring it. One option is to digitize everything, but only OCR when book is requested.

Major barrier is rights management. The paradox of latent value: Obscure titles sit on a shelf and doesn’t deliver royalties, but authors still object to digitization. It’s cheaper to digitize everything than to figure out what’s the “good stuff.” The Google digitization project enhances preservation. Access is another benefit. He thinks the technology will improve. New standards: Open, XML-based. This will force the issue of large scale rights management. Fair use is a balance. Many virtual library projects suffer from a paucity of real content. Let’s think big, let’s build a virtual library that’s really a library. We should do this because it’s hard.

Google’s print vision is clearly one that Rich buys into. He wants the digitization of the most important books in the corpus done by a forward thinking company rather than the government. It shouldn’t be a TVA project. Why Google? Looks like a love fest to me. Google, says Rich, is smart, agile, innovative, show no feat, too young to be afraid, worth $100 billion, and they won’t do this alone. Microsoft’s in the game, as evidenced by last night’s announcement, blogged below by Paula, but only for a few hundred thousand books. Compare that to the billions that Google’s digitizing. “Think Big, Bill” says Rich.

Now it’s Roy’s turn. He’s titled his half of the talk “Google: Catalyst for Digitization Or Library Destruction?” Great graphics on the slides. He thinks digitization is great. Is Google the devil or merely evil? He’s talking about scary monsters. The first one is copyright. Libraries have been shielded by fair use. Now fair use is in play in the court. This could be bad for libraries if the court’s decision comes down on the wrong side. Second scary monster: Closed access to open material. Google Print won’t tell you about public domain access to books republished by other publishers. No link to library, only to buy book. Third scary monster: Blind wholesale digitization. This isn’t such a great idea. Large research collections aren’t weeded because libraries are judged by the size of their collections, even if it’s crap. “Blind, wholesale digitization is no more a good think than buying books based on color.” He’s quoting himself here.

Fourth is ads. Content can bring eyeballs to Google and to Google ads. How long before we see ads for antidepressant medication next to Hamlet? Number 5 is secrecy. The Open Content Alliance and University of California’s agreement was sent to LJ and Information Today yesterday by Roy. Rumor is that University of Michigan has the best agreement with Google, but we don’t know. Fifth is longevity. Like Enron and WorldCom, Google is a publicly traded company motivated by profit. If Rich’s comments were a love fest, Roy’s are definitely a Google bash. He’s going into some comparisons about what Google has in common with libraries. All he can come up with is that we’re both on planet Earth. Google is 7 years old, but Harvard Library is nearly 400 years. Who do we trust with our intellectual heritage? Libraries or Google? Roy says, “Libraries (like, duh)”

Now Adam Smith from Google is invited on stage. His remarks are very brief, basically a feel-good Google loves libraries and its digitization program will help us. Question from the audience, “Tell us about your scanning robot.” It’s a rumor, according to Adam. “Are you scanning manually?” No comment. Someone else pushes the question, so Adam comes up with a partial answer. There’s two scanning processes, one is destructive (that’s for the publisher program, where the publisher knows the book will be cut apart and has backup copies), the other is non destructive (that’s the one they’ll use for libraries and how they do it is proprietary). “What about privacy? What about the cookies from downloaded books?” It will be part of the normal privacy policy of Google.” Is there manual page turning for the library scanning project?” Google won’t comment. Ron Milne said last week at Internet Librarian International that Oxford’s were being manually turned.

At the outset of Google Print, Google said what they were intending to do with snippets, but the press romanticized the program and misstated its purpose. Google will respect copyright. There’s a full text index, but the display for in-copyright books is only three snippets. That’s all you see. You don’t get the entire work. Adam says it’s tremendous from a discovery perspective. He also points out that the index today is almost entirely from publishers not libraries. And publishers have negotiated the copyrights so that this is not a violation.

Question about search results display. “We’re limited to the 10 to 20 docs on a screen. How many items can one see at a time? Is Google looking at displaying hits in any other way? Adam says they’re focused on getting more books into Google Print. “Once you have a large collection, then you can figure out how people interface with information. Then you can look at different interfaces.” How to make information more useful. Rich thinks that page rank won’t work well as book rank. He wants knobs and dials. It will be interesting to see if Google does this.

Google’s digitization of library books may become an easy way to discover books you can’t get. “What steps is Google talking or fantasizing about to connect readers with library books. Adam notes that the companies are working with OCLC for WorldCat, and with SirsiDynix to link people up with a library. Outside the U.S.,they’re working with national libraries. “We are working with publishers to provide greater access.” Publisher program will also evolve. Rich says Google is building the world’s largest Carnegie library, but people are complaining that they’re not building a bus system to get you to the library.

Who decides what snippets get posted? It depends on what your search terms are.

Liz Lawley comments that Microsoft is doing sliders, which are similar to the knobs and dials Rich mentioned. Microsoft research publishes. You can go to site and read the papers. Google is extraordinarily secretive. ”How do you reconcile this with the notion you’re doing this for the good of humanity?” Adam ducks the question, saying he wasn’t around when Google set the policies so he doesn’t know.

Marydee Ojala
Editor, ONLINE: The Leading Magazine for Information Professionals




Email This Post To a Friend Email This Post To a Friend

Keynote Q and A

ITI Bloggers October 26th, 2005

Dale Vidmar, Southern Oregon University, asks a question of this morning’s keynote panelists Roy Tennant, Rich Wiggins and Adam Smith. Conference Chair, Jane Dysart holds the roving mic. Posted by Picasa

Dick Kaser
ITI V.P., Content




Email This Post To a Friend Email This Post To a Friend

Calculating Minds…

ITI Bloggers October 26th, 2005


Click over to a Flickr collage, created by Michael Stephens, for more glimpses of the giant calculators that were ITI’s thank-you gift to this year’s Internet Librarian speakers. These 1970’s icons are well on the way to becoming a cult item among certain bloggers, as you’ll see from the comments they’ve posted (also at this link). The Flickr tag is "librarians with giant calculators."

Nancy Garman
ITI, ngarman {at} infotoday(.)com
Technorati/Flicker Tag:


Email This Post To a Friend Email This Post To a Friend

Search Engine Day

ITI Bloggers October 26th, 2005

If yesterday was “Blog & Wiki Day”, today is “Search Engine Day”. With a keynote by Roy Tennant and Rich Wiggins on “Google: Catalyst for Digitization? Or Library Destruction?”, followed by the “Greg and Gary Show” by Greg Notess and Gary Price on Search Engine Update, and extending all the way to the closing endnote by Steve Abrams on “Competing with Google: Library Strategies”, you can learn everything you want to know about search engines.

Here are Greg and Gary preparing for their session.

Don Hawkins
Columnist, Information Today



Email This Post To a Friend Email This Post To a Friend

The Man Behind the Scenes

ITI Bloggers October 26th, 2005



Egads, a mouse! What’s this? A microphone?

Bill Spence is the dapper man you see slipping silently from room to room on winged feet — getting speakers’ slides loaded, and making sure the microphones, computer projectors, and Internet access are working correctly. If you hear a speaker call out "Where’s Bill?" in the middle of a session when something goes down, this is the Bill they’re looking for! He makes the speakers look good (at least when they aren’t fooling around putting large calculators in front of their faces), and keeps the conference sessions running smoothly.

Bill also collects all the presentations and gets them up on the web site after the conference. Check the Internet Librarian web site when you get home. We say two weeks, but he’s is likely to have many presentations up much sooner.

And of course Bill’s the Schmi-Fi guy! Thanks, Bill!

Nancy Garman
ITI, ngarman {at} infotoday(.)com
Technorati/Flicker Tag:


Email This Post To a Friend Email This Post To a Friend

Meanwhile, in San Francisco…

ITI Bloggers October 26th, 2005

While attendees in Monterey listened to the Tuesday evening panel musing over Google’s digitization project, a (rival) group called the Open Content Alliance was hosting its inaugural event in San Francisco. The OCA founding members include the Internet Archive (which will host the repository); Yahoo! Search; Hewlett-Packard Labs; Adobe Systems; the University of California; the University of Toronto; the European Archive; the National Archives (U.K.); O’Reilly Media, Inc.; and Prelinger Archives. (For background, see our NewsBreak, posted Oct. 3) Last night, during the party of about 400 people, a big shoe dropped—Microsoft announced that it was joining the OCA and committed to kick off their support by funding the digitization of 150,000 books in 2006.

According to Lisa Picarille, our reporter at the event, an “impressive group” of people mingled first during a cocktail hour and had a chance to look at the eight demonstration “booths” that were set up (showing the Internet Archive’s Petabox storage, Searchable Books/Flipbook, print-on-demand, and more). Then, the Internet Archive’s Brewster Kahle took the stage and showed this presentation, which detailed the OCA’s vision of an Open Library and showed a book scanning demo.


The scanner scans one page at a time at 500 dpi and costs approximately 10 cents per page. The scanner is essentially a cradle that holds a book at a 90 degree angle. The operator has to turn the pages. It takes anywhere from 30 to 60 minutes to scan an entire book – depending on the size of the book. “The quality is astonishing to me,” Kahle said. “It was a real challenge to get true color.”

Kahle said that 80 percent of the books published between 1923 and 1964 are out of copyright and those are the books the OCA will focus on first. Next the group will expand the project to include orphaned books, where the publisher and author can not be found. Next will be out of print works. Finally, the OCA is going to tackle in-print works. He called the overall effort “tricky but doable.”

The OCA is working with Lulu.com to create print-on-demand versions of the books that can be purchased for around $8. Users will also be able to listen to audio versions of the digitized books, provided by LibriVox.

The OCA partners each spoke briefly at the end of the presentation, with Microsoft’s announcement coming last. “We are excited to be the newest member,” said Danielle Tiedt, Microsoft’s General Manager of Search Content Acquisitions . “It’s a wonderful cause that is important to us. We are excited to make search more valuable by adding more content.”

In addition to joining the OCA, Microsoft’s press release announced that MSN Search will launch MSN Book Search, “which will support MSN Search’s efforts to help people find exactly what they’re looking for on the Web, including the content from books, academic materials, periodicals and other print resources. MSN Search intends to launch an initial beta of this offering next year.” Watch for a NewsBreak by Barbara Quint reporting the details of this news.

Paula J. Hane
News Bureau Chief
Information Today, Inc.
www.infotoday.com
phane {at} infotoday(.)com




Email This Post To a Friend Email This Post To a Friend
Back to InfotodayBlog.com Homepage


« Prev - Next »


143 Old Marlton Pike, Medford, NJ 08055-8750 | Phone: 609-654-6266 • Fax: 609-654-4309 • custserv@infotoday.com