Web Scale Information Discovery: The Opportunity, The Reality, the Future–An NFAIS Symposium

NFAIS held another one of its popular symposia on September 30, 2011: “Web Scale Information Discovery: the Opportunity, the Reality, the Future”.  There were about 80 attendees, 35 onsite and 45 virtually from as far away as Israel.  Generally used in academic libraries, discovery services provide users with single search box (Google-like) access to all of a library’s resources.  Using discovery services can greatly enhance access to a library’s collection.

This was an excellent overview of the current state of the discovery services market, featuring noted experts as well as representatives from the four current major players.  Not only was the current market described, but NFAIS-sponsored activities and a look into the future were also on the program.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


 The Opportunity 

Judy Luther

Judy Luther, President, Informed Strategies, opened the workshop with an overview of the discovery landscape.  (She had authored an article on discovery services in Library Journal approximately a year ago, but much has changed since then.)

Online access to journal content progressed in the 1970s from printed indexes with controlled vocabularies to online services such as Dialog, which used Boolean logic to structure search queries.  Initially, only abstracts and indexes were available, but full text was soon added.  Many information providers produced CD-ROM versions of their databases for local access. With the coming of the Internet, access to the full-text became ubiquitous, many specialized services appeared, and Google entered the picture with its Google Scholar service.

End users have become used to content organized by format—with databases of books, journals, or local content, but this arrangement does not help the user.  Users want to be able to run one search across all content.  Students do not think of format at all, and some of them have lost the concept of a database of journals, thinking of information sources by the database name.  And the idea of an information intermediary is foreign to many of today’s students, who consider themselves able to use information services without help.  The proliferation of databases (for example, Stanford University has 900 of them), each of which must be searched separately, means that many databases are infrequently used, so there is a significant need for discovery services.

Students want things fast and simple; libraries want to maximize the investment they have made in acquiring databases of information; and content providers want inexperienced users to access their information.  Discovery tools, which bring the user the experience of walking through the stacks, are geared towards browsing, which is a huge step forward because it lays out the landscape of the content.

Today’s four main discovery service providers each bring a variety of skills to the market. EBSCO and ProQuest have lots of experience with content; ExLibris knows technology; and OCLC has extensive experience with metadata.  Criteria for selecting a discovery service are the same as for most information products:

  • Scope and depth of content,
  • Richness and consistency of metadata,
  • Frequency of updates,
  • Ease of incorporating local information,
  • Simplicity of the interface,
  • Ability to customize the interface,
  • Support for mobile platforms, and of course
  • Cost and availability of funding.

We are still learning the role of discovery tools in the industry and will be for the next few years.

In the question period, the important issue of “good enough” search results was raised. What happens to the discovery environment if the industry standard becomes “good enough”?  Undergraduate students are under considerable time pressure so “good enough” information may be sufficient for them.  Writing a thesis requires much more specific and in-depth information, as do disciplines such as law and medicine.

The Players

In this segment of the workshop, each of the four current discovery service providers presented an overview of their offerings.

Summon™

John Law

John Law, Vice President of Discovery Services at Serials Solutions observed students searching for information in their native environments and found that more than 90% of them did not know the resources that were available to them in the library. One of the major functions of discovery services is to make the library a useful starting place for research.  Serials Solutions’ service, Summon, launched commercially in August 2009, and had a very rapid adoption, which was validation that it filled a need.

 

Here is a breakdown of Summon’s current customers.

Law said that many customers choose the Summon service because it meets users’ expectations.  It was not built on catalogs or library databases, which have only a small portion of the content that users want and do not scale well.  Summon was designed by web application developers, so it has an interface familiar to digital natives which lets them identify with the library.  The system’s objective is to bring the user back to the library, not to bring them federated search.  Content comes from multiple sources and is de-duped: when the same article is retrieved from more than one source, the metadata from each is brought into a single unified record—a capability unique to Summon.

Summon also points users to the library as the starting place for research, thus providing an immediate ROI for the entire collection.  It works, people use it, and it has had a very significant impact on usage in libraries. Some libraries have seen as much as a 50% increase in uses of well known titles, and their database usage has also increased substantially.

Summon has over 760 million records from more than 80 types of content.  The Hathi Trust will provide over 9 million records to Summon, and book publishers are also supplying data, so for the first time, libraries can offer users the ability to search the content of the books on their shelves.  Local collections can be easily incorporated and made discoverable.  All the content is in one index, is treated equally, and can be searched at once.  An API allows libraries to build their own interface on top of Summon.  It has become a digital front door for libraries.  Publishers are also seeing the advantages of discovery services because libraries’ usage of their content is increasing in libraries that are Summon subscribers.

Primo

Carl Grant

According to Carl Grant, Chief Librarian at ExLibris, their Primo discovery service was built with the needs of librarians in mind.  Primo is layered on all ExLibris products and can be locally installed or cloud-based.  Over 860 sites, 291 in North America, are using Primo. Users see their results in a single view, with links back to the original source.

Primo Central, ExLibris’s comprehensive consolidated index, can be used as a standalone hosted metadata source, or in conjunction with Primo.  Hundreds of millions of items are indexed in full text.

Primo integrates with many other systems:  ILSs, digital repositories, XML databases, link resolvers.  It is designed to work with what a library has in place and can be customized to boost local content, thus allowing libraries to emphasize “what we offer”. Open APIs allow libraries or consortia to write their own extensions to serve local end users’ needs.  The source of each displayed record is shown, maintaining the content owner’s brand identity.

Primo subscribers have found that their library’s average number of search sessions increases dramatically (NYU found that usage tripled), but average session length dropped because users are quickly finding what they want, which is good news for librarians.

Other considerations to take into account include the importance of content neutrality and non-exclusivity of content in databases.  It is important to recognize that discovery services are NOT content providers. The content provider’s branding is maintained, copyright statements are supported, and access to its subscribers can be restricted by the discovery service if necessary. Retrieved records in Primo are not deduped. They are grouped but not merged, and links back to the source record are provided.

WorldCat® Local

Chip Nilges

Chip Nilges, Vice President of Business Development at OCLC, noted that its mission of its WorldCat Local service is to integrate library collections for consumers.  Launched in 2008, WorldCat Local was built at the request of an OCLC member.  It provides a single search of all library collections, integrated and intuitive resource discovery, and interoperation with existing library systems.  Because its goal is comprehensive coverage of a library’s collections, it includes more than books. Over 14 million records from over 1,000 sources, amounting to more than 44 million data elements, have been integrated into the database.

:spacer:

When multiple retrievals of the same item occur, the “most robust” record is displayed.  Usage reports show statistics and traffic on WorldCat local to publisher site.  Full text indexing will be available in early 2012; staff and expert search views are under development; and the user interface is being enhanced.

 EBSCO Discovery Service™

Sam Brooks

Sam Brooks, Sr. Vice President, Sales and Marketing at EBSCO Publishing, said that the motivation for developing the EBSCO Discovery Service (EDS) was to try to help libraries compete with Google and Wikipedia for the attention of end users.  He quoted a recent OCLC report, “College Student Perceptions of Libraries and Information Services” that found that only 30% of college students use library resources; they are using Google instead.  A service is based solely on full text searching is not much different from Google; the advantage of discovery services is subject indexing.

All discovery services load library catalogs and partner with non-journal vendors, but subject index providers are not working with the discovery services.  As the owner of several subject indexes, EBSCO knew why: subject index providers generally consider partnering with discovery services to be a risk to their bottom line.  There is no risk for full text publishers in working with discovery services because users do not come to full text vendors for the metadata.

A combination of subject indexing and full text searching is the best way to provide users with the best results.  The subject indexing is crucial, and a bias toward it is good and helps give better results.  EDS was developed not to replace subject indexes but to embrace them and gain additional value from them. EBSCO does not try to convince libraries that discovery services are a replacement for subject indexing; indeed, Brooks said that other discovery services may mislead users into thinking that subject indexes are not the best source for finding the desired information.

EBSCO and EDS will be featured on the cover of the upcoming Reference 2012 issue of Library Journal.

The Discovery Service Selection Process

Diane Bruxvoort

Discovery services represent a big commitment of people and funds for a library because they are one way of presenting a library to its community.  Diane Bruxvoort, Associate Dean for Scholarly Resources and Research Services at the University of Florida, described her experiences in selecting discovery services for an academic library and grouped the main issues into “7 C’s”:

  • Commitment can be bottom up or top down or it may be driven by a funding opportunity, but administrative support is crucial. Without it, it is impossible to move forward.
  • Costs can be met from new funds or reallocated ones.  They can also be met from the technology fee charged to students, which is a good use of those fees because discovery tools are used by many students and are very visible to them.
  • Choice: who chooses a vendor makes a difference.  Sometimes the administration just makes the choice, but usually a task force is involved.  The members of the task force must understand all aspects of the service and can include liaisons, librarians, catalogers (they understand metadata), or administrator (having an administrator on the task force is one way to sway the decision with the administration).  End users can help with the decision if they know about searching.
  • Criteria:  Lay out what matters at the beginning, but be willing to add new criteria or delete some.  Be flexible, and avoid the laundry list if possible.  Identify top 5 criteria for your institution.
  • Coverage:  Who has what?  Lots of coverage is common with all discovery services, but it must be matched up with holdings.  Note that coverage changes rapidly.
  • Companies:  All of today’s discovery service providers have been in business for a long time, and all the products are good.  An institution’s previous experience with a company is important.
  • Calendar:  In an academic institution, new products are customarily rolled out at the beginning of the fall semester.  Be sure to allow enough time for the launch; it is a big change, and the librarians need time to prepare for a new product.  Instruction is important in libraries, and it is needed with discovery services.  For an example of how a discovery service is presented to its users, see the University of Houston’s library site.

The Reality: Implementation and Results

Even though discovery services are still in their early days, a number of libraries have acquired implementation experiences.

Demian Katz

Demian Katz, Library Technology Development Specialist at Villanova University said that when Summon appeared as the first available discovery service, the decision was made to integrate Summon into VuFind, Villanova’s interface for discovery, which was being used to search its OPAC.  The challenge was.  Summon’s advantages were that it exposes a wide range of library resources in a single place, it has a simple interface, users do not need instruction to use it, and it is compatible with VuFind.  Three implementation options were considered:

 

  1. Install Summon as a standalone service.  Although this was the simplest option, it would result in loss of functionality and might have relevance ranking problems.
  2. Dynamically merge results from Summon and VuFind.  This option would have been complex to implement and the system response would be slow.
  3. (the chosen option) Provide results from Summon and VuFind as separate lists.  This had the advantage of retaining the full functionality of both systems and was relatively simple to implement.  The results are shown side by side on the screen with many navigation options.

To see the Villanova implementation of Summon click here.

Gregg Silvis

Gregg Silvis, Assistant Director for Library Computing Systems, University of Delaware (UD) Library, noted that UD was the first production site for WorldCat Local. It went live in August 2008 after a complex installation process.  Key issues were the integration of UD’s in-house built open URL resolver and a well-maintained list of databases. It was a major conceptual change for the library staff to search a database of external resources—suddenly articles appeared in search results.  The analytics provided by the system were very useful to see how the users were interacting with the site.

 

 

Scott Anderson

Scott Anderson, Associate Professor and Information Systems Librarian, Millersville University installed EBSCO’s EDS at the library.  The Millersville library was a heavy user of EBSCO’s products, so the integration of EDS was straightforward because all the system parameters were preset.  Not only did this save considerable time (installation took only a few days), but it gave flexibility to build subsets of the content for each course.

EDS has been well accepted by the faculty and students.  It has administrative support because the staff of the Provost’s Office liked it.  Freshmen think EDS is like using Amazon, so they understand the idea of facets.  And it has resulted in increased usage of some subject databases that the library has been trying to justify.  EDS usage data was even used to avoid cancellations of some databases.  Millersville branded its implementation of EDS as “Library Search”, so in the eyes of the users, it has become the primary point of accessing all research content.

Erin Rushton

At Binghamton University, Ex Libris’s Primo service spurred the purchase of the underlying products as well because it could be used as a uniform discovery layer for the library collections and other digital collections.  The convenience of one vendor supporting all the services was seen as a plus.  Erin Rushton, Web Services Librarian, described the implementation of Primo.  Link resolver data were uploaded to provide availability information to users. System testing was done with the help of forms supplied by Ex Libris, and the system was customized for the local data that was added.  Primo has become the default search on the library home page.  After the system went live, little feedback was received from users, so it was assumed that they like it.

 

Content Provider Perspective

Bonnie Lawlor

Bonnie Lawlor, NFAIS Executive Director, reported on a survey of NFAIS members (mostly content providers) conducted a year ago to find out who was working with discovery services and what they had learned.  About half of the respondents were working with discovery services, and most regarded them as an opportunity because they offer broad exposure of content, improved searching speed, and better search results for users.  Some respondents were concerned about loss of brand identification, inaccurate usage statistics, and poor rankings.  A number of specific issues were identified by the respondents; the survey results are available here.

Lawlor is leading a task force currently developing a Code of Practice (COP) for discovery services which will create an awareness and understanding of the issues, ensure full disclosure, provide guiding principles for contract negotiations, and list the rights and obligations of each player. The COP will be patterned on one derived in the 1980s for information gateway services.  So far, 16 rights and obligations have been identified.  When the draft COP is completed later this year, the providers will be invited to react to it.

The Future

Discovery services are putting themselves in the delivery chain of content, influencing what is exposed to users and which content gets used, so they are significant tools.  In the closing session of the day, the four service representatives presented their view of the future of discovery services.

Chip Nilges

Are discovery services a sustaining or disruptive innovation?  We will see lots more aggregation of content into centralized indexes which will drive the need for more filtering and create a demand for more vertical markets.  Shrinking budgets in libraries will create opportunities for new business models.  Rights, clearance, and subscription agents will come together.  More research will migrate to these services, and interface to library collections will become more complex.  Bringing the library into the social web will become increasingly important.  Abstracting and indexing services should be thinking this way as well.   This is an exciting time, and exciting times are scary!

John Law

Libraries can now have tools to meet user expectations, and they will be able to close the gap between information and users.  The number of volumes in a library is no longer a measure of its size.  It is hard to do coverage assessment, but discovery services should make that transparent.  Abstracting and indexing databases are more problematic.  Including indexing from an abstracting and indexing service in a discovery service does not necessarily mean that the information will be discovered.

A recent article from the Chronicle of Higher Education noted these changes in search:

  • Discovery services need to be agile.  We are in a very early stage.  It is exciting to be able to use a whole new technology platform.
  • Discovery services will have to move past the OPAC as their center.  Be careful not to be distracted by a long list of features—only a few users will use the advanced features.
  • Discovery services will need library expertise built in to them.

Carl Grant

The next steps in discovery services are:

  • Personal relevance ranking.  How do we consider what the user is looking for?  What is the context of the user?
  • Open discovery processes and workflows.
  • Improved mobile interfaces.  We are not paying enough attention to this.  The growth of mobile is tremendous and will surpass desktop usage in the near future.  We must be on those content devices.  The world of apps is out of control.  Do they add functionality?  Many of them are quite basic.
  • Address the growth in e-book usage.

Needs currently not being addressed are:

  • A clearer difference between Google and discovery interfaces.  Read The Filter Bubble which gives results based on user behavior.  We do not need to sell an ad; make this clear to end users that the best way to get unbiased results is by using a librarian.
  • Many people like text-based learning, but there are also other ways of learning (see Apple’s Garage Band app for the iPad for a good example of such an interface). We need better support of users with different ways of learning.

Sam Brooks

Do users know that a record comes from an abstracting and indexing database?  It is not merged, not unauthenticated, and is not free on the web.  There is a real bias toward subject indexing.  Usage statistics go up when subject indexes are introduced.

The next step is for discovery services to market themselves appropriately.  There is much confusion in the market, which needs to be addressed.  We need to refute the notion that a discovery service can replace a major subject index.

Discovery services are also competing with Google and Wikipedia.  Users think of Google News, Google Images, or Wikipedia, and look at them as legitimate places to get information.  Students like the Google and Wikipedia because they provide a single search box, real-time news, and a massive encyclopedia.  Discovery services must find ways to coexist with them.  EDS will shortly offer real-time news from major wire services such as AP, UPI, and PR Newswire.  (AP, the most important newswire, is no longer available on Google News.)  It is also building an unprecedented collection of high quality encyclopedias.  Contracts have already been signed with 9 publishers, and more are in process.

 

Comments are closed.