Archive | Online Information 2011 RSS for this section

Big Changes Coming Next Year

Online Information 2012 will be at a completely new venue, the ICC ExCel Centre.

ICC Aerial View

The International Convention Centre (ICC) is a new venue in East London and is Europe’s largest convention center.  It will provide comfortable meeting space and much better connection between the conference and the exhibit hall than was possible at the Olympia.  The ICC is conveniently located only 1 mile from the London City Airport, which serves nearby areas in the UK and other parts of Europe.

Besides a new venue, the organizers of Online Information have promised other new features and an updated program.

I hope to see you there.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Search Engine Update

This session featured 3 search experts reviewing current trends and developments.  Marydee Ojala, Editor, ONLINE Magazine and long-time online searcher, led off with a presentation entitled “So Many Search Engines, So Little Time”.  Of course, the most popular search engine is still Google, but its relevancy is declining, there is no commitment to advanced search options, and it seems to be pulling back from features admired by information professionals.  Alternatives to Google are:

  • General web search engines.  Bing by Microsoft is the most familiar.  It features field searching and, search refinement (i.e. advanced search).   Yahoo’s search is powered by Bing except in Japan and South Korea, and it remains a takeover target.
  • Specialty search engines concentrate on format (images, video, social media), or subject (news, science, business).  A variety of country search engines are available, such as Baidu (China), Yandex (Russia), and Naver (South Korea).  The Search Engine Colossus is an international directory of search engines.  Blekko has no spam and filters out results from content farms.  DuckDuckGo is known for its privacy because it does not save searches.  Exalead is a cloud-based site for enterprise search and has some advanced features such as soundslike and spellslike.  Topsy is now the only search engine for archival Tweets.

Many search engines feature databases of a variety of information types; for example, one can find databases of images, books, news, and maps on Google; images and finance on Yahoo; and travel, news, inages, and video on Bing.  Flickr and Picasa are well-known image databases, which can be searched by image criteria such as color.  YouTube, of course, is the leading video search engine, but one can also find instructional videos from various universities as well as those from the Journal of Visual Experiments (JOVE).

  • Paid search engines are mainly the traditional ones such as Dialog, Factiva, LexisNexis, EBSCO, and ProQuest.  Some subject-oriented paid search engines are also available such as those from STN International, whose flagship database is  Chemical Abstracts.  In contrast to Google and some other web search engines, no SEO manipulation is done by these vendors, so results are very consistent.

Innovations in search continue, but it is happening at the margins and inside the enterprise.  Search algorithms are changed frequently.  (See the closing keynote session for a discussion of the future of search.)  Information professionals must constantly keep up with changes in search engines and be ready to switch search tools quickly.  This is time consuming, but it is necessary if we are to remain relevant.

Marydee closed by urging attendees to read The Filter Bubble.

Arthur Weiss, Managing Director, AWARE, continued Marydee’s theme and reviewed some specialist search engines for people, numeric data, and news.  He noted that although search engines may claim to search the deep web, they may be only using a web crawler to find material on the visible web.  True deep web search tools typically look for information not searchable by crawlers.

Weiss showed how a Google news search returns different results depending on whether one is logged in to a Google account or not. When you are logged in to your account, Google knows who you are, your location, and any preferences you have set.  Several news search engines cater to business users, including Northern Light, Congoo, and NewsnowSilobreaker and Evri aggregate news and return results on a topic.  Silobreaker has a number of innovative features, such as a summary, headlines, and trend charts showing item frequencies.  Evri has more images than Silobreaker.

People search engines are either directories of names or searches for names in the context of articles.  Some of the second type include Pipl, 123People, and Yasni.  Pipl has a US bias; the other two are based in Europe.  Yatedo allows phonetic searches, searches based on links to other people, and other advanced options.  Jigsaw is a database of online business cards and actively solitics contributions of them.  Yoname searches people who are users of any of 27 social media sites.

Numeric searches can be difficult because much numeric is presented in graphical format.  Data from official statistical sources is available in the Offstats database, and the Open Data Directory provides links to over 400,000 databases of numeric data on a wide range of subjects.  For scientific data, Wolfram Alpha is a good source; it presents data in tabular or graphical format. Lexxe searches data by using a “semantic key” approach and also reports results in a chart.

Karen Blakeman, Trainer and Consultant, RBA Information Systems, looked at what search engines know about us, and “a lot” is known, so users must be well aware of this when as they do their searches.  In particular, Google knows us very well and personalizes search results based on the user’s location browser, search history, blocked sites, “liked” sites, etc.  Searches based on the user’s location attempt to return rresults relevant to the country, but they may return erroneous results because a company’s switchboard may be located in a different country, for example, which has implications because access to some sites is blocked outside their local region.

Panopticlick will test your browser configuration and report how unique it appears to be.  (The more unique it is, the easier it is to track unique information about the user.)

Search personalization and localization may not be all bad for users; for example, it is useful if you need to quickly find a local restaurant or are researching comapnies in a particular country.  To explicitly search local listings, country versions of search engines are useful.  Several browsers have an anonymous searching feature that turns off saving of searches, personalization, etc.  You can also set your ad preferences in Google (

Facebook is notorious for making it difficult to delete material, and it even keeps it even when you think you have deleted it.  Europe v. Facebook is a collection of complaints against Facebook and instructions for residents of Europe to request their data from Facebook under EU privacy laws.

In the news area, Google can seriously damage search results.  Mary Ellen Bates recently did an experiment where she asked several searchers to enter the term “Israel” and send her the results.  The results were startling:  More than 25% of the stories were retrieved by only one searcher, and only 12% of the searchers saw the same 3 stories in the same order in their results.  Google’s recently introduced “Standout” feature to tag content will make the situation worse.

So what should a searcher do?  You can reject cookies, but then many searches will not run.  Active management of cookies is possible, but it is time consuming. provides an anonymized interface to Google, but it is for web search only.  Duck Duck Go and Blekko do not keep web history of personalize search results.

Here are Karen’s recommendations in this uncertain and sometimes scary search world.

  • You have some control over personalization, so damage limitation is sometimes possible.
  • Sometimes a web search history is a convenience and personalization is a good thing.  You must make this decision.
  • If you have a Google or Bing account, be sure to log out of it when not using it.
  • Regularly check your dashboard privacy settings, and ad preferences.
  • Clear histories if you do not need them.
  • Remember that if you delete all cookies, you will lose your opt-out preferences.
This was an information-packed session and one that all information professionals should look at.  You are certain to find something of interest!

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Where Next For Search?

Steve Arnold moderated his traditional closing panel discussing the future of searching.  The format was the same as in the past:  A series of questions from Steve with answers from the panelists who were:

  • Gregory Grefenstette, Chief Science Officer, Exalead
  • David Milward, Chief Technology Officer, Linguamatics
  • Dave Patterson, CEO, Sophia

(L-R) Gregory Grefenstette, Steve Arnold (Moderator), David Milward, Dave Patterson

Steve introduced the panel by noting that the participants all represent successful companies gaining in sales. 2012 will be a very challenging year for all companies; the changes in technology that are now underway are as dramatic as we have ever seen. Social change in findability is uncovering a growing interest in using people to find answers.

Below is an edited transcript of the conversation.

1. What is the major trend in enterprise search and content processing for 2012?

Greg: Enterprise search is different from web search because lots of information is in unstructured sources. The current trend is to include both internal and external information in searches, so there is a growing body of linked data available. People want to know everything about what is going on both inside and outside the enterprise.

David: Content is getting more diverse. The number of people available to sift through the data is getting smaller, so text mining and other technologies are necessary to analyze the data, and they are being used in a hidden way behind the scenes. With more automation, keeping things up to date is important. Text mining provides interactive information and can be used like a search tool, so we can combine it with search.

Dave: Our focus is on content, understanding its meaning, and letting organizations leverage more value from their content. Discovery is very important. People want to find out what they do not already know about and link it with information they have. Cloud-based computing will be prominent for the next few years.

Steve: The focus is dramatically different from basic retrieval we had in the last few years.  We have heard that Microsoft has made its search system available without charge as part of a bundle to large corporations, so in effect, search is free. Lucid Imagination is creating a free search system, and pricing pressure is increasing to where basic search has become a commodity.

2. In an environment of low cost systems, what is the value of a commercial system that costs much more?

David: We do not compete directly against search tools. Text mining finds indirect relationships, which you cannot do with a search tool. Information professionals want to use text mining to give added value over what end users would get. Terminologies are difficult and costly to create. Text mining tools are used with the technology.

Dave: Many free search tools are limited in functionality. Just finding some hits is no longer sufficient. If all you want is a flat list of documents, then the free tools are fine, but most enterprises want search tools that understand the content and leverage its value. Tools at the low end of the market will not return that value. Many organizations do not appreciate the time it takes to build something in-house using open source tools.

Greg: It is nice to have free search, and the tools are good at asking a question and getting an answer. We build applications where search is in the background and can connect to many resources. Another added value is the semantic processing we provide which is not available in the free tools. Modifications and fixes are either not available for the free tools or cannot be done quickly.

3. What is the impact of apps on search and findability and content processing?

Dave: We need to be more innovative in the way we present information to users. We cannot present results lists on the small screens of mobile devices, which puts more pressure on the intelligence of the search tool.

David: As you get to smaller devices, you need more understanding of the text and whether you can repurpose it differently. Faceted search and text mining allow us to structure the information and allow people to navigate through it. We may see more emphasis on push services that give people information wherever they are.

Greg: Users’ expectations are being raised; they want instantaneous response time, ease of use without training, and 24 hour availability. This is a challenge and Exalead has a solution. Mashup systems allow you to present information to different devices.

4. How is cloud reliability going to be addressed?

Greg: Search engine technology allows you to have constant availability.

David: We initially thought a cloud service would be interesting to small companies, but we found after launch that big companies were also interested. Peiople want to concentrate on core competencies. They do not want to worry about keeping external services up to date and want someone else to do that for them.

Dave: There are issues of security and reliability. Are there any statistics that in-house servers are more reliable than cloud ones? Are we panicking as a result of the massive failure of BlackBerries on October 10-12? (Google “Blackberry outage” for more information.) Security is also an issue. Companies protect their data as much as possible and are reluctant to put their data on the cloud. Employee behavior is a bigger risk to companies; laptops are frequently lost.

5. In the search and business intelligence space, former search companies are are repositioning themselves and are now saying they offering predictive analytics  or customer service. To an old person, this is search. Are these new terms helping or hurting?

Greg: Search-based applications can be very varied. There are many semantic technologies in search engines that allow these different applications.

David: It is always good to describe what you are doing to solve a problem instead of calling it just search. There is lots of subtility in language.

Dave: We are firmly focused on content and understand its meaning as well as new and better ways of helping people deliver it. Semantics is an example of true business intelligence. We need natural language processing to understand what we are doing. There are many ways of extracting meaning from unstructured business information.

This was an excellent concluding note for the conference on.  The old standby, search, continues to be change as technology advances.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


E-books Unleashed

So popular and widespread have e-books become that it would be rare to find a major information industry conference these days that lacked at least one session on them.  I attended the “E-Books Unleashed” session which had talks by the presenters shown below that highlighted some recent e-book developments.

(L-R) Anders Mildner, James McFarlane, Giulio Blasi, Mary Joan Crowley

John Akeroyd, Director, Information Reports and Honorary Research Fellow at the University College London and session moderator, introduced the session with the following sales data that shows that the recent explosion in e-book sales continues unabated.

Anders Mildner, a journalist from Sweden, keynoted the session with a presentation on “E-Books, Reading and Culture: What Change Can We Expect?”

People have been reading aloud for hundreds of years, and listening has become a shared experience and a part of culture. The public became used to being passive, but 10 years ago, the emergence of social media changed this when people began to create their own media.

Social media deals with social objects,and when they emerge, we see a shift in value. For example, music only has a value when it is being shared and with the rise of music downloading and sharing services, we see many music stores closing down.

E-books will create a shift from a passive listening culture to one of participatory reading, so reading will become a shared experience. Books are entering the world of remixable objects, where they can be cut, pasted, and shared. This is creating a power shift of those who hold the power and an economic threat to the producers. The value of the printed book in economic and cultural terms will decrease as we become surrounded by digital books.

We saw a similar process in the music industry. People care more than ever about music but less about the medium on which it is delivered. Libraries are now making deals with publishers to be able to lend e-books, increasing the value of the book as a social object.

Libraries and librarians are facing an entirely new challenge, but we should be grateful. For the first time, we can re-define reading and do it together. The promise of the future is that we are able to engage in reading more deeply than ever before.

James McFarlane, CEO of Easypress Technologies, began his presentation, “Beyond the EPUB3 e-Book”, with a brief historical overview.  The first serious attempt to produce e-books was on the Apple Newton in 1994. Only 2 books were available: the Bible and the Concise Encyclopedia Britannica. The Newton was backlit and had a short battery life. It did not succeed because there were better devices available. Then Jeff Bezos developed the e-Ink device in 2004, and the Kindle was born. Bezos said he would have “every book ever printed in every language available in less than 60 seconds”.  Then came Apple’s iPad which has transformed the e-book and multimedia worlds. We are now seeing many competing tablets appearing frequently. Harry Potter books will soon be available as e-Books, and 100 million downloads are expected. (There will be 7 books in 68 languages with video clips, audio, games, and other related products).  This will transform the market yet again.

Easypress has developed a way to convert files from Quark to e-books in a couple of minutes. That is epub2, which is available today. The goal of epub3 is to develop interactive books that go beyond the reading experience. The first EPub3 reader will not be on the market until next year, and it will have many new features, including indexing, searching and navigating, video and audio, multi-column formats, and active hyperlinking.

A significant problem for e-books is indexing. 83% of print books have an index, but virtually no e-books do because the concept of a page does not exist in an electronic publication. So the index must be redesigned.  Searching and navigating e-books is also problematical.  In a future design, e-book indexes will have active links so that when you click, you jump to the most relevant section of the book, with the selected term highlighted. Then you can click on a search bar to get to related subjects. Today, we have only simple character-based searching which is not very useful.

Many people just want to find something to read. There are thousands of sites to help readers discover e-books.  The next generation of iPads will permit opening several books simultaneously, so readers will be able to jump back and forth between them.

There are about 1.5 million e-books available today. Searching, navigating, and discovery will be highly necessary as that number grows. Keyword navigation, semantic referencing, and sentiment analysis will be used to help us move from simply finding some facts to discovery.  Epub3 will allow us to move beyond the book experience in ways we have yet to imagine.

In his talk on “Digital Lending Models for Public Libraries”, Giulio Blasi, CEO of Horizons Unlimited, described the Media Library Online (MLOL), a digital lending aggregator serving 2,300 public libraries in Italy with open access and print contents and including several different lending models.

Libraries are currently giving away music legally; is this possible with eBooks? “Social DRM” allows a single user to download the book and keep it forever.

The MLOL offers 4 major services:

  • Shop: libraries manage their collections and get e-book backfiles,
  • Customized library portal: a unified interface for searching, browsing, discovering,
  • API: Integrates with OPACs, external authentication systems, and includes next-generation features allowing embedding the contents of libraries wherever they want from the web, and
  • Cooperation: creation of consortia around any content collection. Service at national level allows any library to cooperate with any others.

Mary Joan Crowley, Librarian, Sapienza University of Rome, noted that for 600 years, the library has guaranteed the organized delivery of printed and other content, provided trained staff to support teaching and research needs of universities, and ensured that the necessary physical facilities were available.   But then that all began to change. Rising costs, networked environments that the library could not compete with, declining usage (people never start their searches from the library building and frequently not from the library website), and generational shifts (students want information immediately) have caused a significant change.  65% of a library’s budget typically goes for journals, squeezing out books. How can we compete with huge information providers?

We have a great brand and must find out how to evolve it into the 21st century. We may not be courageous enough to make the shift. Even though we have a service ethic, good reputation, and the trust of our community that has been built up over centuries, people will not come to the library any more.  We need to manage our resistance to change and direct it towards the critical application of new skills.  One way to do this is to engage our users by going where they are, joining their conversation and figure out how to find the information they want.  We must be content providers, not just suppliers.

The academic library is at the heart of the modern university, providing access, support, and services to its users. An e-book reader project with lesson-based content was started to serve customers not physically at the library. Although the e-book market is in its infancy, users are not; they are accustomed to downloading lots of content, content is evolving, and they want untrammeled access to all types of media. The reader project was successful; users  became part of the conversation.  As the library provided adapts to the changes in the value chain, it can provide incremental value in the ways shown here.

Crowley concluded her talk by showing the following adaption of Ranganathan’s laws.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

The @LIS Tweeting Team

As is common in many conferences these days, Twitter was widely used.  Each conference room had its own Twitter hash tag, which allowed a “Twitter Moderator” to relay questions from the Twitter feed to the speakers.  Large monitors in the hallways of the conference center displayed the feeds for easy reading.

At the e-books session on Thursday morning, I found myself sitting behind two Tweeters.

They turned out to be Stephanie Kenna (at left below) and Hazel Hall (right), from the Library and Information Science Research Coalition.  Stephanie was a Twitter Moderator for the session.

Stephanie Kenna (L) and Hazel Hall (R)

You will probably not be surprised to learn that many interesting opinions surfaced in the Twitter streams.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Delivering Value Through Open Data

Open Data Panel: (L-R) Joy Palmer, Andy Powell, Stephen Dale (Moderator), Jared McGinnis

Jared McGinnis, Research Manager, Semantic Technologies began this session by describing semantic approaches to news at the Press Association (PA, a wire service for the UK and Ireland).  The PA produces 1 million text stories a year, including detailed sports data for every game, and has an archive of 350,000 photos and 50 million video clips.  It will be the official news service for the 2012 Olympics.  Semantics is at the heart of its strategy; it has between 6 billion and 10 billion RDF triples.

McGinnis listed the challenges in dealing with semantic news.

Another challenge is in capturing the semantic metadata from journalists without them having to manually enter the data.  This is important to the PA because it reduces costs, allows the content and metadata to be separated, thus providing a higher level of abstraction and better and more timely services.

The PA is not a technology company, but it is contributing to the data cloud by covering every topic, which is a major impact. The PA’s reputation is “fast, fair, accurate”, so its products must be unbiased.  Using semantic technology in its products increases the sustainability and feasibility of the Semantic Web and enhances the development of standards and the community.

The architecture is not dependent on a single vendor because the content and metadata are separated. New formats can be integrated flexibly. The PA uses an XML database on a Mark Logic platform. Concept extraction is used to suggest metadata terms to journalists, who then select the appropriate ones for the articles they write. The result is a  human-like quality of metadata terms with 90% accuracy.  Here are the advantages of this strategy.

Strategic benefits

Through metadata management, one can capture a relationship between people and locations even though they are not explicitly mentioned in the story, which greatly enhances retrieval as well as creating navigation and SEO advantages.

The PA has created an Simple News and Press (SNaP) ontology for the news industry that provides relationships between terms and creates a basis for sharing. It allows mapping between sets of data regardless of originator, provided both sets are created with the same standards. New products are easy to build because they have a shared view of how data is stored.

Andy Powell, Research Program Director, Eduserv, described a project he did for a Resource Discovery Task Force to develop some metadata guidelines for libraries, museums, and archives.  (Click here to view the complete report.)

A draft proposal was developed using the Linked Open Data Star Scheme (at 3-, 4-, and 5-star levels) to suggest 3 approaches: community formats, RDF data, and Linked Data. From the 196 comments received on the draft proposal, the guidelines were re-conceptualized, and the 5-star level was adopted, which will provide a rich semantic framework for the metadata and allow easy use of other people’s ontologies.

The lessons learned in this project were:

Joy Palmer, Senior Manager, Resource Discovery Services, at the University of Manchester described a vision for a ‘virtuous’ flow of metadata across the web (a metadata ecology for UK education and research).  The discovery process is as much about cultural change as technology. There is a new way that the web works and users behave, and the ecosystem is about creating healthy relationships between the various components, thriving on collaboration and cooperation of stakeholders.

Making data open and reusable means getting the legal issues right. Your data is not open unless it has an explicit open licensing statement. See You cannot ‘de-risk’ open, and open also means being open to machines.

The principles of open data have now been developed; during the next year they will be implemented. One of the cornerstones will be the creation of case studies of services and outcomes in libraries, archives, and museums. The case studies will look at terms of use, data characteristics, interfaces, and services and sustainability.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


New Frontiers in Information Management: Linked Data

Linked Data Panel (L-R): James Howard, Neil Wilson, Richard Wallis

This session, entitled “Linking It All Together: Discovering the Benefits of Connecting Data” explored linking large data sets and why one would want to do that, and included reports on two interesting case studies: one on the British National Bibliography, and the other on how the BBC uses large data sets in its coverage of sports as well as its preparations for the 2012 Olympics to be held in London next summer.

Richard Wallis, Technology Evangelist at Talis discussed why you would want to link data.  He noted that large datasets are becoming more prominent; for example, the Library of Congress has 147 million assets which occupy 147 terabytes of storage.  In 2011, 1.8 zettabytes (a zettabyte is 1 billion terabytes) of data will be created, and this is expected to increase to 7.9 zettabytes by 2015.  This is equivalent to 1.8 million Libraries of Congress.  In such a mass of data, the problem is how to find something, but that is not usually the ultimate goal. This is not a human-scale challenge! Machines must get involved, and we must make it easy for them to do so. We need to identify the “things” we are referencing. Linked data will help us do that because it builds on semantic web standards and is about identifying and linking “things”.

To identify something, we must put a label on it and categorize it. Things are identified with Uniform Resource Identifiers (URIs), which look like web addresses. Things have attributes that can be linked together so a human can understand them, as in this example describing a spacecraft.

The URIs are shown at the left of the photo, and a complete set of them is called Resource Description Framework (RDF) and is one of the standards that linked data are built on. RDFs are expressed as “triples”: a Thing has Properties, each of which has a Value.  Regardless of where a Thing is found, the same identifier is used to identify it, which allows data from different sources to be easily linked.

These are the general principles for using linked data.

Linked data facilitates sharing and is easy to use because it is built on web standards.  The barrier to using other people’s data is low.  It is important to understand that not all linked data is open because it is being used in enterprises.  And not all open data is linked. Linking open data liberates its value and helps others discover it.  A linked open data community has grown up, and many organizations are using it.

All of Wallis’s slides from this presentation are available–click here.

Neil Wilson, Head of Metadata Services at the British Library, described how the British Library is creating the linked open British National Bibliography (BNB).  He noted that McKinsey has predicted that the benefit value of open public data could be as much as €250 billion.  Libraries, a source of trusted information, will find many benefits from linking their data and making it available:

Benefits of Linked Data for Libraries

The British Library is meeting the challenge as part of its 2020 vision:

British Library Open Metadata Strategy

So far, it has signed agreements with over 450 organizations in 71 countries to cooperate in offering free data services and has produced and supplied three 15-million XML datasets under a Creative Commons License.

As part of its linked open data initiative, the Library has produced an open version of the BNB, which is a  description of UK published output.  The reasons for this project were to:

  • Advance the discussion of linked data from theory to practice by releasing a critical mass of data,
  • Show commitment by using a core dataset, and
  • Create a service that others could build on.

The data were released under a CC0 license (the least restrictive) and hosted on a platform developed by Talis.  Existing tools provide a staff and organizational development opportunities. Mentoring and training were done by Talis staff. The project involved matching and generating links, then embedding them into the metadata. MARC records were converted to RDF XML using a series of automated steps. This resulted in a dataset of 250K records with 80M unique RDF triples.

This project showed that legacy data was not designed for this process, so care had to be taken with data modeling and sustainability.  They also found that there are often tools or expertise readily available, and the effort to find them pays off and prevents reinventing the wheel.  In all such projects, hidden issues will surface, but it is better to release the results early on, even if they are imperfect, and improve them as time proceeds.  The learning curve can be steep; using pre-existing tools will save development time and assist in data evaluation.  The effort expended to produce the BNB has resulted in significant benefits:

Based on the results of the initial BNB project, further material will be released, and the data model will be revised.  New sources to link to will be identified, and monthly updates will occur.

In closing, Wilson urged anyone contemplating a similar journey to do it.  Even though mistakes will be made, the lessons learned will benefit everyone.

James Howard, Executive Product Manager at the BBC, finished the session with a presentation on “Preparing for the Olympics and Beyond: Metadata, Tagging, and Lots of Sport”.  The BBC Sport site is now 11 years old, and many changes have occurred.   Over time, approximately 320 manually managed pages had been created, but staff resources had not increased.  New sports teams and organizations had emerged, and because of resource limitations, information on them could not be effectively integrated.

Beginning with the 2010 Winter Olympics an aggregated index for each of 15 top level sports was created.  For the 2010  South Africa World Cup, a page for every team, group, and player was created, as well as an “event ontology“. The BBC data was linked with external suppliers’ data, but this process incurred too much manual overhead.  FIFA, the organization managing the World Cup, has an identifier for every team and player. Joining these data together will cut costs and maximize pubilcation. Content repositories are separated from the ontology to enrich user experiences. Journalists were asked to tag specific areas; the tags were used to populate the indexes. The challenge was to get enough relevance from data that someone else has tagged.  Howard said that it was important not to let the developers near to the tagging tools.

Here are some of the questions that must be answered in designing such a project:

  • What do you need to drive your product or domain?
  • What can you use from other people?
  • What do you need to keep hold of?
  • How to you use the data?
  • How do we contribute to the datasets?

For the London 2012 Olympics, the goal is “to show every piece of live action”. There will be 24 concurrent live streams, with 5,000 hours of live video over 16 days. The International Olympic Committee has defined the names of events and has assigned the venues. They will supply the names of 8,000 to 15,000 athletes as they are determined. These data will be joined with event results, and every event will have a page.  The BBC is working with its suppliers to make sure they supply the data as cleanly and as organized as possible.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Are You a Social Leader?

Jemima Gibbons

Jemima Gibbons, Social Media Strategist at AAB Engage UK and author of Monkeys with Typewriters presented a fascinating analysis of the characteristics of social leaders.  When social media tools became available, people saw that they were fun and easy to use. Now, they are being incorporated into “social businesses” — designing a business around people — and “social brands” — designing a brand around people. Social leaders must understand social media, social businesses, and social brands. They are entrepreneurial, know what makes people engage, courageous, passionate, have a belief in self, and exhibit “servant leadership” — a leader who serves. They also have emotional intelligence, are self-aware and people-aware, and are able to control their emotions. They give to their communities.

These photos show the characteristics of 3 types of people.



Which type are you? It is interesting to note that corporations may exhibit some of the same characteristics as sociopaths. They can be irresponsible and manipulative, grandiose, lacking in empathy, ego-driven, and ruthless.

Here are 6 rules for social leaders:

  1. Empower others. Bring a bright team around all you do. It is all about the network.
  2. Follow your passion (like Craig Newmark). Are we doing what we love and believe in?
  3. Build trust. Be open with your community. These CEOs publicly apologized for service disasters.

    Apologetic CEOs

    Communicate all the time.

  4. Keep it simple. Set up filters for information and make things easier for people. (For example, see i Newspaper, which publishes digests of stories.) Avoid feature creep.
  5. Have a service ethic. Listen, listen, listen to your customer base. Remember that you are there for the stakeholders.
  6. Show empathy, and show that you care. Be generous. Listen to what people want. Try to enable them as much as possible.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Cooperative Consumption: The Wednesday Keynote



Rachel Botsman

The Wednesday keynote address was a fast-paced pre-recorded video presentation by Rachel Botsman, author of What’s Mine is Yours – The Rise of Collaborative Consumption, entitled “Collaborative Consumption: Technology, Business and Society in the 21st Century”.

Botsman defined collaborative consumption as traditional sharing, bartering, lending, trading, renting, gifting, and swapping redefined through technology and peer communities. The concept involves individuals making their services or things available to consumers and using social technologies. Botsman predicts that collaborative consumption will become as big as the Industrial Revolution was.

The critical ingredient of collaborative consumption is trust between strangers. For example, Task Rabbit is a platform on which its users can post a task they wish done and the price they are willing to pay for it. Other people (the “rabbits”) indicate what they would do it for, and when agreement is reached, the task is done. This can be thought of as eBay for errands. The most popular task is assembling Ikea furniture. $4.3 billion has passed through the system since it started. It is about empowerment and making money around lives. 25% of all rabbits are retirees, and the average income is $5,000 per month.


Technology creates the social glue for trust between strangers. We live in a social village where we can mimic personal interactions.

We have moved from passive consumers to highly motivated collaboration. Technology is moving us back to our previous ways of interaction. Product serving systems pay for the benefit of a product without the need to own it outright. For example, bike sharing has become the fastest growing form of transportation in the world.

We do not want stuff but the needs it fulfills. Access trumps ownership. Idling capacity is the untapped social, economic, and environmental value of underutilized or idle assets, such as spare rooms, cars in the garage, or tools not being used. Technology enables us to redistribute that capacity to where it is needed. Thus, car manufacturers have all enered the car sharing space. They have recognized that they are in the personal mobility business, not the car production business. BMW Drive now permits renting a car by the minute. Access is by a chip in your driving licence. The expense in car sharing is the cars. A similar service, WhipCar, lets you rent an idle car in the neighborhood. The transaction depends on trust. Insurance companies protect the owners so that any damage is not charged against their personal policies. This is an example of accountability and transparency: real people dealing with real people who behave better than big faceless institutions. The whole relationship is transformed.

Redistribution markets are in 3 forms: monetary, like for like, gifted. They function on trust and efficiency. eBay was the start of markets built on a web of trust. 98% of all eBay trades receive a positive rating: people are fundamentally good.

Collaborative lifestyles are an important aspect of consumption. People with similar interests are banding together to share and exchange less tangible assets such as time, space, skills, and money. AirBnB is a peer-to-peer travel site that matches properties and renters.

AirBnB Unusual Rentals

It has some very unusual spaces listed for rent, this creating a market for things that did not have a marketplace previously.  Here is a map of places to rent in New York City.

AirBnB Rental Map for New York City

Through collaborative consumption, people have become micro-entrepreneurs. The average renter makes $1,200/month from their spare room, which has an economic impact on local commerce because people go to different restaurants, etc. and see the city in different ways. Would people trust one another enough to rent out their spaces? The secrets of collaborative consumption are community and trust. The owner’s role is not to connect, but to protect. The “EJ Incident” is a famous example of one of the few rentals that went wrong. Existing networks of information and trust have been set up for these micro-entrepreneurs. Connecting trustworthy strangers is an untapped market.

How do we build trust? Reputation capital is the measure and value of a person’s reputation across communities and marketplaces. In all of your transactions, you leave a trail of how well you can or cannot be trusted. We will start to aggregate different aspects of trust, and reputation capital will become a currency.

Consumption is changing in the 21st century. Here are some of its indicators.

Collaborative consumption is not a threat but a massive multimillion dollar opportunity. We will reinvent entire sectors. This is a high tech industry but also very high touch. Botsman concluded with an appropriate quotation from Rupert Murdoch.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


Real-Time Mobile Search

Steve Arnold

Well known search guru Steve Arnold moderated this interesting panel on mobile searching.

Mobile searching panel (L-R): Gregory Grefenstette, Antonio Valderrabanos, Benoit Leclerc

Gregory Grefenstette, Chief Science Officer, Exalead, began with a description of search-based applications for mobile platforms.  Mobility is no longer a choice. It is “everyware” (analogous to hardware and software).  Mobile is the primary platform of engagement of information for consumers and users. According to a Forester report, enterprise mobile workers will represent 73% of the US workforce by 2012, and by 2013, there will be 1.2 billion mobile workers in the world, accounting for 35% of the global workforce.  By next year, you should no longer be asking what your mobile strategy is. Mobile working is not like playing; workers want to have everything available in a single application.

We carry a computer all day long, along with our wallet and keys. In 59 countries, there are more active mobile subscriptions than people. Last year, 6 trillion texts were sent! 32% of adults prefer text. Mobile is now a dominant force. The average teen sends or receives 3,339 text messages every month! That equates to 14 hours of attention to texting per month! The average user gives attention to text messaging is 150 times a day, which equates to once every 6.5 minutes. Any device that people are looking at every 6.5 minutes is very important!

The most popular mobile application is Google Maps, followed by weather, and Facebook.  In the shopping area, the average time for consumers to decide to make an online purchase is 1 month; for mobile platforms, the average decision time is 1 day. And when a mobile is lost, it takes only 1 hour on average to report it.

Because of the rise of mobiles, the balance of power has moved from TelCo operators to app developers, and handset manufacturers have lost control over the information experience.

 The mobile platform has become completely open. Most phones now being sold are smartphones. By 2013, we can expect that most people will be able to consume information and use apps.  Here is a view of the mobile services ecosystem.

“Experience platforms” are where people spend their time online, the most popular of which by far is Facebook.  We are now in a “doing” environment; mobile has moved from talking and texting to other things using apps.

The top criteria for selection of a phone are its operating system and the selection of apps available.

The computer we carry now is more than just a computer. It has senses–sight, touch, motion, direction, sound, and touch. This enables us to make sense of the world. (Google Googles can detect faces, for example, and Facebook has integrated this technology.)

The Internet will become the Internet of Things, where there an API for everything. Intel has predicted that everything that can benefit from being connected will be connected by 2020. We can now gather vast amounts of data and process it in a real-time stream processing of unthinkable amounts of data. The web will be no longer static where we used to pull down documents but will become one where will consume information in real-time. It has become the “right-time web”. Our thinking is being aided by the devices we carry.

Information is shifting from document-centered to distributed to linked and streaming real-time data. There is a huge shift in the way that IT is being thought about. New types of data are emerging and replacing the old transaction-based data. Information can now be tailored and made contextual.

See Golding’s slideshare site for more resources.

John Barnes, Incisive Media UK, told us what we know to design for mobile platforms.  He noted that by 2013, mobile phones will overtake PCs as the most common web access device worldwide. Mobile is part of a ‘multiscreen’ life, in which we have a growing number of screens in our lives.  Anyone building a digital service will need to provide for an effective experience across many devices, which Barnes called “polymorphic” publishing.  There are 3 types of polymorphic publishing:

Digital publishing is all about the audience and content, not the technology. For many years developers looked at average screen sizes and aimed at that in their work. Size has increased and has splintered into many sizes, so one can no longer design to a single average size.

m.editions of websites provide full integration with a content management system. Specially optimized templates can be served to the user depending on the mobile platform they use.  This has the advantages of being able to use the same domain name for different platforms and integration with a variety of systems. There is no clear leader in operating systems for browser-based apps, but the iPhone and iPad are extremely popular.

Many people want to have an app; here are some development criteria for developing one. Note that commercial requirements are major drivers of the choice of apps.

Key findings:

  • There is a high propensity for sharing, so subscription and search services do well.
  • User behavior is changing.  They are always “on”, so it is critical that apps are easy to use.
  • Much web content is not ready for the shift to mobile yet because much of it was developed to be used on desktops with big screens.

Sheila Fahy, an attorney at Allen & Overy LLP, described her experiences in bringing one of the first legal apps (The Little Red App) to market.

Why are we all downloading apps? We like them because we are comfortable with them. The Little Red App was to bring employment legal facts into a single place. It took 2 months to bring it to market and cost £19,000. It is free to download; from its launch inJune 2011, it has been downloaded 837 times, 760 of which came from the UK.

Lessons learned:

  • Keep it simple. Everybody wants to do an app. Recycle something you already have. Don’t try something complex because it will be expensive to produce.
  • Don’t give away your “crown jewels”.
  • Have a business plan.
  • Keep the development team small, flexible, and diverse. Deconstruct the information and work out all the links and actions needed. Draw out every screen.
  • If you have a lot of words, an app is not for you, so you should think of something different.
  • Brand is hugely important.
  • Do as much preparation as possible before you involve the developers.
  • Build in lots of testing time.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor