Archive | Notes RSS for this section

Report on DataContent Conference

Below is an e-mail message that I received from InfoCommerce about their recently concluded DataContent conference, which took place in Philadelphia, PA earlier this month.   Although the conference theme was “Crowd, Cloud, and Curation”, it turned out that “Mobile” should have been added.  Interesting!

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


InfoCommerce Group Novermber 11, 2011
You Had to Be There!

The theme of this year’s DataContent Conference was “Crowd, Cloud and Curation.” But in addition to those big themes, another word cropped up in almost every session: mobile. The excitement around mobile was palpable, and there seemed to be general agreement that tablet devices are already starting to have a significant impact on the industry, and a largely beneficial one, provided publishers embrace the tremendous shift to mobile online access and start to leverage the power of these new devices. My view? As a reliable skeptic, you might expect some push-back, but I’m fully on board. Mobile devices, tablets in particular, are rapidly changing how users access data, and even how they do business. This creates opportunities in some markets, but more fundamentally, it means publishers cannot allow themselves to be left behind as usage patterns and user expectations for data products begin to radically shift.

Our keynote speaker, Clare Hart, CEO of Infogroup, nicely set the stage for the sessions that followed by noting that to maximize the value of data, “you have to innovate around it.” This well sums up the InfoCommerce Group view that data publishers need to focus on “data that does stuff,” not simply providing mountains of raw content from which users are expected to find and extract value on their own. Clare illustrated this with a sneak peak at a soon to be launched Infogroup product called Yesmail Marketing Intelligence that will provide remarkable competitive intelligence to marketers, coupled with a powerful user interface and real-time alerting.

Among our 2011 Models of Excellence nominees, we got in-depth looks at two barely-launched ventures: BestVendor, which is doing some exciting work in social discovery, and First Stop Health, a new data-driven health concierge service. We also got a good overview of FeeFighters which matches businesses to credit card processing services, an area that’s gotten a lot of attention lately.

We also learned about newly launched Chaikin Power Tools, which integrates mountains of data and sentiment analysis into a simple, elegant buy/sell indicator for investors. We also heard from another startup, Brilig, which lets online marketers precisely tap specific market segments, precision that’s been sorely lacking to date. Resolute Digital and b2bAnywhere confirmed the stampede to mobile in our session on that topic, and gave us some useful thinking on how B2B mobile will evolve.

Anne Holland of Subscription Site Insider presented some preliminary findings from a study conducted in conjunction with InfoCommerce Group on paid subscription product renewal rates and retention marketing best practices. In our always popular “Excellence Revisited” Wanted TechnologiesAlacra andAgencyFinder offered candid assessments of what went right – and wrong – with products that had previously won them our Model of Excellence award. The level of candor, as always, was incredibly insightful.

Also insightful were presentations from DonorBase and The Praetorian Group and our own Janice McCallum who offered specific revenue generating ideas with potential applicability to many in the audience. And we also got helpful case studies fromDepository TrustPDR Network and ZoomInfo about what’s involved in launching a new data product inside a company that’s not in the data business, and how to successfully re-position existing data businesses that have lost their way.

We also went interplanetary this year, as we learned about the launch (first publicly announced at our conference) of Saturn, a new service jointly developed by LocationaryNeustar and theLocal Search Association. Designed to be a frictionless cloud-based platform where data publishers can upload data, the goal of Saturn is to help business partners standardize, synchronize and maintain their data to improve accuracy. There’s a lot more to Saturn, and the potential to revolutionize data collection, enhancement and maintenance is huge. Best of all, none of this vision involves making all information free!

Conference attendees also got the inside scoop on Infochimps, a company with the ambitious goal (well underway) of collecting all the data on the web, arguably doing for data what Google has done for text. It’s a breathtaking vision, a vision, I should note, that fully supports the role of paid data products. In fact, Infochimps would like to be the central marketplace for such datasets.

The 2011 Model of Excellence award winners? This year, the awards went to ArtlogDepository Trust and FeeFighters.

If you now really wish you were there, you can view some of the speakers and presentations here, in Pancasts provided byPanopto. Our compelling programs are our best advertisement, so consider this our first promotion for DataContent 2012. You have to be there!


PS — We are always ready, willing and eager to receive your comments!

About InfoCommerce Group

InfoCommerce Group is a boutique consultancy, conference producer and research firm, serving producers of business information content.

InfoCommerce is particularly respected for its deep expertise in all facets of commercial database publishing.

InfoCommerce has also built an unparalleled base of knowledge in all aspects of the health content business through its Health Content Advisorsbusiness.

Workshop on Human-Computer Interaction and Information Retrieval: HCIR 2011

The HCIR workshop began in 2007 as an experiment to see if there was interest among information science researchers to meet and discuss human-computer interaction (HCI) as it applies to information retrieval (IR), and the experiment has been highly successful. From its beginnings, it has grown until this year, about 90 information science researchers assembled at Google’s headquarters in Mountain View, CA on October 20, 2011 for the 5th HCIR workshop. (As the most heavily used search engine on the Web, there is no more appropriate organization than Google to sponsor a workshop on human involvement in searching and information retrieval.)

According to the workshop website, The workshop unites academic researchers and industrial practitioners working at the intersection of HCI and IR to develop more sophisticated models, tools, and evaluation metrics to support activities such as interactive information retrieval and exploratory search.”

The workshop featured a keynote address, poster sessions, presentations, and a challenge competition.

Keynote Address

Gary Marchionini

One of the highlights of the workshop was the keynote address by Gary Marchionini, Dean and Professor in the School of Information and Library Science at the University of North Carolina. He has had a long and distinguished career in the field, serving as a member of the Editorial Boards of several prominent journals, president of the American Society for Information Science & Technology (ASIST), and chair of several conferences. He is the author of Information Seeking in Electronic Environments and Information Concepts: From books to cyberspace identities. His keynote address “HCIR: Now the Tricky Part”, began with a look back into the history of HCIR and noted that the two pioneers of the field (he called them its “father and mother”) are Nick Belkin from Rutgers University and Susan Dumais from Microsoft Research. He showed this diagram breaking down the history into 3 eras and showing some of the pioneering researchers in each (I was honored to be included as a result of some of my work in the early days of online retrieval at Bell Labs).

  • Pre-1980s: Human and machine intermediaries (human search intermediaries are now largely extinct)
  • 1980s-1990s: Networks, search algorithms, words and links, no human intermediary
  • 2000-present: UIs, facets, usage patterns, social interactions (and involvement of many more people in the search process)

He then presented 3 case studies as examples of HCIR platforms: Open Video, the Relation Browser, and Results Space, from these assembling a list of challenges and evaluation worries: mixing various approaches to HCIR, information seeker behavior, retrieval and extraction, and individual and group interactions. Here are some pertinent questions:

  • How do we assess query quality (often the first indication of user behavior in a search)? We might think this is basic, but it is really quite difficult. Have we advanced the science to be able to say we are doing something better now? How much confidence can we put in query profiles?
  • How do we use search behavior as evidence? Can we match the behaviors to queries?
  • How do we create document surrogates and assess their effects? Surrogates tend to be active in the early stages of the search process.
  • How do we account for information seeker loads: cognitive, perceptual, and collaborative? What s the perceptual load in an environment where everything looks the same? When 2 people work together, they are less efficient. We are not paying attention to the costs of collaboration.
  • How do we measure session quality, search quality, and solutions to problems?

Marchionini concluded that substantial progress in HCIR has been made over the last 30 years (compare today’s search experience with that of searches done on a teletype terminal by a librarian while you waited for the results), but there is still much more to learn.

Poster Sessions

Topics of the posters included collective information seeking, a high density image retrieval interface,

High density image

an interactive music information retrieval system, information needs and behavior of mobile users, search quality differences in native and foreign language searching, and search interfaces in consumer health websites. Links to articles describing the research presented in each poster are available on the workshop website.


In the presentation sessions, authors of research articles presented their work, and as with the posters, links to all of these articles are available on the website. Here are the conclusions from a few of the presentations:

  • Chiang Liu

    A study of dwell time (how long a user remains on a website) as a function of task difficulty by Chang Liu and colleagues at Rutgers University found that difficult tasks result in more diverse queries and longer dwell times on search result pages. Users with low knowledge of the search subject tend to be less efficient at selecting query terms; those with high domain knowledge spend much more time on content pages than those with low knowledge.

  • Michael Cole

    Michael Cole, also from Rutgers, presented his group’s research on eye movement patterns during a search. Eye movement analysis is quite powerful; Cole et al. found a strong correlation between the level of a searcher’s domain knowledge, length of time reading words, and reading speed.

  • Luanne Freund

    Luanne Freund from the University of British Columbia analyzed document usefulness by genre. People think about genres in difficult ways; labeling them is difficult for searchers; and they do not always agree about usefulness. Freund identified 5 types of information tasks–fact-finding, deciding, doing, learning, and problem solving–and found that usefulness scores vary considerably by task and genre.

  • Alyona Medelyan

    Alyona Medelyan from Pingar, a New Zealand-based organization, evaluated 5 search interface features in biosciences information retrieval: query autocompletion, search expansions, facetted refinement, related searches, and search results preview. Interface features from several systems were presented to users (without identifying the service); the users were asked to rate the interface. They found facets useful as long as there were not too many of them to choose from, but felt negatively about autocompletion, that it had too much of a “pigeonholing” effect on their searches. The most important thing to them was the content, not the aesthetics of the interface. Facets were useful for searching; the other features were more useful for browsing.

  • Keith Bagley

    Keith Bagley from IBM raised the interesting question whether concepts from the travel industry could be useful in modeling searching. When we travel, milestones provide reference points along the road. Many searches end prematurely because of user frustration; perhaps searchers could share their “road maps” to success with others.

  • Gene Golovchinsky

    Gene Golovchinsky from FX Palo Alto Laboratory, Inc. studied collaboration in information seeking using the Querium system. He said that just because people talk about a document does not mean that it is useful. Systems have been developed that automatically flag documents that have been used in relevance feedback or queries that returned many useful documents. But are these enhancements useful? Is it appropriate to share results automatically? Does this kind of feedback produce better retrieval results despite users’ initial impressions?

HCIR Challenge

The workshop organizers conducted an “HCIR Challenge”, in which search engine developers were asked to use a set of over 750,000 documents in the CiteSeer digital library of scientific literature to answer several questions focusing on the problem of information availability: when the seeker is uncertain as to whether the information of interest is available at all (for example, in a patent search).  Details of the challenge and the questions are available on the workshop website.

Four teams took up the challenge:

  • The L3S team used its faceted DLBP system to solve questions 1 and 4. Their system shows facets along with the retrieved references. Clustering is based on titles of results.
  • The second team used the Querium system on questions 1 and 2.
  • The VisualPurple team used its GisterPRO system to answer questions 3 and 4. Gister does cloud-powered exploratory searches of unstructured data. It was developed for analysts who need to do hard searches in short times and visually searches a databases. The only operation available to the user is quoting to construct phrases; there are no Boolean operators.
  • A team from Elsevier labs demonstrated their query analytics workbench to answer questions 2 and 5.

The challenge winner was chosen by majority vote from members of the audience not involved in the challenge. The vote was very close between the Querium and Elsevier systems. Querium won by a narrow margin.

Future of the HCIR Workshop

The HCIR workshop has clearly been a successful experiment.  It provides a unique venue for researchers in the field to discuss their results in an informal setting.  As it grows, decisions will need to be made as to how to guide it in the future, and a upcoming attendee survey will provide some useful input.  Personally, having been one of those researchers in the past, I hope it will continue.  It was a highly useful and stimulating experience; research in HCIR is making great strides.  We can expect significant improvements in search engines in the future as a result.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


Frankfurt Book Fair App

The Frankfurt Book Fair starts tomorrow, Wednesday, October 12. It’s huge–and crowded! Finding your way around can be difficult. If you haven’t already planned out your schedule, you might be interested in an app for your smartphone developed by Publishing Technology, plc. It contains exhibitor listings, hall plans, and a full schedule of events. There is also a “Tour Planner” feature that gives you an aerial view of the halls.

Enjoy the Fair!

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Charleston, DataContent, and Many Other Fall Conferences

Besides the three major conferences mentioned in the title of this column, there are many others on the schedule as we approach the last month before the holiday season.

The Charleston Conference

The 31st Charleston Conference will be on November 2-5 at its usual venue in Charleston, SC.  This year’s theme is “Something’s Gotta Give!”, and that might be a widely shared sentiment in today’s continuing rapid-paced environment of change, new technologies, and economic difficulties.  On Wednesday, November 2, an all day preconference on “Shared Print Archiving” will take place, along with several half day sessions and the always popular Vendor’s Showcase.  The opening plenary session features Michael Keller, Stanford University Librarian, speaking on “The Semantic Web for Publishers and Libraries”; MacKenzie Smith, Research Director, MIT Libraries, speaking on “Data Papers in the Network Era” (“Data Papers” means research datasets); and speakers on Hidden Collections and the Digital Public Library of America.  This is just the opening session speaker lineup; the entire conference features similar interesting topics and high quality presentations. Charleston is an excellent conference and regularly draws over 1,000 attendees.

DataContent 2011

Organized by the InfoCommerce Group, the DataContent 2011 conference will be held November 3-4 in Philadelphia, PA.  The keynote speaker is Clare Hart, President and CEO, Infogroup (she formerly held several executive-level positions at Dow Jones and was CEO of Factiva from 2000 to 2006). Presentations on “The 3 Cs: Cloud, Crowd, and Curation”, “Mobile’s Second Coming”, “Strategic Makeovers”, and other topics of current relevance will follow.

Information Science: Maps, Life and Literature, Sentiment Analysis, Digital Humanities

The third in a series of conferences on the future of information sciences (INFuture) will take place November 9-11 in Zagreb, Croatia.  According to the conference website, the objective of these conferences is “provide a platform for discussing both theoretical and practical issues in information organization and information integration.”  The program for this year’s conference was not yet available when this column was written.

Many maps are produced for general use and are not designed to be preserved.  But the “Exploring Maps: History, Fabrication, and Preservation” conference (November 2-3, Philadelphia, PA) will explore maps that have been preserved for their beauty and link to the past.  Many of the speakers are librarians and curators in map libraries at universities and archives.

The Life and Literature Conference (November 14-15, Chicago, IL) was organized by the Biodiversity Heritage Library (BHL) consortium to discuss digitizing and networking of biodiversity literature.  Topics to be covered include biodiversity informatics, publishing models, digital libraries, and humanistic and artistic intersections with biodiversity literature.  The plenary speakers are George Dyson, a technology historian, and Richard Pyle, who has developed database systems for managing biodiversity information.  Four panel discussions on information-related topics and a “code challenge” to produce new and innovative ways to disseminate and use BHL’s data are on the rest of the conference program.

Sentiment analysis is deals with the expression of attitudes, emotions, and perspectives, and how these are expressed in language.  With the growth of online shopping and product reviews and the use of social media by consumers to voice their opinions, sentiment analysis has become especially important to product sellers and developers.  It is becoming an information research area in its own right.  The Sentiment Analysis Symposium (November 9, San Francisco, CA) will explore various approaches to sentiment analysis and practical uses of it in several industries.  Pre-symposium tutorial and research sessions will be on November 8.

The Supporting Digital Humanities 2011 conference (November 17-18, Copenhagen, Denmark) has not yet organized its program, but a long list of accepted papers appears on the conference website.  The two major themes of the conference are “Sound and movement – music, spoken word, dance and theatre” and “Texts and things – texts, and the relationship between texts and material artifacts, such as manuscripts, stone or other carriers of texts”.

Libraries:  Brick and Click, Library 2.011, RFID

The 11th annual Brick and Click Libraries Symposium will be on November 4 in Maryville, MO.  As in past years, it will be a series of 6 tracks with 5 concurrent presentations in each. The topics cover a wealth of subjects of interest to librarians in physical (brick) libraries as well as those who provide information services to remote users (click).  With 30 presentations to choose from, there is sure to be something of interest to each attendee; indeed, choosing which session to attend may be a challenge!


Library 2.011 is a global online conference organized by the School of Library and Information Science (SLIS) at San José State University to be held November 2-3.  The website says that it will be “a global conversation on the current and future state of libraries.” The conference will be arranged in 6 “strands”:


  • Libraries – The Roles of Libraries in Today’s World
  • Librarians and Information Professionals – Evolving Professional Roles in Today’s World
  • Information Organization
  • Access and Delivery
  • Learning – Digital Age Learning Cultures
  • Content and Creation – Changes in Accessing and Organizing Information

The conference website has lots of information on the technical requirements for connecting to the conference, speakers, and 36 pages (so far) of registered attendees.  All of the topics look highly interesting and relevant in today’s information environment.  If you’re not going to the Charleston Conference (see above), Library 2.011 might be a good alternative.

RFID has become popular in libraries; among other things, it allows users to check out their own books.  A new RFID standard has been issued, and it will give libraries wider freedom to choose among the various vendors of this technology.  CILIP, the Chartered Institute of Library and Information Professionals, has organized a one-day conference on RFID in Libraries to occur on November 8 in London.  Speakers will describe the current status of RFID technology, the new standard, and some practical case studies of how they have used RFID in their libraries.


Open Access (OA)

The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities was issued in 2003 and has been signed by the leaders of over 300 institutions around the world.  The 9th Berlin Open Access Conference will be held November 9-10 (pre-conference sessions are on November 8) in Washington, DC (this is the first time it has been held in North America).  A coalition of five organizations (the Max Planck Institute, Marine Biological Laboratory, Howard Hughes Medical Institute, Association of Research Libraries, and SPARC) has organized the conference, and the Program Committee has identified the following subjects for discussion:

  • Transforming Research through Open Online Access to Discovery Inputs and Outputs
  • Creation of Innovative New Opportunities for Scholarship and Business
  • The Impact of Open Access and Open Repositories on Research in the Humanities
  • Open Education: Linking Learning and Research through Open Access
  • Public Interaction: the Range and Power of Open Access for Citizen Science, Patients, and Large-scale Collaboration

The Repositories Support Project (RSP), an initiative funded by the UK organization JISC (formerly known as the Joint Information Systems Committee), will hold its “Autumn School” conference, “Bringing the Emphasis back to Open Access, and Demonstrating Value to Your Institution” on November 7-9, near Cardiff, Wales.  A keynote address by David Prosser, Executive Director of Research Libraries UK (RLUK); technical development talks on DSpace, Eprints (OA platforms), Google Analytics, and other topics related to OA are on the program.

Digital Libraries and Preservation

The 2nd International Conference on African Digital Libraries and Archives (ICADLA-2, November 14-18, Johannesburg, South Africa) takes a broad view of the digital library world with its theme “Developing Knowledge for Economic Advancement in Africa”.  The first three days of the conference will be a training workshop for managers and library staff entitled “Digital Futures: from Digitization to Delivery”. The workshop will be conducted as a combination of presentations, discussions, and exercises. The last two days are a strategic planning conference entitled “Developing National and Institutional Digitization Strategies” for directors of libraries and museums.


The 8th International Conference on Preservation of Digital Objects (iPRES 2011, November 1-4, Singapore) will be keynoted by Professor Seamus Ross, iSchool, University of Toronto, speaking on “Digital Preservation: Why should today’s society pay for the benefit of society in future?”  The first day of the conference will consist of two tutorials, “Preservation Metadata in PREMIS” and “Archiving Websites”, and the last day will also offer tutorials: “Steps toward International Alignment in Digital Preservation” and “Web Analytics”.  As of this writing, two other keynote speakers and the session topics remain to be confirmed.

The University of London is offering a Digital Preservation Training Program on November 14-16.  The course will cover policies, planning, strategies, standards and procedures in digital preservation, and a class project will be part of it as well.

Society Meetings

Finally, here are two society meetings scheduled for November:

The Society for Scholarly Publication will hold its Fall Seminar Series on November 8-10 in Washington, DC.  Topics are “Content and Apps for Mobile Devices: Engaging Users in the Mobile Experience” and “Moving to the Online-Only Journal: Breaking Free of Print Constraints”.



The 2011 European Summit of Strategic and Competitive Information Professionals (SCIP) will be in Vienna, Austria on November 8-10.  The keynote address will be by David Frigstad, Chairman of the Board, Frost & Sullivan.


As always, many other conferences, including symposia and book fairs, are listed on the Information Today Conference Calendar.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Frankfurt Book Fair to Address Issues on Multimedia Content

Below is part of an article that appeared in Knowledgespeak, a free daily news reporting service published by Scope eKnowledge Center for the STM publishing industry.  I thank them for their kind permission to reproduce this important news from the Frankfurt Book Fair.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor


While last year saw the entry of enhanced e-books, what’s clear this year is that multimedia content is the future, according to Frankfurt Book Fair Director Juergen Boos. He further says the phase of exploration experienced by the global industry is now over, and that the focus today is on actual business strategies as well as the development of completely new forms of cooperation involving the creative industries of film, games and books. The Book Fair, projected as the world’s largest meeting of the publishing industry, expects to welcome 7,500 exhibitors from 110 countries, as well as 280,000 visitors.

Boos believes that the swift changes taking place in the content business are creating a huge demand for knowledge and, above all, for exchanges of ideas on equal terms. The Frankfurt Book Fair has responded to this by working with the German Publishers and Booksellers Association to establish the new Frankfurt Academy conference brand, and by maintaining the digital initiative, Frankfurt SPARKS, which it launched in 2010. The topics covered by the nine conferences will range from new forms of storytelling and story-selling (Frankfurt StoryDrive Conference/ SPARKS, Open Space, Agora, October 12-13) to “Metadata and Rights Management” (TOC, October 11, Marriott Hotel). All in all, more than half of the roughly 1,100 industry events at the Book Fair will address aspects of digitisation.

To do justice to the new standards of the content business, this year, for the first time ever, the Frankfurt Book Fair will dedicate an entire exhibition hall to buying and selling intellectual capital. In addition to the Literary Agents & Scouts Centre (LitAg), Hall 6.0 will now feature a new trading floor called the StoryDrive Business Centre. This is where content dealers and creative work experts from the film, games and publishing industries will come together to generate business.

The importance of ‘knowledge’ will also be addressed in the new-look Hall 4.2. This has traditionally been home to scientific, specialist information and educational publishers, who first began using digital production processes in the 1990s, and whose products are already nearly 100 percent digital. Various events will illuminate the role and working processes of these publishers and their customers, among whom are information professionals from private companies. The topics covered by this varied programme will range from the “open access” debate and metadata, to a presentation of the control room of the Large Hadron Collider by the European nuclear research centre CERN.

The Semantic Web Media Summit

I attended the Semantic Web Media Summit in New York on September 14.  Here is a report on the conference.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Nearly 100 publishers and other information professionals gathered in New York on September 14 for the Semantic Web Media Summit, organized by, a media content organization.  In contrast to many events on semantics that are heavily oriented towards technology, this Summit focused on the business aspects of integrating the semantic web into content.

Semantic Web Overview

Michael Dunn

Michael Dunn, Vice President and CTO, Hearst Interactive Media, gave the opening keynote address and provided a good overview of the semantic web and its relationships with content.  He said that the media industry has been in catch up mode since the Web started, and the best way to get value out of content is to structure it to improve production (digitization) and enhance consumption (monetization).  The traditional state of media is shifting to the web, and content requirements will keep expanding.  We are experiencing an increasing sense of urgency as a result of a proliferation of devices, continuously changing markets, and shifting audience requirements for real-time, niche, thematic, and contextual content.

In existing content management systems, a major problem is that content is still in silos, which leads to many missed opportunities to increase consumption.  Today’s content is mainly single use, but producers should have their content ready for whatever is coming in the future.  Creators must own every part of what they create, including the text, metadata, multimedia, etc.  Processes must be kept simple so that people creating the content can concentrate on the creative process, with technical steps being done in the background.  This will promote a shift in focus from content being simply a commodity to an innovation.

Content has an ROI, but if it is in silos, it may have been paid for several times (for example, by creating it ourselves, licensing it from others, or via a related or partner entity), which is very wasteful.  In today’s environment, to properly measure the ROI of content, metrics must exist for elements within the content, not just for the number of pages published.  We must also recognize that the context of the intended audience takes priority; the right content must get to the right user at the right time via the right mechanism (increasingly, it will be accessed using smart phones), and all correctly personalized for the user.  Content must be treated as data so that it can be optimized and made harvestable.  Dunn also noted that there is a trust issue with content on the Web: Google does not trust user-supplied metadata and ignores it.

Turning to a discussion of the Semantic Web, Dunn defined it as descriptive markup techniques for content, including links and rich metadata, all of which will foster machine readability.  The media industry should be interested in the Semantic Web because it can create efficiencies during content creation, help to understand content already available, and insure discoverability.  Structuring content will result in generation of richer metadata, better tags and links, reusable content, and improved workflows.  He referred to the Linked Open Data project as an example

Linked Open Data Cloud Diagram. Licensed under Creative Commons

and suggested considering Drupal as a suitable semantic content management system and concluded his address by listing the following benefits of structured semantic content:

  • An increase in productivity, reducing time to market and improving consistency,
  • Increased usage of content, lower production costs, and improved discoverability,
  • An improved user experience for the audience, with increased levels of engagement and better personalization and tagging, and
  • Enhanced revenue streams and new Web opportunities for content.

Publishers should focus on the semantic web, beginning with revenue enhancement opportunities, while showing how to solve business problems and how to measure results.

A Call to Action

Mike Petit, Co-founder and CIO, OpenAmplify, followed Dunn with a call to action, noting that:

  1. The Semantic Web and its associated technology have become tangible and effective tools for publishers, and
  2. Social media have complicated the publishing model and have become indispensible.

For a maximum revenue opportunity, the time to act on these developments is now, but  there are challenges:

  • Control is challenged by social media.
  • Mobile platforms increase demand, but attention spans are shorter.
  • The Web is no longer about finding information; it is now about your content and you being found.
  • Social media can generate premium content, but some brands may not see it that way.

Petit suggested that semantic technology can meet these challenges (he called it “The Semantic 1-2 Punch”).  It helps in gaining an understanding of content, thus increasing use, and the understanding can be used to drive classification, which enhances the sales model, identify audiences, and connect to readers.  He noted that publishers used to create their own content; now our audience does.  We also used to understand what we publish; now we cannot even read it all.  Without understanding, we cannot monetize the content, follow it, or determine where it is relevant.

Once we understand the content, we can classify it, and then it can be optimized, with tangible benefits.  The technology to do this is available; the ROI is measurable; and the necessary costs and effort are reasonable.   Management is beginning to understand the technology, so we will get a better hearing when projects are proposed.  The means are there, so let’s act!

rNews: A New Standard

Three rNews Users (L-R): Andreas Gebhard (Getty Images), Stuart Myles (Associated Press), Evan Sandhaus (The New York Times)

The International Press Telecommunications Council (IPTC) defines digital standards for the media.  Its latest standard, rNews, is a model for embedding machine-readable metadata in Web documents.  Three rNews users described the standard.   Modern websites are built with a 3 tier architecture: the Data Tier where the content resides, the Logic Tier which is the software that reads and processes the data and sends it to the Display (or Presentation) Tier where it is formatted into the HTML document that the user sees.  Parts of a page are not obvious to a computer because the underlying structure gets lost in presentation to user, so the quality of user experience goes down.  Search engines, social networks, and aggregators only see the Display Tier and cannot leverage the underlying structure of the data.  Currently, there are 4 standard formats for marking up and embedding semantic metadata into documents; rNews is a set of suggested implementations.  The complete first version of it will be released at the next IPTC meeting next month.

News organizations should care about rNews because they will realize these benefits from it:

  • They can provide better links and presentation.
  • Better analytics are available.  Javascript can extract richer metadata analytics per item, not just per page.
  • Better ad placement will result.  Unfortunate juxtapositions (such as a cruise ad on page with an article about the Titanic sinking) can be avoided.

rNews is a way to build a news API, level the playing field, and encourage open innovation and lower barriers to cooperation, thus making more stimulating and more interesting news pages.  It also has the advantage that it is based on, a documentation of structured markup tags that Google, Bing, and Yahoo will recognize.

Merging Structure and Meaning

Structure and Meaning Panel (L-R): Mike Petit (OpenAmplify), Rachel Lovinger (Razorfish), Eric Freese (Aptara)

A panel consisting of Rachel Lovinger (Razorfish); Mike Petit (OpenAmplify); Eric Freese (Aptara) and moderated by Christine Connors (TriviumRLG) considered how, in a semantic technology world, structure and meaning can be put together so that content is useful for users.  Here is an edited transcript of the conversation:

RL:  Content should be more reusable, modular, and its designs should be more dynamic.  We must do the design first, and then have new tools that allow the content to publish as it was designed.  Different types of content have varying shelf lives, and much of it has a longer shelf life than most media companies are used to.

MP:  The value and shelf life of content are limited only by the creativity of the users.  To enable creative use of content, it must have a reliable structure.  The meaning must be actionable.

RL:  People are nervous about the information being collected about them.  They are tolerant of ads relevant to them, but if it comes from obviously collected information about them, then they get turned off.

MP:  That is the “spooky factor”, but times are changing.  Cookies used to be feared, but now if you turn them off, you have a bad Web experience.  When you are using social media, you are publicly adding your voice.  It you have the expectation that people should not be able to leverage that, you are being unrealistic.

CC:  People like walled gardens.

EF:  In the book industry, the prime example is Amazon and their ads for related materials.  If you comment, you can get rewards, like $25 off the cheapest Kindle.

MP:  There is value in ads because you might not know the book is out there.

EF: O’Reilly does the same thing if you buy one of their e-books.  When a new edition comes out, they will send you an e-mail.

RL:  Transparency has become extremely important.

MP:  People want the content, and they have insatiable appetites for it.  To the degree that we can deliver that content, they will embrace it.  We need to get the right content to as many people as possible.

CC:  How do we measure how content is being used?

EF:  Book publishers are still trying to figure out how to do it, especially for e-books.  The device makers are not ready to put measurement capabilities in their devices yet.

MP:  All the standards in the world won’t help us if we have data processing capability from the 1950s!  We must choose what to organize.

CC:  We must make sure we’re measuring the right things.

Kasabi: A New Data Platform for the Future

Leigh Dodds

Leigh Dodds, Platform Program Manager at Talis Systems Ltd., described Kasabi, their new data platform which is now in beta test.  Kasabi is built on the premises that context creates value, and its nature is changing as more and more devices become constantly connected.  We point people at related content, and linking creates context.  The Semantic Web is a natural step in that process.

If you do not have to spend time curating and managing a database, you can save costs and get your product to market much quicker.  You can use the content in the database and put your content on top of it, and will not need to figure out the structure of the database, etc.   There is a rapid growth of linked data in several sectors, but that growth presents new problems, such as finding good quality data sources, reliance on the infrastructure, integration into existing systems, and creating revenue from shared data.

Kasabi is making it easy to publish data to extend its reach, while building revenue streams around the data being shared.  It is a data marketplace that is trying to help solve the discovery problem by finding and discovering datasets.  Kasabi offers a standard API for consistent access to all datasets.  Every dataset in Kasabi has 5 APIs associated with it, so there is no need to create one for a new dataset.

Kasabi provides instant access to datasets by allowing click-through licensing.  It is a complete data publishing solution that provides an immediate storefront and platform to host the data.  You can very quickly build a dataset about whatever you are interested in.

Today, Kasabi is in beta, everything is free, and content producers and developers are encouraged to conduct trials of the system.  At present, there are no plans to charge for hosting.  Public domain data can be added to the system and hosted at no charge, and people can use it free.  Producers of commercial data will be charged for high volume usage of the APIs, and they will share revenues.  Developers will pay for the services they are getting for use of your data, so Kasabi is a low cost model for publishers.  Click below to view a short video demonstration.

“Fireside Chat”

Alan Meckler

The day concluded with a brief “fireside chat” by Alan Meckler, long time conference organizer in the information industry and founder of WebMediaBrands, owner of the Semantic Web Media Summit, and associated brands, including  [Side note:  Meckler formerly owned the Computers in Libraries conference and sold it to Information Today in 1995.]  He established the first commercial venture on the Web in 1991 and hopes to be in the forefront when the next large explosion in semantic web technologies, which he predicts will occur in about 18 months.  Meckler bases that forecast on the observation that last year had only about 100 readers; now it has 3,800.  And almost every day, another semantic web effort is announced, so in 18 months, some major commercial development will be newsworthy.


Subscription Site Insider Summit–Not Many Tickets Left

I received the following e-mail from Russell Perkins, Founder and Managing Director of the InfoCommerce Group:

If you are selling subscriptions online – or using recurring (auto-renew) billing – I strongly recommend that you consider grabbing one of the last remaining tickets for Anne Holland’s Subscription Site Summit being held in NYC this Oct 24-25th.

Case Studies and Instructional workshops will include:

  • Media Bistro on launching multiple B2B paid subscription offerings from one main brand.
  • Job Search Digest on selling online training courses to high-level business execs
  • Recurring billing guru Paul Larsen on avoiding credit card declines
  • Auto-renew legal expert Lisa Dubrow on laws that affect publishers
  • How to develop mobile editions of your paid content
  • Online and social marketing tactics specifically for paid content
The presentations look interesting and attractive, and several of them are case studies with “How To …” in their titles, so  they will provide useful practical advice for those in the subscription selling market.  According to the website, there are only 28 seats left at the Summit, so if you would like to attend, make your reservation now.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Frankfurt Book Fair Academy: A New Addition to the Fair Program

The Frankfurt Book Fair (FBF), occurring this year on October 12-16, is a major event on the publishing industry calendar.  It is enormous, occupying seven large halls, at which publishers all over the world exhibit their wares—virtually anything having to do with book publishing is represented.  And the crowds are equally large, especially on the last day when the public is admitted.  Each year, a different country is designated as the “guest of honor”; this year Iceland is the designated guest.  After my first (and only) visit to the FBF, I said that attending the FBF is an experience everyone in the information industry should have at least once—provided they have a pair of very comfortable shoes!

This year, a new “Academy” has been added to the Fair’s satellite events.  The Academy features “the best international conferences, seminars, and publishers’ trips that the Book Fair has to offer all year round” in four broad subject areas:  strategy, marketing, digital, and rights and licenses.  Here is a sampling of the offerings this year:

In addition, a new series of posts on the Fair’s blog, “Every Think“, will feature 20 essays by the speakers appearing at one of the Academy conferences.

There is truly “something for everyone” at the FBF, and if you have the chance to attend it, don’t miss it!

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

ALCOP: A New Conference

A new conference has appeared on the information scene.  ALCOP, the Association of Library Communications and Outreach Professionals, is holding its first conference at Arcadia University in Glenside, PA (suburban Philadelphia) on October 9 and 10.  ALCOP is a new organization composed of marketing and communications executives at academic and public libraries.  The scope and procedures of the ALCOP organization will be determined at the inaugural conference in October.

The conference is sponsored by Kieserman Media, “a full service media firm focusing on strategic public relations and special event planning” and will feature talks by several noted information professionals.  The two days feature a number of concurrent workshops on topics of current interest, with two keynote sessions.

The dinner keynote on October 9 will be by Kathy Dempsey, a consultant and trainer and author of The Accidental Library Marketer (Information Today, 2009).  Prior to establishing her consulting practice, Kathy was Editor of Information Today’s Computers in Libraries magazine and Marketing Library Services newsletter.

The Monday keynote address will be by Chris Olsen, founder of Chris Olsen & Associates.  She is the Editor and Publisher of the newsletter Marketing Treasures and Past President of SLA’s Maryland Chapter.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Earthquakes, Hurricanes, and the Library 2.011 Conference

By now, you have almost certainly heard about the earthquake in Virginia and the northeast earlier this week.  And now we have the approaching Hurricane Irene.  Right now we’re in the proverbial “calm before the storm”, as these pictures that I took of the ocean at Long Beach Island, NJ, show.

Calm Before the Storm on Long Beach Island, NJ, August 26, 2011. Photo by Don Hawkins

Long Beach Island, NJ, August 26, 2011. Photo by Don Hawkins

Of course, preparations are always important:

Preparations for Irene, August 26, 2011. Photo by Don Hawkins

The information industry is no stranger to tumultuous events and far-reaching changes.  Here’s an online conference that you might be interested in–and maybe even make a presentation.  And it’s an online conference, so you can do it from the comfort of your own home or office!

Submit Your Presentation for the Worldwide Library 2.011 Conference

Have you been exploring how the digital age is impacting the roles libraries and librarians play in how we learn and consume information? Do you have some thoughts to share with the world about the future of our profession? This is your chance to lead the discussion with more than 2,000 colleagues from 133 countries!

The fully online Library 2.011 Worldwide Conference will take place November 2-3, 2011, in multiple time zones and languages. Share your expertise with a global audience. Sign up today to present at the Library 2.011 conference – a free forum for information professionals. You can choose to submit a presentation in any of the conference’s six thought-provoking subject strands:

  • STRAND 1: Libraries – The Roles of Libraries in Today’s World
  • STRAND 2: Librarians & Information Professionals – Evolving Professional Roles in Today’s World
  • STRAND 3: Information Organization
  • STRAND 4: Access & Delivery
  • STRAND 5: Learning – Digital Age Learning Cultures
  • STRAND 6: Content & Creation – Changes in Accessing and Organizing Information

It’s easy to present at the Library 2.011 conference. Since it is online and worldwide, you can present at a time that is most convenient for your time zone, and you can present in your native language. Presentations can also vary in length, between 20 and 60 minutes including Q&A. All sessions will be held in the web conferencing platform Blackboard Collaborate, and volunteers will be available to moderate and provide session support. Live and recorded training will also be provided prior to the conference to get you comfortable with presenting online.

To submit your presentation, go to the Call for Proposals and follow the instructions. The Library 2.011 conference is open to all, so please encourage your friends and colleagues to submit their presentation proposals. It is our intention that all serious proposals will be given the opportunity to be presented. The deadline to submit presentation proposals is September 15th.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor