The Semantic Web Media Summit

I attended the Semantic Web Media Summit in New York on September 14.  Here is a report on the conference.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Nearly 100 publishers and other information professionals gathered in New York on September 14 for the Semantic Web Media Summit, organized by, a media content organization.  In contrast to many events on semantics that are heavily oriented towards technology, this Summit focused on the business aspects of integrating the semantic web into content.

Semantic Web Overview

Michael Dunn

Michael Dunn, Vice President and CTO, Hearst Interactive Media, gave the opening keynote address and provided a good overview of the semantic web and its relationships with content.  He said that the media industry has been in catch up mode since the Web started, and the best way to get value out of content is to structure it to improve production (digitization) and enhance consumption (monetization).  The traditional state of media is shifting to the web, and content requirements will keep expanding.  We are experiencing an increasing sense of urgency as a result of a proliferation of devices, continuously changing markets, and shifting audience requirements for real-time, niche, thematic, and contextual content.

In existing content management systems, a major problem is that content is still in silos, which leads to many missed opportunities to increase consumption.  Today’s content is mainly single use, but producers should have their content ready for whatever is coming in the future.  Creators must own every part of what they create, including the text, metadata, multimedia, etc.  Processes must be kept simple so that people creating the content can concentrate on the creative process, with technical steps being done in the background.  This will promote a shift in focus from content being simply a commodity to an innovation.

Content has an ROI, but if it is in silos, it may have been paid for several times (for example, by creating it ourselves, licensing it from others, or via a related or partner entity), which is very wasteful.  In today’s environment, to properly measure the ROI of content, metrics must exist for elements within the content, not just for the number of pages published.  We must also recognize that the context of the intended audience takes priority; the right content must get to the right user at the right time via the right mechanism (increasingly, it will be accessed using smart phones), and all correctly personalized for the user.  Content must be treated as data so that it can be optimized and made harvestable.  Dunn also noted that there is a trust issue with content on the Web: Google does not trust user-supplied metadata and ignores it.

Turning to a discussion of the Semantic Web, Dunn defined it as descriptive markup techniques for content, including links and rich metadata, all of which will foster machine readability.  The media industry should be interested in the Semantic Web because it can create efficiencies during content creation, help to understand content already available, and insure discoverability.  Structuring content will result in generation of richer metadata, better tags and links, reusable content, and improved workflows.  He referred to the Linked Open Data project as an example

Linked Open Data Cloud Diagram. Licensed under Creative Commons

and suggested considering Drupal as a suitable semantic content management system and concluded his address by listing the following benefits of structured semantic content:

  • An increase in productivity, reducing time to market and improving consistency,
  • Increased usage of content, lower production costs, and improved discoverability,
  • An improved user experience for the audience, with increased levels of engagement and better personalization and tagging, and
  • Enhanced revenue streams and new Web opportunities for content.

Publishers should focus on the semantic web, beginning with revenue enhancement opportunities, while showing how to solve business problems and how to measure results.

A Call to Action

Mike Petit, Co-founder and CIO, OpenAmplify, followed Dunn with a call to action, noting that:

  1. The Semantic Web and its associated technology have become tangible and effective tools for publishers, and
  2. Social media have complicated the publishing model and have become indispensible.

For a maximum revenue opportunity, the time to act on these developments is now, but  there are challenges:

  • Control is challenged by social media.
  • Mobile platforms increase demand, but attention spans are shorter.
  • The Web is no longer about finding information; it is now about your content and you being found.
  • Social media can generate premium content, but some brands may not see it that way.

Petit suggested that semantic technology can meet these challenges (he called it “The Semantic 1-2 Punch”).  It helps in gaining an understanding of content, thus increasing use, and the understanding can be used to drive classification, which enhances the sales model, identify audiences, and connect to readers.  He noted that publishers used to create their own content; now our audience does.  We also used to understand what we publish; now we cannot even read it all.  Without understanding, we cannot monetize the content, follow it, or determine where it is relevant.

Once we understand the content, we can classify it, and then it can be optimized, with tangible benefits.  The technology to do this is available; the ROI is measurable; and the necessary costs and effort are reasonable.   Management is beginning to understand the technology, so we will get a better hearing when projects are proposed.  The means are there, so let’s act!

rNews: A New Standard

Three rNews Users (L-R): Andreas Gebhard (Getty Images), Stuart Myles (Associated Press), Evan Sandhaus (The New York Times)

The International Press Telecommunications Council (IPTC) defines digital standards for the media.  Its latest standard, rNews, is a model for embedding machine-readable metadata in Web documents.  Three rNews users described the standard.   Modern websites are built with a 3 tier architecture: the Data Tier where the content resides, the Logic Tier which is the software that reads and processes the data and sends it to the Display (or Presentation) Tier where it is formatted into the HTML document that the user sees.  Parts of a page are not obvious to a computer because the underlying structure gets lost in presentation to user, so the quality of user experience goes down.  Search engines, social networks, and aggregators only see the Display Tier and cannot leverage the underlying structure of the data.  Currently, there are 4 standard formats for marking up and embedding semantic metadata into documents; rNews is a set of suggested implementations.  The complete first version of it will be released at the next IPTC meeting next month.

News organizations should care about rNews because they will realize these benefits from it:

  • They can provide better links and presentation.
  • Better analytics are available.  Javascript can extract richer metadata analytics per item, not just per page.
  • Better ad placement will result.  Unfortunate juxtapositions (such as a cruise ad on page with an article about the Titanic sinking) can be avoided.

rNews is a way to build a news API, level the playing field, and encourage open innovation and lower barriers to cooperation, thus making more stimulating and more interesting news pages.  It also has the advantage that it is based on, a documentation of structured markup tags that Google, Bing, and Yahoo will recognize.

Merging Structure and Meaning

Structure and Meaning Panel (L-R): Mike Petit (OpenAmplify), Rachel Lovinger (Razorfish), Eric Freese (Aptara)

A panel consisting of Rachel Lovinger (Razorfish); Mike Petit (OpenAmplify); Eric Freese (Aptara) and moderated by Christine Connors (TriviumRLG) considered how, in a semantic technology world, structure and meaning can be put together so that content is useful for users.  Here is an edited transcript of the conversation:

RL:  Content should be more reusable, modular, and its designs should be more dynamic.  We must do the design first, and then have new tools that allow the content to publish as it was designed.  Different types of content have varying shelf lives, and much of it has a longer shelf life than most media companies are used to.

MP:  The value and shelf life of content are limited only by the creativity of the users.  To enable creative use of content, it must have a reliable structure.  The meaning must be actionable.

RL:  People are nervous about the information being collected about them.  They are tolerant of ads relevant to them, but if it comes from obviously collected information about them, then they get turned off.

MP:  That is the “spooky factor”, but times are changing.  Cookies used to be feared, but now if you turn them off, you have a bad Web experience.  When you are using social media, you are publicly adding your voice.  It you have the expectation that people should not be able to leverage that, you are being unrealistic.

CC:  People like walled gardens.

EF:  In the book industry, the prime example is Amazon and their ads for related materials.  If you comment, you can get rewards, like $25 off the cheapest Kindle.

MP:  There is value in ads because you might not know the book is out there.

EF: O’Reilly does the same thing if you buy one of their e-books.  When a new edition comes out, they will send you an e-mail.

RL:  Transparency has become extremely important.

MP:  People want the content, and they have insatiable appetites for it.  To the degree that we can deliver that content, they will embrace it.  We need to get the right content to as many people as possible.

CC:  How do we measure how content is being used?

EF:  Book publishers are still trying to figure out how to do it, especially for e-books.  The device makers are not ready to put measurement capabilities in their devices yet.

MP:  All the standards in the world won’t help us if we have data processing capability from the 1950s!  We must choose what to organize.

CC:  We must make sure we’re measuring the right things.

Kasabi: A New Data Platform for the Future

Leigh Dodds

Leigh Dodds, Platform Program Manager at Talis Systems Ltd., described Kasabi, their new data platform which is now in beta test.  Kasabi is built on the premises that context creates value, and its nature is changing as more and more devices become constantly connected.  We point people at related content, and linking creates context.  The Semantic Web is a natural step in that process.

If you do not have to spend time curating and managing a database, you can save costs and get your product to market much quicker.  You can use the content in the database and put your content on top of it, and will not need to figure out the structure of the database, etc.   There is a rapid growth of linked data in several sectors, but that growth presents new problems, such as finding good quality data sources, reliance on the infrastructure, integration into existing systems, and creating revenue from shared data.

Kasabi is making it easy to publish data to extend its reach, while building revenue streams around the data being shared.  It is a data marketplace that is trying to help solve the discovery problem by finding and discovering datasets.  Kasabi offers a standard API for consistent access to all datasets.  Every dataset in Kasabi has 5 APIs associated with it, so there is no need to create one for a new dataset.

Kasabi provides instant access to datasets by allowing click-through licensing.  It is a complete data publishing solution that provides an immediate storefront and platform to host the data.  You can very quickly build a dataset about whatever you are interested in.

Today, Kasabi is in beta, everything is free, and content producers and developers are encouraged to conduct trials of the system.  At present, there are no plans to charge for hosting.  Public domain data can be added to the system and hosted at no charge, and people can use it free.  Producers of commercial data will be charged for high volume usage of the APIs, and they will share revenues.  Developers will pay for the services they are getting for use of your data, so Kasabi is a low cost model for publishers.  Click below to view a short video demonstration.

“Fireside Chat”

Alan Meckler

The day concluded with a brief “fireside chat” by Alan Meckler, long time conference organizer in the information industry and founder of WebMediaBrands, owner of the Semantic Web Media Summit, and associated brands, including  [Side note:  Meckler formerly owned the Computers in Libraries conference and sold it to Information Today in 1995.]  He established the first commercial venture on the Web in 1991 and hopes to be in the forefront when the next large explosion in semantic web technologies, which he predicts will occur in about 18 months.  Meckler bases that forecast on the observation that last year had only about 100 readers; now it has 3,800.  And almost every day, another semantic web effort is announced, so in 18 months, some major commercial development will be newsworthy.



  1. The Semantic Puzzle - October 4, 2011

    […] The Semantic Web Media Summit ( […]

  2. Highlights from the Semantic Web Media Summit - - September 23, 2011

    […] Semantic Web Media Summit on September 14 in New York was a great success, a recent article reports. This review of the event covers several sessions including Michael Dunn’s keynote and Mike […]