Startups–At the Cutting Edge

One of the best ways to learn what’s coming down the proverbial pike is to see what startup companies are doing.  So I was intrigued by a session (which the moderator called “The Startup Beauty Parade”) featuring a series of short presentations by 6 startup companies in publishing. And I was not disappointed; these were not only interesting and forward-looking products, but they are addressing real problems of today’s users.

Startup Panel (L-R): Nathan Watson, Bill Ladd, Bill Park, Niko Gonchanoff, Kathleen Fitzgerald, Jan Reichelt, Dan Pollock (Moderator)

Here are brief summaries of each of the presentations:

Data from the Collective Desktop
Jan Reichelt (Mendeley co

Too Many Documents!

We are drowning in PDF documents that we have collected and when we try to remember what was in the ones already read, we can’t.  Mendeley helps researchers work smarter.  It works on any platform, and it’s free. It organizes PDF documents, extracts research data, allows highlighting sections of a PDF, adding “sticky notes”, and aggregation of research data into the cloud.  This makes science more collaborative and sharable.

Using Mendeley, one can set up a social research network like Facebook, and thus enhance collaboration.  Because all the uploaded articles are collected into a single server, Mendeley can identify what are the most read ones, and do other statistical analysis, thereby identifying trends.  Tags show popular groups, popular papers, etc.  Clicking on an article results in a page, metadata, sources, lists of related articles, and readership statistics.

Opportunities for publishers include deriving insights and analytics, getting traffic and usage statistics on their data, and connecting academia to the “general public”.   People not in academia are using the system to pursue their interests.  Mendeley currently has 950,000 users.

The Reading Revolution–Scribd
Kathleen Fitzgerald

Currently, there is no easy way to share documents on the web, or to search for or share research, which led to the development of Scribd.  Scribd can turn any file type into an HTML page.  It has become the world’s largest reading and sharing website and is integrated with Twitter, Facebook, and other platforms to make it easy to share.  Text of any document can be made searchable by OCR. Connections are made through content interests, not just friends, as in normal social networking platforms.


Scribd Facts

Scribd readers tend to be highly educated: 60% are ages 18-49, and 60% have college or postgrad degrees.  Many publishers are partnering with Scribd.

Scribd provides reading statistics  for any document, which allows companies to develop marketing campaigns.  This allows questions like “Where is your content being shared?” or “Where are documents being embedded?” to be answered.  Authors can revise their documents without losing existing read counts, comments, etc.

Readers can create collections of documents, such as things to read later, etc.  The National Archives created a small collection of documents on Presidents’ mothers that drew over 25,000 readers in a very short time.

Reading is ready for a revolution.  Scribd is re-imagining a reading experience for mobile platforms, ignoring the barriers between different types of content.

Take my content — please!  SureChem: the service-based business model.
Niko Goncharoff

SureChem is the first acquisition by Digital Science, a unit of Macmillan.  It is a chemical patent search designed for scientists and allows searching structures embedded in the text of patents.  SureChem has compiled a database of about 12 million structures from 20 million patents and 12 million Medline records.  

From this data, 3 products have been developed.

SureChem--3 products from the data

SureChem can be used for intellectual property “landscape analysis”, large scale chemical analysis. comparison of internal and external data, or competitive analysis. Today’s customers need to store external data beside their internal datal, manipulate external data behind the firewall, and freely share results throughout the organization.  All this is much easier if the data are owned by the user.

SureChem customers subscribe to the service, not the content.  They can generating their own content from public sources, mine the data and add value to it by showing the text where the structure was found. Advantages of this approach include:

  • Owning the data is a better investment for an organization than renting it.
  • Price increases are tied solely to improvements in functionality.
  • It is easy to add other content sources.

Bill Park, CEO, DeepDyve

Today’s market landscape is made up of 250 million knowledge workers who are generating 4 billion visits/year to publisher sites.  About half of them do not find what they want and go away, which represents a large missed opportunity for the publishers.  Many users are discovering content they cannot access because the publishers have high prices for single copies of articles.  DeepDyve has tried to solve this problem by creating a rent-an-article model, in which an article costs $.99 to $4.99, expires after 24 hours, and there is no printing or downloading.

DeepDyve partners with many publishers in both scientific and humanities areas.  For last year, most of their effort was in getting relationships and content.  Now they are concentrating on making subscriptions worthwhile.  Customers come from Google or publisher rental links to DeepDyve placed on their site.

DeepDyve thinks of themselves as a data and technology company, not a content company.  They operate as a Software as a Service (SaaS) company, where the sale is to the user, not the enterprise.

Bill Ladd:  Chief Analytic Officer, Recorded Future

Recorded Future (RF) has built largest temporal index in the world.  They discovered analytic demands for external data, such as what technology areas are changing, who is talking about what,  what’s coming next, etc.  Many of the answers are in newspapers or public websites?  Search engines make it easy to search for specific things, but they operate in a browse paradigm.  RF is organizing the internet for analysis.

The web is loaded with temporal signals, but it is impossible to search on “next month”, “last month”, etc.  Similarly, events provide additional structure.

RF’s engine takes unstructured data from the Web and applies natural language processing to the data and organizes impacts of future events that have been discussed online to determine what impact is expected.   They have an analytic engine to process this content and also use historical models to find relationships and test predictive models.


Recorded Future's Accomplishments

Nathan Watson, founder and CEO of

Information outside of journals was formerly in the heads of researchers and  in their personal knowledge.  If you worked for one of those researchers, you could get that knowledge, but otherwise you couldn’t.  Information technology allows that information to leave the heads of those people and go into separate applications and systems, so that everything will be clickable and findable.

BioRaft was developed using publicly available regulatory compliance data on hazards to scientists, track what they use, and organize it for safety.  Regulatory compliance requires companies to provide information on who is in their laboratories, what they work with, their projects,and their equipment.

BioRaft capabilities

These data are then linked to journal articles and other references to research projects.  Users want direct links to journals as well as updates, etc.  Sometimes researchers buy access to journals but nobody else in the company knows about it or is able to get access.  BioRaft is are building an enterprise management site to solve these access hurdles.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

