It’s a question many of us are asking with increasing frequency. In these days of simply putting a few words into a search box, is it really worth all the time, effort, and resources that have been put into constructing (sometimes elaborate) controlled vocabularies, taxonomies, and metadata? Are we really providing added value to our users? It’s true that information professionals, especially in some disciplines (like medicine), rely on controlled vocabularies to aid search, but most end users don’t know how to use them. A panel led by Susanne BeDell, Vice President of ProQuest and General Manager of Dialog, looked into this issue.
In her introduction to the session, Suzanne BeDell noted that entity extraction is used by publishers to add functionality to articles. For example, Nature Publishing Group uses TEMIS’s software to identify chemicals. Analytics and data mining add another layer of capability to the traditional industry structure of primary journals, abstracting & indexing services, and search and aggregation. Analytics are used to identify knowledge buried in unstructured content, and they are usually based on statistical analysis of content or natural language processing.
Jabe Wilson, Sr. Solutions Manager at Elsevier agreed, suggesting that taxonomies are more important today than they have ever been and, because they are based on words, they underlie developments of new technologies. He defined the difference between a dictionary, taxonomy, and an ontology The relations between each are shown in the following map.
Tim Mohler, Vice President, Operations, Lexalytics Inc. reviewed human indexing, noting that although it makes navigation easier for users, the drawback is that it is expensive because indexers are scarce, and indexing entails considerable effort. Because users tend not to use complex taxonomies, many information producers simply index their content by machine. However, machine indexing depends on developing rules based on the content, and Mohler wondered if a model could be built to guide the machine, based on a taxonomy. This is still an unanswered question.
Tyron Stadig, CTO and Founder of Innography, echoed a similar theme, saying that analytics as applied to business intelligence can be used to predict future trends, uncover behavioral patterns, link seemingly unrelated behavior, and identify outliers. Structured data helps people make decisions; taxonomies can provide additional attributes of the text to enhance decision making. Multiple taxonomies can be used in a process called “fingerprinting”, and they can also create additional links between data sources, so that you can find information that would otherwise not be evident. Structured information is necessary for analytics; simple keywords aren’t enough. Taxonomies provide additional features to unstructured text and identify its useful attributes.
So are taxonomies worth the effort necessary to construct them? Based on the examples given by the speakers, the answer is, “Indeed, they are!”
Columnist, Information Today and Conference Circuit Blogger