W3C > Semantic Web Use Cases and Case Studies

Case Study: Publishing STW Thesaurus for Economics as Linked Open Data

Timo Borst German National Library of Economics (ZBW) and Joachim Neubert, German National Library of Economics (ZBW), Germany

June 2009

ZBW logo

General Description

The ZBW German National Library of Economics—Leibniz Information Centre for Economics is the world’s largest economics library. It holds more than four million media items such as books, articles, journals, grey literature and databases. ZBW supports its users with fine-grained thematic access to these information resources. For this purpose the STW Thesaurus for Economics has been developed and applied since the 1990s. It provides a high-level taxonomy of subject categories, thousands of keywords (“descriptors”) and tens of thousands of both synonyms and links between the thesaurus concepts. The media items are indexed with descriptors from this thesaurus. They can be retrieved by these descriptors through the library catalog ECONIS.

The challenge

Although the STW has become an important in-house means for categorizing and cataloguing economics literature, there was still potential for increasing both access and reuse by other people and institutions. More concrete, we identified five action points: First, to improve web-based presentation of STW. Second, to foster precision of search results by actively suggesting preferred terms from STW. Third, to support the integration of STW into other indexing or retrieval environments Fourth, to induce third-party reuse of the STW data, e.g. for customizing the vocabulary. And finally, to establish anchor points for linking to other vocabularies and datasets.

The solution

In order to provide a standard format for publishing the thesaurus as a whole, and also to decouple the publication process from the highly proprietary thesaurus maintenance application, we looked for a standardized, highly expressive intermediate format. It turned out that no common serialization format for thesauri yet exists. The “SKOS - Simple Knowledge Organization System” however, built within the Semantic Web community by vocabulary experts and targeting thesauri, classifications, folksonomies and the like, is expected to achieve the status of a W3C recommendation in 2009 and already has some implementations.

The mapping of the thesaurus concepts and relations to SKOS proved to be quite straightforward. Since SKOS is inherently multi-lingual, preferred and alternate labels (synonyms) in English could be attached to concepts as easily as their German equivalents. “Related”, “narrower” and “broader” relations mapped nicely to the according SKOS properties. Additional properties, such as publisher, version and licensing information were added seamlessly through the use of other RDF vocabularies (e.g., Dublin Core).

However, the crucial test for our SKOS adoption was the mapping of STW characteristics which are not very common among thesauri. In particular, STW has about 500 subject categories organizing about 6,000 descriptors in a single mono-hierarchic taxonomy. This taxonomy is not used for indexing, but as an aid for users to find appropriate indexing or search terms by browsing. We wanted to publish this taxonomy together with the descriptors in a single concept scheme. On the other hand, we still needed to distinguish them for rendering or for custom integrity checks. For these purposes SKOS provides extension points, and subclassing of SKOS concepts proved to meet our requirements.

STW Taxonomy

Figure 1: The STW taxonomy forms a high level knowledge organization system of economics and business management

Publishing the data on the Web was one of the main goals of the project. From the RDF file, we generated an XHTML page for each concept in the thesaurus, and embedded all of the data into this page using RDFa. We assigned a persistent, language- and version-independent URI to each page. Thus, the set of pages forms a highly interlinked network of semantic relations, usable for both humans and machines. Web server content negotiation is used to deliver the format (RDF/XML or XHTML, English or German) most appropriate to the request.

STW Descriptor

Figure 2: An STW descriptor with its relations to other concepts and retrieval links into the library catalog

Providing links to other resources inside and outside the library was another main target of the new STW web presentation. The ZBW’s own library catalog ECONIS was a natural choice for this. Users of the STW website can browse the pages and trigger a search for indexed media items by clicking on the book icon. This opens thematic access paths for the retrieval of library resources. Additional links were created to dbpedia entities, which enable linking to Wikipedia pages.

Since the development of the STW was publicly funded, we felt it was our obligation to publish it for free reuse. Under a non-commercial Creative Commons License the STW is now part of the Web of open linked data also in legal terms.

Future Work

ZBW plans to provide interfaces and widgets to select and aggregate descriptors for retrieval (eg. to provide compound search expressions like “financial crisis” OR “banking crisis” OR “stock market crash”). For searches in fields with controlled values, inclusion of narrower terms may lead to a more complete result set. For full text searches, the addition of synonyms and the addition of English or German forms may enhance searches. Other widgets may be used for indexing, e.g. to support authors in tagging their uploaded papers on a document server.

We also will create or make use of existing mappings to other vocabularies in economics, e.g. the classification system of the “Journal for Economics Literature” (JEL), and put them on to the Web as linked data.

As part of a future linked data infrastructure, ZBW has already deployed an experimental SPARQL endpoint (based on Joseki server) and a terminology web service built on the endpoint. The service provides, for example, search for concepts, narrower terms or synonyms. It also powers the autosuggest incremental search service on the STW web site.

Autosuggest search

Figure 3: Autosuggest search based on thesaurus web service

Key Benefits of Semantic Web Technology

Last modified $Date: 2009/07/28 08:21:34 $ by $Author: ivan $