Workpackage description: 8: Thesaurus Research Prototype

Workpackage number: 8

Start date or starting event: Month 12

Lead Partner: CCLRC (3)

Participant short name: ILRT W3C-ERCIM CCLRC HP STILO
Participant number: 1 2 3 4 5
Person-months per participant: 8 0 16 0 0

Total number of deliverables: 8

Objectives

Development of one or more research prototype RDF thesauri showing support for advanced characteristics such as ISO-compatibility, multi-linguality, relations to RDF ontologies, classification schemes and cross-mapping between thesauri. The demonstrator will show existing information scientists how to migrate from their current systems to RDF based semantic web ones, and motivate this with examples and documentation.

Description of work

Thesauri have been an important component of online database searching within the library community for many years and are now considered useful for the online Web-based search community as well. As a simple form of ontology, they play an important role in the indexing of Web-based documents, adding a certain amount of semantic information. There has been substantial work on RDF Thesauri, for example the DESIRE project defined a standard set of conceptual relationships typical of controlled vocabularies such as thesauri, classification systems and organised metadata collections; this set of relationships was encoded into RDF and used in the SOSIG and LIMBER projects. The challenges at this stage are to show that RDF is a useful encoding for thesauri, and to show how to migrate existing thesauri to RDF. To be able to use the latter may require extending the functionality of current RDF thesauri in various ways, for example in the ways detailed below.

RDF ontologies

Relating RDF thesauri to more complex RDF ontologies will allow the support of cross-domain resource-discovery and searching.

ISO-compatibility

Refining the existing RDF thesaurus schema to make it compatible with ISO 2788: Guidelines for the establishment and development of monolingual thesauri, will ensure the schema is compatible with most existing thesauri, improving the possibilities of migration.

Multilinguality

A standardised thesaurus schema will provide a means for showing relationships between multilingual terms within individual thesauri, providing an aid for multilingual access to online documentation. The relationships between multilingual terms are not always easily expressed since equivalent terms do not always exist within different languages. This workpackage will investigate the different relationships that may be required and consider how to develop the RDF Schema to express these.

Cross-mapping

An RDF Schema can also provide the ability to describe relationships between terms within different thesauri covering different subject areas. Such cross-mapping between different thesauri is a vital step in improving access to content provided via various mechanisms for cross-searching currently available on the Web, for example via Z39.50.

Encoding of classification systems

Classification schemes are also of importance for online subject retrieval as they provide the basis for browsing the collections of Directories and Subject Gateways. Although cross-searching of online databases is well established, cross-browsing of subject gateways is in its infancy. One project currently investigating this approach is Renardus (http://www.renardus.org/). Due to the similarities between classification schemes and thesauri, this workpackage will investigate how the existing RDF Schema needs to be extended to allow a similar encoding of classification systems. Such a Schema could also allow the encoding of classification cross-mapping data and, additionally, term weighting data required for the autoclassification of documents.

DAML+OIL

The recent development of the DAML+OIL semantic markup language provides an extension to RDF Schema. The workpackage will look at how DAML+OIL may be used to make the thesaurus schema relationships more precise.

The demonstrator will be used to quantify the intermediate benefits of using semantic web technologies to encourage early adopters, adoption and assimilation in general, before the long term semantic web benefits of global reasoning etc. are available.

Deliverables