W3C

Quick Guide to Publishing a Thesaurus on the Semantic Web

Editor's Draft 8 February 2005

This version:
http://www.w3.org/2004/03/thes-tf/primer/2005-02-08
Latest version:
http://www.w3.org/2004/03/thes-tf/primer/
Previous version:
http://www.w3.org/2004/03/thes-tf/primer/2004-11-17
Editors:
Alistair Miles, CCLRC

Valid XHTML 1.0! Valid CSS!

Abstract

The Semantic Web, which is based on the Resource Description Framework (RDF), provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

This document describes in brief how to express the content and structure of a thesaurus, and metadata about a thesaurus, in RDF. Using RDF allows your data to be linked to and/or merged with other RDF data by semantic web applications.

Status of this Document

This section describes the status of this document at the time of its publication.

This document is an Editor's Draft for review by the Semantic Web Best Practices and Deployment Working Group (hereafter 'the Working Group') and the participants of the public-esw-thes@w3.org mailing list and is subject to change without notice. This document has no formal standing within W3C. Please consult the Working Group's home page and the W3C technical reports index for information about the latest publications by this group. This document may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document is published by the Semantic Web Best Practices and Deployment Working Group, part of the W3C Semantic Web Activity. The Working Group intends the Quick Guide to Publishing a Thesaurus on the Semantic Web to become a W3C Working Group Note.

We encourage public comments. Please send comments to public-esw-thes@w3.org [archive] and start the subject line of the message with "comment:".

Publication as a Working Draft does not imply endorsement by the W3C Membership.

Contents

Introduction

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries [Semantic Web Activity]. It is based on the Resource Description Framework (RDF) [RDF], which provides a simple data formalism for talking about things, their properties, inter-relationships, and categories (classes).

This document describes in brief how to express the content and structure of a thesaurus, and metadata about a thesaurus, in RDF. Using RDF allows your data to be linked to and/or merged with other RDF data by semantic web applications.

The examples in this document are given as a visualisation of an RDF graph. A serialisation of the graph is also given, in the RDF/XML syntax. For an overview of RDF, see [RDF Concepts]. For a description of the RDF/XML syntax, see [RDF Syntax].

The examples in this document use the SKOS Core Vocabulary, which is a set of properties and classes that can be used to express the conceptual content of a thesaurus as an RDF graph. For a complete description of SKOS Core, see [SKOS Core Guide].

The examples in this document use the DCMI Metadata Terms, which are properties and classes for describing resource metadata. For more about DCMI Terms, see [DCMI Terms].

Expressing a Thesaurus in RDF

Below is an extract from the UK Archival Thesaurus (UKAT) [UKAT]:

Term: Economic cooperation

Used For:
    Economic co-operation

Broader terms:
    Economic policy

Narrower terms:
    Economic integration
    European economic cooperation
    European industrial cooperation
    Industrial cooperation

Related terms:
    Interdependence

Scope Note:
Includes cooperative measures in banking, trade, industry etc., between and among countries.

This example, expressed as an RDF graph using the SKOS Core Vocabulary, looks like:

Graph of extract from UKAT

The SKOS Core Vocabulary is designed to allow you to model the conceptual content of a thesaurus as a set of resources. Each of the blue circles in the image above represents a conceptual resource (a resource of type skos:Concept).

An RDF/XML serialisation of this example is below:

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">

    <skos:Concept rdf:about="http://www.ukat.org.uk/thesaurus/concept/1750">
        <skos:prefLabel>Economic cooperation</skos:prefLabel>
        <skos:altLabel>Economic co-operation</skos:altLabel>
        <skos:scopeNote>Includes cooperative measures in banking, trade, industry etc., between and among countries.</skos:scopeNote>
        <skos:inScheme rdf:resource="http://www.ukat.org.uk/thesaurus"/>
        <skos:broader rdf:resource="http://www.ukat.org.uk/thesaurus/concept/4382"/>
        <skos:narrower rdf:resource="http://www.ukat.org.uk/thesaurus/concept/2108"/>
        <skos:narrower rdf:resource="http://www.ukat.org.uk/thesaurus/concept/9505"/>
        <skos:narrower rdf:resource="http://www.ukat.org.uk/thesaurus/concept/15053"/>
        <skos:narrower rdf:resource="http://www.ukat.org.uk/thesaurus/concept/18987"/>
        <skos:related rdf:resource="http://www.ukat.org.uk/thesaurus/concept/3250"/>
    </skos:Concept>

</rdf:RDF>

Note that each concept from the UKAT has been allocated a URI. For example, the URI

http://www.ukat.org.uk/thesaurus/concept/1750

denotes the concept from the UKAT whose preferred term is 'Economic cooperation'. Note also that the UKAT itself has been allocated the URI

http://www.ukat.org.uk/thesaurus.

Allocating URIs to a thesaurus and to the concepts in a thesaurus allows anybody to refer to them from any context.

For a complete description of considerations relevant to allocating URIs, see [WEBARCH].

See also the section 'HTTP URIs for Concepts' in [SKOS Core Guide].

Expressing Thesaurus Metadata in RDF

RDF can also be used to express metaproperties of a thesaurus, such as it's title, description, date of modification and so on. The DCMI Metadata Terms [DCMI Terms] include a number of useful properties for this purpose. For example, below is an RDF/XML serialisation of the UKAT metadata:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
    xmlns:dc="http://purl.org/dc/elements/1.1/">

    <skos:ConceptScheme rdf:about="http://www.ukat.org.uk/thesaurus">
        <dc:title>The UK Archival Thesaurus</dc:title>
        <dc:description>A subject thesaurus produced to support indexing in the UK archive sector.</dc:description>
        <dc:creator>UK Archival Thesaurus project</dc:creator>
        <dc:date>2004-08-22</dc:date>
        <dc:format>text</dc:format>
        <dc:language>en</dc:language>
        <dc:rights>All rights reserved. Data in the UK Archival Thesaurus may be freely used and copied, without prior permission, for educational and other non-commercial purposes. These purposes include (but are not limited to) the incorporation of UKAT data into indexes, thesauri and finding aids created by organisations and projects in the archive sector and the wider heritage sector, in the UK and elsewhere. Under no circumstances may copies of UKAT data be sold without prior written permission from the UKAT Project (support@ukat.org.uk).</dc:rights>
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/1"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/2"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/3"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/4"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/5"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/6"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/8"/>         
    </skos:ConceptScheme>

</rdf:RDF>

See [DCMI Terms] for a description of the recommended usage of these properties.

Note that SKOS Core models a thesaurus as a 'concept scheme'. For more about this, see the section 'Concept Schemes' in [SKOS Core Guide].

Note also that, in the example above, a link has been asserted between the UKAT thesaurus and the top concepts in the UKAT thesaurus (in the UKAT they are known as 'fields') using the skos:hasTopConcept property. Using this property gives applications an efficient way of locating the top concepts for a given scheme.

Publishing RDF Data

The simplest way to publish RDF data is to create one or more RDF documents containing your data, and publish them on the web via a normal HTTP server.

Note that, although the examples above all use the RDF/XML serialisation syntax (i.e. file format), there are two other alternative syntaxes for RDF: N3/Turtle [N3][Turtle] and N-Triples [N-Triples]. For documents containg RDF data in the RDF/XML format, the 'content-type' field in the HTTP header for that document should be 'application/rdf+xml'.

You can also publish your RDF data on the web via a dedicated RDF server such as Joseki [Joseki] or Sesame [Sesame].

References

[DCMI Terms]
DCMI Metadata Terms. Dublin Core Metadata Initiative, 2004. (See http://dublincore.org/documents/dcmi-terms/)
[Joseki]
Joseki Jena RDF Server. Sourceforge. (See http://www.joseki.org/)
[N3]
Tim Berners-Lee. Primer: Getting into RDF & Semantic Web using N3. World Wide Web Consortium, 2004. (See http://www.w3.org/2000/10/swap/Primer)
[N-Triples]
Jan Grant, Dave Beckett, editors. RDF Test Cases (Section 3. N-Triples). World Wide Web Consortium, 2004. (See http://www.w3.org/TR/rdf-testcases/#ntriples)
[RDF]
Resource Description Framework (RDF). (See http://www.w3.org/RDF/)
[RDF Concepts]
Graham Klyne, Jeremy Carroll, editors.Resource Description Framework (RDF): Concepts and Abstract Syntax. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/rdf-concepts/)
[RDF Syntax]
Dave Beckett, editor. RDF/XML Syntax Specification (Revised). World Wide Web Consortium, 2004. (See http://www.w3.org/TR/rdf-syntax-grammar/)
[Semantic Web Activity]
Semantic Web Activity Statement. World Wide Web Consortium, 2001. (See http://www.w3.org/2001/sw/Activity)
[Sesame]
Sesame RDF Database. (See http://www.openrdf.org/)
[SKOS Core Guide]
Alistair Miles, Dan Brickley, editors. SKOS Core Guide. World Wide Web Consortium, 2004. (See http://www.w3.org/2004/02/skos/core/guide/)
[Turtle]
Dave Beckett. Turtle - Terse RDF Triple Language. ILRT, University of Bristol, 2004. (See http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/)
[UKAT]
The UK Archival Thesaurus. See (http://www.ukat.org.uk/)
[WEBARCH]
Ian Jacobs, Norman Walsh, editors. Architecture of the World Wide Web, Volume One. World Wide Web Consortium, 2004. (See http://www.w3.org/TR/webarch/)