W3C

Quick Guide to Publishing a Thesaurus on the Semantic Web

Editor's Draft 17 November 2004

This version:
http://www.w3.org/2004/03/thes-tf/primer/2004-11-17
Latest version:
http://www.w3.org/2004/03/thes-tf/primer/
Previous version:
No previous version.
Editors:
Alistair Miles, CCLRC
Dan Brickley, W3C
Guus Schreiber, Free University Amsterdam

Abstract

This document is a quick guide to creating an RDF description of a thesaurus, and of thesaurus metadata, and publishing these on the web.


Status of this Document

This section describes the status of this document at the time of its publication.

This document is an Editor's Draft for review by the Semantic Web Best Practices and Deployment Working Group (hereafter 'the Working Group') and the participants of the public-esw-thes@w3.org mailing list and is subject to change without notice. This document has no formal standing within W3C. Please consult the Working Group's home page and the W3C technical reports index for information about the latest publications by this group. This document may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document is published by the Semantic Web Best Practices and Deployment Working Group, part of the W3C Semantic Web Activity. The Working Group intends the Quick Guide to Publishing a Thesaurus on the Semantic Web to become a W3C Working Group Note.

We encourage public comments. Please send comments to public-esw-thes@w3.org [archive] and start the subject line of the message with "comment:".

Publication as a Working Draft does not imply endorsement by the W3C Membership.


Creating an RDF Description of Concepts in a Thesaurus

The first step to publishing a thesaurus on the semantic web is to create an RDF description of the concepts in the thesaurus.

Here we use the UK Archival Thesaurus [UKAT@@TODOREF] as an example.

Below is an extract from the UKAT:

A concept from the UKAT

Term: Economic cooperation

Used For:
    Economic co-operation

Broader terms:
    Economic policy

Narrower terms:
    Economic integration
    European economic cooperation
    European industrial cooperation
    Industrial cooperation

Related terms:
    Interdependence

Scope Note:
Includes cooperative measures in banking, trade, industry etc., between and among countries.

This extract contains a description of a single concept from the UKAT.

Below is an RDF description of this UKAT concept, encoded using the RDF/XML syntax [RDFXML@@TODOREF]:

RDF description of a UKAT concept, RDF/XML syntax

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xml:base="http://www.ukat.org.uk/thesaurus/concept/">

    <skos:Concept rdf:about="1750">
        <skos:prefLabel>Economic cooperation</skos:prefLabel>
        <skos:altLabel>Economic co-operation</skos:altLabel>
        <skos:scopeNote>Includes cooperative measures in banking, trade, industry etc., between and among countries.</skos:scopeNote>
        <skos:inScheme rdf:resource="http://www.ukat.org.uk/thesaurus"/>
        <skos:broader rdf:resource="4382"/>
        <skos:narrower rdf:resource="2108"/>
        <skos:narrower rdf:resource="9505"/>
        <skos:narrower rdf:resource="15053"/>
        <skos:narrower rdf:resource="18987"/>
        <skos:related rdf:resource="3250"/>
    </skos:Concept>

</rdf:RDF>

Note that in an RDF/XML document, the xml:base attribute can be used to define a base for all relative URIs in the document. This reduces the size of the document, and makes it easier to look at.

There are several alternative syntaxes for RDF. Below is the same RDF description, encoded using the Notation3/Turtle syntax [TURTLE@@TODOREF]:

RDF description of a UKAT concept, Turtle syntax

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix ukat: <http://www.ukat.org.uk/thesaurus/concept/>.

ukat:1750
  a  skos:Concept;
  skos:prefLabel 'Economic cooperation';
  skos:altLabel 'Economic co-operation';
  skos:scopeNote 'Includes cooperative measures in banking, trade, industry etc., between and among countries.';
  skos:inScheme <http://www.ukat.org.uk/thesaurus>;
  skos:broader ukat:4382;
  skos:narrower ukat:2108;
  skos:narrower ukat:9505;
  skos:narrower ukat:15053;
  skos:narrower ukat:18987;
  skos:related ukat:3250.

In the Turtle syntax, prefixes (such as ukat: in the above) can be defined to avoid repeating the same URI base many times within the same document.


Assigning URIs to a Thesaurus and its Concepts

To support use of your thesaurus within semantic web applications, we recommend that you assign URIs to both the thesaurus itself, and each of the concepts in the thesaurus.

For example, the URI:

http://www.ukat.org.uk/thesaurus

... identifies the UK Archival Thesaurus, and the URI:

http://www.ukat.org.uk/thesaurus/concept/1750

... identifies the concept from the UKAT for which the term 'Economic cooperation' is the descriptor.

There are a number of options and issues related to choosing URIs for thesaurus concepts. For a discussion of these issues, see the VM TF note [VMNOTE@@TODOREF].


Creating an RDF Description of Thesaurus Metadata

It is a good idea to also publish metadata about the thesaurus itself, as an RDF description.

Below is an RDF description of the UKAT metadata:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 
    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
    xmlns:dc="http://purl.org/dc/elements/1.1/">

    <skos:ConceptScheme rdf:about="http://www.ukat.org.uk/thesaurus">
        <dc:title>The UK Archival Thesaurus</dc:title>
        <dc:description>A subject thesaurus produced to support indexing in the UK archive sector.</dc:description>
        <dc:creator>UK Archival Thesaurus project</dc:creator>
        <dc:date>2004-08-22</dc:date>
        <dc:format>text</dc:format>
        <dc:language>en</dc:language>
        <dc:rights>All rights reserved. Data in the UK Archival Thesaurus may be freely used and copied, without prior permission, for educational and other non-commercial purposes. These purposes include (but are not limited to) the incorporation of UKAT data into indexes, thesauri and finding aids created by organisations and projects in the archive sector and the wider heritage sector, in the UK and elsewhere. Under no circumstances may copies of UKAT data be sold without prior written permission from the UKAT Project (support@ukat.org.uk).</dc:rights>
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/1"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/2"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/3"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/4"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/5"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/6"/>         
        <skos:hasTopConcept rdf:resource="http://www.ukat.org.uk/thesaurus/field/8"/>         
    </skos:ConceptScheme>

</rdf:RDF>

The skos:hasTopConcept properties must be included, indicating the top level concepts in the thesaurus (i.e. the top of each hierarchy).

In the example above, some properties from the Dublin Core vocabulary [DC@@TODOREF] have been used to describe thesaurus metadata. Note that properties from any RDF vocabulary (such as DC [DC@@TODOREF], DC Terms [DCTERMS@@TODOREF] and FOAF [FOAF@@TODOREF]) may be used in an RDF description.


Publishing RDF Data on the Web

Once you have an RDF description of your thesaurus, and of your thesaurus metadata, publish these on the web.

Probably the simplest way to publish an RDF description on the web is to encode it as a document using a syntax such as RDF/XML or Turtle (as in the above examples), and publish this document on the web as a file on an HTTP or FTP server.

Note also that there are dedicated RDF servers (e.g. Joseki [JOSEKI@@TODOREF]), through which RDF data may be exposed to the web.


About SKOS Core

The RDF descriptions above use the SKOS Core schema [SKOSHOME], an open standard for publishing thesauri on the semantic web.

SKOS Core has many features not described in this document - see the SKOS Core Guide [SKOSGUIDE] and SKOS Core Vocabulary Specification [SKOSSPEC] for full description of and guide to using these features.


Further Reading

@@TODOREFS