SKOS Implementation: Astronomy Thesauri

Sean Bechhofer
Described in

The International Virtual Observatory Alliance (IVOA, < 
 >) is in the final stages of developing a standard for vocabularies  
(really, thesauri) in astronomy
 >, which mandates the use of SKOS, and the W3C Best Practice  
guidelines for publishing RDF vocabularies < 
 >.  The standard is currently a 'Proposed Recommendation', and its  
promotion to an IVOA Recommendation is being  postponed until after  
the SKOS standard becomes a W3C Recommendation.

The IVOA is a consortium of national and international 'Virfual  
Observatory' projects, which aims to develop standards for the  
discovery of, and access to, astronomical data held in a broad range  
of astronomical image and catalogue servers.

As part of this standard, the IVOA is publishing four existing  
vocabularies and thesauri in SKOS form.  These are intended partly as  
examples of the technology, and as useful contributions to the VO's  
processes.  Other groups within the IVOA expect to develop  
vocabularies in the near future, and will develop them in SKOS form.   
The IVOA Semantics group does not expect to maintain the SKOS versions  
of these vocabularies in future, leaving that to the groups or  
organisations which maintain the source vocabularies.

The four vocabularies published as part of the IVOA standard are:

*** The Astronomy & Astrophysics Keyword List (312 concepts)
This vocabulary is based on a set of keywords maintained jointly by  
the publishers of the journals Astronomy and Astrophysics(A&A),  
Monthly Notices of the Royal Astronomical Society (MNRAS) and the  
Astrophysical Journal (ApJ), and updated on an annual basis.  The  
intended usage of the vocabulary is to tag articles with descriptive  
keywords to aid searching for articles on a particular topic.

*** The AVM Taxonomy (218 concepts)

This vocabulary is published by the IVOA to allow images to be tagged  
with keywords that are relevant for the public. It consists of a set  
of keywords organised into an enumerated hierarchical structure. Each  
term consists of a taxonomic number and a label. There are no  
definitions, scope notes, or cross references.

*** The UCD1+ Vocabulary (474 concepts)
The UCD standard is an officially sanctioned and managed vocabulary of  
the IVOA. The normative document is a simple text file containing  
entries consisting of tokens (for example em.IR), a short description,  
and usage information (“syntax codes” which permit UCD tokens to be  
concatenated). The form of the tokens implies a natural hierarchy:  
em.IR.8-15um is obviously a narrower term than em.IR, which in turn is  
narrower than em.  Note that the SKOS document containing the UCD1+  
vocabulary does NOT consistute the official version: the normative  
document is still the text list. However, on the long term, the IVOA  
may decide to make the SKOS version normative, since the SKOS version  
contains all of the information contained in the original text  
document but has the advantage of being in a standard format easily  
read and used by any application on the semantic web whilst still  
being usable in the current ways.

*** The 1993 IAU Thesaurus (2552 concepts)
The IAU Thesaurus consists of concepts with mostly capitalised labels  
and a rich set of thesaurus relationships (“BT” for "broader term",  
“NT” for “narrower term”, and “RT” for “related term”). The thesaurus  
also contains “U” (for “use”) and “UF” (“use for”) relationships. In a  
SKOS model of a vocabulary these are captured as alternative labels. A  
separate document contains translations of the vocabulary terms in  
five languages: English, French, German, Italian, and Spanish.  
Enumerable concepts are plural (for example “SPIRAL GALAXIES”) and non- 
enumerable concepts are singular (for example “STABILITY”). Finally,  
there are some usage hints like “combine with other”, which have been  
modelled as scope notes.

In converting the IAU Thesaurus to SKOS, we have been as faithful as  
possible to the original format of the thesaurus. Thus, preferred  
labels have been kept in their uppercase format.

The IAU Thesaurus has been unmaintained since its initial production  
in 1993; it is therefore significantly out of date in places. This  
vocabulary is published for the sake of completeness, and to make the  
link between the evolving vocabulary work and any uses of the 1993  
vocabulary which come to light. We do not expect to make any future  
maintenance changes to this vocabulary.

Though there are some differences between the SKOS versions of these  
four vocabularies, they use a common subset of SKOS constructs:  
ConceptScheme, hasTopConcept, Concept, broader, narrower, related,  
inScheme, prefLabel, altLabel, notation, scopeNote.  There is no  
discussion of intervocabulary mapping in this work, but there are  
projects going on within the VO which expect to exploit such  
constructs in the near future.

