Making controlled vocabularies accessible as URI sets

From Data on the Web Best Practices
Jump to: navigation, search

@@TODO: format according to template (or wherever URL is is transfered to)


This document is intended as a section of the Data on the Web Best Practices document.

Datasets often resort to a range of controlled vocabularies in the data they contain: data values are entered or captured in a controlled way, i.e., for certain positions in a data graph (or column in a relationship table), the value used should come with a limited set of pre-existing resources: for example object types, roles of a person, countries in a geographic area, or possible subjects for books.

Such controlled vocabularies, which usually come in the form of thesauri, classification system, or 'flat' reference lists are often refered to as knowledge organization systems or value vocabularies. They are different, from the perspective of their role in the data design, from the formalized vocabularies of classes and properties refered to as (RDFS/OWL) ontologies in the Semantic Web community. Yet they are important, as their usage ensures a level of control, standardization and interoperability in the data. They can also provide a way to easily create richer data. Say, a dataset contain one reference in a data statement to a controlled concept description in several languages. This single statement allows applications to localize their display of their search depending on the language of the user. 'Semantic' algorithms can also exploit the semantic relationships (for example hierarchical 'broader concept' links) typically present in a thesaurus.

This section presents best practices for making controlled vocabularies accessible as URI sets on the Web, technically available for anyone to re-use. It specifically focuses on SKOS, an ontology that has been created to publish knowledge organization systems as Linked Data in a simple way.

TODO: do we need to add something about code list? To check with the group.

Scope / Issues

AI: I've tried to cluster a bit the items, to try to see if I understand the scope better this way...

  • Is there already an existing vocabulary that is suitable? Identifying related terms, concepts, categories and properties in other vocabularies

  • Persistence of URI identifiers - see also URI Design and Management for Persistence
  • Developer-friendliness of URIs - through the use of human-readable names rather than cryptic code values familiar only to insiders
    • (e.g. vs
  • Creating URIs that take into account options for de-referencing data about resources - when to use hash (URI fragments) vs slash. Cf Best Practice Recipes for Publishing RDF Vocabularies AI: the sentence 'when to use hash (URI fragments) vs slash' makes one think it is about creating URIs, not de-referencing them after they've been created.

  • Hierarchical relationships - use of properties of the SKOS data model (skos:broader , skos:narrower etc.) to represent and navigate through the hierarchy
  • Non-hierarchical relationships - use of skos:related
  • Expressing constrained values from enumerated lists (e.g. days of the week) e.g. using OWL Enumerated Classes and owl:oneOf predicates AI not sure I understand this: for KOSs I would rather use the notion of ConceptSchem to express the notion of a set of classes/concepts/etc
  • Cross-referencing to concepts in other vocabularies - indicating various degrees of equivalence using SKOS mapping terms, rdfs:seeAlso, owl:sameAs vs skos:exactMatch AI: I've separated this from the remark on defining own properties and vocabularies, which sounded more about classes in ontologies (remember: in general skos:exactMatch is for Concepts, owl:sameAs for instances and owl:equivalentClass for classes)
  • Provision of multi-lingual definitions for vocabulary terms
    • Use of skos:hiddenLabel to support discovery of labels for terms whose labels contain accented characters AI: no, hiddenLabel is for variations of labels that a vocabulary creator would prefer not to have displayed, like common typos. Accented characters can be perfectly worthy of display, in many languages. anyway as the issue is fairly detailed and is repeated later, I would suggest to treat it only there.

  • Sometimes we are motivated to define our own vocabularies or properties AI: are you talking about properties or statements or whole vocabularies? The 'or' makes this really hard to get if those in other vocabularies are not sufficiently precise - e.g. values are free-form text strings, rather than quantitative values (value + unit of measure) or if no reference unit is specified
    • Need to be able to indicate that this property is related to a property defined elsewhere (even though the expected data type for the range is different (e.g. more structured) and the definition might be more precise (e.g. relate to a specified reference unit)). AI: is this related to specializing properties, via rdfs:subPropertyOf statements? Is this still related to the notion of KOS? It seems here that we're in the realm of ontologies again..
  • Limiting the scope of attribute values within the context of a particular category AI: this seems about ontologies as well'
    • (e.g. an attribute/property/predicate such as ex:typeOfMaterial might have a large range of values including stone, concrete, iron, as well as cotton, silk, leather, linen - but we only want to permit/consider a subset of those within the context of a particular category - such as ex:Clothing ) AI: in fact this block of items makes me think of Dublin core's notion of application profiles, which is at the level of ontologies, not KOSs

SKOS - Simple Knowledge Organization System

The basic structure and content of controlled vocabularies can be expressed using the Simple Knowledge Organization System (SKOS) vocabulary.

  • Concepts are identified with URIs and asserted to be of rdf:type skos:Concept (e.g. ex:animals rdf:type skos:Concept. )

SKOS Labels

  • A concept can have at most one preferred lexical label (skos:prefLabel) per natural language tag (e.g. ex:animals skos:prefLabel "animals"@en; skos:prefLabel "animaux"@fr ; skos:prefLabel "Tiere"@de )
  • A concept can have any number of alternative lexical label (which very often are synonyms of the preferred label), which are expressed using skos:altLabel (e.g. ex:animals skos:altLabel "creatures"@en . )
  • Abbreviations and acronyms and near synonyms for a concept's prefered label can also be expressed using skos:altLabel
  • Hidden labels can also be defined for a concept and expressed using skos:hiddenLabel. Hidden labels are not intended to be visible but they might be used for indexing and search operations, to match against mis-spelled variations of the other labels. Hidden labels can be particularly useful when the preferred label of alternative labels contain accented characters that are not found in the regular ASCII character set. AI: again I don't understand. Accented characters are perfectly valid. And I guess nowadays most data will come in UTF-8 (e.g. ex:animals rdf:type skos:Concept; skos:prefLabel "animaux"@fr; skos:altLabel "bêtes"@fr; skos:hiddenLabel "betes"@fr.)

Semantic Relationships

Hierarchical relationships between broader (more general) concepts and narrower (more specialized) concepts

  • The SKOS properties skos:broader and skos:narrower can represent hierarchical links between a concept and a more general or more specific concept. (e.g. ex:cats skos:broader ex:mammals . ex: mammals skos:broader ex:animals . )
  • skos:broader should be read as 'links to a broader concept' - NOT 'is broader than'. Likewise, skos:narrower should be read as 'links to a narrower concept' - NOT 'is narrower than'.
  • The predicates skos:broader and skos:narrower are mutual inverses but they are not transitive. SKOS defines predicates skos:broaderTransitive and skos:narrowerTransitive that are transitive super-properties of skos:broader and skos:narrower repectively.

Non-hierarchical associative relationships

  • The SKOS property skos:related can be used to express non-hierarchical associations, e.g. between an event and a category of entities that participate in it - or between two categories where neither is more general or more specific than the other. skos:related can also be used to represent the relationships between the whole thing and its parts, e.g. between a bicycle and its wheels.
  • Note that skos:related is not transitive and the transitive closure of skos:broader or skos:narrower must be disjoint from skos:related, i.e. if conceptual resources A and B are related via a chain of one or more skos:broader or skos:narrower relationships, there must not be a skos:related link between A and B.

Documentary Notes

  • SKOS provides a skos:note predicate for general documentation, as well as specializations of skos:note such as skos:scopeNote, skos:definition, skos:example and skos:historyNote for specific purposes, detailed below:
  • skos:scopeNote is used to provide some (possibly incomplete) information about the intended meaning of a concept, (e.g. ex:microwaveFrequencies skos:scopeNote "Used for frequencies between 1GHz to 300GHz"@en .)
  • skos:definition is used to provide a complete explanation of the intended meaning of a concept, (e.g. ex:documentation skos:definition "the process of storing and retrieving information in all fields of knowledge"@en .)
  • skos:example is used to provide an example of the use of a concept: ex:organizationsOfScienceAndCulture skos:example "academies of science, general museums, world fairs"@en.
  • skos:historyNote is used to describe significant changes to the meaning or form of a concept over time. (e.g. ex:childAbuse skos:historyNote "stab. 1975; heading was: Cruelty to children (1952-1975)"@en.
  • skos:editorialNote can be used as reminders of editorial work still to be done or to warn about future anticipated editorial changes.
  • skos:changeNote is used to document fine-grained changes to a concept for administrative / maintenance reasons.

Concept schemes

  • Concepts can be organized in concept schemes, which represent controlled vocabularies, thesauri, classification schemes or taxonomies. A concept scheme is declared to be of rdf:type skos:ConceptScheme
  • The name and creator of a concept scheme can be expressed using the Dublin Core properties dct:title and dct:creator. Other properties from Dublin Core or other ontologies can be used to further characterize concept schemes.
  • Concepts are linked to the concept scheme via the SKOS property skos:inScheme (e.g. ex:animals skos:inScheme ex:animalThesaurus . )
  • Schemes can have one or more most general concepts and these are expressed using the skos:hasTopConcept relationship, to provide one or more entry points into the skos:narrower / skos:broader hierarchical relationships among concepts. (e.g. ex:animalThesaurus skos:hasTopConcept ex:mammals; skos:hasTopConcept ex:fish; skos:hasTopConcept ex:reptiles . )

Using SKOS to indicate mappings across concept schemes

AI: the examples in this section have a different format than the ones above. This will need homogeneization when the text will be ported to the final document.

SKOS provides a number of properties that indicate semantic similarities between concepts from different vocabularies (concept schemes).

  • skos:exactMatch indicates that the two concepts have equivalent meaning. skos:exactMatch is a transitive property.
  e.g. ex1:animal  skos:exactMatch  ex2:animals .
  • skos:closeMatch indicates that the two concepts are sufficiently similar that they can be used interchangeably in applications that consider the two concept schemes to which they belong. However, skos:closeMatch is not a transitive property, which reflect a more 'local' context for validity of the similarity.
  • skos:broadMatch relates one concept in one concept scheme to a broader concept in another concept scheme, in an analogous way to the use of skos:broader within a single concept scheme. Note that skos:broadMatch is a sub-property of skos:broader, so if concept A is related via skos:broadMatch to concept B, then concept A is also related via skos:broader to concept B (typically this statement is materialized after applying a formal inference process).
  e.g.  ex1:platypus  skos:broadMatch  ex2:eggLayingAnimals .
  • skos:narrowMatch relates from one concept in one concept scheme to a narrower concept in another concept scheme, in an analogous way to the use of skos:narrower within a single concept scheme. Note that skos:narrowMatch is a sub-property of skos:narrower, so if concept A is related via skos:narrowMatch to concept B, then concept A is also related via skos:narrower to concept B (typically this statement is materialized after applying a formal inference process).
  e.g.  ex1:mammals  skos:narrowMatch  ex2:cats .
  • skos:relatedMatch indicates that the two concepts across the two different concept schemes are related but in a non-hierarchical manner, in an analogous way to the use of skos:related within a single concept scheme. Note that skos:relatedMatch is a sub-property of skos:related, so if concept A is related via skos:relatedMatch to concept B, then concept A is also related via skos:related to concept B (typically this statement is materialized after applying a formal inference process).
  e.g.  ex1:platypus  skos:relatedMatch  ex2:eggs .

Note that SKOS provides skos:exactMatch to map concepts with equivalent meaning and intentionally does not use owl:sameAs from the OWL ontology language. When two resources are linked via owl:sameAs, they are considered to be the same resource and triples involving those resources are merged. Linking two concepts via skos:exactMatch does not mean that they are the same resource nor does it mean that triples involving those concepts can be merged. This avoids introducing inconsistencies across equivalent concepts from different concept schemes; for example the equivalent concepts might still have different preferred labels (skos:prefLabel) in their respective scheme and cannot have more than one preferred label per language tag. AI: this paragraph is perfectly made. I guess some of the issues I've flagged in the scope section predates the time this was written!


AI: I feel that the size of this section gives notations a big focus, perhaps too much wrt. what should be done with them. Notations clearly date from times where knowledge representation had to rely on cruder solutions than the one we have now. But perhaps I'm biased, and the section is not to long if one considers that previous section on label have a much more concise syntax for their examples.

Some controlled vocabularies and classification schemes use alphanumeric codes to provide identifiers for concepts in a way that they cannot be mistaken for natural-language labels. An example is the Universal Decimal Classification, with examples such as:

 512  Algebra
 512.6  Special branches of algebra

Such notations can be represented using the skos:notation property. For example:

 ex:udc512  skos:prefLabel  "Algebra"@en;
 skos:notation  "512"^^ex:UDCNotation .

Note that the objects of skos:notation statement must be an RDF typed literal, i.e. a literal with an explicit data type that indicates the syntax encoding scheme for the noatation. In case this would prove difficult to handle for a vocabulary provider, the value of the notation may be indicated instead using the skos:prefLabel property, without a specific language tag. For example:

 ex:udc512  skos:prefLabel  "Algebra"@en;
 skos:notation  "512"^^ex:UDCNotation .
 skos:prefLabel "512" .

Editors and Contributors

  • Antoine Isaac
  • Mark Harrison

Links and References