SkosDev/SkosCore/CollectionsAndArrays

This is a decription of a requirement for support for 'collections' and 'arrays' of concepts in SKOS-Core, with a preliminary discussion of possible solutions, and issues pertaining to those solutions.

AJM note 8 Feb 2005 >> N.B. The text below has not been updated since sep 2004. See the most recent proposal in relation to this requirement, which is not described on this wiki page. This proposal has been implemented, and remains at the 'unstable' status in the SKOS Core Vocabulary (see also the SKOS Core Guide). See also discussion on the public-esw-thes@w3.org mailing list in the months of september and october 2004.

Requirement

Many thesauri group small sets of concepts under what's called a 'node label' or 'guide term', for example this from the AAT ...

chairs
   <chairs by form>
      armchairs
      ax chairs
      backstools
      Barcelona chairs
      barrel chairs
      ...

... or this from the English Heritage thesaurus of historic aircraft ...

AIRCRAFT
     AIRCRAFT <BY FUNCTION>
          TEST AIRCRAFT
          FIGHTER
          BOMBER
          TRAINER
          TRANSPORTER
          RECONNAISSANCE
          TARGET
          ARMY COOPERATION
          TUG

This type of collection of concepts is commonly called an 'array', where the array label identifies some 'characteristic of division' for the contents of that array.

The consensus seems to be that the node label (i.e. 'chairs by form' or 'aircraft by function') should not be modelled as a label for a concept in its own right, but rather as a label for a collection of concepts.

The matter is complicated further because in some arrays, the ordering of concepts is meaningful. However, in other arrays the ordering of concepts is not meaningful. The RDF description of an 'array' must therefore provide a way to distinguish between these two cases, primarily so that applications handling the data can know whether they should preserve the original ordering, or whether they are free to reorder the contents of an array by some criterion, for example alphabetically.

SKOS-Core requires some framework for supporting arrays of concepts as described here.

Discussion

To represent the essential features of an 'array' in RDF there are two main options: 'Collections' and 'Containers'.

The examples below reference the following example concepts ...

<rdf:RDF xml:base="http://example.org/">

  <skos:Concept rdf:about="A"/>
    <skos:prefLabel>armchairs</skos:prefLabel>
  </skos:Concept>  

  <skos:Concept rdf:about="B"/>
    <skos:prefLabel>ax chairs</skos:prefLabel>
  </skos:Concept>  

  <skos:Concept rdf:about="C"/>
    <skos:prefLabel>back stools</skos:prefLabel>
  </skos:Concept>  

</rdf:RDF>

Option A: Collections

A possible representation of an 'array' using RDF collections is below (assuming standard namespace prefixes) ...

<rdf:RDF xml:base="http://example.org/">

  <skos:Collection>
    <rdfs:label>chairs by form</rdfs:label>
    <skos:members rdf:parseType="Collection">
      <skos:Concept rdf:about="A"/>
      <skos:Concept rdf:about="B"/>
      <skos:Concept rdf:about="C"/>
   </skos:members>
  </skos:Collection>

</rdf:RDF>

Option B: Containers

A possible representation of an 'array' using RDF containers is below (assuming standard namespace prefixes) ...

<rdf:RDF xml:base="http://example.org/">

  <rdf:Seq>
    <rdfs:label>chairs by form</rdfs:label>
    <rdf:li rdf:resource="A"/>
    <rdf:li rdf:resource="B"/>
    <rdf:li rdf:resource="C"/>
    <rdf:li rdf:resource="D"/>
  </rdf:Seq>

</rdf:RDF>

Pros and Cons

Collections tend to be preferred over containers for several reasons (see e.g. this email and follow up on same thread).

(See also David Menedez's email to public-esw-thes@w3.org earlier this year)

Here follows some scenarios that might help evaluate which of these options is the best starting point ...

AJM> RDF gurus if I have got any of this wrong, please correct me :)

Scenario: given an array, obtain its members using an RDF query language (e.g. RDQL)

RDF collections are an absolute pain to query. If the length of the list is not known, then one query has to be applied for each of the list members until the rdf:nil is met. If there is a network latency to factor in for each query, there are obvious practical implications. An option to overcome this would be to express the length of the list in an additional statement, e.g. ...

<rdf:RDF xml:base="http://example.org/">

  <skos:Collection>
    <rdfs:label>chairs by form</rdfs:label>
    <skos:members rdf:parseType="Collection">
      <skos:Concept rdf:about="A"/>
      <skos:Concept rdf:about="B"/>
      <skos:Concept rdf:about="C"/>
    </skos:members>
    <skos:length rdf:datatype="http://www.w3.org/2001/XMLSchema#int">3</skos:length>
  </skos:Collection>

</rdf:RDF>

... so with the length known, the contents of the list can be obtained in a single RDF query. This might seem a bit silly, but it is an obvious pragmatic solution to a tricky problem.

RDF containers are easier to query, provided that the RDF repository has some basic inferencing capabilities, because the container membership super-property rdfs:member can be used. However, without any inferencing, containers run into the same problem as collections in that the length must be known a priori in order for the members to be obtained in a single query.

Scenario: given a concept, obtain any arrays of which it is a member using an RDF query language

Where RDF collections have been used to describe arrays, this is impossible to do. A workaround would be to add a statement about the concept, e.g. ...

<rdf:RDF xml:base="http://example.org/">

  <skos:Collection rdf:about="C1">
    <rdfs:label>chairs by form</rdfs:label>
    <skos:members rdf:parseType="Collection">
      <skos:Concept rdf:about="A"/>
      <skos:Concept rdf:about="B"/>
      <skos:Concept rdf:about="C"/>
    </skos:members>
    <skos:length rdf:datatype="http://www.w3.org/2001/XMLSchema#int">3</skos:length>
  </skos:Collection>

  <skos:Concept rdf:about="A">
    <skos:inCollection rdf:resource="C1"/>
  <skos:Concept>

  <!-- ... and so on for other concepts. -->

</rdf:RDF>

The main problem with the hypothetical skos:length and skos:inCollection properties is that they introduce logical dependencies between statements that must be maintained by any programs modifying the structure. In other words, conflicting statements could be accidentally introduced.

Where RDF containers have been used to describe arrays, this is possible to do via the rdfs:member, again provided that the repository has some inference capability. Without inference it is impossible.

Issue: preserving order

This a matter of style, but nevertheless is important.

If RDF collections are used, indicating that the order of an array should be preserved could be done via a property with a boolean constrained value, e.g.

  <skos:Collection rdf:about="C1">
    <rdfs:label>chairs by form</rdfs:label>
    <skos:members rdf:parseType="Collection">
      <skos:Concept rdf:about="A"/>
      <skos:Concept rdf:about="B"/>
      <skos:Concept rdf:about="C"/>
    </skos:members>
    <skos:preserveOrder rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">true</skos:length>
  </skos:Collection>

... an alternative way of doing the same thing (preferred e.g. by D. Menendez) is to use different classes for each type of array, e.g. skos:[[OrderedCollection]] and skos:[[UnorderedCollection]].

Under RDF containers the problem is simple, as rdf:Seq can be used where order should be preserved, and rdf:Bag where order doesn't matter.

Issue: nesting an array in the hierarchy of concepts

The original purpose of these arrays is so that their labels can appear in a hierarchical display of the concept scheme, and help people make sense of it.

Thus any visualisation tool taking SKOS data as input has to work out how and when these array labels should be displayed.

The simplest hack here is to introduce a property e.g. skos:viewUnder ...

  <skos:Collection rdf:about="C1">
    <rdfs:label>chairs by form</rdfs:label>
    <skos:members rdf:parseType="Collection">
      <skos:Concept rdf:about="A"/>
      <skos:Concept rdf:about="B"/>
      <skos:Concept rdf:about="C"/>
    </skos:members>
    <skos:viewUnder rdf:resource="X"/>
  </skos:Collection>

... where the concept 'X' is broader than all the collection members.

As a matter of style, a property could be defined to build statements the other way round, e.g. ...

  <skos:Concept rdf:about="X">
    <skos:viewAsChildren rdf:resource="C1"/>
  </skos:Concept>

@@TODO any other way of doing this??