ContainersAndCollections

One of the missing features that a reasonable number of users (lately Jeni Tennison, on twitter) asked for is a simpler way of encoding collections and containers in RDFa. This is indeed the major RDF feature that does not have a direct mapping somehow (if we forget about reification that is not really used anyway). The design below tries to outline a possible approach. I have created a separate FAQ section on why some of the design decisions have been taken.

The user view

Containers (a.k.a. rdf:Seq, rdf:Alt, or rdf:Bag)

Processing containers is triggered by a pseudo resource ::Seq, ::Alt, or ::Bag. Ie, the following code:

<span rel="..." resource="::Seq">
    ...
</span>

triggers container processing, as described below, on the subtree. The current subject in the subtree is a newly created blank node, of type rdf:Seq, rdf:Alt, or rdf:Bag, respectively.

Container processing means that the RDFa processor locates in the subtree, depth-first, any element with a @rel or @property value of ::member. These attributes are processed by exchanging them against the rdf:_1, rdf:_2, … predicates, respectively and in order. The depth-first search stops after handling a ::member or if another container or collection is specified via a pseudo resource.

As an example, the code:

 <li about="http://www.worldcatlibraries.org/isbn/9780262912423" typeof="bibo:Book">
   <em property="dc:title">A Semantic Web Primer</em>, by
   <span rel="dc:creator" resource="::Seq">
     <span property="::member">Grigoris Antoniou</span> and
     <span property="::member">Frank van Harmelen</span>
   </span>
</li>

yields

<http://www.worldcatlibraries.org/isbn/9780262912423> a bibo:Book ;
     dc:creator
         [ a rdf:Seq ;
             rdf:_1 "Grigoris Antoniou" ;
             rdf:_2 "Frank van Harmelen"
         ] ;
     dc:title "A Semantic Web Primer" . 

Collections (a.k.a. rdf:List)

Processing containers is triggered by a pseudo resource ::List. Ie, the following code:

 <span rel="..." typeof="::List">
    ...
 </span>

triggers collection processing, as described below, on the subtree. The current subject in the subtree is a newly created blank node used as the head of the (RDF) list.

Collection processing means that the RDFa processor locates in the subtree, depth-first, any element with a @rel or @property value of ::member. These attributes are processed by exchanging them against the rdf:first predicate; furthermore, the additional triples to bind the list elements together via rdf:next are also added automatically. The depth-first search stops after handling a ::member or if another container or collection is specified via a pseudo resource.

The code below is similar to the previous example, except that a list is used instead of a sequence:

 <li about="http://www.worldcatlibraries.org/isbn/9780262912423" typeof="bibo:Book">
   <em property="dc:title">A Semantic Web Primer</em>, by
   <span rel="dc:creator" resource="::List">
     <span property="::member">Grigoris Antoniou</span> and
     <span property="::member">Frank van Harmelen</span>
   </span>
</li>

which yields:

<http://www.worldcatlibraries.org/isbn/9780262912423> a bibo:Book ;
     dc:creator ( "Grigoris Antoniou" "Frank van Harmelen" ) ;
     dc:title "A Semantic Web Primer" . 

The specification view

Handling of collections and containers does not change the core processing model. Instead, management of collections and containers are defined via a (conceptual) pre-processing step on the DOM tree of the RDFa source: nodes in the DOM tree are modified (more specifically, attributes values are modified) and some new elements are added to the DOM tree for the creation of new triples. All modifications of the DOM tree are such that, once the DOM tree is modified, the normal RDFa Core processing steps can be executed to generate the necessary collection or container triples.

Collection/Container transformation of the DOM

In what follows the term "new bnode id is generated" means that a string of the form "_:xxx" is created that is be distinct from all other similarly formatted strings, whether appearing in the original RDFa source or as a result of an earlier, similar steps of generating a new bnode id.

The modification of the DOM tree is made starting from the top element. The steps below are executed on all element nodes as follows.

  1. If the node does not include the @resource attribute, or if the value of that attribute is not ::Seq, ::Bag, ::Alt, or ::List, processing on that node stops.
  2. If the value of @resource is ::Seq, ::Bag, or ::Alt then
    1. a new bnode id is generated
    2. the new bnode id is used to replace the value of @resource
    3. a new child element is added to the node with the attribute @about set to the new bnode id, and the attribute @typeof set to http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq, http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag, or http://www.w3.org/1999/02/22-rdf-syntax-ns#Alt, respectively.
    4. and element counter is set to '1'
    5. the DOM subtree starting at the node is traversed depth-first; for each descendent node:
      1. if the descendent node contains the @rel or @property attribute with a value of ::member then
        1. the attribute value is exchanged against http://www.w3.org/1999/02/22-rdf-syntax-ns#_X where 'X' is the current value of the element counter
        2. the element counter is incremented by one
        3. the traversal stops
      2. if the descendent node contains the @resource attribute with a value of ::Seq, ::Bag, ::Alt, or ::List, traversal stops
      3. otherwise, traversal is continued until the leaf nodes are reached
  3. If the value of @resource is ::List then
    1. a new bnode id is generated; its value is stored in order
    2. the new bnode id is used to replace the value of @resource
    3. an array of bnode id-s is created and the newly generated is added to the tail of the array
    4. the DOM subtree starting at the node is traversed depth-first; for each descendent node:
      1. if the descendent node contains the @rel or @property attribute with a value of ::member
        1. the attribute value is exchanged against http://www.w3.org/1999/02/22-rdf-syntax-ns#first
        2. the attribute @about is set on the descendent node with the value set to the tail element of the array of bnode id-s
        3. a new bnode id is generated and its value is added to the tail of the array
        4. the traversal stops
      2. if the descendent node contains the @resource attribute with a value of ::Seq, ::Bag, ::Alt, or ::List, traversal stops
      3. otherwise, traversal is continued until the leaf nodes are reached
    5. Once the traversal is done, the array of bnode id-s is traversed from start to tail and, for each id a new child element is added to the node with
      1. the @about attribute set to the current bnode id of the array
      2. the @rel attribute set to http://www.w3.org/1999/02/22-rdf-syntax-ns#rest
      3. the @resource attribute set to the next bnode id of the array if applicable; if not, the value is set to http://www.w3.org/1999/02/22-rdf-syntax-ns#nil

Note: it is not specified what element is added to the node in steps 2.3 or 3.5; these elements in the DOM tree are ephemeral and the implementation is free to use any element name.

Note: this specification describes the conceptual transformation of the DOM tree; implementation are not required to implement collections and containers exactly in these terms. They can, for example, fold these steps directly into the core processing steps.

Transformation Examples

The following code shows what serialized form of the transformed DOM tree could look like for the previous examples. The names of the elements added to the DOM, and their exact position, may differ from one implementation to the other.

The first example shows the container case, using the example above:

<li about="http://www.worldcatlibraries.org/isbn/9780262912423" typeof="bibo:Book">
  <em property="dc:title">A Semantic Web Primer</em>, by
  <span rel="dc:creator" resource="_:xyz10">
    <span property="http://www.w3.org/1999/02/22-rdf-syntax-ns#_1">Grigoris Antoniou</span> and
    <span property="http://www.w3.org/1999/02/22-rdf-syntax-ns#_2">Frank van Harmelen</span>
    <container about="_:xyz10" typeof="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq"/>
  </span>
</li>

The transformed collection case can look as follows.

<li about="http://www.worldcatlibraries.org/isbn/9780262912423" typeof="bibo:Book">
  <em property="dc:title">A Semantic Web Primer</em>, by
  <span rel="dc:creator" resource="_:xyz10">
   <span about="_:xyz10" property="http://www.w3.org/1999/02/22-rdf-syntax-ns#first">Grigoris Antoniou</span> and
   <span about="_:xyz11" property="http://www.w3.org/1999/02/22-rdf-syntax-ns#first">Frank van Harmelen</span>
   <list_links about="_:xyz10" rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#rest" resource="_:xyz11"/>
   <list_links about="_:xyz11" rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#rest" resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
  </span>
</li>

Possible questions

Why not reusing the HTML constructs like <ul/>?

First of all, these elements are HTML specific, and RDFa is defined for any XML dialects. But even in HTML it might become a restriction. Indeed, the sentence in the example should look on the screen as follows:

A Semantic Web Primer, by Grigoris Antoniou and Frank van Harmelen

instead of a bulleted list. It is of course possible to define CSS rules to generate the same visual effects for <ul> but those CSS rules are not obvious, are rarely used and most of the HTML authors are not familiar with them. On the other hand, adding the necessary attributes to list elements is not a huge load on the author…

Why using the ::Seq and ::member instead of, say, rdf:Seq or rdf:li?

The transformation rules above handle the most frequent use cases; however, it does not cover all possible usages of containers or collections. For example if, instead of blank nodes, the author wants to create a sequence as an URI Resource, the transformation would not work (the sequence node is set to a blank node by the rules). The same occurs if one of the constituents of a list is supposed to be a URI Reference. If the transformations were triggered by one of the RDF terms instead of the pseudo terms, the author’s intentions and the transformation might clash. Also, using real prefixes would contradict to the preprocessor approach; indeed, it would require a preprocessor to "understand" the @prefix mechanism, interpret profile files, etc, to make it sure that the rdf:Seq symbol is the corresponding RDF URI. A final issue is with the possible usage of rdf:li: if it was used, many users would believe that this is a real predicate in the RDF namespace which is not true (yes, this is an issue in RDF/XML, too!).

It would of course be possible to extend the specification to cover all corner cases listed above, e.g., by considering possible @about attributes on the constituents of, say, a list element. However, the current design probably covers at least 80% of the usual use cases (the authors‘ list in the examples is a typical one), and it is better to choose for a relative simplicity in this case rather than for completeness.

What about using @typeof instead of @resource?

One could consider using @typeof as a triggering mechanism, ie, something like:

<span typeof="::List">...</span>

It is indeed perfectly possible to define the transformation rules in a more or less analogous fashion. However, some of the side-effects could be surprising due to the extra rules @typeof has in terms of generating new blank nodes on the fly. Reconciling those with the encoding of collections or containers might lead to complications to the user.

Is this implementable?

It is, and it has been… The shadow version of pyRdfa does it.

Open Issues

How should this appear on the RDF API level?

The question is whether the API should reflect, shall we say, the original DOM or the transformed DOM when, e.g., creating Property groups? If the mechanism is defined as a preprocessing step, then it might be o.k. to stick to the transformed DOM (similarly to the fact that, if some syntax is used for lists in, say, Turtle, the triple store still contains the rdf:first, rdf:next, etc, predicates).

Last modified on 23 August 2011, at 09:03