W3C | Submissions

Team Comment on the RDF/XML Source Declaration Submission

W3C is pleased to receive the RDF/XML Source Declaration Submission from INRIA.

Overview

There are several proposed specifications for what has become known in the Semantic Web technology jargon as “named graphs”, “quadruples”, “source graphs”, or “literal graphs”. All these terminologies attempt to assign an IRI to one or more triples, i.e., an RDF graph. Possible applications for this approach include signature and encryption considerations, query optimizations, etc. As an example, the SPARQL specification, defined by W3C, includes the GRAPH keyword to denote named graphs as part of the query pattern. See also “Named Graphs” a collection of (now somewhat outdated) notes on the subject, created by the former RDF Interest Group (now Semantic Web Interest Group).

A general issue related to named graphs is to provide a consistent semantics for named graphs to add this notion to the standard RDF/RDFS semantics. A good example for this work is the paper of Carroll et al.. Another issue to solve is how to provide a consistent syntax to denote named graphs. The paper of Carroll et al. includes a syntax extension to Turtle called TriG. SPARQL uses a similar syntax and so does N3. An XML based syntax has also been proposed under the acronym “TriX”. However, no syntax extending the RDF/XML recommendation has been proposed until now.

This is where this submission fits in the general picture. It proposes a simple mechanism to add information on quadruples to RDF/XML. The mechanism is based on a special attribute called “graph” (defined in a separate namespace). The value of this attribute is an IRI; when used, it assigns a “source URI” to a specific set of triples encoded in RDF/XML. Specifically, when used on an XML element in RDF/XML denoting an RDF subject, all triples in the subtree are considered to “belong” to the graph identified by the IRI. If used on an XML element referring to an RDF property, the source would include the “parent” element of the property (ie, the subject of the triple). By defaulting this process, on the top level rdf:RDF element, to the base URI of the document, each triple belongs to a specific source graph. (See section 3.2. Syntax extension: the source attribute for the precise definition.)

Some technical comments

The assignment of a source graph IRI to a triple, as described in the previous section, works well as long as no blank nodes are used. The submission relies on the assumption, however, that each blank node can belong to one source graph only. (This is a perfectly reasonable assumption adopted by, for example, SPARQL as well. It is not explicitly specified in the submission itself, just taken as granted and obvious.) The problem is that this assumption leads to numerous clashes with the simple source graph assignment in RDF/XML.

Section 4 of the submission describes these cases related to blank nodes. What essentially happens is that new blank nodes are to be generated by the modified RDF/XML parser to abide by this restriction. As an example (extension highlighted):

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:cos="http://www.inria.fr/acacia/corese#"
  cos:graph="http://www.w3.org">
 <rdf:Description rdf:about="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">
  <dc:creator>
    <foaf:Person>
     <foaf:name>Fabien Gandon</foaf:name>
     <foaf:mbox rdf:resource="mailto:fgandon@inria.fr" cos:graph="http://www.inria.fr" />
   </foaf:Person>
  </dc:creator>
 </rdf:Description>
</rdf:RDF>

yields, as specified in the submission:

<http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> dc:creator _:a <- http://www.w3.org
_:a rdf:type foaf:Person <- http://www.w3.org
_:a foaf:name "Fabien Gandon" <- http://www.w3.org
_:b foaf:mbox <mailto:fgandon@inria.fr> <- http://www.inria.fr

(where the ‘<- URI’ denotes the source graph for the triple). The blank node (denoted by <foaf:Person> in RDF/XML) has to be duplicated to ensure the separation of the two graphs. (It may be surprising to readers that _:b rdf:type foaf:Person <- http://www.inria.fr is not generated. The specification could be more clear on that point.) Similar problems appear when using, e.g., collections or containers that also rely on (implicit) blank nodes.

The submission refers to this problem as “it is dangerous to change sources around blank nodes”. It indeed adds an extra complication to RDF/XML, which makes it more likely to make mistakes and lead to unexpected behavior. This is true for all users, but especially those who may not be aware of all the intricacies of blank nodes and RDF(S) semantics. Taking into account that the “simplification rules” of RDF/XML are already at the source of lots of confusions for the user community, this problem should not be underestimated.

However, a possibly more serious problem is the fact that the set of RDF triples represented by the RDF/XML content will be different depending on whether the parser understands the RDF/XML Source declaration of the submission or not. By “different”, we do not only mean that some triples may not be generated by a “traditional” RDF/XML parser (which would be acceptable), but the generated graph structure is different. In other words, this extension does not represent a backward compatible extension to RDF/XML. In view of the large deployment base of RDF/XML, this may lead to serious interoperability issues.

Implementations

At the moment, the submission refers to one implementation, as part of the “COnceptual REsource Search Engine (Corese)” project of INRIA, August 2007.

Relationships to W3C Activities

The issue of Named Graphs, and the syntax thereof, is clearly relevant to the Semantic Web Activity. However, there is, at the moment, no Working Group that would actively pursue this line of work.

Next Steps

It is possible that, at some point in the future, W3C will charter a Working Group to review and possibly update the core RDF specifications. There are a number of minor and major issues that have already surfaced since 2004 (the publication of the core RDF recommendation) and more may come to the fore with the increasing deployment of RDF. The issue of Named Graphs, their semantics and syntax, as well as a possible new XML syntax(es) for RDF, appear as likely work items for such a Working Group. If and when this happens, this submission will have to be taken into account both for the chartering work as well as the work of that group. However, it is not clear at the moment when such work would start at W3C.


Author: Ivan Herman