Second RDF WG Face-to-Face meeting

The 2nd RDF F2F meeting took place on the 12-13 of October, using two locations: about half of the participants were in Cambridge, USA, hosted by MIT, while the other half were in London, UK, hosted by the BBC. Using a schedule bridging the time differences and a video connection between the two meeting rooms we could have an almost genuine F2F; though a meeting with everybody in one place might have been better, this set up worked well after all.

Most of the two days’ meeting concentrated on “graph identification”, i.e., what is commonly called “named graphs” by the community. This discussion at the F2F was the outcome of a long series of discussion on the group’s mailing list, started, essentially, right when the group began its work. There are many issues surrounding this loose notion of named graphs, including terminology, semantics, syntax; although the F2F meeting has not solved all the problems, significant advances were achieved.

First the terminology. At the beginning of the working group’s life, Sandro Hawke proposed a “working” terminology that became known as the “g-*” terms in the discussions so far. The group adopted a more definite terminology as follows. An “(RDF) Graph” (formerly g-snap) is used for a graph in the mathematical sense, i.e., a set of triples; thus a Graph is immutable and abstract. An “Graph Container” (formerly g-box) is a concrete and usually mutable entity (e.g., a file, a database content) that has an RDF Graph as a state. Finally, an “RDF Serialization” (formerly g-text) is a textual encoding of an RDF Graph in some sort accepted syntax (Turtle, RDF/XML, etc). The clear separation of concepts, for example the immutable/mutable aspect, is important, and will have to be reflected in the final documents.

Most of the discussion was on what a named Graph, i.e., a (u,G) pair really means. First of all, it has to be recognized that there might be a difference whether G is a Graph or a Graph Container; depending on possible applications, (u,G) might actually be a named Graph Container, rather than a named Graph. But the main issue, that took most of the discussion at the meeting, was whether there is some sort of a “semantics” assigned to u v.a.v. G or not. One school of thought proposed that u (which is a URI), when dereferencable, should actually return G when indeed dereferenced. E.g., it should return the state of a Graph Container (if G is a container) or G itself (if G is a Graph) using some syntax of Graph Serialization. Others defended the position that there should be no such requirement whatsoever; (u,G) should simply be a loose association, a labeling of a Graph or a Graph Container, and nothing else. The group went through a number of use cases, considered existing deployment (e.g., in SPARQL data sets), discussed Web Architectural principles, semantic issues, etc. Finally, though many people believed that a dereferencable semantics attached to the name would be a good thing to have, the feeling was that the current deployments, usage patterns, etc., did not make this possible. “This ship has already sailed”, as one participant said. Although there was no formal resolution, there was a strawpoll that reached consensus among the participants, saying that in the formal definition of a named Graph (note that the exact terminology may still change!) there is no attached semantics to the name, but the group will document usage practices that reflect a dereferencing behavior as part of a non-normative section of the final document. Obviously, there are still lots of details to be clarified (e.g., whether the group would define relationships among the names, like graph equivalence, sub graph relations, etc) but this consensus gives a good guidelines for the future work to address the details.

Although the names graph discussion took most of the energy and time, some other issues were also discussed. For example, the group has finally passed a formal resolution on the long-standing issue on how a language tagged literal can also be given a datatype, thereby giving all literals a clear datatype instead of the current duality of typed and non-typed literals (note that for non language-tagged literals this issue was already closed a while ago). After much discussions on the mailing list the decision is that a separate and single datatype, called (probably) rdf:langString, will be defined that refers to these and only these literals, with a special lexical-to-value mapping (see the exact text for the details). The discussions that preceded this resolution considered introducing a series of datatypes, i.e., one per language tag, but the practical issues seemed to be way too complicated in practice (e.g., language tags are case insensitive, while URI-s used to identify datatypes ought to be case sensitive, which introduced a huge potential for errors).

All in all, it was a great meeting in an enjoyable atmosphere. We have to thank our two hosts, namely Sandro Hawke at MIT, and Yves Raimond at the BBC, to make this all possible! The minutes are public both for the first and the second days.

(Please, if you have comments on the technical issues, you should comment on the relevant WG mailing list, rather than this blog…)

Enhanced by Zemanta

About Ivan Herman

Ivan Herman is the leader of the Digital Publishing Activity at W3C. For more details, see http://www.w3.org/People/Ivan/