Reflections from RDF-WG F2F2

From Provenance WG Wiki
Revision as of 15:49, 20 October 2011 by Tlebo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

author: Tim Lebo

Introduction

I had the opportunity and pleasure to participate in the RDF-WG's F2F2 in Boston, Ma. on 12-13 October 2011. As a member of PROV-WG that is developing the OWL encoding of PROV-DM, it was very informative to learn the considerations and perspectives from RDF-WG -- especially related to the future of named graphs. This page provides a summary of discussions as it pertains to PROV-WG, along with pointers to materials created and discussed at the F2F.

My largest concern is that SPARQL-WG, RDF-WG, and PROV-WG are not properly aligned, and premature conclusion of one WG may hinder the results of the others. This, of course, needs some justification that I hope to develop quickly enough to be useful for all of the WGs.

Outbrief

Terminology

The three: RDF Graph, Graph Container, and Graph Serialization

RDF-WG resolved on the following terminology [1], which helps clarify some discussions in Using named graphs to model Accounts:

  • RDF Graph (informally, "g-snap") - an instance of an RDF Abstract Model [2] - an unchanging set of triples.
  • Graph Container (informally, "g-box") - a persistent place that one puts an RDF Graph, e.g. in a SPARQL endpoint's GRAPH, a file on a web server, etc.
  • Graph Serialization (informally, "g-text") - a serialization of an RDF Graph, e.g. the characters in a Turtle, TRiG, or RDF/XML file.

The RDF Graph in a Graph Container can differ over time, and an RDF Graph can be encoded by multiple Graph Serializations. (Note, this is my phrasing for summary).

FRBR analogue

I'd like to point out that the relationships among Graph Container, Graph, and Graph Serialization correspond directly to FRBR's Work, Expression, and Manifestation, respectively:

  • A RDF Graph is a frbr:realizationOf a Graph Container,
    • which would make new RDF Graphs in the container new "editions" of the Graph Container (just as frbr:Expressions do for frbr:Works).
  • A Graph Serialization is an frbr:embodimentOf an RDF Graph,
    • and different Graph Serializations (frbr:Manifestations) of an RDF Graph (frbr:Expression) do not change what is "said" in the RDF Graph.

Even RDF-WG's diagram took the form of the tell-tale "FRBR stack".

See a diagram of the FRBR ontology, as well as a collection of FRBR materials (including a recent paper by Jim and I).

Rdf-wg-f2f-frbr.png

The fourth: RDF Dataset

RDF Dataset is a fourth term that RDF-WG commonly cites, which they recently resolved to adopt as the SPARQL-WG defines it. The definition is [3]:

An RDF Dataset is a set { G, (<u1>, G1), (<u2>, G2),... (<un>, Gn) } 
where G and each Gi are RDF Graphs and each <ui> is an IRI.

G is the default (unnamed) RDF graph, while the Gi RDF Graphs are "associated in some way" with <ui>

SPARQL-WG has defined the RDF predicate sd:name [4] to represent the <ui> in the definition above. The value of sd:name is the same value mentioned by the GRAPH keyword in SPARQL INSERT statements, such as INSERT DATA { GRAPH <http://www.w3.org/People/Berners-Lee/card> { owl:sameAs owl:sameAs owl:sameAs } }.

Although the WG agrees upon RDF Graph, Graph Container, Graph Serialization, and RDF Dataset, the relationships among them continues to be unresolved.

Part of the problem is the collision between "Graph" in the RDF Dataset definition and "RDF Graph"/"Graph Container" as they've distinguished them. In practice, RDF Dataset's "Graph" is used as both an RDF Graph and Graph Container in confounding ways; it is a Graph Container in a SPARQL Update query, but also an RDF Graph (i.e., fixed set of triples) when query results are materialized. This would be best resolved by introducing "Graph Container" into SPARQL 1.1's recommendation -- and doing so would better align with PROV-WG and RDF-WG's needs.

Naming vs. identifying

Pat Hayes' was very particular about distinguishing between "naming" and "identifying" a named graph.

(Anything in this section is my best attempt from listening in on discussions of emails I did not get)

Applying Architecture of the World Wide Web to relate Graph Container to its RDF Graph

From what I could tell, "identifying" is more powerful than "naming", since "identifying" something (in the Architecture of the WWW sense [5]) accompanies one or more representations of the Resource identified - if they are HTTP requested. Sandro pointed us to the triangle-ish diagram from AWWW.

This could be used to relate Graph Containers to RDF Graphs, as recommended by (Pat?)[6]:

a [Graph Container] is a resource whose representations are (a recognized interchange syntactic form of) [RDF Graphs]. 

Although Richard agreed, I might make an argument that a representation can only be a Graph Serialization and NOT an abstract RDF Graph as Pat suggests.

I agree with Pat and Richard, but would annotate it a bit for clarity:

a [Graph Container] is a resource whose representations are (a recognized interchange syntactic form of) [RDF Graphs]. 
                                                                          ^^^-- Graph Serialization

Would it work if we HTTP request the (identified) Graph Container, get 303'd to a URI (L2) for the RDF Graph in the container, and content-negotiate with L2 to get one of many possible Graph Serializations? I think this is how it would work with this proposal.

Naming is just as weak as dcterms:identifier

So, given that "identifying" is as defined by AWWW, what is "naming"?

I'm hoping that it means:

pat:name rdfs:subPropertyOf dcterms:identifier .

dcterms:identifier skos:definition "An unambiguous reference to the resource within a given context."

(emphasis on given context)

With dcterms:identifer, I can "name" my laptop "work macbook pro", but a TON of out-of-band context is needed to know that I was referring to the same Resource awww:identified by <http://purl.org/twc/id/machine/lebot/MacBookPro6_2>. SPARQL 1.1 runs into the same "need more out-of-band context" with their sd:name and INSERT DATA { GRAPH <http://www.w3.org/People/Berners-Lee/card> { ... } } since <http://www.w3.org/People/Berners-Lee/card> can't identify the Graph Container nor the RDF Graph it contains, since other SPARQL endpoints (and the URL itself) are elsewhere doing different things; we don't have enough context to name the Graph Container and RDF Graph just created in this SPARQL endpoint.

SPARQL 1.1 is causing a lot of headaches not only because RDF Graphs are confounded with Graph Containers, but also because Graph Containers are only pat:named and dcterms:identified and sd:named, never awww:identified!

Avoiding (an unnecessary) "fifth column"

While a "fourth column" to represent named graphs is generally acknowledged as essential, RDF-WG began to discuss the need for a "fifth column". The fifth column would address the case where different SPARQL endpoints (or anything else passing their data around) may use the same URI in the (sd:name, G) pairs of their RDF Dataset. The problem here is that sd:name does not awww:identify the Graph Container (nor the Graph); it is merely pat:naming (and dcterms:identifying) it. So, the "fifth column" would provide the distinction required to know one SPARQL endpoint's Graph Container <http://www.w3.org/People/Berners-Lee/card> from another SPARQL endpoint's Graph Container <http://www.w3.org/People/Berners-Lee/card> (Consider the example INSERT DATA { GRAPH <http://www.w3.org/People/Berners-Lee/card> { owl:sameAs owl:sameAs owl:sameAs} } when submitted to the endpoint http://dbpedia.org/sparql).

Unfortunately, the "fifth column" discussions arose to address the problems created by the (unfinished) SPARQL 1.1 treatment of named graphs. A predominant sentiment among RDF-WG was that the "SPARQL 1.0 ship has sailed" and the WGs cannot abandon current practical successes. I completely agree that we can't abandon these successes, but I'd like to propose a reinterpretation of current implementations' behavior so that we can reconcile them with the new (and VERY useful) distinctions among RDF Graph, Graph Container, and GraphSerialization that RDF-WG has formulated.

We can avoid an unnecessary fifth column by:

  1. Recognizing and resolving where SPARQL 1.1 conflates sd:naming (pat:naming, dcterms:identifying) with awww:identifying
  2. Recognizing and resolving where SPARQL 1.1 conflates RDF Graphs with Graph Containers
  3. Recommending canonical URI structures to awww:identify Graph Containers present in SPARQL endpoints [7] [8], TRiG files, etc.
  4. Providing minimal vocabulary and inference to awww:identify Graph Containers using their pat:names and provenance.
    1. (This could be used to handle URIs that do not follow best practices)
    2. specifically [9], a rdf2:inDataset from rdf2:GraphContainer to sd:Dataset [10] and an owl:hasKey ( sd:name [11] rdf2:inDataset )
  5. Using PROV-O to describe the derivations of Graph Containers, Graphs, and GraphSerializations.

Achieving the above would permit us to address the same problems with a more concise set of technical mechanisms.

In short, the problem can be solved with a clear distinction between Graph Containers and Graphs, a named graph serialization (e.g., TRiG), ways to awww:identify Graph Containers, some vocabulary, and RDF as we already know it.









Materials during meeting

http://www.w3.org/2011/rdf-wg/wiki/F2F2

On the first day, I presented Using named graphs to model Accounts and got some feedback, which I noted at the bottom of the document.

http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology describes the terminology that RDF-WG is working from.

RDF-WG discussed Sandro's http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs/Options

After the meeting, I tried to consolidate for another example RDF-WG GraphContainers

Naming Graph Containers Via SPARQL 1.1 + RDF 1.1

This section describes three events that populate three different Graph Containers. Two of the Graph Containers are "local" forms of a third "global" Graph Container. We start with the two locals, then finish with the third.

1) Put a triple into LOGD's Graph Container "named" <http://www.w3.org/People/Berners-Lee/card> with a SPARQL Update request:

# (Telling http://logd.tw.rpi.edu/sparql)
INSERT { 
  GRAPH <http://www.w3.org/People/Berners-Lee/card> { 
     card:i rdfs:label "Tim Berners-Lee" 
  }
}

Submitting the above creates a "local" URI derived from <http://www.w3.org/People/Berners-Lee/card>, scoped by the SPARQL endpoint and borrowing the SPARQL 1.1 proposal [12]:

# The above INSERT implies the following descriptions.
<http://logd.tw.rpi.edu/sparql/rdf-graphs/service?graph=http%3A//www.w3.org/People/Berners-Lee/card>
   a rdf2:GraphContainer;
   sd:name            <http://www.w3.org/People/Berners-Lee/card>; # since sd:name was created for this purpose.
   dcterms:identifier <http://www.w3.org/People/Berners-Lee/card>; # "An unambiguous reference to the resource within a given context."
   skos:broader       <http://www.w3.org/People/Berners-Lee/card>; #  this "groups" _this local GraphContainer "under" the "global".
   # sd:name would be rdfs:subPropertyOf dcterms:identifier and skos:broader
.
<http://www.w3.org/People/Berners-Lee/card>
   a rdf2:GraphContainer;
.
[ a owl:NegativePropertyAssertion; # The _local Graph Container is NOT the same as the global.
  owl:annotatedSource   <http://logd.tw.rpi.edu/sparql/rdf-graphs/service?graph=http%3A//www.w3.org/People/Berners-Lee/card>;
  owl:annotatedProperty owl:sameAs;
  owl:annotatedTarget   <http://www.w3.org/People/Berners-Lee/card>
].

(A hacky approach to name the local GraphContainers, which works with SPARQL 1.0, is here)

2) Put a triple into DBPedia's GraphContainer "named" <http://www.w3.org/People/Berners-Lee/card>

# (Telling http://dbpedia.org/sparql)
INSERT { 
  GRAPH <http://www.w3.org/People/Berners-Lee/card> { 
     card:i rdfs:label "Tim Berners-Lee" 
  }
}

This creates a "local" URI derived from <http://www.w3.org/People/Berners-Lee/card>, scoped by the SPARQL endpoint and borrowing the SPARQL 1.1 proposal. The local URI references the global/web accessible/authoritative URI of the Graph Container.

# implies
<http://dbpedia.org/sparql/rdf-graphs/service?graph=http%3A//www.w3.org/People/Berners-Lee/card>
   a rdf2:GraphContainer;
   sd:name            <http://www.w3.org/People/Berners-Lee/card>;
   dcterms:identifier <http://www.w3.org/People/Berners-Lee/card>;
   skos:broader       <http://www.w3.org/People/Berners-Lee/card>;
.
<http://www.w3.org/People/Berners-Lee/card>
   a rdf2:GraphContainer;
.
[ a owl:NegativePropertyAssertion;
  owl:annotatedSource   <http://dbpedia.org/sparql/rdf-graphs/service?graph=http%3A//www.w3.org/People/Berners-Lee/card>;
  owl:annotatedProperty owl:sameAs;
  owl:annotatedTarget   <http://www.w3.org/People/Berners-Lee/card>
].

0) Putting something into the global/authoritative/web accessible GraphContainer

Someone copies card.rdf to /var/www/People/Berners-Lee/ (which is available at http://www.w3.org/People/Berners-Lee/card)

(or any other procedures that results in returning RDF when requesting http://www.w3.org/People/Berners-Lee/card)

Just give me a URI!

We need a way to reference others' rdf2:GraphContainers, so that anybody can describe them. To do this without "URI conventions", we need two predicates, one of which already exists.

@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .

rdf2:Dataset
   rdfs:comment "A Dataset is {0,1} DefaultGraph \union {0,N} <sd:name,GraphContainer> pairs.";
   rdfs:subClassOf void:Dataset;
   rdfs:subClassOf [
      a owl:Restriction;
      owl:onProperty void:subset;
      owl:maxQualifiedCardinality 1;
      owl:onClass rdf2:DefaultGraph;
   ];
   rdfs:subClassOf [
      a owl:Restriction;
      owl:onProperty void:subset;
      owl:minQualifiedCardinality 0;
      owl:onClass sd:NamedGraph;
   ];
.

rdf2:DefaultGraph
   owl:equivalentClass [
      a owl:Restriction;
      owl:onProperty sd:name;
      owl:maxQualifiedCardinality 0; # Or force a hasValue :null?
   ];
.

rdf2:inDataset
   a owl:ObjectProperty;
   rdfs:domain rdf2:GraphContainer;
   rdfs:range  rdf2:Dataset;
.

rdf2:GraphContainer
   a owl:Class;
   owl:equivalentClass [
      a owl:Restriction;
      owl:onProperty sd:name; # This is what is mentioned during SPARQL 1.1 INSERTs
      owl:minCardinality 1; # Want to include DefaultGraph here, so would need hasValue :null
   ];
   rdfs:subClassOf [
      a owl:Restriction;
      owl:onProperty rdf2:inDataset;
      owl:minCardinality 1;
   ];

  owl:hasKey ( sd:name rdf2:inDataset );
.


sd:name is reused as it was defined, and is the value referenced in SPARQL INSERTs. It is also a dcterms:identifier (which is defined as "An unambiguous reference to the resource within a given context.") [13]. The rdf2:inDataset references the "given context" within which the sd:name (dcterms:identifier) should be (and is) considered.

So, when I run into the following random RDF,

:random_uri
   a rdf2:GraphContainer;
   sd:name        <http://www.w3.org/People/Berners-Lee/card>;
   rdf2:inDataset <http://dbpedia.org/sparql>;
.

Anybody can recognize the instance of :random_uri and can do something useful with it -- reference a Graph Container in someone's SPARQL endpoint.

Requesting :random_uri returns the same results as <http://dbpedia.org/sparql?query=CONSTRUCT * WHERE { graph <http://www.w3.org/People/Berners-Lee/card> { ?s ?p ?o }> (while DESCRIBE <http://www.w3.org/People/Berners-Lee/card> could give different results)

If I dereference this URI, I get the rdf2:Graph (via a rdf:GraphSerialization) that is in dbpedia's rdf2:GraphContainer. If I compare the triples I get with those from <http://www.w3.org/People/Berners-Lee/card>, I can assert something in my own vocabulary (something that RDF-WG doesn't have to bother with):

:random_uri my:has_same_triples_as <http://www.w3.org/People/Berners-Lee/card> .

By definition,

<http://www.w3.org/People/Berners-Lee/card> a rdf2:GraphContainer .


RDF-WG doesn't need to bother with my:has_same_triples_as or any other arbitrary nuanced relationships among Graph Containers, because RDF-WG gave everybody a way to reference others' Graph Containers, and let the rest to RDF.


David: how does it fit into void?

put into sd:

sd:inDataset rdfs:range sd:Service;
             owl:inverseOf sd:availableGraphDescriptions .

rdf2/sparql1.1 "Dataset" is an optional DefaultGraph and N URI-Graph pairings.

David: how to do with Federation?