CommentResponse:ID-1

From SPARQL Working Group
Revision as of 21:39, 3 January 2011 by Cogbuji (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

see: ID-1

Hello, Ian. Thanks for your comments, see the response(s) below (in context):

Section 2

> Graph Store is defined to be mutable. I don't see why it needs that 
> requirement. The read only aspects of this document could apply to a 
> non-mutable Graph Store

This definition is the same as that in the SPARQL 1.1 Update document. In both cases, this is necessary in order for the graphs to be subject to all operations (beyond those that are idempotent).

Section 4.1

> I don't at all understand the need for the distinction in this document 
> between a graph and RDF knowledge. I find the supplied explanation 
> particularly confusing:
> "we are not directly identifying an RDF graph but rather the RDF knowledge 
> that is represented by an RDF document, which serializes that graph"

Note, the phrase "that is represented by an RDF document, which serializes that graph" is from the 1.0 SPARLQ specification: 8.2.2 Specifying Named Graphs (last paragraph). The following entry has been added to the terminology section (of the editor's draft, for incorporation into the next publication) for clarification:

Serialize (verb.) - When used in a sentence where the subject is an RDF document and the object is an RDF graph, this is understood to mean that the result of parsing the document is the graph.

> I have seen serialization and representation used interchangeably in many REST  
> discussions but never seen them used as distinct operations so I don't know 
> what to make of it really. 

The word serialization is meant to be used in the sense having to do with parsing. Hopefully, this terminology clarification addresses your concern.

> If my understanding of the terminology is correct than I think the 
> relationships are that RDF Knowledge is the result of interpreting an RDF 
> graph which may be represented by an RDF document. In this case the identified 
> resource that is emitting representations is the graph itself.

This is not the case, and the section mentioned above in the original SPARQL 1.0 specification makes this clear (and is the primary reason why this distinction is emphasized): "[...] the relationship between an IRI and a graph in an RDF dataset is indirect. The IRI identifies a resource, and the resource is represented by a graph (or, more precisely: by a document that [...]"

The graph IRI identifies a resource that "emits" representations (serializations of a graph as RDF documents). The relationship between a named graph and its IRI is part of the definition of a dataset, however the relationship between what the graph IRI identifies and the graph is only briefly described above. This specification uses the term RDF knowledge to attempt to build on this and provide an intuitive understanding of the relationship between that resource and the graph as a framework behind a RESTful abstraction of an RDF dataset. The (informal) intuition is that the graph IRIs identify the meaning of the graph and RDF-MT provides a relationship between an RDF graph and its meaning (interpretation).

> The immediately following sentence "Intuitively, the interpetations that 
> satisfy [RDF-MT] the RDF graph serialized by the RDF document can be thought 
> of as this RDF knowledge" implies that the Graph IRI identifies multiple 
> things, i.e. multiple interpretations. It's axiomatic on the web that a URI 
> (IRI) identifies only one resource so I see this as a conflict.

In the editor's draft this has been changed to: "Intuitively, the set of interpetations that [...]". Again, this is meant to be informal characterization and the idea is that all interpretations that satisfy the graph comprise this meaning that the graph IRI identifies, since they all have in common the fact that they adhere to the (mathematical logic) constraints in the vocabulary and the structure of the graph.

> I assume the introduction of the term "RDF Knowledge" is motivated by an 
> attempt to unify the concept of distinct document-like resources that you 
> encounter on the web and an aggregation of the data in those documents as you 
> might find in a database.

The term is motivated by using RDF-MT to (informally / intuitively) characterize the relationship between the resource identified by the IRI of a named graph and the graph itself.

Section 4.2

> The diagram implies that the encoded URI (e.g. 
> http://www.example.org/other/graph) and the indirect URI 
> http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/g
> raph identify they same RDF Knowledge. Does this imply this triple:
> <http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/
> graph>
> <http://www.w3.org/2002/07/owl#sameAs>
> <http://www.example.org/other/graph> .

Yes. This allows (simultaneously):

  • Clarity regarding the REST principle of identification ("REST uses a resource identifier to identify the particular resource involved in an interaction between components.")
  • Clarity regarding the notion of the scopes of the various parts of a URI (the fragment, the query component, the path, etc.) as defined in RFC 3986, which states that data in the query component further distinguishes which resource (within the scope of the naming authority and path) is being identified
  • The ability to use HTTP to manipulate graphs in a graph store that are not accessible for various reasons (the most probable being that their IRIs are not resolvable)


> I think the whole notion of indirect identification is problematic. What the 
> document is saying, in essence, is that if you have the URI of a graph you 
> need to discover some other URI by an unspecified mechanism with which to 
> manipulate it. 

Recent changes in the editor's draft (see the end of section 4.2) clarify that (in the case of indirect identification), the part of the URI prior to the query component is the URL of the service itself and so it is reasonable to assume that the client knows this URL a priori.

> If you discover multiple such URIs would you be justified in 
> assuming that they all manipulate the same underlying graph?

The graphs manipulated via their IRIs are scoped to a graph store which is scoped to the service, so there will only be one such (service) URL to discover in order to (indirectly) manipulate the named graphs within via the use of the ?graph= query component.


> I am not convinced that it is intuitive that the following identify different 
> graphs that have the same URI

http://foo.com/graphs?graph=http%3A//www.example.org/other/graph http://bar.org/rdf-data?graph=http%3A//www.example.org/other/graph

The service URLs are different and so the HTTP requests that use these as their request URIs would be manipulating the meaning of graphs that exist in separate stores. However, the embedded URIs are the same in both cases, so (since URI identification is a functional relationship) they would be accessing the same RDF knowledge on different stores. Whether or not those stores and the services they are a part of do indeed treat them as the same (by mirroring, for example) is outside the scope of this protocol. Can you further elaborate how this is not intuitive?

> Furthermore, should a conformant server that supports multiple independent 
> collections of graphs (e.g. Talis Platform) be required to enforce that graph 
> URIs identify the same knowledge across all the collections? 

This is beyond the scope of this protocol (which only specifies operations on a *single* graph store), however, I would think that such a server would need to or it would be at odds with the AWWW and REST.

> In other words 
> are the following required to manipulate the same "RDF Knowledge":
> ..snip..

Not required by this protocol, but by what is dictated by the URI specification

> The following sentence implies that this is the case: "Any server that 
> implements this protocol and receives a request URI in this form SHOULD invoke 
> the indicated operation on the RDF knowledge identified by the URI embedded in 
> the query component where the URI is the result of percent-decoding the value 
> associated with the graph key." 

Given, what I've said about scoping, would the following modification address your concerns?:

"[...] SHOULD invoke the indicated operation on the RDF knowledge (in the underlying graph store) identified by the URI embedded [...]"


> At this point I stopped my review. That the two areas I explored are 
> complicated excessively by the introduction of the RDF Knowledge concept into 
> what I feel should be a very simple and straightforward document. 

See the earlier point about the role of this term with respect to clarifying what is identified by a graph IRI (which is necessary as part of a REST protocol model for an RDF dataset).

>I believe 
> the removal of that concept and the introduction of a non-normative section 
> describing the expected behaviour of Graph Stores would be the best route 
> forward.

In light of the above response, can you clarify how the expected behavior of Graph Stores is not already covered by the this specification?

> It is also unclear what this document has to say about a central concept of 
> SPARQL: the dataset. I see in the change summary that the term Graph Store was 
> introduced to replace Dataset but I don't know the background to that 
> decision.

This was done to unify with the notion of a Graph Store that is common to this specification and the SPARQL Update language specification; both of which specify operations that change (or replace a graph).


> 1. Introduce a Graph Store as a service that manages a collection of datasets 
> and a collection of graphs. Many Graph Stores will have a single dataset, 
> multi-tenant ones will have many.

Currently, this specification and the SPARQL Update specification distinguish between a 'service' and a Graph Store and there is roughly a 1-1 correspondence between a Graph store and a a dataset. Can you elaborate on use cases where a single service would need to manage multiple collections of graphs (i.e., multiple datasets or graph stores)?

> 2. Describe operations on a Graph Store: GET to obtain a document describing 
> the graph store including a link to the collection of datasets, a link to the 
> collection of graphs, links to provided services and links to other 
> configuration information 

In the most recent batch of changes, HTTP OPTIONS / GET on the service URL returns a service description which includes much of the items you have listed.

> 3. Describe operations on the Collection of Datasets: GET to obtain the list 
> of datasets, POST to append a new one

The HTTP POST operation already specifies how to append a new graph

> 4. Describe operations on a Dataset: GET to obtain a list of graphs included 
> in the dataset, POST to include an existing  graph in the dataset, PUT to 
> replace the definition, DELETE to remove a dataset

It is not clear to me how this is different from what is already currently specified with the exception that the target in your case is the dataset rather than the resource identified by the graph IRI (the RDF knowledge)

The same questions are relevant to the remaining items in your list. Please indicate if this response adequately addresses your concerns