Re: HTTP Protocol review from Chimezie Ogbuji on 2010-05-17 (public-rdf-dawg@w3.org from April to June 2010)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Mon, 17 May 2010 19:22:22 -0400
To: "Gregory Williams" <greg@evilfunhouse.com>, "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-ID: <C81747EE.119B6%ogbujic@ccf.org>
Thanks for the detailed review Greg. See my response below.

On 5/15/10 3:31 AM, "Gregory Williams" <greg@evilfunhouse.com> wrote:
> Chime,
> 
> Below is my review (ACTION-235) of the HTTP Protocol document. Most of the
> issues detailed just need some clarification in the text, or are formatting
> issues. There are a few bigger issues (the use and meaning of "dataset",
> discussion of what constitutes a "compliant implementation", the affect of
> conditional requests on non-GET operations). These might need discussion, but
> I don't know that they have to be nailed down before this draft publication.

> --------
> Abstract
> 
> You link to SPARQL Query, but not to SPARQL Update (even though both are
> mentioned directly).

I've added the standard references to the other SPARQL 1.1 specifications in
the introduction as well as a link to the update specification from the
abstract. 

> --------
> 1. Introduction
> 
> I'm not sure I understand the second sentence. It reads:
> 
> """It emphasizes a clear separation between the RDF graph management actions
> performed from the networked body of RDF knowledge identified by a URI as the
> target of the actions, the lexical form of a Request URI, the URI of a graph
> in an RDF dataset, and the (optional) RDF delivered with the message."""
> 
> Is there any way to make that more clear? In particular, I can't seem to parse
> "the RDF graph management actions performed from the networked body of RDF
> knowledge identified by a URI as the target of the actions".

Changed to "It emphasizes a clear separation between a RDF graph management
action from the networked body of RDF knowledge identified as the target of
the action, the lexical form of a Request URI, the URI of a graph in an RDF
dataset, and the (optional) RDF delivered with the message."

> --------
> 2. Terminology

Note, I also incorporated the terminology changes specified in response to
earlier comments (see:
http://lists.w3.org/Archives/Public/public-rdf-dawg/2010AprJun/0067.html).


> sp. "enduce".

fixed
 
> "Architectural style" is only ever used in the immediately following
> definition for REST. Is it really necessary, or might it be rolled into the
> definition for REST?

removed
 
> "IRI" should be defined before it is used in the definition for "Resource".
> Also, this definition says "Before an IRI found in a document is used by HTTP,
> the IRI is first converted to a URI. See Section 4.1", but section 4.1 doesn't
> seem to actually talk about this conversion process.

"See Section 4.1." was removed
 
> "Network-manipulable RDF (Dataset)" - Is "dataset" here used in the same way
> as it is in SPARQL (with a default graph)? I can't find any discussion of a
> default graph in the rest of the document, and fear two definitions of
> "dataset" in SPARQL documents might be confusing.

The use of dataset here is specific to the term introduced.  It indicates
that a Network-manipulable RDF dataset is a special kind of graph store.

I have removed the parenthesis around "Dataset."

> Also, worth adding an
> explicit reference to section 4.2 when you say "URIs that can be embedded in
> the query component of URI in a manner described later in this document".

Done.
 
> "Networked RDF knowledge" - I find the use of "knowledge" to be confusing for
> an information resource (perhaps just a personal preference, but I'd think
> knowledge would be more likely to be a non-info resource, all else being
> equal). 

I'm not sure I agree.  The term knowledge as it is used here is from
"knowledge representation" (KR).  KRs typically are such that "all of their
essential characteristics can be conveyed in a message": they have
well-defined syntaxes.  Also, knowledge was used to distinguish Networked
RDF knowledge from the more general term 'information resources' due to
having a formal way to interpret them and typically interpretation is what
distinguishes data and information from knowledge.

> Use of "IRI" and "URI" in this definition and the previous one don't
> seem consistent (almost being used interchangably).

I have cleared up the use of both terms in this definition and the previous
one.

> 
> At the end of this section, you define "the resolvable URI of a graph" as a
> shorthand, but that phrase isn't actually used in the rest of the document as
> far as I can see. You do use "resolvable URI" once, but in a parenthetical, so
> might just be worth expanding the shorthand for the one use.

Done.

> --------
> 3. Protocol Model
> 
> This section says, "A compliant implementation of this specification MUST
> accept HTTP requests directed at its dataset and handle them as specified by
> this protocol," but this disagrees with the discussion in section 7 (Security
> Considerations) and doesn't seem to leave much wiggle room for things like
> refusing requests that seem like DOS attacks, etc.

Changed to "A compliant implementation of this specification SHOULD accept
HTTP requests directed at its dataset and handle them as specified by this
protocol with the exception of security considerations such as those
discussed in section 7 and other (DOS attacks, etc.)"

> --------
> 4.1 Direct Graph Identification
> 
> "However, in using URIs in this way, we are not directly identifying the RDF
> graphs but rather the networked RDF knowledge they represent." Isn't this
> backwards? Doesn't the networked RDF knowledge represent the graph?

Changed to: "However, in using a URI in this way, we are not directly
identifying an RDF graph but rather the networked RDF knowledge that is
represented by an RDF document, which serializes that graph. Intuitively,
the interpretation [RDF-MT] of the RDF graph serialized by the RDF document
can be thought of as the Networked RDF knowledge itself."

This is meant to clarify the relationship between the RDF document / payload
(which is passed back and forth over HTTP), the RDF graph (in graph store),
and the Networked RDF knowledge.  I have updated the diagram accordingly as
well.  

> The diagram in this section should be labeled. The description of the diagram
> talks about URIs, but the diagram itself uses "IRI".

Fixed

> As previously, I'm not
> sure if the "representedBy" arc in the diagram should be "represents".

Changed to 'serialize' (the diagram as it was before was incorrect,
actually).  See above

> Is 
> "NetworkManipulableDataset" meant to signify a single graph, or a dataset?

It is the Network-manipulable *Dataset* - i.e., the graph store managed via
the protocol.  The request URI designates a single IRI, graph pair from the
graph store.

> Finally, what do the black arcs represent (serialized RDF)?

The represent a HTTP request / response (the diagram has been updated)

> --------
> 4.2 Indirect Graph Identification
> 
> Are there any restrictions on what URIs can be used with the ?graph=... query
> component? 

There is no restriction in the same way that there is no restriction in
which request URI can be specified to interact with any information resource
(a request URI that doesn't identify a resource can be specified for
example, but it is up to the implementation to respond accordingly)

> It might be helpful to see a full example URI using indirect graph
> identification.

I've added an example
 
> This diagram should be labelled, too, and at least some mention made of it in
> text. There should also be some sort of visual connection in the diagram
> between the "Networked RDF knowledge" node and the 'http://..?graph=...' node.

I chose not to connect them to emphasize that the reference is indirect (via
embedding the URI).

> --------
> 5. Graph Management Operations
> 
> I'm not sure "Networked RDF Knowledge" can be used in the first sentence if
> you want to include indirectly identified graphs as possible targets for these
> operations. The definition of "Networked RDF Knowledge" seems to only include
> the directly identifiable graphs. This terminology issue occurs throughout

The definition of a graph IRI includes: ".. specified as a request URI *or
embedded* as the query component of a request URI".  Networked RDF Knowledge
is defined as " An information resource identified by a *graph IRI*"

> section 5.
> --------
> 5.1 PUT
> 
> The example SPARQL UPDATE operations should seperated by semicolons. The
> syntax should also be aligned with the most recent draft of the Update doc (I
> believe this means using 'INSERT DATA { GRAPH <graph_uri> { ... } }' instead
> of 'INSERT DATA INTO <graph_uri> { ... }'.

changed

> "Note that the DROP and CREATE expressions are only necessary if the networked
> RDF knowledge does not already exist in the server." I'm not sure I understand
> this. If the networked RDF knowledge doesn't exist, why would the DROP be
> necessary?

Changed to " Note that the DROP expression is not necessary if the networked
RDF knowledge does not already exist in the server."  I'm not certain if
this is inline with the consensus regarding non existent graphs (was an
issue before but was resolved).
 
> The use of the word "can" seems very weak in describing the semantics of PUT:
> "the origin server can create the knowledge with that URI in the associated
> network-manipulable dataset". Why not "SHOULD"?

Changed.

> --------
> 5.2 DELETE
> 
> Why does the text here explicitly say that "the client cannot be guaranteed
> that the operation has been carried out, even if the status code returned from
> the origin server indicates that the action has been completed successfully"?

This was pulled directly from HTTP RFC which says the same thing.  The
intuition (as I understand it) is to support delayed operation.  A
successful response indicates the intent to delete the resource not an
indication that it has already been deleted.  If this protocol deviates from
standard HTTP semantics in this regard it might break the expectations of
upstream applications that expect PUT in this protocol to behave same as it
would over vanilla HTTP.

The HTTP RFC does say ".. 202 (Accepted) if the action has not yet been
enacted", so I've added a suggestion for the reader to refer to the HTTP RFC
for further information about behavior in the beginning of section 5:

"Developers of implementations of this protocol should refer to [RFC2616]
for additional details of appropriate behavior beyond those specified here.
This
section only serves to emphasize the behavior specific to the
manipulation of Networked RDF knowledge."

> What good is a "completed successfully" response if it doesn't actually mean
> anything? Would using HTTP 202 (Accepted) be a better code to use in that
> situation?
> 
> Why are brackets used in the SPARQL UPDATE operation?

Removed

> --------
> 5.3 POST
> 
> Again, why are the brackets used in the SPARQL UPDATE? This is another place
> where the SPARQL UPDATE syntax should be aligned with the current draft.

Brackets removed, syntax updated
 
> I can't make sense of this: "and distinguish such a request from the insertion
> use case on the basis of whether or not the request URI identifies networked
> RDF knowledge managed by the server".

That paragraph has been changed to "Alternatively, if the request URI
identifies a container resource (designated by the origin server) and not
networked RDF knowledge, the origin server SHOULD accept the RDF payload
enclosed as a request for the container resource identified by the request
or encoded URI to create a new RDF graph comprised of the statements in the
payload. The server SHOULD return the URI associated with the new graph via
the Location HTTP header in a 201 Created response. This scenario is useful
for situations where the requesting agent either does not want to specify
the graph IRI of a new graph to create (via the PUT method) or does not have
the appropriate authorization to do so."

So, if the graph IRI (embedded or otherwise) corresponds to an existing
graph, the triples are inserted into it, otherwise, this operation
corresponds to the addition of graph by submitting to a service that assigns
the graph IRI.   

> Is the "insertion use case" just the use
> of POST on a previously-non-existant graph? Should the Location HTTP header be
> returned with a 201 Created response even if the graph is indirectly
> identified (and possibly not dereferencable)?

If the embedded IRI does not correspond to an existing graph then it either
identifies a 'container resource' or it doesn't.  If it doesn't, the origin
server should return a 404, otherwise the operation should proceed as
defined.  I've added text clarifying this.

> --------
> 6. Conditional Requests
> 
> The text in this section talks about "any of the operations", but only goes on
> to talk about GET requests. I'd like to see some discussion of how conditional
> requests work with PUT, DELETE, and POST requests, and especially in
> combination with indirect graph identification.

But the semantics of conditional requests apply to all operations in the
same way that If-Modified-.. Etc. headers can be used with any HTTP Method
and the interpretation of the request would follow from the HTTP rfc.  I
don't think the management of RDF presents any caveats that should be called
out.

-- Chime


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News & World Report (2009).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Monday, 17 May 2010 23:31:14 UTC