Re: Review of SPARQL 1.1. Update editors' draft from Paul Gearon on 2010-05-18 (public-rdf-dawg@w3.org from April to June 2010)

From: Paul Gearon <gearon@ieee.org>
Date: Tue, 18 May 2010 17:13:22 -0400
To: Alexandre Passant <alexandre.passant@deri.org>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <AANLkTilXYrW_Yn6OfPyFe-e5J2pwWgBz5uwIiurTs4fz@mail.gmail.com>
Thanks for all the work Alex.

On Mon, May 17, 2010 at 5:33 AM, Alexandre Passant
<alexandre.passant@deri.org> wrote:
> Hi Lee,
>
> Thanks a lot for your feedback.
> Here are some first answer regarding your comments / queries.
> I'll let Paul complete some points, and we'll get back to you re. other unanswered comments

Jumping ahead....

<snip/>

> On 16 May 2010, at 03:53, Lee Feigenbaum wrote:
>> * We haven't closed ISSUE-37, but I would propose that:
>>  PROPOSED: Close ISSUE-37 noting that SPARQL 1.1 will not address basic federated update in any way, no change needed.
>
> I agree with the proposal but I haven't seen such proposal in minutes of previous TC.
> Could it be addressed in tomorrow's t-con ?

While I don't want to see updates applied in a federated way, it would
be awkward to prevent federation in the WHERE clause. After all, it
might be handy to remove data based on data found in another store.

Can we get opinions from people about this please?


>> * 3. I think that the definition of a Graph Store is confusing. In particular, I don't understand the difference/relationship between the unnamed graph and the default graph. (This confusion applies for me throughout the document.)
>
> Could you elaborate on the confusing aspects here ?

Quoting from section 3:
"Unless overridden (for instance, by the SPARQL protocol), then the
unnamed graph will be the default graph for the store."

I was working with what I had, but the approach I ended up taking is
that the "unnamed graph" is something that applies to the store, while
the "default graph" is something that applies to the operation in
question. Usually, if the operation has to rely on its "default", then
this will end up being the unnamed graph in the store.

I realize that my description was a bit weak, so I tried to improve it.

>> * 3. What is an update service?
>
> I guess that's a "SPARQL Update endpoint" ?
> That's reference several times in the doc so I'll let Paul confirm before doing any change.

Don't presume that I invented these terms.  :-)

I've added the following:
"A service (often referred to as an endpoint) that accepts and
processes update operations is referred to as an update service."


<snip/>

>> * 4.1 "Graph update operations change existing graphs in the Graph Store but do not delete them." Given the provision that empty graphs can be removed after any operation, this statement is not strictly true, right?
>
> indeed, I'd suggest "Graph update operations change existing graphs in the Graph Store and may delete graphs when DELETE/DELETE DATA operations lead to graph(s) with no triples."
> Haven't changed anything so far.

I added the word "explicitly" along with a second sentence:

"Graph update operations change existing graphs in the Graph Store but
do not explicitly delete nor create them. An implementation MUST
create graphs that do not exist before triples were inserted into
them, and MAY remove graphs that are left empty after triples are
removed from them."


>> * 4.1.1/4.1.2 Is it purposeful that a single INSERT DATA or DELETE DATA operation can only act on a single named graph? I believe the operations would be equally streamable if graph_triples was graph_triples*.

While possible, I don't see any need for this.

Having one stream go to two places would be an awkward sort of operation:
  INSERT DATA {
    GRAPH <a> { triple1 } GRAPH <b> { triple1 }
    GRAPH <a> { triple2 } GRAPH <b> { triple2 }
    ... etc ...
  }
And doing it consecutively could just be a pair of operations.

>> * 4.1.3. I think the treatment of WITH and USING / USING NAMED is still unclear. In particular, what's the relation between the two? The text about WITH seems in conflict with the text about USING. Also I think that we need to be explicit about what USING and USING NAMED do, rather than just asserting that their behavior is identical to FROM and FROM NAMED from SPARQL Query.

I wrote more about this in DELETE/INSERT. The DELETE and INSERT
operations refer back to DELETE/INSERT for this (DELETE WHERE is now
on its own and does not allow USING).

Datasets in SPARQL query (FROM/FROM NAMED) is a large section all on
its own. I don't see that it is appropriate to duplicate it.

The other thing is that I really don't understand graphs vs. "named
graphs" anyway. The only thing I ever got out of reading the Query
document is that a variable graph will get bound to the list of named
graphs, if one exists. Other than that, named graphs seem to operate
like graphs. So I've tried to always refer to "USING" and "USING
NAMED" together and referred the reader to the description of datasets
in the original Query document. There is no point in me trying to
explain it any differently to the original, since I don't really
follow the original description anyway (perhaps the reader can follow
it where I cannot).  :-)

>> * 4.1.3. "Recall however that for any clause that stipulates its own GRAPH or USING modifiers, then the WITH will be ignored for that clause." This doesn't seem strictly true... Can't a DELETE or INSERT clause contain both triples within a GRAPH clause _and_ unadorned triples (that would be affected by WITH)?

Yes. How about this?

Note that WITH will be ignored for any section that stipulates a GRAPH
or for the entire WHERE clause if a USING is present.

>> * 4.1.3. "It is legal for an operation to attempt to delete from a graph that does not exist" - Does this mean the result of this is success? Also, probably would be better to avoid the phrase "it is legal" in non-de-jure standards. :-)

Changed to:
"Deleting triples that are not present, or from a graph that is not
present will have no effect and will return success"

>> * 4.1.6. LOAD <...> INTO GRAPH DEFAULT sounds weird. Could we make that case just be LOAD <...> INTO DEFAULT? (This also relates to the general confusion between unnamed graph and default graph and the Query notion of a default graph.)

I thought Andy had mentioned something about this and I asked him
off-list for a reminder (he pointed out that I should have CC'ed the
list).

It looks like it should have been:
  LOAD <uri> INTO (DEFAULT | GRAPH <uri>)

However, Andy suggested that saying "INTO DEFAULT" seemed unnecessary.
I'm agree, and have removed it. If anyone wants to see "INTO DEFAULT"
put back in, then please let me know.

>> * 4.1.6. LOAD needs more explanation of what documentURI means. Do I need to dereference this URI? Is it an indirect relationship between the URI and the triples to be loaded (a la FROM/FROM NAMED)? That is, can I ever do LOAD <urn:TheGraph> INTO GRAPH <...> ?

I don't want to over specify this, so I've written the following:

"The documentURI specifies the URI of a document such that a store
will be able to identify, locate and read the document. Common forms
will be URLs with the http: or file: protocols. Once the document has
been read, the resulting triples will be inserted into the destination
graph.
If no destination graph URI is provided to load the triples into, then
the data will be loaded into the default graph."

Yes, I think it should be possible to "LOAD <urn:TheGraph>", though it
will be up to an individual store as to how it wants to interpret
that. Most stores will probably indicate an error, while others may
have a particular mechanism for locating documents for different types
of URI schemes (maybe URNs get mapped to a configured RDBMS?). I want
to keep the door open for any kind of URI an implementor wants to
accept. That means I don't want to enumerate what can be accepted, nor
do I want to specify anything that can't be accepted (such as a URN).

> I'd suggest to replace documentURI by remoteGraphURI, adding that : "The remoteGraphURI has to be dereferenced and the retrieved triples should be added in the Graph Store into the specified graph or the default graph"
> Will it be explanatory enough ?

I missed this bit before I wrote my last response, but I'll stick to
what I said. "Dereferencing" implies a URL, which I don't want to
limit the operation to. Other than that, I think the new text covers
the same points, right?

>> * 4.1.7 Same comment regarding "GRAPH DEFAULT" sounding weird.
>>
>> * 4.1.7 "This operation does not remove the graph from the Graph Store." - This is not strictly true given the ability for implementations to remove any empty graph at any time, right?
>>
>
> right - proposed: "This operation does not remove the graph from the Graph Store, but implementation may decide to do so."

Changed to:
"This operation is not required to remove the empty graph from the
Graph Store, but an implementation may decide to do so."

>> * 4.2 What does it mean that graph management operations are optional? How should a conformant implementation behave that doesn't want to create/remove graphs?

Changed to:
"These operations are not required to result in any actions, since
graph stores are not required to support named graphs."

>> * 4.2.1 "The graph store does not record empty graph" -- What does this mean? I thought we agreed to deal with these implementations by allowing stores to (theoretically) remove empty graphs at any time, in which case this sort of store would just be a normal success, followed by a silent removal of the empty graph. (In which case this does not need to be called out specially here.) Similarly, the failure modes can be simplified to failing to create a graph or the graph already existing. Similarly for section 4.2.2.

Since we still have "SILENT" as an option, I'm expecting that these
operations can still fail. So I've tried to distinguish between stores
that record the existence of empty graphs and those that do not.

CREATE
For stores that record empty graphs, this will create a new empty
graph in the store with a name specified by the URI. If the graph
already exists, then a failure may be returned, except when the SILENT
keyword is used.
Stores that do not record empty graphs will always return success.

DROP
This operation removes the specified named graph from the Graph Store
associated with the SPARQL 1.1 Update service endpoint. After
successful completion of this operation, the named graph is no longer
available for further graph update
operations.
If the store records the existence of empty graphs, then the SPARQL
1.1 Update service, by default, is expected to return failure if the
specified named graph does not exist. If SILENT is present, the
operation will always return success.
Stores that do not record empty graphs will always return success.

>> * Section 5: As we've discussed, we need a formal model for the update language. I think it needs to cover (perhaps among other things), Graph Store, transformations on a graph store, transformations on a graph, what it means to "process" a triple, success and failure criteria for each operation, what it means to optionally remove an empty graph between operations, what it means to abort later operations in the face of failure of one operation, the relation between Graph Stores and RDF datasets, the relationship between the unnamed graph and default graph, the processing model of DELETE/INSERT statements ...

> So far, I moved previous 3.2.1 here, and added a note re. the upcoming formal model.

Yes, we need to do it. I'm good at reading formalisms, but not writing
them. Any experts here?


>> * 1.2.1. It may just be my OS/browser setup, but I have a very hard time telling the bold fixed-width font from the non-bold fixed width font.

The other approach is to use quote characters (as you'd use in many
compiler-compiler languages). Personally, I find that this can be
harder to read, so I opted for bold. But if it's illegible in some
setups then I can go back to quoting. I'll wait on some further
feedback before committing to this.

>> * 3.1 The second two sentences are confusing. I think they're trying to say:
>>
>> """
>> A query service MAY offer an RDF dataset formed from graphs that are part of an update service's Graph Store. The graphs in the query service's RDF dataset MAY be a subset of the graphs in the update service's Graph Store. Furthermore, the query service's RDF dataset and the update service's Graph Store may use different names for the same graphs.
>> """

It took me some effort to parse this paragraph as well. It appears to
be working very hard to be non-prescriptive on any implementation.
However, it was leaving the definition broad enough as to be
irrelevant for me, so I left it alone.

I've added your text here. I've also removed the sentence:
"The composition of the RDF dataset may also change."

Since it is unclear to me, and I think it is covered already.

>> * 3.1 Note -- there's nothing in 2.1 that seems related to this any longer?

No, it was removed. I've taken the note out.

>> * 3.2 I think we should strike "SPARQL 1.1 Update services are provided over the SPARQL 1.1 Protocol for RDF." The protocol layers on top of the Update spec (as with query)--Update can be used via other mechanisms rather than the protocol (e.g. an API), so the Update spec. should not need to make reference to the protocol.
>>

Removed.

> are -> may ?

Yes, it's a "may", but there is no need to mention the protocol
anyway, so it's gone.

>> * 3.2 What does "This requirement should address concurrency issues." mean?

Removed.

> I don't think we need to explain why an implementation might not satisfy a SHOULD condition.

Maybe not, but there are few good reasons to avoid this requirement,
so I wanted to explain those conditions where it is acceptable. I've
removed it now.

>> * 4. s/must Be/must be
>
> done

??? It was still there after I did a cvs update.

>> * 4. s/will be terminated/is terminated
>
> done
>
>>
>> * 4. I think we probably should simply refer to operations resulting in either success or failure, rather than talking about returning state. In the absence of an access mechanism such as the protocol, it feels a little strange to talk about _returning_ results. (Not sure how strongly I feel about this or not.)
>
> Fair enough.
> Note that I also updated the failure sections to remove any reference to the number of added / removed triples as part of the answer.

I've tried to avoid the use of "return" in some cases, but other times
I just ended up with awkward grammar, so I left it.

>> * 4.1. Re-format as a bulleted list? I might move these descriptions out of this summary area and into the subsections on the specific language statements.

As an introduction to update operations, it is worthwhile to mention
there are two categories and what they are. The bulleted list seemed
clearer than simple prose. Given that the descriptions are only about
a dozen words long, they seemed appropriate for this section.

>> * Throughout the document, I think it would be helpful for examples to be more complete, showing the Graph Store before the operation and then again after the operation.

That will take a long time. I can't get it done for this publication round.

>> * 4.1.2. The example here does not separate operations with a semicolon.
>
> done
>
>> Also, the example is written in a way that implies to me that the PREFIX declaration applies to all of the operations in the request, when I thought the intent was that PREFIX was attached to a single operation.
>
> Indeed, the current SPARQL Query grammar (currently referred to in SPARQL Update) indicates that PREFIX definitions apply for a single operation.
> I personally like having the PREFIX definition shared between operations when there are several operations in the same request.
> When adding the UpdateOperation in the SPARQL grammar, it could be done as follow to solve this issue
>
> Query     ::=   Prologue
> ( SelectQuery | ConstructQuery | DescribeQuery | AskQuery | UpdateQuery* )

This was sort of how it used to be. Now with separate statements all
separate by ; characters I believe that prologues need to be repeated
as you say.

I'll put the repeated prefix in for the moment. Andy, do you have
anything to say on this? I agree with Lee that it would be nice to
avoid repeating prologues (after all, rdf: owl: and foaf: are hardly
likely to change in the midst of your operations).


>> * 4.1.3. What does the comment "# UPDATE outline syntax : general form:" mean?

I had just left it as it was, but as we don't have comments elsewhere,
I've taken it out.

>> * 4.1.3. I think this section would benefit from being re-formatted as a numerical list of steps outlining the processing model. I find it hard to internalize the process described by the prose.

Since I've been working through your email step-by-step, I had already
changed the prose here quite a lot by the time I got to this comment.

Re-reading the section, I realize that there is no description of the
operation, only the effects of each of the individual clauses. So
straight after the grammar I've written the following:

"This operation identifies data with the WHERE clause, binding values
to a set of variables. These bindings are then used in the DELETE
template to remove triples, and then in the INSERT template to create
new triples."

>> * 4.1.3. Is the example improved by removing "?person foaf:firstName 'Bill'" from the WHERE clause? It's not necessary, right?

It *is* necessary. If it weren't there then every person in the system
would be given a new foaf:firstName of "William". It doesn't affect
the deletions though.

>> * 4.1.3/4.1.4/4.1.5. I think this might be clearer if 4.1.3 made DELETE and INSERT optional, and then 4.1.4 and 4.1.5 _only_ presented and talked about the shortcut forms.

This was originally the case, but it made the grammar much more
difficult. In fact, we now have a new section for DELETE WHERE.

>> * 4.1.5 The last example in this section needs a semicolon to separate operations.
>
> done
>
>>
>> * References don't seem to be accurate.

I haven't looked at these references at all. Which ones do you think
are wrong? Are there any you think shouldn't be there? Any that should
be there but aren't?

Regards,
Paul Gearon
Received on Tuesday, 18 May 2010 23:01:43 UTC