Re: SPARQL 1.1 Update from Paul Gearon on 2010-01-14 (public-rdf-dawg@w3.org from January to March 2010)

From: Paul Gearon <gearon@ieee.org>
Date: Thu, 14 Jan 2010 10:31:32 -0500
To: public-rdf-dawg@w3.org
Message-ID: <a25ac1f1001140731w2f4832a2s313688088f348b64@mail.gmail.com>
Sorry everyone. As Andy has already pointed out for me, I made a
mistake and sent this to the Comments mailing list rather than the WG
list. I had wanted to discuss the issues before getting back to
Richard. (I've also apologised to Richard off-list).

Regards,
Paul Gearon

On Tue, Jan 12, 2010 at 11:54 AM, Paul Gearon <gearon@ieee.org> wrote:
> Richard Newmann raised some points that I'd like to see addressed, so
> I thought I'd ask about them directly. I think I also need some
> feedback from others before I can adequately form a response.
>
> Starting with the first issue....
>
> On Fri, Jan 8, 2010 at 7:32 PM, Richard Newman <rnewman@twinql.com> wrote:
>> Hi folks,
>>
>> A few questions/comments on the Update portion of the 1.1 draft:
>>
>> * DELETE/MODIFY/INSERT are described in terms of CONSTRUCT templates.
>> CONSTRUCT templates allow blank nodes, which are generated as fresh blank
>> nodes for each input row. This makes sense for INSERT, but it doesn't make
>> sense for DELETE: the fresh blank node will never match a triple in the
>> store, than thus
>>
>>  DELETE { ?s ?p [] } WHERE { ?s ?p ?o }
>>
>> is a no-op by definition. It would be good for this issue to be addressed in
>> the spec, with one of the following possible resolutions:
>>
>>  1. Forbid blank nodes in a DELETE template.
>>
>>  2. Define those blank nodes as being null placeholders, such that
>>
>>      DELETE { ?s _:x _:y } WHERE { ?s rdf:type rdfs:Class }
>>
>>     would delete every triple whose subject is an rdfs:Class.
>>
>>  3. Document that DELETE triple patterns containing blank nodes will never
>> match.
>>
>> * INSERT et al permit multiple "INTO" URIs:
>>
>>  INSERT [ INTO <uri> ]* { template } [ WHERE { pattern } ]
>>
>> but the text discusses the graph in the singular ("The graph URI, if
>> present, must be a valid named graph..."). Is it intended that '*' actually
>> be '?'?
>>
>> If not, the text should be changed, and text added to describe how an
>> implementation should process multiple graphs: e.g., should they run DELETE
>> then INSERT on each graph in turn, or should all DELETEs be batched together
>> prior to the INSERTs?
>
> From memory, we are not allowing blank nodes. Is that right?
>
> I'm fine with this, if that's what's happening, but from a theoretical
> viewpoint I believe that his second option is better (blank nodes can
> match anything). I don't like the third option at all.
>
> Either way, I agree that it should be mentioned in the document.
>
>> * Re atomicity: it would seem that, for systems which will allow multiple
>> SPARQL/Update requests within a single transaction, the requirement that
>> "Each request should be treated atomically by a SPARQL-Update service" is
>> onerous. I don't know of too many systems that support sub-transactions, and
>> thus implementations will be forced to take one of two routes:
>>
>>  1. Violating the spec: "sorry pal, that doesn't apply: our transactions
>> have multi-request scope"
>>  2. Annoying users: "sorry pal, we aborted your transaction because SPARQL
>> 1.1 says we have to, even though you wanted to handle it yourself".
>>
>> Neither choice is beneficial to users (the former because it reduces their
>> ability to rely on the spec). I'd suggest changing the language to require
>> that implementations provide "some method of atomically executing the entire
>> contents of a SPARQL/Update request", which allows for the execution of a
>> request within an existing transaction, as well as for approaches that
>> execute requests within their own new transaction.
>
> I pushed for this, since I think it deals with some (though definitely
> not all) of the transaction issues. I intentionally said "should" to
> avoid making it compulsory (should the word be capitalized as
> "SHOULD"?) though I'd like to see it in systems that are capable of
> it.
>
> Should the word be changed to "MAY"? Are his concerns justified and it
> should it be dropped altogether? This has been talked about before,
> but I believe that the discussion has been limited.
>
>> * There doesn't seem to be any mention at all of responses in the draft. Is
>> that intentional?
>
> I believe so. That's a job for the protocol, right?
>
>> * Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which
>> doesn't fail (and abort your transaction!) if the LOAD fails?
>
> There's no transactions, but if multiple operations are to be
> completed atomically, then his point is made.
>
> LOAD can fail in one of the following ways:
> 1. The graph does not exist, and we do not allow graphs to be
> automatically created when data is inserted into them. (See ISSUE-20).
> 2. The document to be loaded is malformed.
> 3. The document to be loaded cannot be read.
> 4. There is an error updating the graph with the contents of the document.
>
> #4 is an internal system error, and not our problem. #3 is an error
> that is also our of our hands (non-existent file, no permissions, i/o
> error, etc). #2 is also an error (should we permit partial inserts for
> documents that are well formed up to that point, or recoverable
> errors?).
>
> #1 is the only one that might not be considered an "error". If we
> create graphs automatically, then it's not an issue. If we don't, then
> inserting into a non-existent graph would be an "error", but one that
> can be avoided with a "CREATE SILENT" guard. In this case I think we
> can just consider the error condition here, rather than allowing LOAD
> SILENT.
>
> As for all of these possible error conditions, this brings me back to
> the point that errors are really part of the protocol. Correct?
>
>> * I'd like to throw my 2¢ in for Issue 20.
>>
>> It strikes me as a little short-sighted to assume that every store operates
>> with first-class graph objects, such that they can be created and deleted in
>> a closed-world fashion: not only does this conflict with some
>> implementations (e.g., those which use quad stores to efficiently implement
>> named graphs, and those which dynamically load data from a graph on an ad
>> hoc basis), but it also is dissonant with the "triple stores are caches of
>> the semantic web" open-world view.
>
> I don't follow his reasoning here. Can someone shed some light on it for me?
>
> I see no conflict between quad stores and graphs being created and
> deleted. Mulgara was one of the earliest quad stores, and it has
> always had operations for creating and deleting a graph.
>
> I *think* I see what he's talking about with the open-world view, in
> that any URI should be treated as a possible graph (just one that we
> may not know the contents of). However, from an implementation
> perspective, this gets tricky, since so many stores implement the
> common extension of de-referencing graph URIs that the store does not
> hold locally. Without the ability to CREATE a graph locally, then it
> won't be possible to know if an INSERT or LOAD into a URI should
> create a local graph, or attempt to do an HTTP PUT/POST operation
> (assuming URIs in the HTTP scheme).
>
> Can someone help me out on this please? Even if it's just a response
> to his concern, I don't know what other people think on this issue.
>
>> I see in emails text like "We have agreed on the need to support a graph
>> that exists and is empty"[1]. I would like to see strong supporting evidence
>> for this in the spec (or some other persistent and accessible place) before
>> resolving this issue. I personally don't see any need to distinguish an
>> empty graph (after all, it's easy to add an all-bnodes triple to it to make
>> it non-empty but without excess meaning).
>
> I'm not sure if he's asking for evidence of the need or of us agreeing
> on the need.
>
>> I note that there is no proposal for CREATE SUBJECT (or PREDICATE or
>> OBJECT), nor CREATE LANGTAG. I see little point in unnecessarily
>> special-casing one value space to reduce its dynamism.
>
> SPARQL has always treated graphs differently to subjects, predicate
> and objects. I believe that this is necessary as some implementations
> do not support named graphs. Also, RDF itself clearly defines the
> elements of a triple, while treating the definition of a graph
> somewhat separately. Is this correct?
>
>> From interactions with users, I expect that "oh, you mean I have to CREATE a
>> graph before I can use it in an INSERT query?" will be a common question,
>> and "always preface your query with CREATE SILENT..." the pervasive
>> response. Seems like a waste of time to me.
>>
>> (Regardless of the official outcome of the issue, my implementation is
>> unlikely to strictly follow the CREATE/DROP behavior, because it would be
>> inefficient to track graphs for the sole purpose of throwing errors in edge
>> cases. CREATE will be a no-op, and DROP will be identical to CLEAR.)
>
> Well, Mulgara already tracks it, and we've never considered it a
> problem (indeed, it's quite beneficial in many ways), so I certainly
> have a bias on this question.
>
> Regards,
> Paul Gearon
>
Received on Thursday, 14 January 2010 15:32:05 UTC