Re: SPARQL 1.1 Update

Thanks for moving the discussion Andy. My comments are inline as well....

On Thu, Jan 14, 2010 at 9:50 AM, Andy Seaborne <andy.seaborne@talis.com> wrote:
> Moved to WG list, cc'ed to Richard.  My comments inline.
>
>        Andy
>
> On 12/01/2010 4:54 PM, Paul Gearon wrote:
>>
>> Richard Newmann raised some points that I'd like to see addressed, so
>> I thought I'd ask about them directly. I think I also need some
>> feedback from others before I can adequately form a response.
>>
>> Starting with the first issue....
>>
>> On Fri, Jan 8, 2010 at 7:32 PM, Richard Newman<rnewman@twinql.com>  wrote:
>>>
>>> Hi folks,
>>>
>>> A few questions/comments on the Update portion of the 1.1 draft:
>>>
>>> * DELETE/MODIFY/INSERT are described in terms of CONSTRUCT templates.
>>> CONSTRUCT templates allow blank nodes, which are generated as fresh blank
>>> nodes for each input row. This makes sense for INSERT, but it doesn't
>>> make
>>> sense for DELETE: the fresh blank node will never match a triple in the
>>> store, than thus
>>>
>>>  DELETE { ?s ?p [] } WHERE { ?s ?p ?o }
>>>
>>> is a no-op by definition. It would be good for this issue to be addressed
>>> in
>>> the spec, with one of the following possible resolutions:
>>>
>>>  1. Forbid blank nodes in a DELETE template.
>>>
>>>  2. Define those blank nodes as being null placeholders, such that
>>>
>>>      DELETE { ?s _:x _:y } WHERE { ?s rdf:type rdfs:Class }
>>>
>>>     would delete every triple whose subject is an rdfs:Class.
>>>
>>>  3. Document that DELETE triple patterns containing blank nodes will
>>> never
>>> match.
>>>
>>> * INSERT et al permit multiple "INTO" URIs:
>>>
>>>  INSERT [ INTO<uri>  ]* { template } [ WHERE { pattern } ]
>>>
>>> but the text discusses the graph in the singular ("The graph URI, if
>>> present, must be a valid named graph..."). Is it intended that '*'
>>> actually
>>> be '?'?
>>>
>>> If not, the text should be changed, and text added to describe how an
>>> implementation should process multiple graphs: e.g., should they run
>>> DELETE
>>> then INSERT on each graph in turn, or should all DELETEs be batched
>>> together
>>> prior to the INSERTs?
>>
>>> From memory, we are not allowing blank nodes. Is that right?
>
> As far as I know, we are.

OK, this is implicit in the document as it stands. It should be
explicit with the DELETE statement (and will be dependent on what
meaning we associate with them).

So the two options are that we disallow blank nodes in DELETE
templates, or we allow them and figure out the semantics of that.
Presuming that we allow them, then I'm of the opinion that we have
free reign with the semantics, since it should be something that
people usually want to do. We just have to make it look consistent.

>> I'm fine with this, if that's what's happening, but from a theoretical
>> viewpoint I believe that his second option is better (blank nodes can
>> match anything). I don't like the third option at all.
>>
>> Either way, I agree that it should be mentioned in the document.
>
> 1 is possible but we end up with several variations on "template", from
> triples only in DATA forms, triples + named variables (here) and for INSERT
> triples + variables + bnodes.

Yes. This is very annoying from an implementation perspective, though
it might make the most sense for users. OTOH, I think I'd prefer to
use a more consistent approach in our templates.

> For 2 - treating a DELETE template as still being a pattern  (so not like
> CONSTRUCT nor INSERT), Treating bnodes as ANY and unbound variables as don't
> match (c.f. CONSTRUCT templates) is inconsistent to me.  We need a
> consistent treatment.

Actually, I was seeing blank nodes in this sense as being similar to
an unbound variable, rather than giving it a different meaning. After
all, interpretations on a graph with blank nodes can associate those
nodes with anything (so long as the graph remains consistent). So
treating them like unbound variables just seemed like a natural
approach to me. (it would also trivial to implement.  :-)

> We do have the DELETE shortform.
>
> If full-DELETE is still a template, we don't need a short form because it is
> DELETE { template } WHERE {} (the empty pattern).  If you prefer a fewer
> operations, you may like that approach.

While syntactically consistent, it looks a little kludgey to me. But I
could live with it.

> For named variables:
> Do we want to have partially restricted templates or say do it as a proper
> full DELETE { template } WHERE {...} because it is only adding the template
> into the WHERE.
>
> Does not address bNodes directly but let's make a consistent decision.
>
> ----
>
> I mildly favour 3.  This is (1) without the enforcement.  Parsers may choose
> to emit a warning

When viewed as "(1) without the enforcement" then I see what you're
getting at. However, I'm less comfortable with the semantics, in that
it implied that blank nodes refer to nothing. That feels like I'm
skolemizing them to something that doesn't exist.

OTOH, maybe that's OK.

> (caveat: where does the warning go to on the web?)

I run into that issue all the time. Depending on the type of the
response, it may be possible to put it in the body, but not always.
That's not a very consistent approach though.

>>> * Re atomicity: it would seem that, for systems which will allow multiple
>>> SPARQL/Update requests within a single transaction, the requirement that
>>> "Each request should be treated atomically by a SPARQL-Update service" is
>>> onerous. I don't know of too many systems that support sub-transactions,
>>> and
>>> thus implementations will be forced to take one of two routes:
>>>
>>>  1. Violating the spec: "sorry pal, that doesn't apply: our transactions
>>> have multi-request scope"
>>>  2. Annoying users: "sorry pal, we aborted your transaction because
>>> SPARQL
>>> 1.1 says we have to, even though you wanted to handle it yourself".
>>>
>>> Neither choice is beneficial to users (the former because it reduces
>>> their
>>> ability to rely on the spec). I'd suggest changing the language to
>>> require
>>> that implementations provide "some method of atomically executing the
>>> entire
>>> contents of a SPARQL/Update request", which allows for the execution of a
>>> request within an existing transaction, as well as for approaches that
>>> execute requests within their own new transaction.
>>
>> I pushed for this, since I think it deals with some (though definitely
>> not all) of the transaction issues. I intentionally said "should" to
>> avoid making it compulsory (should the word be capitalized as
>> "SHOULD"?) though I'd like to see it in systems that are capable of
>> it.
>
> Some terminology confusion perhaps.  A "request" is several "operations" and
> one request is one HTTP POST.  Need a terminology section - this is still
> outstanding from my WD comments.
>
> When the text says "should", I think it is talking about route 1 already.
>
> So, yes, let's give that full RFC 2119 force of SHOULD.

I'll start on a Terminology section, and I'll include a reference to
RFC terminology.

>> Should the word be changed to "MAY"? Are his concerns justified and it
>> should it be dropped altogether? This has been talked about before,
>> but I believe that the discussion has been limited.
>>
>>> * There doesn't seem to be any mention at all of responses in the draft.
>>> Is
>>> that intentional?
>>
>> I believe so. That's a job for the protocol, right?
>
> Yes but it is worth noting that no operations have any results other than
> success/failure (unlike a query, say).

OK. I'll try to figure out where that should go.

>>> * Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which
>>> doesn't fail (and abort your transaction!) if the LOAD fails?
>>
>> There's no transactions, but if multiple operations are to be
>> completed atomically, then his point is made.
>>
>> LOAD can fail in one of the following ways:
>> 1. The graph does not exist, and we do not allow graphs to be
>> automatically created when data is inserted into them. (See ISSUE-20).
>> 2. The document to be loaded is malformed.
>> 3. The document to be loaded cannot be read.
>> 4. There is an error updating the graph with the contents of the document.
>
> 5. related to 2/3 - get a partial read so some triples come it.

Ah yes, sorry. I'm used to a system that will roll back an
unsuccessful operation.

>> #4 is an internal system error, and not our problem. #3 is an error
>> that is also our of our hands (non-existent file, no permissions, i/o
>> error, etc). #2 is also an error (should we permit partial inserts for
>> documents that are well formed up to that point, or recoverable
>> errors?).
>>
>> #1 is the only one that might not be considered an "error". If we
>> create graphs automatically, then it's not an issue. If we don't, then
>> inserting into a non-existent graph would be an "error", but one that
>> can be avoided with a "CREATE SILENT" guard. In this case I think we
>> can just consider the error condition here, rather than allowing LOAD
>> SILENT.
>
> We don't actually say what happens for a LOAD.
>
> The ability to load a remote graph as best one can (connection drops,
> document is found to broken part way through) is useful expecially at scale,
> as is the otherway round.

Interesting point. So what *should* we say?

>        Andy

<snip/>
The remainder of the email refers to the creation of empty graphs, and
insertions into graphs that don't exist.

A point that this raises for me is the assumption of named graphs.
>From memory, we don't require that a graph store support named graphs.
Is this correct? (I tend to forget this, since I only deal with stores
that support this). If so, then I'd better put in a comment that
CREATE and DROP need not be supported on those system (or perhaps that
they should be supported as no-ops).

Regards,
Paul Gearon

Received on Thursday, 14 January 2010 16:17:38 UTC