Re: SPARQL 1.1 Update from Richard Newman on 2010-01-12 (public-rdf-dawg-comments@w3.org from January 2010)

From: Richard Newman <rnewman@twinql.com>
Date: Tue, 12 Jan 2010 11:20:07 -0800
To: Paul Gearon <gearon@ieee.org>
Cc: public-rdf-dawg-comments@w3.org
Message-Id: <CE72F5F2-858C-491E-85C7-90AAA472EED1@twinql.com>
Hi Paul,

> Richard Newmann raised some points that I'd like to see addressed, so
> I thought I'd ask about them directly. I think I also need some
> feedback from others before I can adequately form a response.

Thanks for the response. I've added some clarification inline.

>> Neither choice is beneficial to users (the former because it  
>> reduces their
>> ability to rely on the spec). I'd suggest changing the language to  
>> require
>> that implementations provide "some method of atomically executing  
>> the entire
>> contents of a SPARQL/Update request", which allows for the  
>> execution of a
>> request within an existing transaction, as well as for approaches  
>> that
>> execute requests within their own new transaction.
>
> I pushed for this, since I think it deals with some (though definitely
> not all) of the transaction issues. I intentionally said "should" to
> avoid making it compulsory (should the word be capitalized as
> "SHOULD"?)

That might help for people used to reading RFCs (such as myself).

> though I'd like to see it in systems that are capable of it.
>
> Should the word be changed to "MAY"? Are his concerns justified and it
> should it be dropped altogether? This has been talked about before,
> but I believe that the discussion has been limited.

I don't suggest dropping it altogether: it's good to encourage capable  
implementations to support transactions. I simply suggest weakening/ 
expanding/clarifying the language to acknowledge that not every  
transaction will be scoped to a single Update request.

>> * There doesn't seem to be any mention at all of responses in the  
>> draft. Is
>> that intentional?
>
> I believe so. That's a job for the protocol, right?

I agree, but it's probably worthwhile putting a very brief section in  
the draft:

   # Responses
   "That's a job for the protocol."

>> * Re LOAD: if we can have CREATE SILENT, how about LOAD SILENT, which
>> doesn't fail (and abort your transaction!) if the LOAD fails?
>
> There's no transactions, but if multiple operations are to be
> completed atomically, then his point is made.

Forgive the smushing.

> LOAD can fail in one of the following ways:
> 1. The graph does not exist, and we do not allow graphs to be
> automatically created when data is inserted into them. (See ISSUE-20).
> 2. The document to be loaded is malformed.
> 3. The document to be loaded cannot be read.
> 4. There is an error updating the graph with the contents of the  
> document.
>
> #4 is an internal system error, and not our problem. #3 is an error
> that is also our of our hands (non-existent file, no permissions, i/o
> error, etc). #2 is also an error (should we permit partial inserts for
> documents that are well formed up to that point, or recoverable
> errors?).
>
> #1 is the only one that might not be considered an "error". If we
> create graphs automatically, then it's not an issue. If we don't, then
> inserting into a non-existent graph would be an "error", but one that
> can be avoided with a "CREATE SILENT" guard. In this case I think we
> can just consider the error condition here, rather than allowing LOAD
> SILENT.

I'm not sure it's valuable to distinguish between "storage errors",  
"protocol errors" (#1), and "load errors" in this way: the LOAD  
operation -- for whatever reason -- could not complete, so the whole  
request (which should be executed atomically) is aborted.

I'm suggesting a way to allow a LOAD to silently fail if, for example,  
a document is a 404. I view this as equivalent to DROP SILENT (a  
different kind of 404!).

> As for all of these possible error conditions, this brings me back to
> the point that errors are really part of the protocol. Correct?

Not if you're specifying atomic behavior in this spec. To do this  
properly you need to document classes of conditions that cause a  
failure such that the request as a whole fails. Error *reporting* is a  
Protocol issue, but errors can't be ignored.

>> * I'd like to throw my 2¢ in for Issue 20.
>>
>> It strikes me as a little short-sighted to assume that every store  
>> operates
>> with first-class graph objects, such that they can be created and  
>> deleted in
>> a closed-world fashion: not only does this conflict with some
>> implementations (e.g., those which use quad stores to efficiently  
>> implement
>> named graphs, and those which dynamically load data from a graph on  
>> an ad
>> hoc basis), but it also is dissonant with the "triple stores are  
>> caches of
>> the semantic web" open-world view.
>
> I don't follow his reasoning here. Can someone shed some light on it  
> for me?
>
> I see no conflict between quad stores and graphs being created and
> deleted. Mulgara was one of the earliest quad stores, and it has
> always had operations for creating and deleting a graph.

My point is that a *generic* quad store does not distinguish between a  
predicate and a graph, and thus requiring explicit graph creation does  
not make much more sense than requiring explicit predicate creation.

In my opinion (and it's just an opinion), *triple stores* are created  
and deleted explicitly; graphs are as fluid as any other entity in RDF.

Of course it's possible to write a quad store that tracks allowed  
values for any column... it just seems to me like a waste of time to  
do so.

Put another way: I think having CREATE GRAPH et al assumes a certain  
approach to triple store implementation, restricting the behavior of  
conformant implementations for no significant benefit.

I will agree that the "graphs represent documents on the web" view  
admits an empty graph with a name. I just don't see what value the  
complication of all the explicit and non-dynamic graph handling adds  
to the standard.

>> I see in emails text like "We have agreed on the need to support a  
>> graph
>> that exists and is empty"[1]. I would like to see strong supporting  
>> evidence
>> for this in the spec (or some other persistent and accessible  
>> place) before
>> resolving this issue. I personally don't see any need to  
>> distinguish an
>> empty graph (after all, it's easy to add an all-bnodes triple to it  
>> to make
>> it non-empty but without excess meaning).
>
> I'm not sure if he's asking for evidence of the need or of us agreeing
> on the need.

Both. I'd like to see the emails discussing it, documentation in favor  
of the need, and the opposing points of view addressed.

>> I note that there is no proposal for CREATE SUBJECT (or PREDICATE or
>> OBJECT), nor CREATE LANGTAG. I see little point in unnecessarily
>> special-casing one value space to reduce its dynamism.
>
> SPARQL has always treated graphs differently to subjects, predicate
> and objects. I believe that this is necessary as some implementations
> do not support named graphs. Also, RDF itself clearly defines the
> elements of a triple, while treating the definition of a graph
> somewhat separately. Is this correct?

Indeed; I was being somewhat facetious. My point still stands, though:  
any implementation that could support CREATE SILENT GRAPH can create  
graphs on demand, but implementations that do not track graphs must  
violate the spec.

As I see no benefit to explicitly creating and deleting graphs (and  
erroring when those operations are not performed), just as I see no  
benefit to CREATE PREDICATE, I suggest that those operations be removed.

>> From interactions with users, I expect that "oh, you mean I have to  
>> CREATE a
>> graph before I can use it in an INSERT query?" will be a common  
>> question,
>> and "always preface your query with CREATE SILENT..." the pervasive
>> response. Seems like a waste of time to me.
>>
>> (Regardless of the official outcome of the issue, my implementation  
>> is
>> unlikely to strictly follow the CREATE/DROP behavior, because it  
>> would be
>> inefficient to track graphs for the sole purpose of throwing errors  
>> in edge
>> cases. CREATE will be a no-op, and DROP will be identical to CLEAR.)
>
> Well, Mulgara already tracks it, and we've never considered it a
> problem (indeed, it's quite beneficial in many ways), so I certainly
> have a bias on this question.

Could you explain the ways in which tracking empty graphs and checking  
existence before insertion are beneficial?

I don't see any problem such tracking solves. (I'm not talking about  
tracking graphs with content, I'm talking about empty graphs.)

A side point that came to mind: there seems to be no efficient way in  
SPARQL/Update to do the following:

   INSERT INTO ?g1 {
     ?s <x> <y> .
   }
   WHERE {
     ?s ?p ?g1 .
     FILTER ( someExpensiveFilter(?g1) )
   }

... because one cannot create a graph during INSERT execution, nor can  
one create a graph using the results of a query (even by running the  
WHERE part twice). This would all be trivial if graphs were dynamic.

Many thanks,

-R
Received on Tuesday, 12 January 2010 19:20:38 UTC