Re: Last Call for comments on "SPARQL Query Language for RDF" from Axel Polleres on 2007-04-17 (public-rdf-dawg-comments@w3.org from April 2007)

From: Axel Polleres <axel.polleres@deri.org>
Date: Tue, 17 Apr 2007 18:42:50 +0100
To: public-rdf-dawg-comments@w3.org
Message-id: <4625071A.9060807@deri.org>
p.s.: Short clarification: I wrote these comments on your draft also on 
behalf of the RIF-WG, but unfortunately the group has not yet had a 
chance to review and discuss them, so for now please just take them as 
my personal
comments. Hopefully, the RIF-WG will forward you additional comments,
endorsed by the Working Group, in the next week or two.

thanks,
axel

Axel Polleres wrote:
> Dear all,
> 
> below my review on the current SPARQL draft from
> 
> http://www.w3.org/TR/rdf-sparql-query/
> 
> on behalf of W3C member organization DERI Galway.
> 
> Generally, I think the formal definitions have improved a lot, but still 
> I am at the same time not 100% sure that all definitions are formally 
> water-proof. This affects mainly questions on Section 12 and partly 
> unclear Definitions/pseudocode algorithms for query evaluation therein.
> 
> HTH,
> best,
> Axel
> 
> 
> -------
> 
> Detailed comments:
> 
> 
> Prefix notation is still not aligned with Turtle. Why?
> Would it make sense to align with turtle and
> use/allow '@prefix' instead/additionally to 'PREFIX'
> You also have two ways of writing variables... so, why not?
> 
> 
> Section 4.1.1
> 
> The single quote seems to be missing after the table in sec 4.1.1
> in "", or is this '"'?
> 
> Section 4.1.4
> 
> The form
> 
> [ :p "v" ] .
> 
> looks very awkward to me!
> 
> I don't find the grammar snippet for ANON very helpful here, without
> explanation what WS is...  shouldn't that be a PropertyListNotEmpty 
> instead?
> 
> 
> Section 5
> 
> Section 5 is called Graph patterns and has only subsections
> 5.1 and 5.2 for basic and group patterns, whereas the other types are
> devoted separate top level sections.. this structuring seems a bit
> unlogical.
> 
> 
> Why the restriction that a blank node label can only be used in a single
> basic graph pattern? And if so, isn't the remark that the scope is the
> enclosing basic graph pattern redundant?
> 
> Why here the section about "extending basic graph pattern matching",
> when not even basic graph pattern matching has been properly introduced
> yet? If you want to only informally introduce about what matching you
> talk here, then I'd call section 5.1.2 simply "Basic Graph Pattern
> Matching" but I think I'd rather suggest to drop this section.
> 
> 
> 
> "with one solution requiring no bindings for variables"
> -->
> rather:
> "with one solution producing no bindings for variables"
> or:
> "with one solution that does not bind any variables"
> 
> Section 5.2.3
> 
> Why you have a separate subsection examples here? It seems 
> superfluous/repetitive. Just put the last example, which seems to be the 
> only new one, inside Sec 5.2.1 where it seems to fit, and drop the two 
> redundant ones. For the first one, you
> could add "and thatbasic pattern consists of two triple patterns" to the
> first example in sec 5.2, for the second one, add the remark that "the
> FILTER does notbreak the basic graph pattern into two basic graph
> patterns" to the respective exaple in section 5.2.2.
> 
> 
> 
> Section 6:
> 
> One overall question which I didn't sort out completely so far:
> What if I mix OPTIONAL with FILTERs?
> 
> ie.
> 
> {A OPTIONAL B FILTER F OPTIONAL C}
> 
> is that:
> 
> {{A OPTIONAL B} FILTER F OPTIONAL C}
> 
> or rather
> 
> {{A OPTIONAL B FILTER F} OPTIONAL C}
> 
> and: would it make a difference? I assume no, the filter is, in both
> cases at the level of A, but I am not 100% sure. Maybe such an example 
> owuld be nice to have...
> 
> 
> Another one about FILTERs: What about this one, ie. a FILTER which
> refers to the outside scope:
> 
> ?x p o OPTIONAL { FILTER (?x != s) }
> 
> concrete example:
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n .
>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
> 
> Supresses the email address for John Doe in the output!
> Note: This one is interesting, since the OPTIONAL part may NOT be 
> evaluated separately!, but carries over a binding from the super-pattern!
> 
> Do you have such an example in the testsuite? It seem that the last 
> example in Seciton 12.2.2 goes in this direction, more on that later
> 
> Would it make sense to add some non-well-defined OPTIONAL patterns,
> following [Perez et al. 2006] in the document? As mentioned before, I
> didn't yet check section 12, maybe these corner case examples are there..
> 
> 
> Section 7:
> 
> Why "unlike an OPTIONAL pattern"? This is comparing apples with pears...
> I don't see the motivation for this comparison, I would suggest to
> delete the part "unlike an OPTIONAL pattern".
> 
> 
> as described in Querying the Dataset
> -->
> as described in Section 8.3 "Querying the Dataset"
> 
> 
> Section 8
> 
> The example in section 8.2.3 uses GRAPH although GRAPH hasn't been
> explained yet, either remove this section, start section 8.3 before, I 
> think GRAPH should be introduced before giving an example using it.
> 
> <you may ignore this comment>
> BTW: Would be cool to have a feature creating a merge from named graphs
> as well...
> 
> ie. I can't have something like
> GRAPH g1
> GRAPH g2 { P }
> 
> where the merge of g1 and g2 is taken for evaluating P.
> whereas I can do this at the top level by several FROM clauses.
> (Note this is rather a wish-list comment than a problem with the current 
> spec, probably, might be difficult to define in combination with 
> variables...)
> </you may ignore this comment>
> 
> Section 8.2.3 makes more sense after the 8.3 examples, and 8.3.2 is
> simpler than 8.3.1, so, I'd suggest the order of subsections in 8.3
> 
> 8.3.2
> 
> 8.3.1
> 
> 8.3.3
> 
> 8.2.3
> 
> 8.3.4 (note that this example somewhat overlaps with what is shown in
> 8.2.3 already, but fine to have both, i guess.)
> 
> 
> 
> Section 9:
> 
> What is "reduced" good for? I personally would tend to make reduced the
> default, and instead put a modifier "STRICT" or "WITHDUPLICATES" which 
> enforces that ALL non-unique solutions are displayed.
> 
> "Offset: control where the solutions start from in the overall solution
> sequence."
> 
> maybe it would be nice to add: "[...] in the overall solution sequence, 
> i.e., offset takes precedence over DISTINCT and REDUCED"
> 
> at least, the formulation  "in the overall solution sequence" would
> suggest this... however, right afterwards you say:
> "modifiers are applied in the order given by the list above"... this
> seems somehow contradicting the "in the overall solution sequence", so
> then you should modify this to:
> "in the overall solution sequence, after application of solution
> modidiers with higher precedence" and give an explicit precedence to
> each solution modifier....
> 
> <you may ignore this comment>
> BTW: Why is precendence of solution modifiers not simply the oRder in
> which they are given in a query? wouldn't that be the simplest thing to do?
> 
> ie.
> 
> OFFSET 3
> DISTINCT
> 
> would be different than
> 
> DISTINCT
> OFFSET 3
> 
> depending on the order.
> Anyway, if you want to (which you probably do) stick with what you have
> now, it would at least be easier to read if you'd take the suggestion 
> with explicit precedence levels for each modifier.
> </you may ignore this comment>
> 
> 
> Section 9.1
> 
> The ORDER BY construct allows arbitrary constraints/expressions as 
> parameter...ie. you could give an arbitrary constraint condition here,
> right? What is the order of that? TRUE > FALSE? Would be good to add a 
> remark on that.
> 
>  I would put 'ASCENDING' and 'DESCENDING' in normal font, since it looks 
> like keaywords here, but since the respective keywords are ASC and DESC.
> 
> Stupid Question: What is the "codepoint representation"? ... Since more 
> people might be stupid, maybe a reference is in order.
> 
> 
> What is a "fixed, arbitrary order"??? Why not simply change
> 
> "SPARQL provides a fixed, arbitrary order"
> -->
> "SPARQL fixes an order"
> 
> and
> 
> "This arbitrary order"
> -->
> "This order"
> 
> I'd also move the sentence starting with "This order" after the 
> enumeration.
> 
> 
> Note that, in the grammar for OrderCondition I think you could write it 
> maybe shorter:
> 
> Wouldn't simply
>  orderCondition ::= ( 'ASC' | 'DESC' )? (Constraint | Var)
> do?
> 
>  In the paragrpah above the Grammar snippet, you forgot the ASK result 
> form where ORDER BY  also doesn't play a role, correct?
> 
> Sec 9.2:
> 
> Add somewhere in the prose: "using the SELECT result form"...
> 
> It is actually a bit weird that you mix select into the solution 
> modifiers, IMO, it would be better to mention SELECT first in section 9 
> and then introducing the solution modifiers.
> 
> Sec 9.3:
> 
> REDUCED also allows duplicates, or no? you mention before that reduced 
> only *permits* elimination of *some* duplicates... so, delete the "or 
> REDUCED" in the first sentence.
> 
> 
> Sec9.4:
> As for reduced as mentioned earlier, my personal feeling is that 
> REDUCED, or even DISTINCT should be the default, since it is less 
> committing, and I'd on the contrary put an alternative keyword "STRICT" 
> or "WITHDUPLICATES" which has the semantics that really ALL solutions 
> with ALL duplicates are given. My personal feeling is that
> aggregates, which you mention in the "Warning" box, anyway only make 
> sense in connection with DISTINCT. Or you should include a good example 
> where not...
> 
> Sec 9.5/9.6:
> 
> OFFSET 0 has no effect, LIMIT 0 obviously makes no sense since the 
> answer is always the empty solution set... So why for both not simply 
> only allowing positive integers? I see no benefit in allowing 0 at all.
> 
> Section 10:
> 
> "query form" or "result form"? I'd suggest to use one of both consistently
> and not switch.  Personally, I'd prefer "result form"...
> 
> Section 10.1
> 
> As for the overall structure, it might make sense to have the whole 
> section 10 before 9, since modifiers are anyway only important for 
> SELECT, and then you could skip the part on projection in section 9, as 
> SELECT is anyway not a solution modifier but a result form...
> You should call it also "projection" in section 10.1, ie. what I suggest 
> is basically merging section 10.1 and 9.2.
> 
> 
> Section 10.2
> 
> CONSTRUCT combines triples "by set union"?
> So, I need to eliminate duplicate triples if I want to implement
> CONSTRUCT in my SPARQL engine?
> Is this really what you wanted? In case of doubt, I'd suggest to
> remove "by set union", or respectively, analogously to SELECT,
> introduce a DISTINCT (or alternatively a WITHDUPLICATES)
> modifier for CONSTRUCT...
> 
> BTW, I miss the semantics for CONSTRUCT given formally in Section 12.
> 
> 
> Section 10.2.1
> 
> <you may ignore this comment>
> What if I want a single blank node connecting all solutions? That would 
> be possible, if I could nest constructs in the FROM part...
> </you may ignore this comment>
> 
> 
> Section 10.2.3
> 
>  Hmm, now you use order by, whereas you state before in Section 9.1 that 
> ORDER BY has no effect on CONSTRUCT... ah, I see, in combination with 
> LIMIT!
>  So, would it make sense in order to emphasize what you mean,  to change 
> in section
> 9.1
> 
> "Used in combination"
> -->
> "However, note that used in combination"
> 
> 10.3/10.4
> 
> I think that ASK should be mentioned before the informative DESCRIBE, 
> thus I suggest to swap these two sections.
> 
> Section 11
> 
> - Any changes in the FILTER handling from the last version? Is there a 
> changelog?
> - As mentioned earlier, I am a bit puzzled about the "evaluation" of 
> Constraints given as an argument to ORDER BY especially since there you 
> don't want to take the EBV but the actual value to order the solutions.
> (Note that what it means that a solution sequence "satisfies an order 
> condition" is also not really formally defined in Section 12!)
> 
> Apart from that, did not check the section in all detail again since it 
> seems to be similar to the prev. version , but some comments still:
> 
> "equivilence"?
> Do you mean equivalence? My dictionary doesn't know that word.
> 
> The codepoint reference should already be given earlier, as mentioned 
> above.
> 
> 
> Section 11.3.1
> 
> The operator extensibility  makes me a bit worried as for the 
> nonmonotonic behavior of  '! bound':
>  In combination with '! bound', does it still hold that
> "SPARQL extensions" will produce at least the same solutions as an 
> unextended implementations and may for some queries, produce more 
> solutions... I have an unease feeling here, though not substantiated by 
> proof/counterexample.
> 
> 
> Section 12 :
> 
> 12.1.1
> 
> 
> Is the necessity that the u_i's are distinct in the dataset really 
> important?
> Why not also define the data corresponding to the respective URI as 
> graph merge then, like the default graph?
> 
> 
> 12.2
> 
> The two tables suggests there is a corellation between the patterns and 
> modifiers appearing in the same line of the table, which is not the case.
> 
> Also, why in the first table is RDF Terms and triple patterns in one 
> line and not separate?
> 
> Why do you write
>    FILTER(Expression)
> but not
>   ORDER BY (Expression)
> as the syntax suggests?
> 
> Moreover, the tables should be numbered.
> 
> You use the abbreviation BGP for Basic graph pattern first in the second 
> table which wasn't introduced. Actually, it would be more intuitive, if 
> you'd use actually *symbols* for your algebra, like e.g. the ones from 
> traditional Relational Algebra, as was done in [Perez et al. 2006].
> 
> "The result of converting such an abstract syntax tree is a SPARQL query 
> that uses these symbols in the SPARQL algebra:"
> -->
> "The result of converting such an abstract syntax tree is a SPARQL query 
> that uses the following  symbols in the SPARQL algebra:"
> or maybe even better:
> "The result of converting such an abstract syntax tree is a SPARQL query 
> that uses the symbols introduced in Table 2 in the SPARQL algebra:"
> 
> What is "ToList"?
> 
> 12.2.1
> 
> The steps here  refer to the grammar?
> The steps obviously take the parse tree nodes of the grammar as the 
> basis...
> anyway this is neither explained nor entirely clear.
> 
> then connected with 'UNION'
> -->
> connected with 'UNION'
> 
> What do you mean by
> 
> "We introduce the following symbols:"
> 
> 1) what you define here is not 'symbols'
> 2) This doesn't seem to be a proper definition but just a bullet
>   list without further explanation.
> 
> as said before, the symbols, should indeed be symbols and be defined 
> properly in section 12.2 with the tables, in my opinion.
> 
> The algorithm for the transformation is a bit confusing, IMO. It seems 
> to be pseudo-code for a recursive algorithm, but it is not clear where 
> there are recursive calls.
> 
> Is the observation correct that in this algebra (following the algorithm)
> 
>     A OPTIONAL {B FILTER F}
> 
> would be the same as
> 
>    A  FILTER F OPTIONAL {B}
> 
> ???
> 
> ie, both result in:
> 
>  LeftJoin(A,B,F)
> 
> That is not necessarily intuitive in my opinion.
> Take the concrete exampe from above:
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n .
>   OPTIONAL { ?x foaf:mbox ?m FILTER (?n != "John Doe") }  }
> 
> As I said, in my understanding, this query could be used to supress
> email addresses for a particular name, whereas the algorithm suggests
> that this is just the same as writing:
> 
> SELECT ?n ?m
> { ?x a foaf:Person .  ?x foaf:name ?n . FILTER (?n != "John Doe")
>   OPTIONAL { ?x foaf:mbox ?m  }  }
> 
> Is this intended? If yes, the last example of section 12.2.2 is wrong.
> 
> BTW: If so, it seems that the whole second part of the algorithm can be 
> simplified to:


-- 
Dr. Axel Polleres
email: axel@polleres.net  url: http://www.polleres.net/
Received on Tuesday, 17 April 2007 17:43:02 UTC