To Last Call/Federated Query Review

From SPARQL Working Group
< To Last Call
Revision as of 18:47, 14 March 2011 by Cbuilara (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This partially completes my ACTION-284 on reviewing fed. query... find below part 1 of my review.

I didn't really get to the meat of Carlos' changes yet, I believe, but mainly have feedback on the examples so far, in general I think that the examples should make clearer what they illustrate and apart from that I have some editorial feedback.



1) Remove: "Please refer to the errata for this document, which may

     include some normative corrections.

The previous errata for this document, are also available.

See also translations.

This document is also available in these non-normative formats: XML and XHTML with color-coded revision indicators. "

-> removed

2)

"This specification defines the syntax and semantics of a SPARQL 1.1 Query extension for executing distributed queries."

- better? ->

"This specification defines the syntax and semantics of a SPARQL 1.1 Query extension for executing queries distributed over different endpoints."

-> changed

3) We should have this either in all or none of our documents:

"The documents produced by this Working Group are:

   * SPARQL 1.1 Query
   * SPARQL 1.1 Federation Extensions (this document)
   * SPARQL 1.1 Update
   * SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs
   * SPARQL 1.1 Protocol for RDF
   * SPARQL 1.1 Service Description
   * SPARQL 1.1 Entailment Regimes
   * SPARQL 1.1 Property Paths
   * SPARQL 1.1 Conformance Tests

"

-> removed, keeping it only in the main document

4) "This publication includes the extension SERVICE to the SPARQL 1.1 Query specification. The structure of this document will change to fully integrate the new features."

-->

"This publication describes the SERVICE extension to the SPARQL 1.1 Query specification."

-> changed

5) Remove: "The design of the features presented here is work-in-progress and does not represent

     the final decisions of the working group.  Implementers and application writers should
     not assume that the designs in this document will not change.

"

-> removed

6)

"This document will be presented to the SPARQL Working Group, which is part of the W3C Semantic Web Activity." --> "This document was produced by the SPARQL Working Group, which is part of the W3C Semantic Web Activity."

-> changed

7) Add:

"Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress."

-> added

8) Section 1

"The growing suite of SPARQL query services offer consumers an opportunity to merge data distributed across the web. A small number of extensions to SPARQL 1.1 enable expression of the merging queries. In particular, a SERVICE allows one to direct a portion of a query to a particular SPARQL query service, just as a GRAPH directs queries to particular named graphs. This specification defines the syntax and semantics of these extensions. "

-->

"The growing number of SPARQL query services offer consumers an opportunity to merge data distributed across the web. The SERVICE extension allows one to direct a portion of a query to a particular SPARQL query service, similar a GRAPH graph pattern, which "directs" queries to particular named graphs in the (local) dataset . This specification defines the syntax and semantics of this extension."

-> changed

9) Meta-remark across all documents: we should hav econsistent capitalization of "Web" vs "web", "Semantic Web" vs "semantic web", etc.

-> changed

10) Remove:

"The SPARQL query language is closely related to the following specifications:

   * The SPARQL Query 
       for RDF [SQRY] specification defines a language for matching and reporting on RDF data.
   * The SPARQL Protocol 
       for RDF [SPROT] specification defines the remote protocol for issuing SPARQL queries and receiving the results.
   * The SPARQL Query 
       Results XML Format [RESULTS] specification defines an XML document format for representing the results of SPARQL SELECT and ASK queries."

-> removed

11) Section 1.1

You refer to fn: and rdfs: both of which aren't used in the document... In general, I suggest, you just say:

"This document uses the same conventions as and terminology from the SPARQL1.1 Query document [Ref]."

-> changed

12) Editorial note in the beginning of the doc:

"Editorial note The BINDINGS section will be moved to the SPARQL query main document: SPARQL 1.1 Query . All references to BINDINGS in this document will be removed."


Not sure, but wouldn't we want to actually leave the BINDINGS *example* in the document. The example in the query doc is not about the combination of SERVICE with BINDINGS. I think the example at least makes sense

-> example added

13) SECTION 2

Given that BINDINGS is now defined in Query, this should be renamed to

"SPARQL 1.1 Basic Federation Extension" -> done

and I'd change

"Queries over distributed data often entail querying one source and using the acquired information to constrain queries of the next source. This section covers the SERVICE operator giving examples of how to use it and its behavior."

to

"Queries over distributed SPARQL endpoints often involves querying one source and using the acquired information to constrain queries of the next source. This section illiustrates how this can be achieved using SPAQL1.1's SERVICE Graph patterns by examples."

-> done

I'd then remove subsection heading 2.1 and make subsubsections

2.1.1 -> 2.1 2.1.2 -> 2.2 2.1.3 -> 2.3 2.1.4 -> 2.4 2.1.5 -> 2.5 -> done

2.2 BINDINGS -> 2.6 Using SERVICE in combination with BINDINGS -> done

(in the following comments I will still use the old section numbers)


14) 2.1.1 "For instance, an endpoint which contains information about people working:

Data in <http://people.example/sparql> endpoint:"


not a sentence...

Next, I'm not sure about the names. Are these names of real people? I would rather use fictitious ones.

Also, I don't find the example very useful to just query a remote endpoint, without joining the data with any local data (in that case, I can directly query the endpoint, why should I want to use SERVICE here)... so I suggest, rather to rewrite the whole example as follows:


For instance, let us assume a SPARQL service endpoint available at <http://people.example/sparql> that contains the following data in its default graph:

  <http://example.org/people/people15>  <http://xmlns.com/foaf/0.1/name>     "Alice" .
  <http://example.org/people/people16>  <http://xmlns.com/foaf/0.1/name>     "Bob" .
  <http://example.org/people/people17>  <http://xmlns.com/foaf/0.1/name>     "Charles" .
  <http://example.org/people/people18>  <http://xmlns.com/foaf/0.1/name>     "Daisy" .

which I want to combine with my local FOAF file at <http://example.org/myfoaf.rdf> that contains the single triple:

   <http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows>  <http://example.org/people/people15> .

The following query allows to get the name of persons I know from the remote SPARQL service.

Query:

SELECT ?name FROM <http://example.org/myfoaf.rdf> WHERE {

 <http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows> ?person .
 SERVICE <http://people.example/sparql> { 
   ?person <http://xmlns.com/foaf/0.1/name>   ?name . } 

}

This query, on the data above, has one solution.

Query Result:

 name
 "Alice"

-> changed


15) Section 2.1.2

Again, I'd change the name to "Alice" -> changed

Is this example illustrating something that the first example doesn't illustrate? Is it so much different to have two service queries? It would be good to have a senctence in the beginning for each example that explains what it should show.

"For instance, an endpoint which contains information about people working:"

-> example removed, it does not say anything new

--> "Several SERVICE patterns can be combined in the same query to join results from different SPARQL service endpoints. For example, let us now assume two service endpoints which contain information about people and projects as follows."

16) Section 2.1.3

Again, there's no rationale what this example should illustrate. I assume something like "SERVICE patterns can be nested and used within other complex patterns, e.g. within OPTIONAL patterns. We again assume two SPARQL endpoints containing information about people and projects."

I don't think the example is correct as it stands, BTW... I think as you wrote it, it should only return the first three results.

Isn't what you want to write rather:


PREFIX people: <http://people.example/ns#> PREFIX project: <http://project.example/ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?projectName WHERE {

 SERVICE <http://people.example/sparql> { 
   ?people foaf:name   ?name .  
 OPTIONAL { ?people people:worksIn   ?project .
   SERVICE <http://project.example/sparql> { 
     ?project project:hasTitle   ?projectName . } }
 }    

}

That would IMO return the results you put, and also illustrate nested SERVICE patterns.

-> changed to this example

17) Section 2.1.4

the use of dcterms:subject for a numeric id is a bit akward, dcterms:subject is meant to point at a subject/topic. I suggest to change the example something like as follows:


We assume the following data on sparql endpoints about various projects in certain subject categories in the default graph:

@prefix void: <http://rdfs.org/ns/void#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix doap: <http://usefulinc.com/ns/doap#> .

[] dc:subject "Querying RDF" ;

  void:sparqlEndpoint <http://projects1.example/SPARQL> .

[] dc:subject "Querying RDF remotely" ;

  void:sparqlEndpoint <http://projects2.example/SPARQL> .

[] dc:subject "Updating RDF remotely"  ;

  void:sparqlEndpoint <http://projects3.example/SPARQL> .

Data in default graph at SPARQL service endpoint http://projects2.example/SPARQL:

_:project1 doap:name "Querying remote RDF Data" . _:project1 doap:created "2011-02-12"^^xsd:date . _:project2 doap:name "Querying multiple SPARQL endpoints" . _:project2 doap:created "2011-02-13"^^xsd:date .

Data in default graph at SPARQL service endpoint http://projects3.example/SPARQL:


_:project3 doap:name "Update remote RDF Data" . _:project3 doap:created "2011-02-14"^^xsd:date .

We now want to query the project names of projects on the subject "remote"


Query:

PREFIX void: <http://rdfs.org/ns/void#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX doap: <http://usefulinc.com/ns/doap#>

SELECT ?service ?projectName WHERE {

 # Find the service with subject "remote".
 ?p dc:subject ?projectSubject ;
    void:sparqlEndpoint ?service  
    FILTER regex(?projectSubject, "remote")
 # Query that service projects.
 SERVICE ?service {
    ?project  doap:name ?projectName . } 

}


The bindings of ?service provide the location of the service to query, yielding:

Query result:

service title <http://projects2.example/SPARQL> "Query remote RDF Data" <http://projects2.example/SPARQL> "Querying multiple SPARQL endpoints" <http://projects3.example/SPARQL> "Update remote RDF Data"


-> changed to this example

18) "Editorial note When having variables for specifying the address of a SPARQL endpoint in a SERVICE operation this variable must be bounded. In order to clearly define what "must be bounded" mean we point to a boundedness definition. This is still an issue for the SPARQL Working Group, as it the question of having variables in SERVICE calls at all. Feedback from the community is encouraged."

Is this Ed note still appropriate here?

-> removed editorial note but I maintained the link to the boundedness definition, I think it makes sense to have such definition

19) 2.1.5

"SERVICE execution may fail due to several reasons: server down, wrong endpoint IRI, or there may be no results from the query. In order to allow users to continue with the other parts of t he query we propose to use a service silent operation Service(IRI,G,P,SilentOpt) which is false by default."

--> "The execution of a SERVICE pattern may fail due to several reasons: the remote service may be down, the service IRI may not be dereferenceable, or the endpoint may return an error to the query. Normally, under such circumstances the invoking query containing a SERVICE pattern fails as a whole. However, SPARQL 1.1 allows to explicitly allow failed SERVICE requests by the keyword 'SILENT'."

-> changed

Again, I'd prefer "Alice" -> all examples changed to Alice & friends

"Query result if an error happens when querying the remote SPARQL endpoint::" --> "Query result if an error happens when querying the remote SPARQL endpoint:" -> fixed


20) Section 2.1.6 is obscure to me... it talks a bout two results when there is one, it talkes about a query, when there is no query, I suggest to simply remove that section..

-> removed section

21) Section 2.2 BINDINGS

"In order to efficiently communicate constraints to sparql endpoints, the queryier may follow the WHERE clause with BINDINGS. In order to efficiently address the constraints, the query on http://people.example/data could be expressed as follows:"

I don't understand entirely, as in case BINDINGS doesn't appear in the SERVICE clause, the "constraints" don't even reach the remote endpoint... shouldn't we reformulate the example to actually have the BINDINGS *within* the SERVICE pattern?

That would make more sense to me.

Accordingly, I would suggest to rephrase:

"In order to efficiently communicate constraints to sparql endpoints, the requester may use SERVICE in combination with a BINDINGS clause (see [SPARQL 1.1 Query], Section 18.2.5.6 BINDINGS). In order to efficiently address the constraints, the query on http://people.example/data could be expressed as follows:"

Also, note that the advantage of BINDINGS only comes across if you use several bindings, since a single binding can be written directly into the query. So, I would suggest to think of a better example or drop the BINDINGS section alltogether.

-> I removed the whole BINDINGS section, I will put it back

22) Section 3 on syntax can be dropped. The syntax is clear from the grammar and illustrated with the examples already, I don't think the schematic syntax adds anything.

-> section removed

to be continued... at section 4.

Here comes the rest of my review... starting at section 4.


1) What do you mean by

"We introduce the following symbols:

          * Join(Pattern, Pattern)
           * LeftJoin(Pattern, Pattern, expression)
           * Filter(expression, Pattern)
           * UNION(Pattern, Pattern)

"

these are defined in the query doc, they don't need to be re-introduced, right? -> right, I removed them

I understand, that you want to extend the transformation rules for GroupGraphPattern from in Section 18.2.2.4 Translate Graph Patterns of [SPARQL 1.1 Query Language], since you want to reuse information about variables already bound. Fine, but that should be said/explained.

So instead of the "We introduce" part, say:

"In order to define the transformation of SERVICE patterns we extend the transformation of GroupGraphPattern from Section 18.2.2.4 Translate Graph Patterns of [SPARQL 1.1 Query Language], since we assume the Service invocation

-> changed

2)

Why do you have two different definitions for

 Definition: Evaluation of a Service Pattern

and

 Definition: Service Silent Function


Can't they be merged into one, where SilentOpt is just a boolean flag that's true for SILENT (in which case execution doesn't fail) and false otherwise (where overall execution fails)?

-> yes, I added Silent Function to Evaluation of a Service Pattern

3) I think this looks weird to me:


if IRI is a SPARQL service Service(IRI,G,P)) = Invocation( IRI, vars n bound, P, Bindings(G, vars) )

eval(D(G), Service(var,G,P)) =

    Let R be the empty multiset
    foreach i in O(?var->i)
       if i is an IRI
         R := Union(R, Join( Invocation( i, vars n bound, P, Bindings(G, vars) ) , O(?var->i) ) )
       else
         exection fails.
    the result is R

shouldn't this rather be:


if IRI is a SPARQL service Service(IRI,G,P)) = Invocation( IRI, vars n bound, P, Bindings(G, vars) ) else: eval(D(G), Service(var,G,P)) =

    Let R be the empty multiset
    foreach i in O(?var->i)
       if i is an IRI
         R := Union(R, Join( Invocation( i, vars n bound, P, Bindings(G, vars) ) , O(?var->i) ) )
       else
         exection fails.
    the result is R

also, by only projecting vars interect bound, you can have strange effects since the evaluation becomes order dependent, which I am not sure whether it is implied by the algorithm referred in Section 4.1. There you have:


"For each element E in the GroupGraphPattern"

note that this - per se - doesn't imply any order of the elements in GroupGraphPattern

However, I assume that you assume/imply that

 { P1 SERVICE i {... } P2 }

behaves different from

 { P2 SERVICE i {... } P1 }

do you? -> yes, if P1 and P2 are group graph patterns, if they are just triple patterns they behave the same: if P1 = ?s1 ?p1 ?o1 and P2 = ?s2 ?p2 ?o2 whould behave the same

-> that's a part a do not have clear, this was wrote by Eric, and I did not work much on it, since my idea was to incorporate the semantics we discussed by email a couple of weeks ago.

4) What about

" @@All binary operators that have open LHS: new UNION, MINUS, (NOT)EXISTS

  @@SubSELECT??

"

?

-> I fixed it using Lee's comments

5) I skipped section 4.2 and 4.4, assuming it will be removed

I removed them

Lee's review

This review discharges my ACTION-385.

Overall: I think this specification still needs a fair amount of work before it is ready for Last Call. Please see my detailed comments below.


  • Title: Since we've moved BINDINGS to the query document, should we

change the name of this document, since there are not multiple "Extensions"? Perhaps "SPARQL 1.1 Federated Query"? -> changed to Fed Query as you suggest

  • Status of this Document -- this should be updated to reflect the fact

that this is now an active WG editor's draft. That said, this isn't that important since the SotD gets replaced when we publish as a WD. -> there is a reference to the editors wd, is that enough?

  • 1. Introduction. Suggest rewording the first few sentences like:

""" This specification defines the syntax and semantics of the SERVICE keyword for SPARQL 1.1. The SERVICE keyword extends SPARQL 1.1 to support queries that merge data distributed across the Web. """ -> rephrased according to Axel's comments

  • 1. Introduction. Replace the listing of other documents with the full

set of documents or a pointer to the overview document. -> I removed it following Axel's comments

  • 1.1.1 The only prefix listed here that's used elsewhere in the

document is "xsd:". -> I removed it following Axel's comments

  • 1.1.2 I don't think the formal definition of "binding" helps much. If

you do want to keep it, it should be in section 1.1.3 Terminology. -> moved to terminology

  • 1.1.3 No need to repeat the information about IRIs. I'd remove the

first paragraph.


  • 1.1.3 "and used in SPARQL" => "and reused in this document" ? -> changed
  • 2.2 BINDINGS needs to be removed. It can be referenced informatively

from this document, but should not have its own section. Once this is done, Section 2 needs to be restructured so that it is all about the SERVICE keyword

-> Almost removed, I keeped it following Axel's comments

  • I think there needs to be at least an informative, informal

explanation of the SERVICE keyword before diving in with examples. Perhaps the examples should all come after the syntax and semantics sections.

-> added explanation for each example

  • I think the example should be more consistent both in presentation and

content. Specifically, the example in 2.1.3 specifies that the data is part of the default graph for the endpoint, but the previous examples don't specify that. More troubling is the fact that the data in 2.1.3 uses blank node subjects whereas the previous examples use URIs. It seems to me that all the examples could use a common set of endpoints and data which could be presented upfront before the examples. This might be easier to work with and less distracting as you read from one example to another.

-> I changed all the examples in the document, hope now is better

  • In 2.1.4, is this an appropriate use of dcterms:subject? -> no, changed following Axel's comments
  • I haven't tested this, but I think there's a syntax error in the 2.1.4

example - there should be a "." after the "void:sparqlEndpoint" triple pattern, right? -> it is not necessary, I tested it in other sparql endpoints such as bio2rdf

  • What is the status of variables in SERVICE? There is still an

editorial note by the example in 2.1.4 that says this is unresolved. We need to clarify this before Last Call. -> I'd like to maintain what must be bounded mean, I deleted the editiorial note

  • 2.1.5 Why is returning no results considered a failure condition? I

would omit that. -> it returns no results instead fo failing becaue the SILENT token

  • 2.1.5 This text needs to be cleaned up to be more prescriptive. In

particular, it should not say that "we propose" something or other. It should say something like "The SILENT keyword indicates that error encountered while accessing a remote SPARQL endpoint should be ignored while processing the query. The failed SERVICE clause is treated as if it had a result of a single solution with no bindings."

-> fixed using Axel's comments

  • 2.1.5 This section is presenting examples. It should not talk about

algebra constructs such as the Service(...) construct.

-> removed

  • 2.1.5 The example should be improved. It should include valid data at

the endpoint and indicate the comparative results when there is an error at the remote endpoint and when there is not an error. To make this as clear as possible, there should be another part to the query that just accesses the local default graph.

  • 2.1.6 This section needs to be rewritten. It is very hard to

understand. Here are some of the issues with is:

  * There is no example.
  * I don't understand the comparison with GRAPH.
  * The terminology needs to be tightened to align with terminology 

used in SPARQL query. (E.g., what is a "querying system"?)

  * As with 2.1.5, the text here should be more prescriptive, and less 

speculative sounding.

-> subsection removed

  • 3 Syntax - I don't think all of these examples are necessary. I think

that one sentence about the SERVICE clause would suffice, along with the grammar rules. That would be more consistent with how SPARQL 1.1 Query presents syntax.

  • This section should include the SILENT keyword.
  • Remove 3.2.

-> I removed the whole section 3


  • 4.1 I don't think this should restate the text from SPARQL 1.1 Query.

It should only include the new additions to the algorithm, along with a clear reference to where the new bit is inserted.

  • 4.1 This doesn't seem to take SILENT into consideration.
  • 4.1 The algebra expression given for an example seems to be completely

incorrect. Can this be checked?

  • 4.1. I think the definition is unclear as written. Specific questions

I have are:

  * What is "B"? -> BINDINGS
  * What does "if IRI is a SPARQL service" mean? -> if it is a valid URL pointing to a SPARQL endpoint
  * What is omega? -> the solution variables
  • Remove 4.2. -> done
  • I'm unclear as to how 4.1 relates to 4.3? -> removed section 4.3 following the discussion we had by email
  • Remove 4.4 -> removed
  • 4.5 needs to be explained in the context of 4.1 and 4.3.
  • The conformance section needs to be tuned specifically to federated query.


Lee