Lee's feature proposal
The following is a proposal for the features that the SPARQL WG should adopt. It is an attempt to reach consensus by balancing previously stated goals including
- group preference
- group energy
- implementation experience
- utility to developers
- utility to end users
(Roughly in this order.)
This proposal has 5 mandatory features and 5 time/energy-permitting features. This is more than I think is desireable, but I have a hard time making the proposal narrower.
The required features consist of the three features identified early on as having the highest level of consensus.
I've also included as required project expressions, the ability to include arbitrary expressions in a SELECT clause. The aggregate feature already requires the group to find a way to include values not explicitly mentioned in the RDF dataset in a query's results (i.e. the computed value of aggregate functions), and it seems confusing and unnecessarily limiting to not also allow the same or a similar (syntactic) mechanism to be allowed to introduce new scalar values into query result sets. In addition, project expressions in conjunction with the othe required features enables the same capabilities as various other proposed features, including assignment and scalar expressions in construct. Project expressions receives significant but not overwhelming WG support in our survey, with five organizations ranking it amongst their top four features, and no organizations explicitly objecting to it. Project expressions is widely implemented in existing SPARQL engines.
Finally, I suggest that service description be a required deliverable of the Working Group. While there are various design pieces to draw on, service description carries the challenge of the Working Group doing a fair bit of design work. However, I believe that this sort of leading-edge-of-the-curve design work is appropriate for the SPARQL WG in the case of a feature such as service description that is an extensibility point and an enabler for future standardization efforts. Service description provides a standard way for extended SPARQL implementations to advertise their capabilities, and in doing so encourages similar implementations to coalesce around common syntax and semantics of extensions. It can be used to advertise entailment regimes, extended surface syntax, data set information (including optimization hints for federation), supported functions, and much more. Service description received moderate WG support in the survey (5 organizations including it in their top 10), and no organizations explicitly objected to it. With Condorcet, service description is preferred to everything except the top 3 features and negation. (See below for more on negation.)
I've included five time-permitting features in this proposal, ranked roughly in the order in which I believe the group should pursue them. I acknowledge at the same time that some of these efforts can reasonably go on in parallel with either other time-permitting features or in parallel with development of required features.
I believe that SPARQL/OWL is an important deliverable for this WG. The SPARQL community sees somewhat of a divide between those using SPARQL purely to query RDF graphs, and those using SPARQL in conjunction with richer semantics. The original SPARQL effort acknowledged this by providing a mechanism to define extensions that would define basic graph pattern matching for entailment regimes other than simple entailment. This extension mechanism is key to enabling groups other than the SPARQL working group (whether formal or informal groups) to define how SPARQL queries behave in the presence of other semantic regimes. But the extension mechanism has never been formally tested, and it seems to be prudent to test it (a) under the auspices of the SPARQL WG, so that the results may feed back into the SPARQL BGP extension specification itself and (b) in the context of OWL semantics, probably the most popular richer entailment regime that currently exists. There are numerous implementations that implement SPARQL/OWL already, though likely not in an interoperable fashion. And in the personage of Bijan Parsia, the SPARQL WG has the expertise and energy necessary to properly specify the SPARQL/OWL basic graph pattern matching extension. SPARQL/OWL received minimal support in the survey, but seemed to have a somewhat warmer reception in the discussion on the April 28 teleconference.
I believe that property paths is an important deliverable for the WG as it enables variable-length path queries for SPARQL developers. It has significant support within the WG, and it also enables most cases of the accessing RDF lists proposed feature.
I believe that Surface syntax and Function library represent reasonable maintenance tasks for the WG to examine, time-permitting. Accepting surface syntax as a time-permitting feature gives the WG an opportunity to examine capabilities of the SPARQL language that are particularly onerous to use and to consider specialized syntax for these features. Accepting function library allows the WG to consider extending the core set of functions available when moving between SPARQL implementations to include things like basic string or mathematic operations.
Finally, I believe the WG should deliver a specification for basic federated query, time-permitting. Federated query is implemented in a variety of forms in several implementations, and the feature received significant support in the survey (6 organizations including it amongst their top six choices). I believe that looking at a design for basic federated query is important for the growing Linked Data community, and the time is ripe to standardize on basic federated query as a way to encourage implementations to explore more and more sophisticated approaches to federated query.
This proposal leaves out many good features, and I'd be remiss not to address several specific ones.
- Negation. The survey indicated strong support for providing a simpler form of asking negative queries than the current OPTIONAL/!bound construct. I've excluded this from my proposal under the hope that the design for subqueries may obviate the need for this feature.
- Full text. The survey indicated strong support for standardizing the syntax and semantics for full text search in SPARQL. While I believe that this is one of the top interoperability stumbling blocks for SPARQL, the wide-open design space (both for syntax and semantics) of the problem worries me.
- Parameterized inference. The survey indicated support from a small number of organizations for parameterized inference. The discussion during the April 28 teleconference made clear to me that some members of the WG see a need both to define what it means to query other entailment regimes (a la SPARQL/OWL) and also how to go about doing that on a query-by-query basis. The latter is what parameterized inference is about. I have omitted parameterized inference from my proposal because of the lack of existing implementations/designs to draw on, coupled with the fact that service descriptions provide an out-of-band way for endpoints to indicate the entailment regime or rulesets that they service. I recognize that this does not fully address the use case of on-demand rulesets, but I believe that this would be better served via a SPARQL protocol feature, and I do not see any mature designs yet in this space to draw upon. I believe that (1) standardizing on the semantics of SPARQL/OWL and (2) the increasing maturity and deployment of RIF, will encourage SPARQL implementations to begin to explore this space more and make this an appropriate feature for a future round of standardization.