SPARQL/Extensions/Aggregates
SQL contains aggregate functions to select and return aggregate functions of multiple result values after grouping query solutions in a certain way. The SPARQL specification contains no machinery for dealing with aggregates, though there are ways to query for some universal aggregates like MIN or MAX.
Several SPARQL implementations support aggregates:
- OpenLink Virtuoso supports
COUNT,COUNT DISTINCT,MAX,MINandAVGin queries and subqueries. Virtuoso does not implement an explicitGROUP BYclause, instead implicitly grouping solution results by all variables appearing in aggregate functions in a query projection. Virtuoso does not implement aHAVINGclause, but the functionality can be emulated via subqueries. Virtuoso allows aggregate functions to be used as arguments to other projected expressions and allows the arguments to aggregate functions to be arbitrary expressions. - ARQ supports
COUNTandCOUNT DISTINCT. ARQ implements aGROUP BYclause that can act on either variables or expressions. Expressions in aGROUP BYcan be named and then selected from the query, providing a way of selecting arbitrary expressions. IfGROUP BYis omitted, then ARQ groups on all variables in the query pattern. ARQ implements aHAVINGclause that can filter the result set after grouping. - ARC supports
COUNT,MAX,MIN,AVG, andSUM. ARC requires that aggregate functions in a query's projection be named with theASkeyword. ARC implements aGROUP BYclause that must be present if anything other a single aggregate is selected. ARC only allows variables (not expressions) in aggregate functions orGROUP BYconditions. - Glitter, part of Open Anzo, supports
COUNTandCOUNT DISTINCT. Glitter implements aGROUP BYclause that can only contain variables.
A paper on RAP's SPARQL DB engine discusses aggregates. ?? Does RAP implement aggregates?
Design Questions
What happens when aggregate functions are applied to results with unbound values or mixed data types?
does anyone have an answer?
Fundamentals
SQL is a very old language, and the meaning of all but the simplest aggregation statements in SQL is opaque because of the notation, and is also highly implementation-dependent.
- This is not true wrote Chimezie...
Chimezie -- If you are interested, I can supply an example SQL query on which
Oracle and MySql return different results, both of which are
intuitively wrong to most people. The different results are
concrete evidence of a shortcoming. The general problem is that
there is no model theory or other implementation independent standard
that specifies what the results of any query should be. -- Adrian
This raises the question -- Why stick with 1970s style SQL-like syntax for SPARQL aggregation?
adriandwalker-at-gmail-dot-com suggests that, instead of the 1970s-style SQL aggregation notation, it would benefit SPARQL to use a rule-based notation similar to the examples in
www.reengineeringllc.com/demo_agents/Aggregation.agent