SQL contains aggregate functions to select and return aggregate functions of multiple result values after grouping query solutions in a certain way. The SPARQL specification contains no machinery for dealing with aggregates, though there are ways to query for some universal aggregates like
Several SPARQL implementations support aggregates:
- OpenLink Virtuoso supports
AVGin queries and subqueries. Virtuoso does not implement an explicit
GROUP BYclause, instead implicitly grouping solution results by all variables appearing in aggregate functions in a query projection. Virtuoso does not implement a
HAVINGclause, but the functionality can be emulated via subqueries. Virtuoso allows aggregate functions to be used as arguments to other projected expressions and allows the arguments to aggregate functions to be arbitrary expressions.
- ARQ supports
COUNT DISTINCT. ARQ implements a
GROUP BYclause that can act on either variables or expressions. Expressions in a
GROUP BYcan be named and then selected from the query, providing a way of selecting arbitrary expressions. If
GROUP BYis omitted, then ARQ groups on all variables in the query pattern. ARQ implements a
HAVINGclause that can filter the result set after grouping.
- ARC supports
SUM. ARC requires that aggregate functions in a query's projection be named with the
ASkeyword. ARC implements a
GROUP BYclause that must be present if anything other a single aggregate is selected. ARC only allows variables (not expressions) in aggregate functions or
- Glitter, part of Open Anzo, supports
COUNT DISTINCT. Glitter implements a
GROUP BYclause that can only contain variables.
A paper on RAP's SPARQL DB engine discusses aggregates. ?? Does RAP implement aggregates?
What happens when aggregate functions are applied to results with unbound values or mixed data types?
does anyone have an answer?
SQL is a very old language, and the meaning of all but the simplest aggregation statements in SQL is opaque because of the notation, and is also highly implementation-dependent.
- This is not true wrote Chimezie...
Chimezie -- If you are interested, I can supply an example SQL query on which Oracle and MySql return different results, both of which are intuitively wrong to most people. The different results are concrete evidence of a shortcoming. The general problem is that there is no model theory or other implementation independent standard that specifies what the results of any query should be. -- Adrian
This raises the question -- Why stick with 1970s style SQL-like syntax for SPARQL aggregation?
adriandwalker-at-gmail-dot-com suggests that, instead of the 1970s-style SQL aggregation notation, it would benefit SPARQL to use a rule-based notation similar to the examples in