Warning:
This wiki has been archived and is now read-only.
Feature:AggregateFunctions
Contents
Feature: Aggregate Functions
Feature description
In SPARQL/Query 1.0 (original SPARQL), query patterns yield a solution set (effectively a table of solutions) from which certain columns are projected and returned as the result of the query. Aggregates provides the ability to partition a solution set into one or more groups based on rows that share specified values, and then to create a new solution set which contains one row per aggregated group. Each solution in this new aggregate solution set may contain either variables whose values are constant throughout the group or aggregate functions that can be applied to the rows in a group to yield a single value. Common aggregate functions include COUNT, SUM, MIN, and MAX.
Aggregate functions are commonly required to perform a slew of application and data-analysis tasks, such as:
- Determining the number of distinct resources that satisfy certain criteria
- Calculating the average exam score of students grouped by school district
- Summing the campaign contributions of donors, grouped by postal code and political party
Applications can typically take a SPARQL/Query 1.0 solution set and calculate aggregate values themselves. Enabling SPARQL engines to calculate aggregates, however, results in moving work from the application to the SPARQL engine, and will usually result in significantly smaller solution sets being returned to the application.
Example
SELECT COUNT(?person) AS alices WHERE { ?person :name "Alice" . }
return the number of times the a triple of the form _ :name "Alice" appears in the source data.
SELECT AVG(?value) AS average WHERE { ?good a :Widget ; :value ?value . }
return the average value of known Widgets.
Existing Implementation(s)
- Garlik's JXT implements COUNT() and AVG()
- Yahoo's Redland implements COUNT()
- ARQ implements COUNT() and SUM() with syntax like (COUNT(*) AS ?c) to fit with expressions. Bare COUNT(*) allowed.
- Open Anzo's Glitter engine implements AVG(), COUNT(), SUM(), MIN(), MAX().
- Virtuoso implements AVG(), COUNT(), SUM(), MIN(), MAX(), user-defined aggregates such as VECTOR_AGG or XML tree constructors. Appropriate GROUP BY clause is composed automatically, if missing in the original query.
See the aggregate extension page on the ESW wiki, which mentions that ARC and Virtuoso also implement some measure of aggregates.
Existing Specification / Documentation
None known, but has strong similarity with SQL.
Compatibility
This is barwardly compatible with SPARQL.
Links to postponed Issues
Postponed in [1].
Related Use Cases/Extensions
Related to UseCase:ProjectExpressions, also Grouping.
Champions
- Garlik is willing to push.
- Ivan Mikhailov / OpenLink
- Kjetil Kjernsmo, Computas AS.
Use Cases
@@