Feature:AggregateFunctions

From SPARQL Working Group
Revision as of 06:41, 4 June 2009 by Lfeigenb (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Feature: Aggregate Functions

Feature description

In SPARQL/Query 1.0 (original SPARQL), query patterns yield a solution set (effectively a table of solutions) from which certain columns are projected and returned as the result of the query. Aggregates provides the ability to partition a solution set into one or more groups based on rows that share specified values, and then to create a new solution set which contains one row per aggregated group. Each solution in this new aggregate solution set may contain either variables whose values are constant throughout the group or aggregate functions that can be applied to the rows in a group to yield a single value. Common aggregate functions include COUNT, SUM, MIN, and MAX.

Aggregate functions are commonly required to perform a slew of application and data-analysis tasks, such as:

  • Determining the number of distinct resources that satisfy certain criteria
  • Calculating the average exam score of students grouped by school district
  • Summing the campaign contributions of donors, grouped by postal code and political party

Applications can typically take a SPARQL/Query 1.0 solution set and calculate aggregate values themselves. Enabling SPARQL engines to calculate aggregates, however, results in moving work from the application to the SPARQL engine, and will usually result in significantly smaller solution sets being returned to the application.

Example

SELECT COUNT(?person) AS alices
WHERE {
  ?person :name "Alice" .
}

return the number of times the a triple of the form _ :name "Alice" appears in the source data.

SELECT AVG(?value) AS average
WHERE {
  ?good a :Widget ;
        :value ?value .
}

return the average value of known Widgets.

Existing Implementation(s)

  • Garlik's JXT implements COUNT() and AVG()
  • Yahoo's Redland implements COUNT()
  • ARQ implements COUNT() and SUM() with syntax like (COUNT(*) AS ?c) to fit with expressions. Bare COUNT(*) allowed.
  • Open Anzo's Glitter engine implements AVG(), COUNT(), SUM(), MIN(), MAX().
  • Virtuoso implements AVG(), COUNT(), SUM(), MIN(), MAX(), user-defined aggregates such as VECTOR_AGG or XML tree constructors. Appropriate GROUP BY clause is composed automatically, if missing in the original query.

See the aggregate extension page on the ESW wiki, which mentions that ARC and Virtuoso also implement some measure of aggregates.

Existing Specification / Documentation

None known, but has strong similarity with SQL.

Compatibility

This is barwardly compatible with SPARQL.

Links to postponed Issues

Postponed in [1].

Related Use Cases/Extensions

Related to UseCase:ProjectExpressions, also Grouping.

Champions

Use Cases

@@

References