Feature:FunctionLibrary

From SPARQL Working Group
Revision as of 21:05, 27 September 2010 by Aseaborne (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Feature: Standard Function Library for SPARQL

A standard library of functions expected of all SPARQL implementations.

Feature description

A standard library of functions expected of all SPARQL implementations would aid interoperability and expressiveness. A clearer documented core set of functions would aid applications where the application writer is not able to introduce functionality in the data provider.

The library can be subdivided into a small set of functions that each implementation must support and a big set of optional functions introduced by other specs. The web service protocol may be extended, so special sorts of self-description requests will enumerate supported optional part of standard library and/or nonstandard extension functions supported by the SPARQL endpoint.

If some standard defines the signature of function mentioning datatypes not supported in SPARQL spec, then a function can still be part of SPARQL core function library, with the restricted signature. Say, XQuery supports sequences but SPARQL does not. So the "clone" of

fn:string-join($arg1 as xs:string*, $arg2 as xs:string) as xs:string

from XQuery library can be incomplete and support only variants

fn:string-join($arg1 as xs:string, $arg2 as xs:string) as xs:string
fn:string-join(unbound, $arg2 as xs:string) as xs:string

providing backward-compatibility with the original function. Complete implementations will satisfy requirements of SPARQL spec.

Example

PREFIX fn:      <http://www.w3.org/2005/xpath-functions#>
...
FILTER ( fn:contains(?x, "foo") )

Existing Implementation(s)

The function-by-IRI calling mechanism for custom functions is part of the SPARQL/2008 recommendation.

XQuery 1.0 and XPath 2.0 Functions and Operators (F&O) is a very large library of functions. SPARQL expression evaluation is based on this work.

ARQ function library provides a small number of functions, choosen from F&O where possible.

Leigh Dodds's survey.

Existing Specification / Documentation

Compatibility

Fully compatible with SPARQL/2008 except that using a function with an SPARQL/2008 implementation not supporting that function will not result in any results.

Only if new keywords are introduced (as well as, or instead of) IRIs for functions, would new style queries not be legal SPARQL/2008 queries.

No existing query is invalidated by this feature.

Links to postponed Issues

Related Features

The following introduce new function for literal and IRI construction but propose using a keyword, not an IRI for the constructor.

Where functions are not part of a core library that every implementation is expected to provide, Feature:ServiceDescriptions can be use to declare the supported functions.

Champions

Use cases

The feature is of general purpose.

Starting Points

Operators

  • Current SPARQL Filter Functions
    • Compatibility: need to be preserved/extended
    • Optionally, we could give them URIs (e.g. to datatype)
  • Existing "base libraries":
    • XQuery/XPath
      • pro: open standard
      • con: less attractive to users familiar with SQL
    • SQL99 (or later versions)
      • con: standard not open
      • pro: still, very widely known/used :-)
    • RIF-DTB
      • pro: open standard, tries to extract an essential subset of XQuery/XPath functions, adds some special functions for RDF (for plain literals, etc.)
      • con: same problem as Xquery/Xpath, not widely known/implemented
  • Other "missing" stuff (some functions which have been mentioned/claimed in discussions or mails):
    • COALESCE,IF (already in SQL99), cf. Andy's mail
    • getNamespace, getLocalName: e.g. "Get me all triples using foaf: properties?", e.g. Jena's qnameFor, getLocalName(), getNameSpace()
    • Full-text-search :
      • Note: Earlier discussion about new full-text features as a language feature suggested that it might not be easy to find consensus here even if we wrap it into a function. Still, it might be part of some "external function library"
      • some SQL already dialects have it, e.g. "MATCH (col1,col2,...) AGAINST (expr [search_modifier])" from mySQL
      • LARQ (Lucene-extension of ARQ) uses property functions, could be written in more "pure function" style e.g. using project expr, e.g. (mock-up syntax)
PREFIX pf: <http://jena.hpl.hp.com/ARQ/property#>
SELECT ?doc lucene:textMatch(?lit,'+text',0.5,100 ) AS ?score
{
  ?doc ?p ?lit
}
ORDER BY ?score

SPARQL specific

Constructors of RDF terms are mostly missing from SPARQL 1.0:

IRI(string) -> IRI
BNODE() -> fresh blank node
BNODE(string) -> same blank node as other use of BNODE(string)
LITERAL(str) -> 
LITERAL(str, IRI) ->  (not strictly needed -- xsd:integer("123") does it.)
LITERAL(str, string) ->

This should also have IRIs, and the existing operators (e.g. sameTerm) need IRIs.

XQuery 1.0 and XPath 2.0 Functions and Operators

SPARQL already uses operations from XQuery 1.0 and XPath 2.0 Functions and Operators

Things marked (()) are already in SPARQL as operator symbols.

A minimal approach would be to add the string operations, being the most requested.

((Already:
       6.2.1 op:numeric-add
       6.2.2 op:numeric-subtract
       6.2.3 op:numeric-multiply
       6.2.4 op:numeric-divide
       6.2.5 op:numeric-integer-divide
       6.2.6 op:numeric-mod
       6.2.7 op:numeric-unary-plus
       6.2.8 op:numeric-unary-minus
   6.3 Comparison Operators on Numeric Values
       6.3.1 op:numeric-equal
       6.3.2 op:numeric-less-than
       6.3.3 op:numeric-greater-than
))
   6.4 Functions on Numeric Values
       6.4.1 fn:abs
       6.4.2 fn:ceiling
       6.4.3 fn:floor
       6.4.4 fn:round
       6.4.5 fn:round-half-to-even
((    7.3.2 fn:compare))
       7.4.1 fn:concat
       7.4.3 fn:substring
       7.4.4 fn:string-length
       7.4.7 fn:upper-case
       7.4.8 fn:lower-case
       7.4.10 fn:encode-for-uri
       7.5.1 fn:contains     (collation form optional)
       7.5.2 fn:starts-with
       7.5.3 fn:ends-with
((
       9.2.1 op:boolean-equal
       9.2.2 op:boolean-less-than
       9.2.3 op:boolean-greater-than
       9.3.1 fn:not
))

Without requiring support for xsd:date:

((
       10.4.6 op:dateTime-equal
       10.4.7 op:dateTime-less-than
       10.4.8 op:dateTime-greater-than
))        

Maybe:

       10.5.7 fn:year-from-dateTime
       10.5.8 fn:month-from-dateTime
       10.5.9 fn:day-from-dateTime
       10.5.10 fn:hours-from-dateTime
       10.5.11 fn:minutes-from-dateTime
       10.5.12 fn:seconds-from-dateTime
       10.5.13 fn:timezone-from-dateTime
       3. fn:error

Aggregate functions

    • Aggregation functions from XPath/Xquery (count(), ... )seem unsuitable... work on sequences.
    • SQL: MIN, MAX, AVG, COUNT, SUM
      • Fairly standard, but we need to make some decisions, e.g. how are blank nodes treated in aggregation, SUM/AVG over non-numerical values?
    • Extensibility mechanism? (e.g. different versions of
  AVG,COUNT, ignoring bnodes or not)

Alternative approaches to establish SPARQL1.1 function library

  • we define "supported modules" (e.g. "SQL99 module", XPath/Xquery module, etc.) by namespace:
    • pro:
      • nothing needs to be done except "documenting" existing function libraries, giving namespace to those which don't yet have one (SQL-functions), everything else is already supported with common extensibility mechanism.
      • Trivial for XPath/XQuery: fn: namespace.
        • Note that there are still issues with that: Does fn: support imply datatype support? which datatypes shall be supported? Are there additional issues wrt. Entailment regimes (esp. D-entailment)?
    • con: overlapping functions in differnent libraries (e.g. similar functions in SQL and XPath/XQuery) hamper interoperability, query exchange
  • A selection of most-wanted, which we inlcude as first-class citizens in the language:
      • pro: these could be used in any implementation without interoperability problems, we have a small set of these already
      • con: the selection is doomed to be arbitrary, never cover all needs
      • Possible process for getting there: Ask what are the 50 most wanted/most implemented SQL/XQuery/RIF-DTB functions?

References