W3C

Algae RDF Query Language


Abstract

The document describes Algae, an RDF query language used in the W3C Annotea Server. Algae has also served as a research platform for studies in interfacing with relational databases and system information.

Status of this Document

This document is a work in progress by the author and does not represent any endorsement from the W3C.


Table of contents

  1. Introduction
  2. Informal Description
    1. Query Description
  3. Extensibility
  4. Data Stores
  5. Result Set
  6. Statement Data Model
  7. Grammar
  8. References

Introduction

The Algae query language evolved to serve the needs of W3C's Annotea protocol. It provides query and assertion access to RDF Graphs. Algae's functionality followed the development and refinement of RDF itself, including the addition of XML datatypes, nodeIds and collections.

Informal Description

Algae can be used to query a graph, insert data into a graph, or write rules to automatically insert data when a query is matches. These are called actions and the directives are spelled ask, assert, and fwrule respectively. Each Algae action works on an RDF graph called the working graph and a result set. The result set holds tuples for all the query solutions and proofs.

Query Description

This section describes how an Algae query selects nodes and statements from an RDF graph. Many examples demonstrate queries similar to those used in the RDQL Submission [1]. In such cases, the analogous RDQL example is referenced for each in comparing the two languages.

Example 1: find nodes of a given type (RDQL Example 1)

ask (?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
           <http://www.w3.org/2000/10/annotation-ns#Annotation>)

The query pattern (inside 'ask()') is a series (in this case, a single one) of query terms. Variables are prefixed with a "?" and URIs are enclosed in "<>". This query binds the variable ?x to each node with type Annotation. For example, ?x will bind to Annot1 and Annot8 in the example data

<r:RDF xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:a="http://www.w3.org/2000/10/annotation-ns#">
   <r:Description r:about="http://example.com/things#Annot1">
      <r:type r:resource="http://www.w3.org/2000/10/annotation-ns#Annotation" />
   </r:Description>
   <a:Annotation r:about="http://example.com/things#Annot8" />
<r:RDF>

The query terms may be simplified (or at least shortened) by declaring namespace prefixes. Textual report tables are specified with the collect directive.

Example 1b: use namespace delcarations and specify a report

ns r=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ns a=<http://www.w3.org/2000/10/annotation-ns#>
ask (?x rdf:type a:Annotation)
collect (?x)

Run against the previous data set, this query produces

?x
<http://example.com/things#Annot1>
<http://example.com/things#Annot8>

Namespace abbreviate is especially handy when introducing multiple terms to a query.

Example 2: query with multiple terms (RDQL Example 2)

ns vcard=<http://www.w3.org/2001/vcard-rdf/3.0#>
ask (?vcard  vcard:FN "John Smith" .
     ?vcard  vcard:N  ?name .
     ?name   vcard:Family  ?family .
     ?name   vcard:Given  ?given)

As this query has multiple query terms, they are separated by a term terminator ('.'). The could also be separated by a disjunction connective ('||').

This query will bind ?family and ?given to "Smith" and "Johnathan" respectively when matched against the following data: nodes and arcs diagram of vcard entry for John Smith

Example 3: integer value constraint (RDQL Example 3)

ns info=<http://example.org/peopleInfo#>
ask (?resource info:age ?age {?age >= 18 && ?age < 24} .
     ?resource info:shoeSize ?size {?size <= 44})

In algae, constraints are syntactically differentiated from the query terms. This indicates to the implementation that it should compare the ?size binding to 44 rather than looking for the extra assertions:

foo info:showSize bar. 
bar <= 44

Frequently it a query will unify data from more than one data source. The read directive tells Algae to incorporate the assertions from a given resource into the working graph.

Example 4: query targeting

ns info=<http://example.org/peopleInfo#>
read <http://example.org/someWebPage> ()
ask (?resource info:age ?age {?age >= 24})

The working graph defaults to the data supplied by the application. The result set contains all the bindings discovered at any point. Subsequent ask actions will act on the result set from previous actions.

Example 5: result set extension (extends Example 4: query targeting)

ns info=<http://example.org/peopleInfo#>
read <http://example.org/someWebPage> ()
ask (?resource info:age ?age {?age >= 24})
read <http://example.org/anotherWebPage> ()
ask (?resource info:shoeSize ?shoeSize {?shoeSize >= 42})

The query pattern describes a graph template which is compared against the working graph. The query pattern may include optional terms — terms which need not correspond to arcs in the working graph. This functionality is analogous to outer joins in SQL, and, in the SQL triple store [3], is implemented with outer joins.

Example 6: query with optional terms (extends Example 2: query with multiple terms)

ns vcard=<http://www.w3.org/2001/vcard-rdf/3.0#>
ask (?vcard  vcard:FN "John Smith" .
     ?vcard  vcard:N  ?name .
     ?name   vcard:Family  ?family .
     ?name   vcard:Given  ?given .
     ~?vcard foaf:mbox ?mbox .
     ~?vcard foaf:mbox_sha1sum ?sum)

A particular vcard entry may or may not have FOAF [4] data attached. This query will retrieve two FOAF properties if present, but will still return vcard entries without this vcard information.

An ask action may start with optional terms. This simply means that no solutions will be struck from the result set inherited from earlier actions.

Extensibility

Algae provides an extensibility mechanism where clients may specify a set of features required to execute a query. Sets of features are identified by a profile name. Algae processors are required to support the set of features identified by the URI http://www.w3.org/2004/05/06-Algae/#core. Additional requirements are identified by the requires directive. Such features are subsequently identified as "required features". A processor encountering features that are neither in the core profile, nor included in a the required features, should return MalformedQueryException.

[informative] The requires mechanism is intended to provide a balance between interoperability through standardization and flexibility. Ideally, a small set of profiles will evolve. Those designing profiles are encouraged, when practical, to adopt already-existing profiles that together encompass the desired feature set.

core

The core features does not require an ordered result set — it is suitable for streaming large result sets. It includes the connectives:

[informative] The core profile is intended to address the RDF Data Access Working Group's proposed requirements on streaming and result limits [6].

sortable

The sortable features specifically requires an ordered result set. It includes the connective:

Data Stores

Algae may be used over a variety of stores. By default, the working graph is an in-memory data store [2]. The query may specifically draw upon other stores by using the attach directive:

ns ot=<http://localhost/OrderTracking#>
attach <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#dbs> ot:db (
                    properties=\"../test/OrderTracking.prop\")
ns alg=<http://www.w3.org/1999/02/26-modules/algae#>
ask ot:db (
       ?p	ot:Products_partNo		?partNo .
       ?p	ot:Products_price		?price {?price >= 15.00 && ?price <= 50.00} .
       ?p	ot:Products_description		?description
      )
read <file:../test/Federate0-reviews.rdf> (inputLang=\"rdf\") 
ns rvw=<http://localhost/Reviews#>
ask ( ~?p	rvw:rating			?rating .
      ~?p	rvw:distribution		?dist
    )

Implemented data stores include an SQL data store [3] and various application-data accessors [4].

Result Set

All logical Algae operations are defined in terms of a result set. The result set is an ordered 1 set of results. Each result is a tuple of variable bindings paired with a set of supporting assertions.

?name?mbox proofs
Alicealice@example.com_:1 pim:given "Alice" .
_:1 foaf:mbox <mailto:alice@example.com> .
Bob bob@example.com _:2 foaf:first_name "Bob" .
foaf:first_name owl:sameAs pim:mbox .
_:2 foaf:mbox <mailto:bob@example.com> .

An Algae query starts with an unbound result set. This is a result set with a single result. That result has no bindings and no proofs. The operations defined below transform the result set. Those operators in the core profile manipulate only a single result, potentially duplicating it or eliminating it. Operators in the sortable profile manipulate more than one result at a time and thus need access to the entire result set.

A result set from which all the results have been eliminated is an empty result set.

1 The core profile, defined above, may use an unordered result set.

Statement Data Model

Statements in Algae extend RDF triples [5]. In addition to the predicate, subject and object of a triple, Algae statements contain an attribution for tracking provenance information. Attributions are select by the the internalName %ATTRIB and may be constrained or assigned to variables within exprs.

ask (?an rdfs:type a:Annotation {%ATTRIB == <http://example.com/a1.rdf>}.
     ?an a:body ?body {%ATTRIB == <http://example.com/a1.rdf>})

Grammar

There are no reserved words in the Algae grammar. Variables may be called ?ask or ?namespace.

The Algae grammar starts with the query production:

query &mdash an Algae query string

[1]    query    ::=    (actionStr
| '@' sourceStr)*
An algae query is made up of actions and include directives. Include directives identify a URL or file whose contents are to be included in the query. The query processor replaces all include directives with the actions resulting from parsing the included resource. A resource may be included more than once, however, circular inclusions are a CircularInclusionException.
@foaf-query.alg
collect (?who ?knowsWho)

actionStr &mdash Algae functions

[2]    actionStr    ::=    askStr
| attachStr
| collectStr
| namespaceStr
| requireStr
| slurpStr
[3]    askStr    ::=    'ask' dbSpec? '(' graphPattern '.'? ')'
The ask directive identifies a query pattern and optionally database in which to attempt to match that pattern.
ask <http://example.com/annotationsDB>
    (?a rdf:type a:Annotation
     ?a a:annotates <http://example.com/some/page.html>)
[4]    attachStr    ::=    'attach' urlvar urlvarlit '(' binding* ')'
This provides the parameters necessary for attaching to a database and determines the handle used to subsequently identify this database. The first parameter is a URL which identifies a database. This document reserves the name http://www.w3.org/2004/05/06-Algae/#ephemoral to refer to a memory resident database. The second parameter is the handle. The handle may be either a URL or a variable. The third parameter is a set of zero or more bindings (see bindings below).
attach <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#dbs> ot:db (
                    properties="OrderTracking.prop")
The parameters are determined by the type of the database.
[5]    collectStr    ::=    ('collect' | 'select') '(' knownVar+ ')'
Identify a set of variables to report.
collect (?name ?mbox)
[6]    namespaceStr    ::=    ('ns' | 'namespace') namespaceDecl
Specify a namespace prefix associated with a namespace.
ns a=<http://www.w3.org/2000/10/annotation-ns#>
[7]    slurpStr    ::=    ('slurp' | 'read') sourceStr dbSpec? '(' binding* ')'
Import statements from the sourceStr. The sources of these statements will often be an RDF document, but need not be.
read <http://example.org/someWebPage> (user="bob" password="obo")
[8]    requireStr    ::=    'require' url
Specify a profile that the query server must support in order to satisfy this query.
require <http://www.w3.org/2004/06/20-rules/#rule>

decl &mdash statement declaration

Algae uses a subset of the Notation 3 [4] syntax to define a graph pattern. This subset includes only the syntax for making assertions with constants or variables. The following productions define the syntax for that syntax.
[9]    decl    ::=    subject propValueList
A single decl specifies that each result in the result set be replaced by (zero or more) results obtained from the following two steps:
1 Substitute the variables from that result for the variables in the decl and search for the resulting triple in the (current) database.
2 Evaluate the constraints on the triples obtained from step 1.
[10]    propValueList    ::=    property valueList
| propValueList ';' property valueList
[11]    valueList    ::=    value ('{' expr '}')*
| valueList ',' value ('{' expr '}')*
[12]    subject    ::=    urlvar
| '[' nulPropValList ']'
[13]    value    ::=    urlvarlit
| '[' nulPropValList ']'
[14]    nulPropValList    ::=   
| propValueList
| propValueList '.'
| propValueList ';' '.'
[15]    property    ::=    urlvar
For example, the following is a valid algae decl:
<#pat> <#child>  <#al>, <#chaz>, <#mo> ;
       <#age>    24 ;
       <#eyecolor> "blue" .

graphPattern &mdash represent a set of graphs

The graphPattern expresses a set of graphs with logical connectives.
[16]    graphPattern    ::=    QTerm
| graphPattern '.' graphPattern
| graphPattern '.'? '||' graphPattern
| graphPattern '.'? '|&' graphPattern
| graphPattern '.'? '|-' graphPattern
The "." connective indicates a conjunction, ie. the construction of a graph. However, when combined with the other connectives ("||", "|&", "|-"), it has no meaning — the pattern has the same meaning as if no "." were present. It is included only for ease in authoring queries.
A single QTerm specifies that each result in the result set be replaced by (zero or more) results obtained by evaluating the QTerm.
The conjunction connective, ".", specifies that each result in the result set be replaced by the (zero or more) results obtained by evaluating each graph pattern in the conjunction.
The shortcut disjunction connective, "||", specifies that each result in the result set be replaced by the (zero or more) results obtained by either, evaluating the left graph pattern in the conjunction, or, if the left pattern produced no results, evaluating the right graph pattern.
The union disjunction connective, "|&", specifies that each result in the result set be replaced by the union of the results from evaluating the left graph pattern and evaluating the right graph pattern.
The merged union disjunction connective, "|-", specifies that each result in the result set be replaced by the results from evaluating the left graph pattern and merged with the results from evaluating the right graph pattern. This merging operation reduces duplicate result tuples to a single result with that tuple with the proofs from both merged results. This conjunction is not in the core profile, but is in the sortable profile.
?annotation rdfs:type a:Annotation .
(?annotation d0:title ?title ||
 ?annotation d1:title ?title).
[17]    QTerm    ::=    decl
| '(' graphPattern '.'? ')'
| '~' QTerm
| '!' decl

expr &mdash expressions operating on atoms within a decl

exprs provide arithmetic and comparison functions on elements within the statement data model.
[18]    expr    ::=    expr '*' expr
| expr '/' expr
| expr '%' expr
| expr '+' expr
| expr '-' expr
| expr '<' expr
| expr '>' expr
| expr '<=' expr
| expr '>=' expr
| expr '==' expr
| expr '!=' expr
| expr '&' expr
| expr '^' expr
| expr '|' expr
| expr '&&' expr
| expr '||' expr
| single
| varOrName '=' expr
| varOrName '@' str
[19]    single    ::=    '-' single UMINUS
| '!' single
| '~' single
| '(' expr ')'
| internalName
| variable
| literal
| url

convenience productions

[20]    namespaceDecl    ::=    name '=' url
| url
[21]    sourceStr    ::=    url
| filename
[22]    varOrName    ::=    variable
| internalName
[23]    internalName    ::=    '%' name
[24]    urlvar    ::=    url
| variable
[25]    urlvarlit    ::=    urlvar
| literal
| internalName
[26]    variable    ::=    '?' variableName
[27]    knownVar    ::=    '?' variableName
[28]    variableName    ::=    VAR
[29]    literal    ::=    str
| INT
| FLOAT
| DtSTR url
[30]    str    ::=    STR
[31]    binding    ::=    name '=' urlvarlit
| internalName '=' urlvarlit
[32]    name    ::=    VAR
[33]    dbSpec    ::=    url
| variable
[34]    filename    ::=    FILE
[35]    url    ::=    URL
| qname
[36]    qname    ::=    VAR ':' VAR
| VAR

terminals

in order of precedence:
[T1]    URL    ::=    '<' space* ([^space>]+) space* '>'
[T2]    DtSTR    ::=    '"' ([^"]+) '"'^^
| "'" ([^']+) "'"^^
[T3]    STR    ::=    '"' ([^"]+) '"'
| "'" ([^']+) "'"
[T4]    FLOAT    ::=    [0-9]* '.' [0-9]+
[T5]    INT    ::=    [0-9]+
[T6]    VAR    ::=    [A-Za-z][A-Za-z0-9_]+

References

[1] "RDQL - A Query Language for RDF", Andy Seaborne

[2] "RDF Access to Relational Databases"

[3] "RDF Datbase Access to the Finger Protocol"

[4] "Primer: Getting into RDF & Semantic Web using N3", Tim Berners-Lee

[5] "Resource Description Framework (RDF): Concepts and Abstract Syntax", Graham Klyne, Jeremy J. Carroll

[6] "RDF Data Access Use Cases and Requirements", Kendall Grant Clark

Valid XHTML 1.0! Author: Eric Prud'hommeaux
$Date: 2004/06/21 08:51:36 $