The document describes Algae, an RDF query language used in the W3C Annotea Server. Algae has also served as a research platform for studies in interfacing with relational databases and system information.
This document is a work in progress by the author and does not represent any endorsement from the W3C.
The Algae query language evolved to serve the needs of W3C's Annotea protocol. It provides query and assertion access to RDF Graphs. Algae's functionality followed the development and refinement of RDF itself, including the addition of XML datatypes, nodeIds and collections.
Algae can be used to query a graph, insert data into a graph, or write rules to automatically insert data when a query is matches. These are called actions and the directives are spelled ask, assert, and fwrule respectively. Each Algae action works on an RDF graph called the working graph and a result set. The result set holds tuples for all the query solutions and proofs.
This section describes how an Algae query selects nodes and statements from an RDF graph. Many examples demonstrate queries similar to those used in the RDQL Submission [1]. In such cases, the analogous RDQL example is referenced for each in comparing the two languages.
Example 1: find nodes of a given type (RDQL Example 1)
ask (?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/10/annotation-ns#Annotation>)
The query pattern (inside 'ask()') is a series (in this case, a single one) of query terms. Variables are prefixed with a "?" and URIs are enclosed in "<>". This query binds the variable ?x
to each node with type
Annotation
. For example, ?x will bind to Annot1 and Annot8 in the example data
<r:RDF xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:a="http://www.w3.org/2000/10/annotation-ns#"> <r:Description r:about="http://example.com/things#Annot1"> <r:type r:resource="http://www.w3.org/2000/10/annotation-ns#Annotation" /> </r:Description> <a:Annotation r:about="http://example.com/things#Annot8" /> <r:RDF>
The query terms may be simplified (or at least shortened) by declaring namespace prefixes. Textual report tables are specified with the collect directive.
Example 1b: use namespace delcarations and specify a report
ns r=<http://www.w3.org/1999/02/22-rdf-syntax-ns#> ns a=<http://www.w3.org/2000/10/annotation-ns#> ask (?x rdf:type a:Annotation) collect (?x)
Run against the previous data set, this query produces
?x |
---|
<http://example.com/things#Annot1> |
<http://example.com/things#Annot8> |
Namespace abbreviate is especially handy when introducing multiple terms to a query.
Example 2: query with multiple terms (RDQL Example 2)
ns vcard=<http://www.w3.org/2001/vcard-rdf/3.0#> ask (?vcard vcard:FN "John Smith" . ?vcard vcard:N ?name . ?name vcard:Family ?family . ?name vcard:Given ?given)
As this query has multiple query terms, they are separated by a term terminator ('.'). The could also be separated by a disjunction connective ('||').
This query will bind ?family and ?given to "Smith" and "Johnathan" respectively when matched against the following data:
Example 3: integer value constraint (RDQL Example 3)
ns info=<http://example.org/peopleInfo#> ask (?resource info:age ?age {?age >= 18 && ?age < 24} . ?resource info:shoeSize ?size {?size <= 44})
In algae, constraints are syntactically differentiated from the query terms. This indicates to the implementation that it should compare the ?size binding to 44 rather than looking for the extra assertions:
foo info:showSize bar. bar <= 44
Frequently it a query will unify data from more than one data source. The read directive tells Algae to incorporate the assertions from a given resource into the working graph.
ns info=<http://example.org/peopleInfo#> read <http://example.org/someWebPage> () ask (?resource info:age ?age {?age >= 24})
The working graph defaults to the data supplied by the application. The result set contains all the bindings discovered at any point. Subsequent ask actions will act on the result set from previous actions.
Example 5: result set extension (extends Example 4: query targeting)
ns info=<http://example.org/peopleInfo#> read <http://example.org/someWebPage> () ask (?resource info:age ?age {?age >= 24}) read <http://example.org/anotherWebPage> () ask (?resource info:shoeSize ?shoeSize {?shoeSize >= 42})
The query pattern describes a graph template which is compared against the working graph. The query pattern may include optional terms — terms which need not correspond to arcs in the working graph. This functionality is analogous to outer joins in SQL, and, in the SQL triple store [3], is implemented with outer joins.
Example 6: query with optional terms (extends Example 2: query with multiple terms)
ns vcard=<http://www.w3.org/2001/vcard-rdf/3.0#> ask (?vcard vcard:FN "John Smith" . ?vcard vcard:N ?name . ?name vcard:Family ?family . ?name vcard:Given ?given . ~?vcard foaf:mbox ?mbox . ~?vcard foaf:mbox_sha1sum ?sum)
A particular vcard entry may or may not have FOAF [4] data attached. This query will retrieve two FOAF properties if present, but will still return vcard entries without this vcard information.
An ask action may start with optional terms. This simply means that no solutions will be struck from the result set inherited from earlier actions.
Algae provides an extensibility mechanism where clients may specify a set of features required to execute a query. Sets of features are identified by a profile name. Algae processors are required to support the set of features identified by the URI http://www.w3.org/2004/05/06-Algae/#core. Additional requirements are identified by the requires directive. Such features are subsequently identified as "required features". A processor encountering features that are neither in the core
profile, nor included in a the required features
, should return MalformedQueryException.
[informative] The requires
mechanism is intended to provide a balance between interoperability through standardization and flexibility. Ideally, a small set of profiles will evolve. Those designing profiles are encouraged, when practical, to adopt already-existing profiles that together encompass the desired feature set.
The core features does not require an ordered result set — it is suitable for streaming large result sets. It includes the connectives:
[informative] The core profile is intended to address the RDF Data Access Working Group's proposed requirements on streaming and result limits [6].
The sortable features specifically requires an ordered result set. It includes the connective:
Algae may be used over a variety of stores. By default, the working graph is an in-memory data store [2]. The query may specifically draw upon other stores by using the attach directive:
ns ot=<http://localhost/OrderTracking#> attach <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#dbs> ot:db ( properties=\"../test/OrderTracking.prop\") ns alg=<http://www.w3.org/1999/02/26-modules/algae#> ask ot:db ( ?p ot:Products_partNo ?partNo . ?p ot:Products_price ?price {?price >= 15.00 && ?price <= 50.00} . ?p ot:Products_description ?description ) read <file:../test/Federate0-reviews.rdf> (inputLang=\"rdf\") ns rvw=<http://localhost/Reviews#> ask ( ~?p rvw:rating ?rating . ~?p rvw:distribution ?dist )
Implemented data stores include an SQL data store [3] and various application-data accessors [4].
All logical Algae operations are defined in terms of a result set. The result set is an ordered 1 set of results. Each result is a tuple of variable bindings paired with a set of supporting assertions.
?name | ?mbox | proofs |
---|---|---|
Alice | alice@example.com | _:1 pim:given "Alice" . |
_:1 foaf:mbox <mailto:alice@example.com> . | ||
Bob | bob@example.com | _:2 foaf:first_name "Bob" . |
foaf:first_name owl:sameAs pim:mbox . | ||
_:2 foaf:mbox <mailto:bob@example.com> . |
An Algae query starts with an unbound result set. This is a result set with a single result. That result has no bindings and no proofs. The operations defined below transform the result set. Those operators in the core profile manipulate only a single result, potentially duplicating it or eliminating it. Operators in the sortable profile manipulate more than one result at a time and thus need access to the entire result set.
A result set from which all the results have been eliminated is an empty result set.
1 The core profile, defined above, may use an unordered result set.
Statements in Algae extend RDF triples [5]. In addition to the predicate, subject and object of a triple, Algae statements contain an attribution for tracking provenance information. Attributions are select by the the internalName %ATTRIB
and may be constrained or assigned to variables within exprs.
ask (?an rdfs:type a:Annotation {%ATTRIB == <http://example.com/a1.rdf>}. ?an a:body ?body {%ATTRIB == <http://example.com/a1.rdf>})
There are no reserved words in the Algae grammar. Variables may be called ?ask or ?namespace.
The Algae grammar starts with the query production:
query &mdash an Algae query string | [1] | query |
::= | (actionStr |
An algae query is made up of actions and include directives. Include directives identify a URL or file whose contents are to be included in the query. The query processor replaces all include directives with the actions resulting from parsing the included resource. A resource may be included more than once, however, circular inclusions are a CircularInclusionException.
@foaf-query.alg collect (?who ?knowsWho) | |||
actionStr &mdash Algae functions | |||
[2] | actionStr |
::= | askStr |
[3] | askStr |
::= | 'ask' dbSpec? '(' graphPattern '.'? ')' |
The ask directive identifies a query pattern and optionally database in which to attempt to match that pattern.
ask <http://example.com/annotationsDB> (?a rdf:type a:Annotation ?a a:annotates <http://example.com/some/page.html>) | |||
[4] | attachStr |
::= | 'attach' urlvar urlvarlit '(' binding* ')' |
This provides the parameters necessary for attaching to a database and determines the handle used to subsequently identify this database. The first parameter is a URL which identifies a database. This document reserves the name http://www.w3.org/2004/05/06-Algae/#ephemoral to refer to a memory resident database. The second parameter is the handle. The handle may be either a URL or a variable. The third parameter is a set of zero or more bindings (see bindings below).
attach <http://www.w3.org/2003/01/21-RDF-RDB-access/ns#dbs> ot:db ( properties="OrderTracking.prop")The parameters are determined by the type of the database. | |||
[5] | collectStr |
::= | ('collect' | 'select') '(' knownVar+ ')' |
Identify a set of variables to report.
collect (?name ?mbox) | |||
[6] | namespaceStr |
::= | ('ns' | 'namespace') namespaceDecl |
Specify a namespace prefix associated with a namespace.
ns a=<http://www.w3.org/2000/10/annotation-ns#> | |||
[7] | slurpStr |
::= | ('slurp' | 'read') sourceStr dbSpec? '(' binding* ')' |
Import statements from the sourceStr . The sources of these statements will often be an RDF document, but need not be.
read <http://example.org/someWebPage> (user="bob" password="obo") | |||
[8] | requireStr |
::= | 'require' url |
Specify a profile that the query server must support in order to satisfy this query.
require <http://www.w3.org/2004/06/20-rules/#rule> | |||
decl &mdash statement declaration | |||
Algae uses a subset of the Notation 3 [4] syntax to define a graph pattern. This subset includes only the syntax for making assertions with constants or variables. The following productions define the syntax for that syntax. | |||
[9] | decl |
::= | subject propValueList |
A single decl specifies that each result in the result set be replaced by (zero or more) results obtained from the following two steps: 1 Substitute the variables from that result for the variables in the decl and search for the resulting triple in the (current) database. 2 Evaluate the constraints on the triples obtained from step 1. | |||
[10] | propValueList |
::= | property valueList |
[11] | valueList |
::= | value ('{' expr '}')* |
[12] | subject |
::= | urlvar |
[13] | value |
::= | urlvarlit |
[14] | nulPropValList |
::= |
|
[15] | property |
::= | urlvar |
For example, the following is a valid algae decl:
<#pat> <#child> <#al>, <#chaz>, <#mo> ; <#age> 24 ; <#eyecolor> "blue" . | |||
graphPattern &mdash represent a set of graphs | |||
The graphPattern expresses a set of graphs with logical connectives. | [16] | graphPattern |
::= | QTerm |
The ". " connective indicates a conjunction, ie. the construction of a graph. However, when combined with the other connectives ("|| ", "|& ", "|- "), it has no meaning — the pattern has the same meaning as if no ". " were present. It is included only for ease in authoring queries. | |||
A single QTerm specifies that each result in the result set be replaced by (zero or more) results obtained by evaluating the QTerm. | |||
The conjunction connective, ". ", specifies that each result in the result set be replaced by the (zero or more) results obtained by evaluating each graph pattern in the conjunction. | |||
The shortcut disjunction connective, "|| ", specifies that each result in the result set be replaced by the (zero or more) results obtained by either, evaluating the left graph pattern in the conjunction, or, if the left pattern produced no results, evaluating the right graph pattern. | |||
The union disjunction connective, "|& ", specifies that each result in the result set be replaced by the union of the results from evaluating the left graph pattern and evaluating the right graph pattern. | |||
The merged union disjunction connective, "|- ", specifies that each result in the result set be replaced by the results from evaluating the left graph pattern and merged with the results from evaluating the right graph pattern. This merging operation reduces duplicate result tuples to a single result with that tuple with the proofs from both merged results. This conjunction is not in the core profile, but is in the sortable profile. | |||
?annotation rdfs:type a:Annotation . (?annotation d0:title ?title || ?annotation d1:title ?title). | |||
[17] | QTerm |
::= | decl |
expr &mdash expressions operating on atoms within a decl | |||
exprs provide arithmetic and comparison functions on elements within the statement data model. | |||
[18] | expr |
::= | expr '*' expr |
[19] | single |
::= | '-' single UMINUS |
convenience productions | |||
[20] | namespaceDecl |
::= | name '=' url |
[21] | sourceStr |
::= | url |
[22] | varOrName |
::= | variable |
[23] | internalName |
::= | '%' name |
[24] | urlvar |
::= | url |
[25] | urlvarlit |
::= | urlvar |
[26] | variable |
::= | '?' variableName |
[27] | knownVar |
::= | '?' variableName |
[28] | variableName |
::= | VAR |
[29] | literal |
::= | str |
[30] | str |
::= | STR |
[31] | binding |
::= | name '=' urlvarlit |
[32] | name |
::= | VAR |
[33] | dbSpec |
::= | url |
[34] | filename |
::= | FILE |
[35] | url |
::= | URL |
[36] | qname |
::= | VAR ':' VAR |
terminals | |||
in order of precedence: | |||
[T1] | URL |
::= | '<' space* ([^space>]+) space* '>' |
[T2] | DtSTR |
::= | '"' ([^"]+) '"'^^ |
[T3] | STR |
::= | '"' ([^"]+) '"' |
[T4] | FLOAT |
::= | [0-9]* '.' [0-9]+ |
[T5] | INT |
::= | [0-9]+ |
[T6] | VAR |
::= | [A-Za-z][A-Za-z0-9_]+ |
[1] "RDQL - A Query Language for RDF", Andy Seaborne
[2] "RDF Access to Relational Databases"
[3] "RDF Datbase Access to the Finger Protocol"
[4] "Primer: Getting into RDF & Semantic Web using N3", Tim Berners-Lee
[5] "Resource Description Framework (RDF): Concepts and Abstract Syntax", Graham Klyne, Jeremy J. Carroll
[6] "RDF Data Access Use Cases and Requirements", Kendall Grant Clark
Author: Eric Prud'hommeaux