W3C Technology and Society Domain

Technical Plenary and WG Meeting Event, Royal Sonesta Hotel - Cambridge, MA USA

Possible RDF Query Work

At the

6-7 March 2003 - Semantic Web Architecture Meeting

will be/was a discussion of possible work on query languages for RDF. The quoted background reading of which includes:

From, specifically Andy and Alberto's use cases and EricP summary of the various languages, a pattern starts to emerge, in which a lot of very similar languages vary in certain features but remain.

The abstract syntax of the query

There was remarkable consistency in the abstract form of the query for abroad range of positive Horn(?) graph match query languages. Three levels appear:

  1. A path traversal query, in which predicates cannot be variables
  2. A general graph-match query, in which predicates cannot be variables
  3. A general graph-match query, in which predicates can be variables
  4. As above with literals and subgraphs allowed as nodes.

but one notes that all levels can be covered by an unrestricted syntax, and services characterized by the restrictions they impose on the graph. An open question is the level or levels to be standardized.

An interesting comparison is with the RuleML project which aims to integrate many non-RDF rule languages. It also uses a generic syntax with multiple sublanguages (see DTDs) and the categories of sublanguage should be compared. The languages being considered are not webized in that they do not in general use URIs to identify things, individuals and predicates are identified by PCDATA strings. Connections to RDF include the addition of a ur form of constant (for some reason, rather than simply the use of URIs as identifiers), and the move toward the RDF reification of a ruleML query. This reification may be able to be converged with reification of an RDF query language. It seems evident that translation between an RFD query standard and RuleML will be straightforward. It isn't clear the extent to which the result will be re-exportable into the various rule engines. Conversion in the reverse way would require the supply of namespace URIs, and the conversion of Naries to combinations of Binaries.

Things which are not covered by these levels, are the ability to distinguish matches from difference data sources within the query, to be able to take action on a particular data source not containing a given piece of information. These are not covered here.

The abstract syntax of the returned result

Query systems differed in whether the result is returned as
  1. a set of bindings,
  2. the matched RDF graph, or
  3. an RDF graph built from a given template

Clearly these are interconvertible and have different advantages and disadvantages, incompletely tabulated below. It may be best to require support for more than one or even all three.

Returned result
Pro Con
a set of bindings
  • Byte-efficient
  • Bindings only useful specifically linked to query.
  • Needs arbitrary RDF encoding to be chosen.
the matched RDF graph
  • RDF data is a valid subset of the datastore, irrelevant of context.
  • Returns parts of the graph which may not be required
an RDF graph built from a given template
  • Most flexible
  • Makes query equivalent to rules
  • Provides a possible extension for remote query to add deduced results back into data store
  • (Privacy implications in a distributed system)

Syntax choices

The syntax choices (except for Versa's path-specific syntax) were mainly independent of the above semantic distinctions. Most query languages had non-XML compact syntax, which is not surprising given that even the native XML Query language uses a non-XML syntax. The non-XML examples differed in various styles

Among XML syntaxes are two basic approaches: one, to wrap regular RDF syntax with punctuation to make it a rules language, adding variables and the grouping statements of the query and return template; two, to reify a query in great and quite verbose detail. The latter methods is very explicit, but for clarity in an RFD world would best take the form of RDF itself. It seems that attempts to use XML for RuleML in an RDF-compatible way led Harold Boley to conclude that XML should be changed.

Semantics

The semantics of the queries chiefly differ along two axes:

Various different query systems had quite different powers of inference: several simply query a static database, several do a query with built-in use of certain axioms such as OWL axioms, some precompute an index of transitive closure, class membership, and so on. However, for all these differences in the deductive power of the store, the operation of query could always conceptually be considered to be a straightforward graph match query on some conceptual data store which was the deductive closure of the data under the kind of inference supported. [ref Pat Hayes presentation to DAML-PI meeting] .

This concept can be extended include the support or otherwise of built-in functions: they do not, either, change the form of the query language.

Therefore, the RDF query language can be defined independently of the specification of the inference levels of the service.

It does make sense to make an ontology of the types of service offered, for example OWL-complete service, and to define relationships between datasets with or without various forms of inference. It seems that this is a lower priority, and less advanced. The need for standards is not so acute

In the case that a given query service supports optional powers of inference, then one would expect a description of the service requested to be sent with the query, but that it would use the same ontology.

Built-in functions such as arithmetic and string operations, and web access are a classic standardization problem and indeed many existing libraries exist and should be referenced. Existing systems which have libraries include the XQ set of functions (many of which are not XQ-specific), and the cyc and prolog libraries. This work is very connected with datatype definitions, and as datatypes (with the noted exception of rational numbers) are defined by XML Schema, it could be hoped that definitions and for the datatype operations should be provided by other work, with some effort required to reference them for use in the RDF query language.

Remote query

One a query language is defined, the mechanism or mechanisms for accessing a query service should be fairly straightforward to define on top of remote access protocols such as raw HTTP+GET, and/or SOAP 1.2. ad

Possible Deliverables

So, if work were begun in this area, formally or informally, more or less in chronological order, one might hope to see:

Future work which would extend the query language one could image being done in parallel but is not at such an advanced stage of development and need for standardization as the basic query language includes:

Need among RDF users

There is a need for RDF Query not only as a stand-alone langauge for RDF systems, but also for

References and footnotes

Background reading above and all references

R. V. Guha et al, Enabling Inferencing, position paper for the W3C Query Languages meeting in Boston, December 3-4th 1998.. Mentions many of the points made above. (and other papers from that workshop.)

XML Query

RuleML. Links within the text above may help the reader find various aspects of this project.

RFML

[] Minutes if any from DAML-PI rules breakout meeting.

[] Rational numbers have been defined by CC/PP and are used in UAProf profiles, although a complete definition of operators on them does not exist as far as I know (2003-03)