Position paper for the W3C Query Languages Workshop 03-December-1998

Editors:
Paul Cotton (IBM) <cotton@ca.ibm.com>
David Fallside (IBM) <fallside@us.ibm.com>
Ashok Malhotra (IBM) <petsa@us.ibm.com>

Abstract

XML query covers a spectrum of requirements. We discuss a few points on the spectrum.

Position Paper on XML Query

IBM

Table of Contents

1. Spectrum
2. Characteristics and Requirements
    2.1 XML Query as an extension to a URL
    2.2 Querying from within a programming language
    2.3 Constructing XML documents
3. References

1. Spectrum

The requirements for querying XML documents cover a wide spectrum. Below, we identify a three points on the spectrum based on where the query executes. Later sections discuss characteristics and requirements for each of them.

The document(s) to be queried may be stored as a text file, or in memory as a [DOM] tree or in a repository. Documents stored in a repository may be stored as files or in a repository-specific representation.

2. Characteristics and Requirements

2.1 XML Query as an extension to a URL

The query expression can be appended to the end of a URL and identifies a part -- sub-element or attribute -- of a XML document. This applies only to documents stored as files although the files themselves may be stored in a repository. Clearly, this requires a linear syntax i.e. no markup. Also, while this requires navigation and selection, we suggest that it only requires simple selection. Specifically, it does not require full-text search.

Such a simple query facility is also required for [XSL] as well as [XLL]. The [XQL] proposal from Microsoft and Texcel attempts to address this point in the requirements spectrum.

2.2 Querying from within a programming language

In applications built around XML documents queries would, typically, execute from within a programming language. The query may execute against a DOM structure in memory or a collection of documents in files or in a repository. It could, equally well, execute against a collection of documents generated from data stored in database tables. In fact, working with databases as if they were collections of XML documents may well become a popular style for writing database applications.

This point on the query spectrum requires a full-function query capability including:

Although an XML-style syntax would seem desirable this is not strictly necessary. An XML-style syntax would need to be embedded into a programming language like [SQL] (e.g. [JDBC] or [SQLJ] for Java). Alternately, the programming language could be extended with constructs for working with XML documents. Java could, for example, be extended with classes representing XML documents, elements etc. with appropriate methods for querying, constructing and updating. Language-specific XML constructs would, however, be different for every programming language and the query facilities would not be declarative.

Let us assume, then, a single declarative, XML-style query language that would be embedded into various programming languages for execution. The code that would actually do the querying would need to be quite different depending on whether it was querying a DOM tree or a set of XML documents stored as relational database tables. Thus, for execution, it should be possible to translate the query into executable code appropriate for manner in which the document(s) are stored. Thus, the query should be translatable into operations on the DOM for parsed documents in memory, into SQL for documents stored in a relational database, [OQL] for documents stored in an object database, LDAP query, Lotus Notes query, etc.

The syntactic translation of the query is usually straighforward. Differences between data models cause more difficult problems. For example, if a collection of XML documents with the same DTD is mapped into a set of relational tables then the XML query will need to be translated into a SQL query. The translation will depend on the exact manner in which the documents were mapped to relational tables and the mapping strategy would need to be an input to the translator. Some repositories may not support all the facilities of the XML Query language.

Note that the SQL standard has recently been extended [SQL/MM] with a user-defined type "FullText" which supports a method (Contains()) that can be used to search occurrences of FullText using a text sub-query language.

The [XML-QL] proposal from AT&T and others attempts to address this point in the spectrum. It does not, however, have an XML syntax. Another difficulty is that some its constructs, such as tag variables, cannot be translated easily into SQL.

[DASL] (DAV Searching and Locating), an outgrowth of WebDAV, is another point on the spectrum. While it cannot be called from a programming language, it provides a relatively full-function query language in XML syntax which can be used from a client machine to query a collection of XML documents on a server and return a XML document as a result.

2.3 Constructing XML documents

It should be possible to embed XML queries within XML documents. This will be primarily used to construct XML documents from templates.

Definition: XML templates are XML documents in which certain element or attribute values are designated as variables.

To construct XML documents from a template, the variables are mapped to an input source that will supply the required values. This may be a user interface, a file or an XML query. We shall discuss only the XML query case in this paper. When the XML template is processed the query is executed and values returned are inserted into the appropriate variables and the XML document is generated. As discussed above, it may be necessary to translate the query into a form appropriate for the manner in which the source document(s) are stored.

The indirection between the template variables and the information source provides two benefits. First, it allows a template to be used with several different sources. Second, it allows a single query to return a set of values each of which is bound to one template variable. A single query that returns a set of values is typically much more efficient then executing a query to retrieve each value.

This architecture can also be used to create a set of documents from one template. Suppose that we are trying to create offer letters for a direct mail campaign. For simplicity suppose the letter template has two string variables, one for the name and the other for the address. The query can be set up to return a set of name, address pairs. Each pair can be fed to the template to generate a letter.

The template model can be used recursively. Template variables can themselves be associated with templates which when instantiated supply a sub-element. For example, a template could be created to create the following XML structure:

<Address>
  <Line1> ... </Line1>
  <Line2> ... </Line2>
  <City> ... </City>
  <State> ... </State> 
  <Zip> ... </Zip>
</Address>

This template could be tied to a query that returns five variables and creates an Address sub-element. A more complex template such as the letter template, above, could then use this address template to create its address sub-element.

3. References

DASL
DASL Proposal. See http://www.ics.uci.edu/pub/ietf/dasl
DCD
DCD Submission. See http://www.w3.org/TR/NOTE-dcd.
DOM
Document Object Model (DOM) Level 1 Specification. See http://www.w3.org/TR/REC-DOM-Level-1.
JDBC
JDBC 2.0 Specification.See http://java.sun.com/products/jdbc/jdbcse2.html.
OQL
Object Query Language. See The Object Database Standard, ODMG 2.0, R.G.G. Cattell (ed.), Morgan Kaufmann, 1997.
SQL
SQL Standard. See http://www.jcc.com/sql_stnd.html.
SQL/MM
SQL Multimedia Standard. See ftp://jerry.ece.umassd.edu/isowg3/dbl/BASEdocs/public/fcd3found.ps.
SQLJ
Proposed ISO/ANSI SQLJ Standard. See X3H2-98-320 DBL:BBN-015 June, 1998
XLL
XLL proposal. See http://www.w3.org/TR/NOTE-xlink-principles.
XML-QL
XML-QL Submission. See http://www.w3.org/TR/NOTE-xml-ql.
XQL
XQL Proposal. See http://www.w3.org/Style/XSL/Group/1998/09/XQL-proposal.html.
XSL
XSL working draft. See http://www.w3.org/TR/WD-xsl.