XML query covers a spectrum of requirements. We discuss a few points on the spectrum.
The requirements for querying XML documents cover a wide spectrum. Below, we identify a three points on the spectrum based on where the query executes. Later sections discuss characteristics and requirements for each of them.
The document(s) to be queried may be stored as a text file, or in memory as a [DOM] tree or in a repository. Documents stored in a repository may be stored as files or in a repository-specific representation.
The query expression can be appended to the end of a URL and identifies a part -- sub-element or attribute -- of a XML document. This applies only to documents stored as files although the files themselves may be stored in a repository. Clearly, this requires a linear syntax i.e. no markup. Also, while this requires navigation and selection, we suggest that it only requires simple selection. Specifically, it does not require full-text search.
Such a simple query facility is also required for [XSL] as well as [XLL]. The [XQL] proposal from Microsoft and Texcel attempts to address this point in the requirements spectrum.
In applications built around XML documents queries would, typically, execute from within a programming language. The query may execute against a DOM structure in memory or a collection of documents in files or in a repository. It could, equally well, execute against a collection of documents generated from data stored in database tables. In fact, working with databases as if they were collections of XML documents may well become a popular style for writing database applications.
This point on the query spectrum requires a full-function query capability including:
Although an XML-style syntax would seem desirable this is not strictly necessary. An XML-style syntax would need to be embedded into a programming language like [SQL] (e.g. [JDBC] or [SQLJ] for Java). Alternately, the programming language could be extended with constructs for working with XML documents. Java could, for example, be extended with classes representing XML documents, elements etc. with appropriate methods for querying, constructing and updating. Language-specific XML constructs would, however, be different for every programming language and the query facilities would not be declarative.
Let us assume, then, a single declarative, XML-style query language that would be embedded into various programming languages for execution. The code that would actually do the querying would need to be quite different depending on whether it was querying a DOM tree or a set of XML documents stored as relational database tables. Thus, for execution, it should be possible to translate the query into executable code appropriate for manner in which the document(s) are stored. Thus, the query should be translatable into operations on the DOM for parsed documents in memory, into SQL for documents stored in a relational database, [OQL] for documents stored in an object database, LDAP query, Lotus Notes query, etc.
The syntactic translation of the query is usually straighforward. Differences between data models cause more difficult problems. For example, if a collection of XML documents with the same DTD is mapped into a set of relational tables then the XML query will need to be translated into a SQL query. The translation will depend on the exact manner in which the documents were mapped to relational tables and the mapping strategy would need to be an input to the translator. Some repositories may not support all the facilities of the XML Query language.
Note that the SQL standard has recently been extended [SQL/MM] with a user-defined type "FullText" which supports a method (Contains()) that can be used to search occurrences of FullText using a text sub-query language.
The [XML-QL] proposal from AT&T and others attempts to address this point in the spectrum. It does not, however, have an XML syntax. Another difficulty is that some its constructs, such as tag variables, cannot be translated easily into SQL.
[DASL] (DAV Searching and Locating), an outgrowth of WebDAV, is another point on the spectrum. While it cannot be called from a programming language, it provides a relatively full-function query language in XML syntax which can be used from a client machine to query a collection of XML documents on a server and return a XML document as a result.
It should be possible to embed XML queries within XML documents. This will be primarily used to construct XML documents from templates.
Definition: XML templates are XML documents in which certain element or attribute values are designated as variables.
To construct XML documents from a template, the variables are mapped to an input source that will supply the required values. This may be a user interface, a file or an XML query. We shall discuss only the XML query case in this paper. When the XML template is processed the query is executed and values returned are inserted into the appropriate variables and the XML document is generated. As discussed above, it may be necessary to translate the query into a form appropriate for the manner in which the source document(s) are stored.
The indirection between the template variables and the information source provides two benefits. First, it allows a template to be used with several different sources. Second, it allows a single query to return a set of values each of which is bound to one template variable. A single query that returns a set of values is typically much more efficient then executing a query to retrieve each value.
This architecture can also be used to create a set of documents from one template. Suppose that we are trying to create offer letters for a direct mail campaign. For simplicity suppose the letter template has two string variables, one for the name and the other for the address. The query can be set up to return a set of name, address pairs. Each pair can be fed to the template to generate a letter.
The template model can be used recursively. Template variables can themselves be associated with templates which when instantiated supply a sub-element. For example, a template could be created to create the following XML structure:
<Address> |
This template could be tied to a query that returns five variables and
creates an Address
sub-element. A more complex template
such as the letter template, above, could then use this address template
to create its address sub-element.