Copyright ©2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document specifies goals, usage scenarios, and requirements for the W3C XML Query data model, algebra, and query language.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This is a W3C Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.
This document has been produced as part of the W3C XML Activity, following the procedures set out for the W3C Process. The document has been written by the XML Query Working Group (W3C members only). The goals of the XML Query working group are discussed in the XML Query Working Group charter (W3C members only).
The XML Query Working Group feels that the contents of this Working Draft are pretty much stable, and therefore encourages to provide feedback as early as possible.
Comments on this document should be sent to the W3C mailing list www-xml-query-comments@w3.org (archived at http://lists.w3.org/Archives/Public/www-xml-query-comments/).
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
1 Goals
2 Usage Scenarios
3 Requirements
3.1 Terminology
3.2 General
Requirements
3.3 XML Query
Data Model
3.4 XML
Query Functionality
4 Relationship to Other Activities
5 References (non-normative)
A Glossary
The goal of the XML Query Working Group is to produce a data model for XML documents, a set of query operators on that data model, and a query language based on these query operators. The data model will be based on the W3C XML Information Set, and will include support for Namespaces.
Queries operate on single documents or fixed collections of documents. They can select whole documents or subtrees of documents that match conditions defined on document content and structure, and can construct new documents based on what is selected.
The following usage scenarios describe how XML queries may be used in various environments, and represent a wide range of activities and needs that are representative of the problem space to be addressed. They are intended to be used as design cases during the development of XML Query, and should be reviewed when critical decisions are made. These usage scenarios should also prove useful in helping non-members of the XML Query Working Group understand the intent and goals of the project.
Perform queries on structured documents and collections of documents, such as technical manuals, to retrieve individual documents, to generate tables of contents, to search for information in structures found within a document, or to generate new documents as the result of a query.
Perform queries on the XML representation of database data, object data, or other traditional data sources to extract data from these sources, to transform data into new XML representations, or to integrate data from multiple heterogeneous data sources. The XML representation of data sources may be either physical or virtual; that is, data may be physically encoded in XML, or an XML representation of the data may be produced.
Perform both document-oriented and data-oriented queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents.
Perform queries on configuration files, user profiles, or administrative logs represented in XML.
Filter streams of XML data, such as logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data, either as a traditional UNIX-style filter that extracts or transforms its input data, or to specify filters and profiles for routing messages represented in XML.
Perform queries on DOM structures to return sets of nodes that meet the specified criteria.
Perform queries on collections of documents managed by native XML repositories or web servers.
Perform queries to search catalogs that describe document servers, document types, or documents. Such catalogs may be combined to support search among multiple servers. A document-retrieval system could use queries to allow the user to select server catalogs, represented in XML, by the information provided by the servers, by access cost, or by authorization. Once a server is selected, a retrieval system could query the kinds of documents found on the server and allow the user to query those documents.
Queries may be used in many environments. For example, a query might be embedded in a URL, an XML page, or a JSP or ASP page; represented by a string in a program written in a general-purpose programming language; provided as an argument on the command-line or standard input; or supported by a protocol, such as DASL or Z39.50.
The following key words are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:
When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
The XML Query Language MAY have more than one syntax binding. One query language syntax MUST be convenient for humans to read and write. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query.
The XML Query Language MUST be declarative. Notably, it MUST not enforce a particular evaluation strategy.
The XML Query Language MUST be defined independently of any protocols with which it is used. (Relationships to some specific protocols are discussed in [4 Relationship to Other Activities]. )
The XML Query Language MUST define standard error conditions that can occur during the execution of a query, such as processing errors within expressions, unavailability of external functions to the query processor, or processing errors generated by external functions.
Version 1.0 of the XML Query Language MUST not preclude the ability to add update capabilities in future versions.
The XML Query Language MUST be defined for finite instances of the data model. It MAY be defined for infinite instances.
The XML Query Data Model relies on information provided by XML Processors and Schema Processors, and it MUST ensure that it does not require information that is not made available by such processors. For XML constructs found in XML 1.0 or the Namespaces Recommendation, the XML Query Data Model MUST show how the equivalent XML Query Data Model constructs are are defined in terms of items in the XML Information Set. The XML Query Data Model SHOULD represent all information items, or provide justification for any information items omitted. For information found in the XML Schema, such as datatypes, the Data Model MUST coordinate with the XML Schema Working Group to ensure that schema processors may be relied on to provide the information needed to construct the Data Model.
The XML Query Data Model MUST represent both XML 1.0 character data and the simple and complex types of the XML Schema specification.
The XML Query Data Model MUST represent collections of documents and collections of simple and complex values. (Note that collections are not part of the current XML Infoset.)
The XML Query Data Model MUST include support for references, including both references within an XML document and references from one XML document to another.
Queries MUST be possible whether or not a schema is available (in this document, the term "schema" may refer to either an XML Schema or a DTD). If a schema is available, the data model MUST represent any items which they define for their instances, such as default attributes, entity expansions, or data types. These items will not be present if a schema is not present.
The XML Query Language and XML Query Language Data Model MUST be namespace aware.
The XML Query Language MUST support operations on all data types represented by the XML Query Data Model (see datatypes, collections, references) .
Operations on text MUST be applicable to text that spans element boundaries.
Operations on collections MUST include support for universal and existential quantifiers.
Queries MUST support operations on hierarchy and sequence of document structures.
The XML Query Language MUST be able to combine related information from different parts of a given document or from multiple documents.
The XML Query Language MUST be able to compute summary information from a group of related document elements (this operation is sometimes called "aggregation.")
The XML Query Language MUST support expressions in which operations can be composed, including the use of queries as operands.
The XML Query Language MUST include support for NULL values. Therefore, all operators MUST take NULL values into account, including logical operators.
Queries MUST be able to preserve the relative hierarchy and sequence of input document structures in query results.
Queries MUST be able to transform XML structures and MUST be able to create new structures.
Queries MUST be able to traverse intra- and inter-document references.
Queries MUST be able to preserve the identity of items in the XML Query Data Model.
Queries SHOULD be able to operate on XML Query Data Model instances specified with the query. We refer to such data as "literal data" in this document.
Queries SHOULD be able to operate on names, such as element names, attribute names, and processing instruction targets, and to operate on combinations of names and data.
Queries SHOULD provide access to the XML schema or DTD for a document, if there is one. If the schema is represented as a DTD, a mapping to an appropriate XML Schema representation MAY be required.
The XML Query Language SHOULD support the use of externally defined functions on all datatypes of the XML Query Data Model. The interface to such functions SHOULD be defined by the Query Language, and SHOULD distinguish these functions from functions defined in the Query Language. The implementation of externally defined functions is not part of the Query Language.
The XML Query Language MUST provide access to information derived from the environment in which the query is executed, such as the current date, time, or user.
Queries MUST be closed with respect to the XML Query Data Model. Both the input to a query and the output of a query MUST be defined purely in terms of the XML Query Data Model. Non-XML sources such as traditional databases or objects may be queried if they are given an XML Query Data Model representation. Similarly, query results are defined purely in terms of the XML Query Data Model. In software systems these results may be instantiated in any convenient representation such as DOM nodes, hyperlinks, XML text, or various data formats.
XML has become a strategic technology in W3C and in the global Web market. The deliverable of the XML Query Working Group MUST satisfy the dependencies from the following Working Groups before it can advance to Proposed Recommendation. Some dependencies to and from the following W3C Working Groups will require close cooperation during the development process; the requirements posed for the Query work by these Working Groups may change during the development process, which means the interdependency of the Query work with these Working Groups must be managed actively:
For example, it should be possible to base query predicates on the existing DTD or XSDL definition of the content of an XML document and on the new data types being defined as part of the XDTL.
There are no requirements for co-development of features with the following Working Groups, but there are points of contact between their work and that of this Working Group, and thus logical dependency between their deliverables and those of this Working Group. Requirements from these Working Groups are expected to be well suited for communication via documents:
Formal liaison between the XML Query Working Group and other W3C working groups, including the other XML working groups and the WAI (Web Accessibility Initiative) group, as well as organizations outside of the W3C, shall be accomplished by the exchange of documents (requirements, reviews, etc.) transmitted through the XML Coordination Group.
The following references are some of the works considered by the WG in deriving its requirements.