Non-Position Paper for Quark, Inc.

This paper lacks a specific position with regard to XML QL, but rather contains a set of guidelines by which we will evaluate differing proposals. Instead of presenting implementation specifics for our vision of an XML query language, we prefer instead to list the features upon which we will rely.

Quark product users are generally aligned into two categories: 1) the small shop users with little need for automated job tracking, and 2) the large publication sites with extensive data storage and retrieval systems. It is the second class of customers that seem to have the highest need for a document workflow that includes XML. Therefore, our requirements for XML QL are derived mostly from the needs of those large sites.

Some big generalizations can be made about large publication sites:

A minority of their users consider themselves "power users"; but a non-trivial number consider themselves "computer-phobic".
Users seek information at the granularity of a document. It will be rare when they wish to retrieve only a part of a document.
The DTDs of these sites are few in number, similar in structure, and are altered rarely.

The first generalization requires that Quark provide a GUI for query operations. Therefore, we are not deeply concerned about the simplicity of the query language because it will be computer generated (with rare exceptions). Also, the complexity of the average query will be low, which eliminates the need for the more exotic features of the query language. The second generalization obviates the need for Quark to provide extensive document fragment functionality. The third generalization emphasizes the static nature of the structure of the content at these sites.

The favorite phrase of a fellow programmer is:
"Good programmers imitate, great programmers steal."
In this spirit, I have parsed other position papers for this conference and shamelessly stole pieces that seem most relevant to Quark. Once again, we are interested more in features than syntax.

Italicized text represents comments added by me.

Features that Interest Quark

Excerpts from: David Maier

XML Output
An XQuery should yield XML output. Such a closure property has many benefits. Derived databases (views) can be defined via a single XQuery. Query composition and decomposition is aided. It is transparent to applications whether they are looking at base data or a query result. Yes, it is convenient.

Exploit Available Schema
Conversely, when DTDs are available for a data source, it should be possible to judge whether an XQuery is correctly formed relative to the DTDs, and to calculate a DTD for the output. This capability can detect errors at compile time rather than run time, and allows a simpler interface for applications to manipulate a query result. Yes. If we allow users to write free form queries, this will be essential. We cannot expect them to debug their queries, we need to do that for them through validation.

XML Representation
An XQuery should be representable in XML. While there may be more than one syntax for XQuery, one should be as XML data. (Note that XSL is written in XML.) This property means that there do not need to be special mechanisms to store and transport XQueries, beyond what is required for XML itself. It also helps satisfy Requirement 2.7. Yes, it is convenient.

Programmatic Manipulation
XQueries should be amenable to creation and manipulation by programs. Most queries will not be written directly by users or programmers. Rather, they will be constructed through user-interfaces or tools in application development environments. Yes!

Support for New Datatypes
XQuery should have an extension mechanism for conditions and operations specific to a particular datatypes. I am thinking mainly of specialized operations for selecting different kinds of multimedia content. Yes, but how?

Excerpts from: Olken and McCarthy

Ability to query multiple documents
We view this as essential for our applications. Yes, so do we.

Uniform treatment of attributes and elements.
XML has two different syntaxes which are often used equivalently: attributes specified in the tags, and nested elements. A number of XML dialect proposals treat these as equivalent. We would therefore like a query language which also did so. Obviously, attributes can not be nested. Yes.

Excerpts from: Robie, Lapp and Schach

XQL shall be expressed in strings that can easily be embedded in programs, scripts, and XML or HTML attributes. Yes, well, probably not HTML attributes for us.

XQL queries may return any number of results, including 0. Yes. It would also be nice to limit the number of returns, like "give me first 5 results" or "give me the first one you find".

Excerpt from: Noah Mendelsohn

A non-XML repository, such as a relational database, which stores information in a manner optimized to its own needs. In these situations, XML will be presented as a "view" of the underlying data. Yes, having an XML object metaphor on top of a RDB would be sweet. But is this the job of the query language?

Excerpt from: Deutsch, Fernandez, Florescu, Levy, Suciu

Transforming XML Data
An important use of XML-QL is transforming XML data. For example, we may want to translate data from one DTD (a.k.a. ontology) into another. Yes, if a site alters their DTD it would be nice for them to have a mapping mechanism to coerce documents of the old DTD into the new DTD. But I pity the poor sap who has to implement a UI for this.

Features that are not so important to Quark:

Excerpts from: Robie, Lapp and Schach

XQL strings shall be compact.
XQL shall be easy to type and read.
XQL syntax shall be simple for the simple and common cases.
Yes, we should strive for this, but for Quark functionality is more important than clean syntax.

Excerpt from: David Maier

No Schema Required
XQuery should be usable on XML data when there is no schema (DTD) known in advance. XML data is structurally self-describing, and it should be possible to an XQuery to rely on such "just-in-time" schema information in its evaluation. This capability means XQueries can be used against an XML source with limited knowledge of its documents' precise structures. Our customers are strict enough about their content that this should not be an issue for us.

Contact Information:
Peter Laird
Quark, Inc.
plaird@quark.com