slanted W3C logo
Cover page images (keys)

Communicating Query

Where XQuery Fits and How to Deploy It

Liam Quin

Images from www.fromoldbooks.org used by permission. Photographs by Liam Quin.

Introduction

This talk is for people who have to choose whether to use XQuery, or which implementation(s) to use, and for people who have to help others make those decisions.

If you are implementing or marketing an XQuery engine yourself, you might also find it helpful.

Communicating:

Shared understanding of what is available and how it works;

Shared vocabulary to describe what is available;

Shared expectations about what you can achieve.

Shared Understanding

There are at least fifty different XQuery implementations. How do they work? What sets one apart from another? How can you describe them quickly and succinctly?

It's not enough to know them all (and not necessary either) if we divide them into categories.

In this talk I will name names but you must choose the most appropriate software for your task.

Shared Vocabulary

Terminology can confuse and obfuscate. What we need is an ontology to simplify and clarify.

I do not (yet) have a formal ontology; maybe someone would like to help me make a Topic Map for this?

For now, I hope you find what I have already done to be of use. Plus, using jargon can make you sound like a more expensive consultant.

Ongoing work will appear on the XQuery Home Page: www.w3.org/XML/Query/

Shared Expectations

What will you achieve by using XML Query? What will you achieve by using a particular implementation?

Who will see the benefits? And who will do the work?

Our vocabulary may help people to talk about this early on in a project, and to have a clearer understanding of what they might achieve.

Primary Axes

Business Model

Access

Primary Purpose

Storage Strategy

Business Model

Supported Closed Source (many products)

Unsupported Closed Source (avoid these people)

Open Source with commercial version e.g. with more features (e.g. Saxon, Qizx)

Open source with informal or third-party support (e.g. Galax)

Research or personal project

Demo Versions are not the same as Free or Open Source!

Busines Model: surrounding issues

Moving between implementations is facilitated by having a standard language. So choice of vendor can be less critical as long as you avoid using proprietary features or extensions.

Conformance is what lets you migrate. Does the implementation run the XQuery Use Cases unchanged? Has the vendor submitted test results? Do they have a formal conformance statement?

Moving between programming languages and operating systems can also be easy as long as you try to avoid writing language-native functions, or, if you do, put them in a module that has a clearly documented interface and try to make it platform-neutral.

Moving between vendors often involves some money and some acrimony.

Access

How do you need to access the query engine? Most implementations support multiple access methods, but many have a primary method they support, and others may not be as well supported.

Access

Command-line (Galax, Saxon, Qizx)

Servlet (Qizx)

embedded in another language (SQL: DB2, Oracle, MSSQL)

API/Library (BSD dbxml)

Web service or server (Sedna, MarkLogic, DataDirect)

GUI (e.g. StylusStudio, oXygen)

Primary Purpose

Software can be very general-purpose or it can be very specialised. If you have particular needs, such as a low memory footprint, use the right tool; on the other hand, an implementation intended for streaming data on a mobile phone might have compromised on optimising joins or might not support external modules.

Primary Purpose

Mobile/embedded (e.g. MXquery)

Streaming (e.g. BEA AquaLogic)

Database query language (many products)

General query language (e.g. Sherlock)

Middleware (e.g. DataDirect)

Web applications (many products)

general purpose (many products)

development and debugging (StylusStudio, oXygen) large collections (MarkLogic, Oracle, DB2)

Storage Strategy

On top of SQL (e.g. XQuark)

Alongside SQL (e.g. DB2, Oracle, MonetDB/XQuery)

XML-native Database (e.g. eXist, MarkLogic)

Other XDM (e.g. Sherlock)

Files (e.g. Saxon)

Storage Surrounding Issues 1

Many implementations can read external (unindexed) XML documents, either locally or via HTTP. Many can also use ODBC or JDBC to run SQL and present the result as if it were XML.

You might need to run an external indexing program if documents change.

For large documents, e.g. 100MBytes, or for large collections (e.g. terabytes) indexing is very important. Make sure performance constraints are part of any contract. In some cases, the full text facility can give fastest results.

Features

Now we have talked about context, about how you get at the XQuery engine and where it lives. Next we talk a little about what it does, about specific features.

Static Typing and Schema Support

Static typing is XPath 2’s name for strong typing, that is, using restrictions on values to help detect errors even before a program is run.

Full schema support in XQuery lets you define your own types using an external XSD.

Implementations can have any combination of these two features!

Combinations:

typing + schema, E.g. Galax, MarkLogic

typing, no schema, E.g. MSSQL

no typing + schema, E.g. SaxonSA

no typing no schema, E.g. Saxon, Qizx

A Common Trap

Don't get locked into open systems [IBM]

If you write native functions, or if you use implementation-specific features, hide them in a module with its own namespace so you know exactly what you might have to reimplement.

A Common Trap

for $a in jdbc:get_rows("circus.performers")
      where ok_together($a/sql_row[4], $candidate/sock_colour)

for $a in circus:getperformers()
      where circus:oktogether($a, $candidate)

More Features

native functions (and Web services too)

XSLT natively or via Saxon

official extensions e.g. full text, updates

proprietary extensions

Who does what?

When a Web application is written with XQuery, the actual results might be delivered using XSLT. This means more XML people working on the project, but fewer SQL or Java people.

Summary

XQuery implementations can be categorised usefully

You can often move between implementations. The standardised language helps to encourage you to experiment!

Use modules and namespaces to protect yourself

Consider staffing and skills

Very high performance is possible, but might not be easy or cheap.

Questions and Ending