
Images from www.fromoldbooks.org used by permission. Photographs by Liam Quin.
This talk is for people who have to choose whether to use XQuery, or which implementation(s) to use, and for people who have to help others make those decisions.
If you are implementing or marketing an XQuery engine yourself, you might also find it helpful.
Shared understanding of what is available and how it works;
Shared vocabulary to describe what is available;
Shared expectations about what you can achieve.
There are at least fifty different XQuery implementations. How do they work? What sets one apart from another? How can you describe them quickly and succinctly?
It's not enough to know them all (and not necessary either) if we divide them into categories.
In this talk I will name names but you must choose the most appropriate software for your task.
Terminology can confuse and obfuscate. What we need is an ontology to simplify and clarify.
I do not (yet) have a formal ontology; maybe someone would like to help me make a Topic Map for this?
For now, I hope you find what I have already done to be of use. Plus, using jargon can make you sound like a more expensive consultant.
Ongoing work will appear on the XQuery Home Page: www.w3.org/XML/Query/
What will you achieve by using XML Query? What will you achieve by using a particular implementation?
Who will see the benefits? And who will do the work?
Our vocabulary may help people to talk about this early on in a project, and to have a clearer understanding of what they might achieve.
Business Model
Access
Primary Purpose
Storage Strategy
Supported Closed Source (many products)
Unsupported Closed Source (avoid these people)
Open Source with commercial version e.g. with more features (e.g. Saxon, Qizx)
Open source with informal or third-party support (e.g. Galax)
Research or personal project
Demo Versions are not the same as Free or Open Source!
Moving between implementations is facilitated by having a standard language. So choice of vendor can be less critical as long as you avoid using proprietary features or extensions.
Conformance is what lets you migrate. Does the implementation run the XQuery Use Cases unchanged? Has the vendor submitted test results? Do they have a formal conformance statement?
Moving between programming languages and operating systems can also be easy as long as you try to avoid writing language-native functions, or, if you do, put them in a module that has a clearly documented interface and try to make it platform-neutral.
Moving between vendors often involves some money and some acrimony.
How do you need to access the query engine? Most implementations support multiple access methods, but many have a primary method they support, and others may not be as well supported.
Command-line (Galax, Saxon, Qizx)
Servlet (Qizx)
embedded in another language (SQL: DB2, Oracle, MSSQL)
API/Library (BSD dbxml)
Web service or server (Sedna, MarkLogic, DataDirect)
GUI (e.g. StylusStudio, oXygen)
Software can be very general-purpose or it can be very specialised. If you have particular needs, such as a low memory footprint, use the right tool; on the other hand, an implementation intended for streaming data on a mobile phone might have compromised on optimising joins or might not support external modules.
Mobile/embedded (e.g. MXquery)
Streaming (e.g. BEA AquaLogic)
Database query language (many products)
General query language (e.g. Sherlock)
Middleware (e.g. DataDirect)
Web applications (many products)
general purpose (many products)
development and debugging (StylusStudio, oXygen) large collections (MarkLogic, Oracle, DB2)On top of SQL (e.g. XQuark)
Alongside SQL (e.g. DB2, Oracle, MonetDB/XQuery)
XML-native Database (e.g. eXist, MarkLogic)
Other XDM (e.g. Sherlock)
Files (e.g. Saxon)
Many implementations can read external (unindexed) XML documents, either locally or via HTTP. Many can also use ODBC or JDBC to run SQL and present the result as if it were XML.
You might need to run an external indexing program if documents change.
For large documents, e.g. 100MBytes, or for large collections (e.g. terabytes) indexing is very important. Make sure performance constraints are part of any contract. In some cases, the full text facility can give fastest results.
Now we have talked about context, about how you get at the XQuery engine and where it lives. Next we talk a little about what it does, about specific features.
Static typing is XPath 2’s name for strong typing, that is, using restrictions on values to help detect errors even before a program is run.
Full schema support in XQuery lets you define your own types using an external XSD.
Implementations can have any combination of these two features!
typing + schema, E.g. Galax, MarkLogic
typing, no schema, E.g. MSSQL
no typing + schema, E.g. SaxonSA
no typing no schema, E.g. Saxon, Qizx
Don't get locked into open systems [IBM]
If you write native functions, or if you use implementation-specific features, hide them in a module with its own namespace so you know exactly what you might have to reimplement.
for $a in jdbc:get_rows("circus.performers")
where ok_together($a/sql_row[4], $candidate/sock_colour)
for $a in circus:getperformers()
where circus:oktogether($a, $candidate)
native functions (and Web services too)
XSLT natively or via Saxon
official extensions e.g. full text, updates
proprietary extensions
When a Web application is written with XQuery, the actual results might be delivered using XSLT. This means more XML people working on the project, but fewer SQL or Java people.
XQuery implementations can be categorised usefully
You can often move between implementations. The standardised language helps to encourage you to experiment!
Use modules and namespaces to protect yourself
Consider staffing and skills
Very high performance is possible, but might not be easy or cheap.