Position Paper for W3C Schema Experiences Workshop

Authors:
Ashok Malhotra (Oracle Corp.) <ashok.malhotra@oracle.com>
Shih-Chang Chen (Oracle Corp.) <shih-chang.chen@oracle.com>
Ravi Murthy (Oracle Corp.) <ravi.murthy@oracle.com>
Blaise Doughan (Oracle Corp.) <blaise.doughan@oracle.com>

Table of contents

  1. Introduction
  2. Mapping XML Schema Structures to Java Classes
        2.1 Choice
        2.2 Derivation by Restriction
        2.3 Derivation Control
        2.4 Facets
        2.5 Namespace Restrictions for Wildcards
        2.6 Problems with Annotations
  3. Managing XML Schemas in the Database
        3.1 Substitution Groups
        3.2 Duration
        3.3 Redefine
        3.4 Key/Keyref
        3.5 Derivation by Restriction
        3.6 Wildcard with namespace exclusion
  4. Summary

Appendices

  1. References

1 Introduction

The primary goal of XML Schema was to provide a language to specify the structure of XML documents. We already had DTDs but these did not have an XML syntax and needed to be updated with new features such as namespaces as well as additional datatypes. There was also a need for stronger typing. But when XML Schema finally appeared ([XML Schema Part 1: Structures Second Edition], [XML Schema Part 2: Datatypes Second Edition]), it was put to a host of other uses that the creators had not anticipated. In this paper we discuss two such uses -- mappingXML documents to Java™ objects under the control of an XML Schema and using XML Schemas to control the structure of XML documents shredded into a relational database. In each case we find some Schema features difficult to map and discuss possible workarounds.

2 Mapping XML Schema Structures to Java Classes

In this section we discuss some XML Schema constructs that are difficult to map into object structures.

2.1 Choice

The [XML Schema Part 1: Structures Second Edition] construct "choice" allows variations in the structure of an element. There is no corresponding construct in Java™ although other languages allow variant types. For example:

<xsd:complexType name="PurchaseOrder">
  <xsd:sequence>
    <xsd:choice>
      <xsd:element name="BritishAddress" type="string"/>
      <xsd:element name="USAddress" type="string"/>
    </xsd:choice>
    <xsd:element name="items" type="string" minOccurs="1"/>
  </xsd:sequence>
</xsd:complexType>

The structure of the PurchaseOrder type can vary according to the options defined in the choice. There is no direct mapping of this feature into Java™, but a check can be performed at runtime. This means, however, that errors cannot be caught until runtime. An alternative would be to have no choice support and require that the choices in a schema be defined as separate types using derivation by extension.

2.2 Derivation by Restriction

In the same way, derivation by restriction cannot be mapped to a Java™ construct and must be checked at runtime.

2.3 Derivation Control

In [XML Schema Part 1: Structures Second Edition], the "fixed", "final", "finalDefault", "block" and "blockDefault" attributes can be used to control how and if derivation is allowed from a complex or simple type. For example:

<complexType name="Address" final="restriction">
  <sequence>
    <element name="name"  type="string"/>
    <element name="street" type="string"/>
  </sequence>
</complexType>

The only one of these attributes that has a direct mapping to Java™ is the "final" attribute with a value of "#all" on a complexType definition which prohibits further derivation from that type. This corresponds to the "final" modifier that can be applied to a Java™ class. The "finalDefault", "block", and "blockDefault" have no direct mapping to Java™ features but these features are very similar to what is provided with "final". We are not clear how an element with such controlling attributes can be implemented in Java™. The "fixed" feature relates to facets and, in our view, need not be supported.

2.4 Facets

Facets in [XML Schema Part 2: Datatypes Second Edition] allow datatypes to be constrained by value and by lexical form. For example:

<xsd:simpleType name="heightInInches">
  <xsd:restriction base="xsd:positiveInteger">
     <xsd:maxExclusive value="100"/>
  </xsd:restriction>
</xsd:simpleType>

This cannot be expressed as constraints on Java™ types. The only way to implement facets is to create a configurable validator that checks the value of the datatype at runtime.

2.5 Namespace Restrictions for Wildcards

[XML Schema Part 1: Structures Second Edition] allows extensible elements via xs:any to be restricted to a namespace. For example:

<complexType name="foo">
  <sequence>
    <any namespace="http://www.example.org"/>
  </sequence>
</complexType>

Again, this cannot be mapped into a Java™ feature and must be checked at runtime.

2.6 Problems with Annotations

[XML Schema Part 2: Datatypes Second Edition] allows an annotation element to be specified for most elements but is ambiguous in some cases. The source of ambiguity is related to the specification of an annotation element for a reference to a schema element using the "ref" attribute. This arises in three cases:

For example, consider the following schema fragment.

<xs:element name="Customer">
  <xs:complexType>
    <xs:element ref="Name"/>
    <xs:element ref="Address"/>
  </xs:complexType>
</xs:element>

XML Schema spec is unclear on whether an annotation element can be specified on the reference to the "Name" element and whether it takes precedence over an annotation on the element referered to. In our products, we assume that an annotation element can be specified in each of the three cases mentioned above. Furthermore, the annotation element is assumed to be associated with the abstract schema component as follows:

3 Managing XML Schemas in the Database

The Oracle database products provide facilities for managing XML Schemas in the database as well as shredding XML instances into relational tables in a Schema-controlled manner. Product documentation is available at [Using Oracle XML DB and XML Schema]. A paper with the technical highlights is available at [XML Schemas in Oracle XML DB].

Support is provided for the following tasks:

We discuss below some XML Schema features that make the implementation of the above facilities particularly challenging. Although we attempt to push as many constructs as possible down to the SQL level, There are some schema constructs that either don't map well to SQL and/or don't affect the storage. Such constraints are enforced at the XML layer (above the O-R level).

3.1 Substitution Groups

The instance data is stored in the column/table corresponding to the head element. However, the actual element QName is stored in a system (binary) column. This system column contains other pieces of information that are not mapped directly - such as namespace prefixes, comments, PIs, etc - but need to be preserved to ensure fidelity of retrieved documents and fragments. We have seen heavy use of substitution groups in XBRL schemas.

3.2 Duration

Datatypes such as duration that don't have a SQL equivalent are stored in a VARCHAR2 column. Actually, this is an option for other datatypes, such as xsd:integer, also. Though the default for xsd:integer is SQL NUMBER, users can choose to override the storage mechanism using a Schema annotation.

3.3 Redefine

Redefine is the only XML Schema feature that is not supported by XML DB. Interestingly, it should be noted that none of our customers have asked for this as yet. In contrast, we have seen customer schemas with heavy usage of every other schema feature including type derivation, substitution groups, wildcards, etc.

3.4 Key/Keyref

Key/Keyref is enforced at the XML layer and not directly captured at the SQL level. The main reason is that these express constraints at the level of a single document - whereas in most of the cases, users want to express constraint on the document collection. For example, /PurchaseOrder/@ID is unique across all documents stored in the PurchaseOrder table. It is not possible to express this in the Schema unless you define a virtual collection element, virtual documents representing a table, etc. Users are not very comfortable with this virtualization. Also, it does not make sense when the unit of store/fetch is a single purchase order.

A few other observations from our implementation experience:

3.5 Derivation by Restriction

Though derivation by extension maps cleanly to the sub-type (UNDER) construct in SQL, there is no corresponding construct for derivation by restriction. We chose to map restricted complex types to a dummy sub-type i.e. a derived SQL type with no extra fields. This provides a full-fledged SQL type corresponding to the restricted complex type, but does not enforce the restriction at the SQL level.

3.6 Wildcard with namespace exclusion

While mapping the [HTTP Extensions for Distributed Authoring -- WEBDAV] resource model to XML DB using XML Schema, we needed to define a wildcard which allows any elements not in our (Oracle) namespace - but also permits the null namespace. This is not permitted in XML Schema. If you use ##other, it automatically also excludes the null namespace. We had to work around this by adding proprietary extensions to our handling of wildcards.

4 Summary

This paper has discussed the mapping of XML Schema constructs into two other languages. Some XML Schema constructs do not map cleanly into facilities offered by other languages and we have discussed workarounds for some of them.


A References

HTTP Extensions for Distributed Authoring -- WEBDAV
HTTP Extensions for Distributed Authoring -- WEBDAV. Available at: http://webdav.org/specs/rfc2518.html
Using Oracle XML DB and XML Schema
Using Oracle XML DB and XML Schema. Available at: http://st-doc.us.oracle.com/10/101/appdev.101/b10790/xdb05sto.htm#sthref438
XML Schema Part 1: Structures Second Edition
XML Schema Part 1: Structures Second Edition, Oct 28 2004. Available at: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes Second Edition
XML Schema Part 2: Datatypes Second Edition, Oct. 28 2004. Available at: http://www.w3.org/TR/xmlschema-2/
XML Schemas in Oracle XML DB
XML Schemas in Oracle XML DB. VLDB 2003. Available at: http://www.vldb.org/conf/2003/papers/S30P02.pdf