XML Schema User Experience Report from Sun

Part 1. Experience with XML Schema Specs

Kohsuke Kawaguchi (kohsuke.kawaguchi@sun.com)

I have been working on “Java Architecture for XML Binding” (aka JAXB) for a few years now. I am the lead engineer of its reference implementation. I am also a member of its expert group, which generally designs the technology. We had our fair share of issues with XML Schema, and I'd like to talk about those in this report.

Issue 1: Failing to import is an error!

The XML Schema spec says explicitly that it is not an error for <xs:import> to fail (See Schema Representation Constraint: Import Constraints and Semantics). Because of this, some XML Schema implementations (in particular Apache Xerces) does not report an error if:

@schemaLocation contains a typo and fails to point to a proper file,
@schemaLocation points to a schema file but the file contains a typo and not well-formed, or
@schemaLocation points to http://schemas.xmlsoap.org/soap/envelope/ but the proxy configuration is wrong and the resource couldn't be retrieved.

Most likely, the user will receive an error like “element foo is not defined” which points to a location where he/she references a component defined in the imported schema.

These are very common operator errors, and even experienced developers get very confused because nothing in the error message indicates that the <xs:import> failed. More than a few people blamed our schema compiler for this reason.

I have yet to come across the case where failing to resolve <xs:import> is not meant to be an error. This design of XML Schema is causing a lot of grief among developers.

Issue 2: Anonymous type cannot be always used

In most of the places you can reference a named type, you can use an anonymous type. For example,

<xs:attribute name="foo" type="xs:string"/>

… and

<xs:attribute name="foo">
  <xs:simpleType> … </xs:simpleType>
<xs:attribute>

… are both allowed. However, there is one place where you cannot use an anonymous type, which is when you are defining a complex type with a simple content by extension; the simple type to be extended has to be always a named type.

This lack of consistency hurts JAXB. When we try to map an user-written class to XML Schema, sometimes we have to give a meaningless name to a simple type.

Issue 3: UPA violations often go undetected

Some tools fail to detect the UPA constraint violations (in particular Altova XML Spy.) I even saw a consortium produced a “standard” schema that contains UPA violations, but it happens more often with schemas written by smaller entities.

When people run these broken schemas against our schema compiler, we reject it as an error, which only make them think that ours is broken. This trouble-shooting can get quite complicated if it involves in a type hierarchy, substitution groups, and/or wildcards.

Issue 4: What are root elements?

XML Schema does not provide a way of marking the possible root elements. Because of this, given the following schema, JAXB needs to assume that the “name” element might be a root element.

<xs:complexType name="Address">
  <xs:sequence>
    <xs:element ref="name" />
    <xs:element ref="street" />
    <xs:element ref="zipCode" />
  </xs:sequence>
</xs:complexType>
<xs:element name="name" type="xs:string" />
<xs:element name="street" type="xs:string" />
<xs:element name="zipCode" type="xs:integer"/>

This prevents us from generating the following class and calling it a day.

class Address {
  String name;
  String street;
  BigInteger zipCode;
}

It needs to generate more code for name, street, and zip code, and it complicates the Address class unnecessarily.

Many schemas are written in this style (Partly because it's recommended in XML Schemas: Best Practices), and this makes the generated code less than optimal. This is one example of “schema allowing more than what's intended”

Issue 5: Unused element substitution capability

Another example of “schema allowing more than what's intended” is the element substitution capability.

JAXB would like to allow schemas to be compiled separately and used together at the runtime. That is, if schema X refers to Y, one person can compile Y, and another person can compile X (while referencing Y), then they can put the generated code together to run.

This is challenging for many reasons, but one of the challenge is the fact that XML Schema allows element substitution by default. Suppose Y contains a following fragment:

<xs:complexType name="Address">
  <xs:sequence>
    <xs:element ref="name" />
    <xs:element ref="street" />
    <xs:element ref="zipCode" />
  </xs:sequence>
</xs:complexType>
<xs:element name="name" type="xs:string" />
<xs:element name="street" type="xs:string" />
<xs:element name="zipCode" type="xs:integer"/>

When presented this schema, JAXB needs to consider the theoretical possibility of the “name” element being substituted by another element in X.

Schema can prohibit element substitutions, but most of the schemas don't bother to set that flag. So the end result is that many schemas allow element substitutions even though they are not intended. I believe the developer community would have been better served if the element substitution is opt-in, not opt-out.

This makes it difficult for JAXB to just generate this:

class Address {
  String name;
  String street;
  BigInteger zipCode;
}

Issue 6: Unused type substitution capability

Yet another example of “schema allowing more than what's intended” is the type substitution capability.

XML Schema allows every type reference to be substitutable by default. For example, consider the following fragment taken from UBL:

<xsd:complexType name="TextType">
  <xsd:simpleContent>
    <xsd:extension base="xsd:string">
      <xsd:attribute name="languageID" … />
      <xsd:attribute name="languageLocaleID" … />
    </xsd:extension>
  </xsd:simpleContent>
</xsd:complexType>
…
<xsd:element name="Name" type="xsd:string"/>

because the “TextType” derives from “string”,XML Schema considers the following document valid:

<Name xsi:type="TextType" languageID="…">Kohsuke</Name>

This happens very often in many schemas, because the only way to define an element with text and attributes is to define an complex type like this. While type substitutions can be explicitly turned off (by the using the block attribute), many schemas don't bother to prohibit it, even if this substitution is not intended by the schema author.

The net result is that the schema allows unintended type substitutions.

When JAXB is presented with this schema, it has two choices;

Assume that “string” isn't going to be substituted by “TextType”, and generate the value of the name element as java.lang.String
Assume that the substitution might be possible, and generate the type as java.lang.Object (which is the GCD of java.lang.String and the TextType class)

The former runs a risk of not being able to handle some valid documents. The latter is less usable.

Again I believe that the community would have been better served if it's opt-in, not opt-out.

Part 2. Experience with the XML Schema Test Suite

Leonid Arbouzov (leonid.arbouzov@sun.com)

I have been working on Java conformance test suites for Java SE, JAXB and JAXP (“Java API for XML Processing”). The W3C XML Schema test suite was a great help for us. It saved us resources for test development and helped to identify and fix problems in XML Schema implementations. Over 10000 W3C XML Schema tests are included into Java conformance test suites and all Java implementations are required to pass every single test. This helps to improve conformance and compatibility of XML Schema support on Java platform. At the same time it is necessary for tests are of high quality. Working with the W3C XML Schema test suite we have discovered a number issues with tests described below.

Issue 7: Some tests contradict XML Schema specification.

We've discovered 383 tests (of total ~10000) that seem to contradict the XML Schema specification. This includes 120 tests in NISTTEST subsuite and 263 tests in MSXDSTEST subsuite. Developers can implement wrong semantics if they try to make their implementation pass all W3C XML Schema tests. It is not easy for them to find out which tests are valid and which are not. This may lead to incompatible implementations.

Issue 8: Test coverage is not great and is not easy to improve.

Even though the W3C XML Schema test suite contains more than 10000 tests the coverage those tests provide is not clear. We tried to do sample estimation of assertion coverage and result was 40-50% of assertion coverage. It means there are parts of the XML Schema specification which are not tested. If implementation behaviors differ in those places, this may pass unnoticed and lead to incompatibilities. Even if somebody wanted to improve test coverage, it wouldn't be easy to identify portions of specification that require additional tests.

Issue 9: Test suite is not updated regularly

Errata to specs are published regularly. Some of them invalidate tests. However, the test suite is not always updated accordingly, and therefore becomes out of date. It is not easy for users to find out which level or errata the test suite corresponds to. As a result, implementers do not get proper guidance and may make errors in their implementations that may lead to incompatibilities.

Issue 10: No effective tests appeal process in place

There should be a way for the test suite users to challenge tests and get quick response. Test suite issues do not always get proper attention from the XML Schema group.

Recommendations

to provide coverage information (where coverage is insufficient and should be improved)
to improve test coverage
to publish information on invalid tests
to update test suite according to published errata
to setup a process for a quick resolution of test issues