XML Schema Language Experience Report

Microsoft Corporation
May 27, 2005

Background

Microsoft has been involved in defining and implementing XML schema languages since the beginning of the XML era. XML Data Reduced [XDR] was a schema language developed largely by Microsoft, and provided strongly typed XML support in the early versions of our XML-enabled products. XDR was an input to the W3C Schema working group, but MS publicly committed to support whatever the WG came up with, and has followed through by deprecating XDR and moving to W3C XML Schema Definition language (XSD) as the only supported XML schema language.

Several Microsoft products and tools support XSD:

In the core technology stack, MSXML v4 to 6 (native stack) and System.Xml (.NET Framework) fully implement the XSD 1.0 Recommendation in several APIs.
The SQLXML feature in SQL Server 2000 supports XSD with annotations for mapping XML to tables. SQL Server 2005 supports XSD both to guide “shredding“ XML into tables and to validate and type columns of the new native XML type.
Microsoft Office 2003 uses XSD both to define the built-in XML serialization formats for documents and spreadsheets and to allow the various components to handle instances of user-defined schemas.
Web services tools such as the Indigo framework use XSD to map XML elements and attributes back and forth to programming language objects.

Given the widespread use of these tools and products, it would be difficult to even begin to enumerate the usage scenarios in which their support for XSD is exploited! In short, Microsoft depends heavily on the XSD 1.0 Recommendation in many ways, and is fully committed to its success. Microsoft has also implemented the full specification in both its native and managed API libraries.

Microsoft’s position is that the XSD spec is far from perfect, but has been valuable to us and the XML industry as a whole. It could be even more valuable to us all, if vendors and OSS community invest some effort to fully implement XSD validation, perform interoperability tests, and fix ambiguities and errors in the original recommendation. This experience paper will focus on areas in which realistic user expectations have not been met, and suggest how the XML industry can address them.

Interoperability Problems Experienced

With this diverse set of products and use cases supporting XSD, it is inevitable that Microsoft and its customers would experience a range of interoperability problems that led to this conference being organized.

Rick Jelliffe [RJ05] describes the situation faced by an actual customer where incomplete support for XSD in various products has “stuffed up the ready interoperability they thought they were buying into with XML.” That sums up the problem nicely and mirrors the experience of many of our users. Following is a list of problems:

Some systems don’t handle recursive elements
Wildcards are frequently not supported (sometimes for performance reasons, e.g. in XML databases).
Some tools do not handle substitution groups
Dynamic typing using xsi:type is frequently problematic
Some tool vendors deliberately do not implement parts of the spec with which they disagree, e.g. the UPA constraint.
Tools that depend on mapping back and forth between XSD constructs and programming language objects are particularly weak in their support of the XSD spec.

Particular Frustrations

UPA Constraint

XSD requires that during validation an element can be uniquely attributed to an element declaration in a content model so as to provide deterministic and efficient implementations of validators. While this is useful from a schema processor point of view, it creates problems when generating schemas from structures that do not have similar constraints. Some XSD implementations follow the spec and others do not.

The UPA is problematic because it breaks idiomatic uses of XML such as

<xs:schema blockDefault="#all"  elementFormDefault="qualified"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.example.com/incorrect"
  <!-- THIS TYPE IS NON-DETERMINISTIC -->
  <xs:complexType name="bookType">
    <xs:sequence>
      <xs:element name="title" type="xs:string" />
      <xs:element name="author" type="xs:string" />
      <xs:element name="isbn" type="xs:string" minOccurs="0" />
      <xs:any namespace="##targetNamespace ##other" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute name="publisher" type="xs:string" />
  </xs:complexType>
</xs:schema>

The fact that such common XML vocabulary design practice is broken is further exacerbated by the fact that certain tools don't enforce this rule, which means there are a number of schemas in the wild which work with popular tools that break in conformant processors. Worse, the average user often can’t tell by simple inspection whether a content model is deterministic or not. The combination of an idiomatic feature that is for practical purposes only machine-checkable, combined with the lack of machine-checking for this violation in one of the most popular schema creation tools, does not bode well for XSD interoperability.

Derivation of Complex Types by Restriction

This feature is the most complex and buggiest feature in the spec (see the various errata) with some parts having key details missing (e.g. how does restriction apply to identity constraints?). It is also very difficult to handle in Object<->XML mapping technologies since there is simply no way to model the fact that a subtype may restrict away properties from the base type in an intuitive manner in an OO language. Whether XSD is too broad or OO too limited is debatable, but end users are frustrated by the mismatch here.

Versioning

Another major frustration has been in supporting the evolution of schemas. XSD doesn't provide any constructs that make it easy to extend schemas or document formats in a backwards compatible way. The closed content model and the existence of UPA breaks the versioning model used by formats such as HTML (processors must ignore markup they don't understand).

Nilability

xsi:nil is used by some products in a way that does not match its specification in XSD 1.0. Its actual, defined meaning is to control validation, nothing more. However, some early SOAP products reused the xsi:nil attribute as a signal that the corresponding parameter was "null". This continues to cause interoperability problems in mapping between XML structures and objects, because, while .NET will now accept either style of signaling a "null", many products will only accept the xsi:nil style.

Simple Types

Many simple types do not easily map to existing types in type systems such as SQL, CLR, and Java. Likewise they contain features that are usable for validation, but have negative consequences for type system usage (e.g., pattern facets on types). Furthermore, a simpleType defined with enumeration values cannot be extended with additional enumeration values. This is an extremely common usage scenario and a frequent source of questions / complaints.

Unordered Content Models

xs:all has not proven to be very useful in practice. For example, the limitations of xs:all is probably the #1 reason why the IETF Atom syndication format will likely not have an XSD, nor can WSDL describe the Atom Publishing Protocol.

Internal Test Suites / Best Practices Guides

Test-driven development has become widely used in the software industry in the last few years, and can help wring the remaining bugs and ambiguities out of the XSD spec and implementations. Microsoft contributed many of the tests in the current test suite and will offer our current test suite to W3C. We believe widespread use of a common and comprehensive test suite is the surest path to interoperable XSD implementations.

Microsoft has recently launched an internal Schema Best Practices effort. This is not simply a matter of identifying problematic features of XSD to avoid, since a large majority of its features have some audience that depends on them. Just as different applications need different features from the XSD specification, different application areas also appear to require somewhat different best practices guidelines. A multi-team effort is underway to determine which candidate guidelines are generally applicable, which are specific to products, and which are necessary for interoperability across domains. Automated tools to detect and help correct guideline violations are being investigated.

Conclusion / Recommendations

The best approach to interoperability is to focus on getting widespread, conformant implementation of the XSD 1.0 specification, and not on efforts to find a subset that avoids interoperability traps. This will require industry cooperation to develop and share test suites to shake out interoperability problems, and possibly further W3C work to flag errata and generally clarify the spec itself to remove ambiguities in interpretations.

We believe that the following points can be the basis for making the “ready interoperability” that XML promises much more of a practical reality than it is today:

Most of the widely-discussed “schema interoperability problems” are really schema to object mapping problems. This is a difficult challenge; implementers of web services tools in particular have addressed it in different ways, and it has taken a lot of discussions to get them to work together. This should remain out of scope for W3C, however.
The XSD 1.0 Recommendation should continue to define the basic foundation for XML document and message interoperability.
1. Infrastructure-level components such as the .NET Framework should try very hard to completely and correctly implement it.
2. W3C should focus on clarifying ambiguities and correcting actual bugs via the errata process
3. We oppose bringing the breaking changes in the XSD 1.1 proposal forward to Recommendation status. XSD has enough interoperability problems today without adding another dimension to the space.
4. Attempts to define a common subset are futile. Different applications and use cases will require a different profile of the schema spec.
5. Profiles to meet specific usage scenarios, e.g. Java to CLR RPC-style web services, may be appropriate, but we will discourage them from becoming de facto standards.
Applications need only use the subset of XSD that meets their specific needs. For example, the unique, key, and keyref features are arguably inappropriate to support in a DBMS environment that has a very different concept of identity and identity constraints. Likewise, RPC-style web services may need to focus on XSD constructs that can be reliably and interoperably mapped onto the type systems of modern programming languages.
Implementations should fail gracefully when given a legal XSD construct that is not supported or not appropriate. Falling back to some pure XML interface, e.g. a DOM-like API, is the preferred approach.
Microsoft applauds the relaunched/expanded XSD Test Collection and plans to contribute our internal test suite to that effort.