SAP

SAP User Experience Report on XML Schema

This paper is SAP’s User Experience Report on XML Schema. It is submitted to the W3C as input to the “W3C Workshop on XML Schema 1.0 User Experiences” being held on June 21-22, 2005.

This paper has been prepared by: Ümit Yalçınalp, David Burdett and Gunther Stuhec, Platform EcoSystem Industry Standards, SAP.

May 20th, 2005.

  1. Introduction
  2. XML Schema Experiences
    1. Usability Perspective
      1. Schema Profiles
      2. Extending Schema definitions
      3. Occurrences
      4. Schema Versioning
      5. Infrequently used Schema Features
      6. Global vs. Local Element Declaration
    2. Implementation Perspective
    3. Language Binding Perspective
      1. XML Schema Constructs and Programming Language Constructs
  3. Conclusion and Recommendations
    1. Schema Profiles
    2. Schema Versioning
    3. Extensions
  4. Acknowledgments
  5. References

Introduction

SAP is the leading provider of business applications for Global 2000 companies.

The representation and processing of XML documents plays a key role in the development and operation of SAP’s business applications and in enabling business and application integration as part of SAP’s Enterprise Services Architecture. ESA is SAP’s blueprint for an enterprise-level Service Oriented Architecture.

The SAP NetWeaver application platform on which ESA is built provides SAP’s environment for: writing business applications, process integration, business integration as well as people and information integration.  Our integration server provides proxies to existing applications that utilize XML Schema for building Web Services. Our development tools, J2EE and ABAP based language environments also require access to and processing of XML Schema based documents. In addition, vertical business vocabularies are expressed using XML Schema which is crucial for business integration and application development. 

SAP has broad practical experience of using XML Schema for building business applications. This paper reflects this multidimensional experience.

XML Schema Experiences

Our experience with XML Schema is described in three different categories covering the usability, implementation and language binding aspects of XML Schema.

Usability Perspective

Schema Profiles

Development of XML business vocabularies is important for SAP.  Therefore, SAP actively participates in and is a key contributor to:

In essence the XML Naming and Design rules define a “profile” of XML Schema that reduces the complexity and variability of XML business vocabularies. The profile defines best practices as well as which features of XML Schema to use. Certain aspects of XML Schema are not permitted and/or found useful, such as redefine, extensibility with substitution groups and xs:any.

Further restrictions have only been used to restrict values by facets (for values) and extensions have only been used to add attributes to existing types. Hence, the complexities of XML Schema authoring have been reduced when building business vocabularies.

As a result, the XML Naming and Design Rules profile defines a well defined subset of XML Schema that avoids interoperability issues and promotes a well understood design pattern/approach for defining components that result in a consistent Schema designs. It provides an excellent example of how profiling is useful for the industry, for usability of XML Schema, and for providing readability and interoperability.

Extending Schema definitions

The way extension/restriction works in XML Schema is not always helpful. For example:

All the features in XML schema can only be extended in one dimension. However Core Components [CCTS] requires an extensibility of 8 dimensions. For example an existing schema definition may need to have elements or attributes added depending on any combination of the geopolitical environment, industry, product or business process in which that Schema is used.

Occurrences

Using unbounded occurrences of schema components such as sequence or choice, leads automatically to uncontrolled growth in generation of XML instances and is not very helpful for readability and interoperability.

A controlled mechanism for the definition of occurrences must be fixed in XML schema.

Schema Versioning

Versioning is a problem that has been subject to discussion by many groups that develop XML Schema components. This is a multifaceted problem faced by groups that design vocabularies that build on XML Schema such as UN/CEFACT as well as by languages that build on top of XML Schema, such as WSDL .

Efficient development and evolution of XML Schema requires tools that provide support for versioning. However inconsistent approaches to versioning have been adopted, for example:

Resolving this problem requires clear guidelines on what versions mean and how forwards or backwards compatibility is handled.

For example, the XML Naming and Design Rules activity in UN/CEFACT are currently attempting to define within the context of the profile, the semantics of versioning:

The minor versioning scheme helps establish backward and forward compatibility and usually also allows the validation of XML instances, which are generated by XML Schema with earlier minor versions.

Minor versions often work by adding values to enumerations, optional extensions and/or optional elements. However problems often occur if an XML instance created using one minor version is validated against a XML schema with a lower minor version.

We think that integration of versioning into XML Schema in general and being able to name and refer to versioned schema components requires standardization in the very near future. Otherwise, the industry will face more interoperability problems in trying to use “versioned” components or “vocabularies” in the same context without consistent success and uniform semantics.

Infrequently used Schema Features

From a general XML usability perspective within our integration platform and Web services, we have noticed that the following features of XML Schema have not been widely used:

Unions are also infrequently used although they are useful, for example, when concatenating code lists together.

SAP does not use the following features because of the performance overhead of resolving references. This can be critical when developing high-performance web services:

On the other hand, these same key/keyref/unique/xpath features are very useful for uniquely identifying components. Especially, if these components are used to represent objects in an OO-model or database tables in a relational database.

Notably, the extensibility aspects of XML Schema such as Substitution groups and extensibility with xs:any / xs:anyAttribute have not been utilized much so far. Having said that, we note that especially with Web Services, the trend is to build arbitrary extensibility into containers by design.

Global vs. Local Element Declaration

Recently, many different design patterns have emerged that describe how to design XML Schema. These design patterns have very fancy names [Costello], such as:

All these design patterns address the decoupling and cohesiveness of types and elements in different ways. Some patterns require a global element declaration and others a local element declaration. Different types of declaration lead to completely different representations of XML instances. The biggest concern here is that these XML instances can never be validated by XML Schema, where different types of element declarations are used.

For example the Universal Business Language [UBL] and UN/CEFACT cannot align their approaches for designing business documents because they use different design patterns. This means that, even though they use the same CCTS approach with similar or identical semantics, the XML instances will always be different.

Implementation Perspective

In addition to the usability category discussed above, we find the following features of XML Schema harder to implement and support:

Language Binding Perspective

XML Schema Constructs and Programming Language Constructs

SAP’s programming environment supports two distinctly different languages, ABAP and Java. Naturally, the interfaces and proxies to existing applications especially in business integration and Web Services applications require data binding between XML Schema and both these languages. This includes binding XML Schema constructs, programming language concepts and constructs, and serialization / deserialization of XML content.

We have observed that a one-to-one binding between XML Schema constructs and their equivalents in the programming languages SAP uses is hard or impossible to implement. Certain features, such as extensions for complex types and recursive structure support has been problematic for binding purposes hence their support has in general been limited.

In addition to the features that are recognized to be hard to implement that are listed above, we find that extensibility in general is hard to support with proxies to our supported languages, including xs:any and filtering content by using the processcontent attribute in general.

Data binding is a problem that leads to serialization and deserialization to be non-uniform. Namely, XML Schema generated automatically from a programming language definition can be inconsistent with programming language definitions generated automatically from those generated XML Schema definitions. This leads to both round-tripping problems when going from programming language to XML to programming language as well as non-uniform schema support between languages.

Conclusion and Recommendations

In a heterogeneous environment, usability, implementation and language binding issues impact the choice of features that are supported. This led SAP to have different levels of support for XML Schema features due to the impedance mismatch between the language constructs and XML Schema constructs or the frequency of use of certain XML Schema constructs.

Schema Profiles

XML Schema is used to define many different types of documents from 50Mb+ business documents to small messages sent to control a printer. These different environments often result in the use of different subsets of Schema features. In addition, some groups, such as UN/CEFACT have developed “profiles” consisting of the Schema features and specific design patterns that meet their own individual needs.

It is unrealistic to expect that these diverse groups will ever align the “profiles” that describe how they use Schema. Schema profiles exist now and will continue to do so.

However the lack of any formal way of identifying and defining the profile of XML Schema that is being used, means that tools cannot easily be built that facilitate the building of schema that conform to the profile or that check that an XML Schema is following a profile correctly.

A formal way of defining profiles that restricts the usage of XML Schema would improve usability especially for language bindings as well as for business vocabularies. To maximize the benefit the Schema profiling method would require the ability to define as well as identify the use of profiles that specify the limited set of XML Schema features to be used.

If XML Schema provided a native profiling mechanism, profiles that target language restrictions or application restrictions could be generated for specific user communities and language bindings.

SAP favors this direction as a catalog of profiles and their constraints could then be published and used as basis for interoperability when designing a certain class of language bindings and applications as well as business vocabularies. Further, it would help reduce interoperability issues for hard to implement and understand features of XML Schema. We envision that profiles will be created by communities or application vendors.

Schema Versioning

We also encourage the XML Schema working group to tackle the versioning problem so that this problem is solved uniformly.

Extensions

Multidimensional extensions of XML Schema must be possible based on a context driver principle as described in CCTS. The current one-dimensional extension mechanism is helpful for closed communities where everyone knows each other, but it leads to significant interoperability problems in a broader environment when multiple extensions in different contexts from different sources need to be combined in a single Schema.

Acknowledgments

The authors thank Chavdar Baikov, Vladislav Bezrukov, and Uwe Schlarb for providing valuable input to this paper.

References

[CCTS]
UN/CEFACT Core Components Technical Specification – see http://www.untmg.org/artifacts/CCTS_v2.01_2003-11-15.pdf
[NDR]
UN/CEFACT Naming and Design Rules – download from http://www.disa.org/cefact-groups/atg/downloads/index.cfm
[Costello]
Global versus Local, (A Collectively Developed Set of Schema Design Guidelines), Roger L. Costello – See http://www.xfront.com/GlobalVersusLocal.html
[UBL]
OASIS Universal Business Language Technical Committee - see http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl