Simple types for XML, or how to untie one's specification from an unneeded XML 1.0 dependency

Draft $Date: 2005/08/19 10:06:46 $ 2005

This version:
http://www.w3.org/2005/08/17-xml-simp-types
Previous version:
http://www.w3.org/2005/08/xml-simp-types (W3C Team restricted)
Editor:
Hugo Haas, W3C <hugo@w3.org>

Abstract

This document defines a set of simple types to describe abstract content, e.g. in an XML Information Set [XML Information Set] or in an abstract model (e.g. WSDL 2.0's component model [WSDL 2.0 Core Language]).

The types are defined in order to be largely indepedent of the version of XML used when serializing the abstract content as an XML document.

Status of this Document

This document is an editors' copy that has no official standing.

This document is a draft intended to be reviewed by the Web Services Description Working Group for possible publication as a Working Group Note and has no formal status.

The author feels that it contains useful information that is worth publishing for the benefit of others.

Table of Contents

1 Introduction
    1.1 Notational Conventions
2 Background
3 Definition of the Simple Types
    3.1 string Type
    3.2 Token Type
    3.3 NCName Type
    3.4 anyURI Type
    3.5 QName Type
    3.6 boolean Type
    3.7 int Type
    3.8 unsignedLong Type
    3.9 anyType Type
4 Serialization as various versions of XML
    4.1 Serialization as XML 1.0 and Relationship with XML Schema 1.0 Datatypes
    4.2 Example of the serialization as different versions of XML of a simple type: stype:NCName
        4.2.1 XML 1.0 serialization of an stype:NCName
        4.2.2 XML 1.1 serialization of an stype:NCName
5 Using the simple types
6 Interoperability considerations
7 References
    7.1 Normative References
    7.2 Informative References
8 Acknowledgements


1 Introduction

This document defines a set of simple types commonly used in Web services specifications. They are defined independently of any version of XML.

This document is an example of how to allow specifications to be abstracted from a particular version of XML, in particular XML 1.0. Other types MAY be added to this document depending on the feedback received.

1.1 Notational Conventions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [IETF RFC 2119].

This specification uses a number of namespace prefixes throughout, listed in the table below. Note that the choice of any namespace prefix is arbitrary and not semantically significant (see [XML Information Set]).

Prefixes and Namespaces used in this specification
PrefixNamespaceNotes
stype "http://www.w3.org/2005/08/17-xml-simp-types" Defined in this document.
xs "http://www.w3.org/2001/XMLSchema" Defined in the W3C XML Schema specification [XML Schema: Structures], [XML Schema: Datatypes].

2 Background

The use of XML Schema 1.0 datatypes [XML Schema: Datatypes] to define properties in a specification mandates an XML 1.0 [XML 1.0] serialization and prevents an XML 1.1 [XML 1.1], because the definitions of datatypes in [XML Schema: Datatypes] depend on XML 1.0 productions [XML 1.0].

This unfortunate side-effect of XML Schema datatypes is unnecessarilly certain specifications to be compatible with XML version 1.1, and probably other versions of XML that the community may come up with in the future.

A previous Working Draft of WSDL 2.0 defined simple types independent of a particular version of XML to free itself from an unnecessary dependency from XML 1.0, making the XML Schema defined with [XML Schema: Structures] for WSDL 2.0 normative only for XML 1.0 serialization.

However, the Working Group later took the decision that this additional layer of abstraction was too complex and decided to go back to defining its properties with XML Schema datatypes.

This document captures the method which was used in this 2004-08-03 Working Draft of WSDL 2.0, explaining the objectives it was trying to reach, as it is believed that this technique to write specifications independent of a particular version of XML has merit.

3 Definition of the Simple Types

This specification provides its own definition of those types, patterned after [XML Schema: Datatypes] but independent of it. This allows processors to accept descriptions serialized using a mechanism that is not compatible with [XML Schema: Datatypes], such as XML 1.1 [XML 1.1].

All types defined in this section are formally assigned to the "http://www.w3.org/2005/08/17-xml-simp-types" namespace. All references to them in this specification are made via qualified names that use the stype prefix. It should be noted though that there is no schema (in the sense of [XML Schema: Structures]) for that namespace, because the types defined here go beyond the capabilities of XML Schema to describe.

All types listed above are such that their value spaces are a superset of the value space of the type with the same name defined by XML Schema [XML Schema: Datatypes]. In particular, the value space of the stype:string type is a strict superset of the value space of xsd:string, as shown by the one-character string consisting exclusively of the #x0 character.

Note:

The small list of types provided here is believed to cover list the WSDL 2.0 [WSDL 2.0 Core Language] and WS-Addressing 1.0 [WS-Addressing 1.0 - Core],[WS-Addressing 1.0 - SOAP Binding]. Other simple types may be defined.

3.1 string Type

The value space of the stype:string type consists of finite-length sequences of characters in the range #x0-#x10FFFF inclusive, where a character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] and Unicode [Unicode].

3.2 Token Type

The value space of the stype:Token type is the subset of the value space of the stype:string type consisting of strings that do not contain the line feed (#xA), tab (#x9) characters, that have no leading or trailing spaces (#x20) and that have no internal sequences of two or more spaces.

3.3 NCName Type

The value space of the stype:NCName type is the subset of the value space of the stype:Token type consisting of tokens that do not contain the space (#x20) and ':' characters.

3.4 anyURI Type

The value space of the stype:anyURI type consists of all International Resource Identifiers (IRI) as defined by [IETF RFC 3987].

3.5 QName Type

The value space of the stype:QName type consists of the set of 2-tuples whose first component is of type stype:anyURI and whose second component is of type stype:NCName.

3.6 boolean Type

The value space of the stype:boolean type consists of the two distinct values true and false.

An instance of a datatype that is defined as boolean can have the following legal literals {true, false, 1, 0}.

3.7 int Type

The value space of the stype:int type consists of the infinite set {…,-2,-1,0,1,2,…} representing the standard mathematical concept of the integer numbers.

An instance of a datatype that is defined as int has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) with an optional leading sign ("-" or "+"). If the sign is omitted, "+" is assumed.

3.8 unsignedLong Type

The value space of the stype:unsignedLong type consists of the set {0,1,2,…,18446744073709551615} of integer numbers.

unsignedLong has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39).

3.9 anyType Type

Any combination of element, processing instruction, unexpanded entity reference, character, and comment information items as defined by [XML Information Set].

4 Serialization as various versions of XML

When serializing as other versions of XML, such as XML 1.0 [XML 1.0] or XML 1.1 [XML 1.1], the set of characters allowed by the simple types defined in section 3 Definition of the Simple Types are restricted to the ones allowed by those versions of XML.

4.1 Serialization as XML 1.0 and Relationship with XML Schema 1.0 Datatypes

When serializing the information to XML 1.0 [XML 1.0], the simple types defined in section 3 Definition of the Simple Types map naturally to well-known datatypes defined in [XML Schema: Datatypes] which add additional constraints to the content serialized:

Simple type Corresponding schema type for an XML 1.0 serialization
stype:stringxs:string
stype:Tokenxs:Token
stype:NCNamexs:NCName
stype:anyURIxs:anyURI
stype:QNamexs:QName
stype:booleanxs:boolean
stype:intxs:int
stype:unsignedLongxs:unsignedLong
stype:anyTypexs:anyType

4.2 Example of the serialization as different versions of XML of a simple type: stype:NCName

Let's consider when stype:NCName may be serialized as XML 1.0 and as XML 1.1 as an example.

4.2.1 XML 1.0 serialization of an stype:NCName

An stype:NCName MAY be serialized in an XML 1.0 document if it is only composed of the characters allowed by XML 1.0, i.e. matching the NCName production from the Namespaces in XML specification [Namespaces in XML].

4.2.2 XML 1.1 serialization of an stype:NCName

An stype:NCName MAY be serialized in an XML 1.0 document if it is only composed of the characters allowed by XML 1.1, i.e. matching the NCName production from the Namespaces in XML 1.1 specification [Namespaces in XML 1.1].

5 Using the simple types

Typically, a specification with a dependency on XML 1.0 [XML 1.0] will have defined its content using types from XML Schema 1.0 Part 2 [XML Schema: Datatypes], and provided a normative XML 1.0 schema [XML Schema: Structures].

In order to allow XML versioning independence, types defined by this specification SHOULD be used. The XML 1.0 schema defined SHOULD be declared normative for XML 1.0 serializations only.

Note:

This document having gone through the W3C Recommendation Track Process and therefore not having received a wide review, a normative reference to this document is difficult.

6 Interoperability considerations

Conformance to a specification defined independent of any version of XML does NOT require to accept documents using all existing versions of XML, unless specifically called out.

Conformance is considered for processing documents using the XML version supported by the implementation.

7 References

7.1 Normative References

Character Model for the WWW
Character Model for the World Wide Web 1.0: Fundamentals, M. Dürst, F. Yergeau, R. Ishida, M. Wolf, T. Texin, Editors. W3C Recommendation, 15 February 2005. Latest version available at http://www.w3.org/TR/charmod/. (See http://www.w3.org/TR/2005/REC-charmod-20050215/.)
IETF RFC 2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Author. Internet Engineering Task Force, June 1999. Available at http://www.ietf.org/rfc/rfc2119.txt. (See http://www.ietf.org/rfc/rfc2119.txt.)
IETF RFC 3987
Internationalized Resource Identifiers (IRIs), M. Duerst, M. Suignard, Authors. Internet Engineering Task Force, January 2005. Available at http://www.ietf.org/rfc/rfc3987.txt. (See http://www.ietf.org/rfc/rfc3987.txt.)
ISO/IEC 10646
ISO/IEC 10646-1:2003. Information technology -- Universal Multiple-Octet Coded Character Set (UCS), as, from time to time, amended, replaced by a new edition or expanded by the addition of new parts. See http://www.iso.org/iso/en/ISOOnline.openerpage for the latest version. (See http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=39921.)
Unicode
The Unicode Standard, Version 4.1, The Unicode Consortium, ISBN 0-321-18578-1, as it may from time to time be revised or amended. See http://www.unicode.org/unicode/standard/versions for the latest version and additional information on versions of the standard and of the Unicode Character Database. (See http://www.unicode.org/versions/Unicode4.1.0/.)
XML 1.0
Extensible Markup Language (XML) 1.0 (Third Edition), T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, Editors. World Wide Web Consortium, 4 February 2004. This version of the XML 1.0 Recommendation is http://www.w3.org/TR/2004/REC-xml-20040204/. The latest version of "Extensible Markup Language (XML) 1.0" is available at http://www.w3.org/TR/REC-xml. (See http://www.w3.org/TR/2004/REC-xml-20040204/.)
XML 1.1
Extensible Markup Language (XML) 1.1 , T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, Francois Yergau, and John Cowan, Editors. World Wide Web Consortium, 04 February 2004, edited in place 15 April 2004. This version of the XML 1.1 Recommendation is http://www.w3.org/TR/2004/REC-xml-20040204. The latest version of XML 1.1 is available at http://www.w3.org/TR/xml11. (See http://www.w3.org/TR/2004/REC-xml11-20040204/.)
XML Information Set
XML Information Set (Second Edition), J. Cowan and R. Tobin, Editors. World Wide Web Consortium, 4 February 2004. This version of the XML Information Set Recommendation is http://www.w3.org/TR/2004/REC-xml-infoset-20040204. The latest version of XML Information Set is available at http://www.w3.org/TR/xml-infoset. (See http://www.w3.org/TR/2004/REC-xml-infoset-20040204.)
XML Schema: Structures
XML Schema Part 1: Structures, H. Thompson, D. Beech, M. Maloney, and N. Mendelsohn, Editors. World Wide Web Consortium, 28 October 2004. This version of the XML Schema Part 1 Recommendation is http://www.w3.org/TR/2004/REC-xmlschema-1-20041028. The latest version of XML Schema Part 1 is available at http://www.w3.org/TR/xmlschema-1. (See http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/.)
XML Schema: Datatypes
XML Schema Part 2: Datatypes, P. Byron and A. Malhotra, Editors. World Wide Web Consortium, 28 October 2004. This version of the XML Schema Part 2 Recommendation is http://www.w3.org/TR/2004/REC-xmlschema-2-20041028. The latest version of XML Schema Part 2 is available at http://www.w3.org/TR/xmlschema-2. (See http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/.)

7.2 Informative References

WSDL 2.0 Core Language
Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language, R. Chinnici, M. Gudgin, J-J. Moreau, S. Weerawarana, Editors. World Wide Web Consortium, 3 August 2005. This version of the "Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language" specification is available at http://www.w3.org/TR/2005/WD-wsdl20-20050803. The latest version of "Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language" is available at http://www.w3.org/TR/wsdl20. (See http://www.w3.org/TR/2005/WD-wsdl20-20050803.)
WS-Addressing 1.0 - Core
Web Services Addressing 1.0 - Core, M. Gudgin, M. Hadley, Editors. World Wide Web Consortium, 17 August 2005. This version of the "Web Services Addressing 1.0 - Core" Specification is available at http://www.w3.org/TR/2005/WD-wsa-core-20050817. The latest version of "Web Services Addressing 1.0 - Core" is available at http://www.w3.org/TR/wsa-core. (See http://www.w3.org/TR/2005/WD-wsa-core-20050817.)
WS-Addressing 1.0 - SOAP Binding
Web Services Addressing 1.0 - SOAP Binding, M. Gudgin, M. Hadley, Editors. World Wide Web Consortium, 17 August 2005. This version of the "Web Services Addressing 1.0 - SOAP Binding" Specification is available at http://www.w3.org/TR/2005/WD-wsa-soap-20050817. The latest version of "Web Services Addressing 1.0 - SOAP Binding" is available at http://www.w3.org/TR/wsa-soap. (See http://www.w3.org/TR/2005/WD-wsa-soap-20050817.)
Namespaces in XML
Namespaces in XML, T. Bray, D. Hollander, and A. Layman, Editors. World Wide Web Consortium, 14 January 1999. This version of the Namespaces in XML Recommendation is http://www.w3.org/TR/1999/REC-xml-names-19990114. The latest version of Namespaces in XML is available at http://www.w3.org/TR/REC-xml-names. (See http://www.w3.org/TR/1999/REC-xml-names-19990114.)
Namespaces in XML 1.1
Namespaces in XML 1.1, T. Bray, D. Hollander, A. Layman, R.Tobin, Editors. World Wide Web Consortium, 14 January 1999. This version of the Namespaces in XML Recommendation is http://www.w3.org/TR/2004/REC-xml-names11-20040204. The latest version of Namespaces in XML 1.1 is available at http://www.w3.org/TR/xml-names11. (See http://www.w3.org/TR/2004/REC-xml-names11-20040204.)

8 Acknowledgements

The original idea for defining types independent of a version of XML was proposed by Jonathan Marsh.

The core content of this document is extracted from the this 2004-08-03 Working Draft of WSDL 2.0 Part 1. The editors of this specifications were:

Commenters on this part of the WSDL 2.0 specification are acknowledged, as well as Richard Ishida and Felix Sasaki for their feedback.