XML Schema 1.1 Part 2: Datatypes

Editor's Wording Proposal 19 September 2007

This version:
../datatypes/
Latest version:
http://www.w3.org/TR/xmlschema11-2/
Previous versions:
http://www.w3.org/TR/2006/WD-xmlschema11-2-20060217/ http://www.w3.org/TR/2006/WD-xmlschema11-2-20060116/ http://www.w3.org/TR/2005/WD-xmlschema11-2-20050224/ http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/
Editors:
David Peterson, invited expert (SGMLWorks!) <davep@iit.edu>
Paul V. Biron, Kaiser Permanente, for Health Level Seven <Paul.V.Biron@kp.org>
Ashok Malhotra, Oracle Corporation <ashokmalhotra@alum.mit.edu>

...

Status of this Document

This document is an editors' copy that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a member-only review version which will in due course become a Public Working Draft of XML Schema 1.1: Datatypes. It has no formal standing within W3C; it is here made available for review by W3C membersand the public. This version of this document was created on 19 September 2007.It reflects (unless otherwise noted elsewhere) all decisions on this document made by the Working Group through 14 September 2007. The document thus incorporates all decisions made by the Working Group to date.

It also includes the following proposals for changes to the wording of the spec:
  • A wording proposal intended to resolve issue 3228 "Lists|Unions of Lists|Unions", by adding notes after the definition of ·list· and ·union· specifying what kinds of datatypes may appear as ·item type· or among the ·member types·.
  • A wording proposal intended to discharge an obligation we accepted when accepting the proposal for conditional type assignment and resolving issue 2861 "RQ-38 Add co-constraints (coconstraints)", by allowing unions with an empty sequene of member type definitions.

For those primarily interested in the changes since version 1.0, the Changes since version 1.0 (§H) appendix, which summarizes both changes already made and also those in prospect, with links to the relevant sections of this draft, is the recommended starting point. An accompanying version of this document displays in color all changes to normative text since version 1.0; another shows changes since the previous Working Draft.

Comments on this document should be made in W3C's public installation of Bugzilla, specifying "XML Schema" as the product. Instructions can be found at http://www.w3.org/XML/2006/01/public-bugzilla. If access to Bugzilla is not feasible, please send your comments to the W3C XML Schema comments mailing list, www-xml-schema-comments@w3.org (archive) Each Bugzilla entry and email message should contain only one comment.

The end of the Last Call review period is 8 November 2007; comments received after that date will be considered if time allows, but no guarantees can be offered.

Although feedback based on any aspect of this specification is welcome, there are certain aspects of the design presented herein for which the Working Group is particularly interested in feedback. These are designated 'priority feedback' aspects of the design, and identified as such in editorial notes at appropriate points in this draft.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The goals of the XML Schema language version 1.1 are discussed in the Requirements for XML Schema 1.1 document. The authors of this document are the members of the XML Schema Working Group. Different parts of this specification have different editors.

This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures made in connection with this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification must disclose the information in accordance with section 6 of the W3C Patent Policy.

The English version of this specification is the only normative version. Information about translations of this document is available at http://www.w3.org/2003/03/Translations/byTechnology?technology=xmlschema.

The presentation of this document has been augmented to identify changes from a previous version, controlled by dg-b3228.xml, which shows the status-quo text without adornment. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.


Table of Contents

1 Introduction
    1.1 Introduction to Version 1.1
    1.2 Purpose
    1.3 Dependencies on Other Specifications
    1.4 Requirements
    1.5 Scope
    1.6 Terminology
    1.7 Constraints and Contributions
2 Datatype System
    2.1 Datatype
    2.2 Value space
    2.3 The Lexical Space and Lexical Mapping
    2.4 Datatype Distinctions
3 Built-in Datatypes and Their Definitions
    3.1 Namespace considerations
    3.2 Special Built-in Datatypes
anySimpleType · anyAtomicType
    3.3 Primitive Datatypes
string · boolean · decimal · precisionDecimal · float · double · duration · dateTime · time · date · gYearMonth · gYear · gMonthDay · gDay · gMonth · hexBinary · base64Binary · anyURI · QName · NOTATION
    3.4 Other Built-in Datatypes
normalizedString · token · language · NMTOKEN · NMTOKENS · Name · NCName · ID · IDREF · IDREFS · ENTITY · ENTITIES · integer · nonPositiveInteger · negativeInteger · long · int · short · byte · nonNegativeInteger · unsignedLong · unsignedInt · unsignedShort · unsignedByte · positiveInteger · yearMonthDuration · dayTimeDuration
4 Datatype components
    4.1 Simple Type Definition
    4.2 Fundamental Facets
    4.3 Constraining Facets
5 Conformance
    5.1 Partial Implementation of Infinite Datatypes

Appendices

A Schema for Schema Documents (Datatypes) (normative)
B DTD for Datatype Definitions (non-normative)
C Illustrative XML representations for the built-in simple type definitions
    C.1 Illustrative XML representations for the built-in primitive type definitions
    C.2 Illustrative XML representations for the built-in ordinary type definitions
D Built-up Value Spaces
    D.1 Numerical Values
    D.2 Date/time Values
E Function Definitions
    E.1 Generic Number-related Functions
    E.2 Duration-related Definitions
    E.3 Date/time-related Definitions
    E.4 Lexical and Canonical Mappings for Other Datatypes
F Datatypes and Facets
    F.1 Fundamental Facets
G Regular Expressions
    G.1 Character Classes
H Changes since version 1.0
    H.1 Datatypes and Facets
    H.2 Numerical Datatypes
    H.3 Date/time Datatypes
    H.4 Other changes
I Glossary (non-normative)
J References
    J.1 Normative
    J.2 Non-normative
K Acknowledgements (non-normative)

...

2 Datatype System

This section describes the conceptual framework behind the datatype system defined in this specification.  The framework has been influenced by the [ISO 11404] standard on language-independent datatypes as well as the datatypes for [SQL] and for programming languages such as Java.

The datatypes discussed in this specification are for the most part well known abstract concepts such as integer and date. It is not the place of this specification to thoroughly define these abstract concepts; many other publications provide excellent definitions. However, this specification will attempt to describe the abstract concepts well enough that they can be readily recognized and distinguished from other abstractions with which they may be confused.

...
...
...
...

previous sub-section 2.4 Datatype Distinctions

It is useful to categorize the datatypes defined in this specification along various dimensions, defining terms which can be used to characterize datatypes and the Simple Type Definitions which define them.

2.4.1 Atomic vs. List vs. Union Datatypes

First, we distinguish ·atomic·, ·list·, and ·union· datatypes.

For example, a single token which ·matches· Nmtoken from [XML] is in the value space of the ·atomic· datatype NMTOKEN, while a sequence of such tokens is in the value space of the ·list· datatype NMTOKENS.

2.4.1.3 Union datatypes

Union types may be defined in either of two ways. When a union type is ·constructed· by ·union·, its ·value space·, ·lexical space·, and ·lexical mapping· are the "ordered unions" of the ·value spaces·, ·lexical spaces·, and ·lexical mappings· of its ·member types·. When a union type is defined by ·restricting· another ·union·, its ·value space·, ·lexical space·, and ·lexical mapping· are subsets of the ·value spaces·, ·lexical spaces·, and ·lexical mappings· of its ·base type·. ·Union· datatypes are always ·constructed· from other datatypes; they are never ·primitive·. Currently, there are no ·built-in· ·union· datatypes.

Example
A prototypical example of a ·union· type is the maxOccurs attribute on the element element in XML Schema itself: it is a union of nonNegativeInteger and an enumeration with the single member, the string "unbounded", as shown below.
  <attributeGroup name="occurs">
    <attribute name="minOccurs" type="nonNegativeInteger"
        use="optional" default="1"/>
    <attribute name="maxOccurs"use="optional" default="1">
      <simpleType>
        <union>
          <simpleType>
            <restriction base='nonNegativeInteger'/>
          </simpleType>
          <simpleType>
            <restriction base='string'>
              <enumeration value='unbounded'/>
            </restriction>
          </simpleType>
        </union>
      </simpleType>
    </attribute>
  </attributeGroup>

Any number (greater than 0)(zero or more) of ordinary or ·primitive· ·datatypes· can participate in a ·union· type.

[Definition:]   The datatypes that participate in the definition of a ·union· datatype are known as the member types of that ·union· datatype.

[Definition:]  The transitive membership of a ·union· is the set of its own ·member types·, and the ·member types· of its members, and so on. More formally, if U is a ·union·, then (a) its ·member types· are in the transitive membership of U, and (b) for any datatypes T1 and T2, if T1 is in the transitive membership of U and T2 is one of the ·member types· of T1, then T2 is also in the transitive membership of U.

[Definition:]  Those members of the ·transitive membership· of a ·union· datatype U which are themselves not ·union· datatypes are the basic members of U.

[Definition:]  If a datatype M is in the ·transitive membership· of a ·union· datatype U, but not one of U's ·member types·, then a sequence of one or more ·union· datatypes necessarily exists, such that the first is one of the ·member types· if U, each is one of the ·member types· of its predecessor in the sequence, and M is one of the ·member types· of the last in the sequence. The ·union· datatypes in this sequence are said to intervene between M and U. When U and M are given by the context, the datatypes in the sequence are referred to as the intervening unions. When M is one of the ·member types· of U, the set of intervening unions is the empty set.

[Definition:]  In a valid instance of any ·union·, the first of its members in order which accepts the instance as valid is the active member type. [Definition:]  If the ·active member type· is itself a ·union·, one of its members will be its ·active member type·, and so on, until finally a ·basic (non-union) member· is reached. That ·basic member· is the active basic member of the union.

The order in which the ·member types· are specified in the definition (that is, in the case of datatypes defined in a schema document, the order of the <simpleType> children of the <union> element, or the order of the QNames in the memberTypes attribute) is significant. During validation, an element or attribute's value is validated against the ·member types· in the order in which they appear in the definition until a match is found.  The evaluation order can be overridden with the use of xsi:type.

Example
For example, given the definition below, the first instance of the <size> element validates correctly as an integer (§3.4.13), the second and third as string (§3.3.1).
  <xsd:element name='size'>
    <xsd:simpleType>
      <xsd:union>
        <xsd:simpleType>
          <xsd:restriction base='integer'/>
        </xsd:simpleType>
        <xsd:simpleType>
          <xsd:restriction base='string'/>
        </xsd:simpleType>
      </xsd:union>
    </xsd:simpleType>
  </xsd:element>
  <size>1</size>
  <size>large</size>
  <size xsi:type='xsd:string'>1</size>

The ·canonical mapping· of a ·union· datatype maps each value onto the ·canonical representation· of that value obtained using the ·canonical mapping· of the first ·member type· in whose value space it lies.

Note: A datatype which is ·atomic· in this specification need not be an "atomic" datatype in any programming language used to implement this specification.  Likewise, a datatype which is a ·list· in this specification need not be a "list" datatype in any programming language used to implement this specification. Furthermore, a datatype which is a ·union· in this specification need not be a "union" datatype in any programming language used to implement this specification.
...

4 Datatype components

...

next sub-section4.1 Simple Type Definition

Simple Type Definitions provide for:

  • In the case of ·primitive· datatypes, identifying a datatype with its definition in this specification.
  • In the case of ·constructed· datatypes, defining the datatype in terms of other datatypes.
  • Attaching a QName to the datatype.

4.1.1 The Simple Type Definition Schema Component

The Simple Type Definition schema component has the following properties:

{primitive type definition}
A Simple Type Definition component. With one exception, required if {variety} is atomic, otherwise must be absent. The exception is ·anyAtomicType·, whose {primitive type definition} is absent.

If not absent, must be a ·primitive· built-in definition.

{item type definition}
A Simple Type Definition component. Required if {variety} is list, otherwise must be absent.
{member type definitions}
A sequence of Simple Type Definition components.

Must not be emptyMust be present (but may be empty) if {variety} is union, otherwise must be absent.

Simple type definitions are identified by their {name} and {target namespace}.  Except for anonymous Simple Type Definitions (those with no {name}), Simple Type Definitions must be uniquely identified within a schema. Within a valid schema, each Simple Type Definition uniquely determines one datatype. The ·value space·, ·lexical space·, ·lexical mapping·, etc., of a Simple Type Definition are the ·value space·, ·lexical space·, etc., of the datatype uniquely determined (or "defined") by that Simple Type Definition.

If {variety} is ·atomic· then the ·value space· of the datatype defined will be a subset of the ·value space· of {base type definition} (which is a subset of the ·value space· of {primitive type definition}). If {variety} is ·list· then the ·value space· of the datatype defined will be the set of finite-length sequences of values from the ·value space· of {item type definition}. If {variety} is ·union· then the ·value space· of the datatype defined will be a subset (possibly an improper subset) of the union of the ·value spaces· of each Simple Type Definition in {member type definitions}.

If {variety} is ·atomic· then the {variety} of {base type definition} must be ·atomic·, unless the {base type definition} is anySimpleType. If {variety} is ·list· then the {variety} of {item type definition} must be either ·atomic· or ·union·, and if {item type definition} is ·union· then all its ·basic members· must be ·atomic·. If {variety} is ·union· then {member type definitions} must be a list of Simple Type Definitions.

The {facets} property determines the ·value space· and ·lexical space· of the datatype being defined by imposing constraints which must be satisfied by values and ·lexical representations·.

The {fundamental facets} property provides some basic information about the datatype being defined: its cardinality, whether an ordering is defined for it by this specification, whether it has upper and lower bounds, and whether it is numeric.

If {final} is the empty set then the type can be used in deriving other types; the explicit values restriction, list and union prevent further derivations of Simple Type Definitions by ·facet-based restriction·, ·list· and ·union· respectively; the explicit value extension prevents any derivation of Complex Type Definitions by extension.

The {context} property is only relevant for anonymous type definitions, for which its value is the component in which this type definition appears as the value of a property, e.g. {item type definition} or {base type definition}.

...