XML Schema Part 2: Datatypes

This document specifies a language for defining datatypes to be used in XML Schemas and possibly elsewhere.

This is a W3C Working Draft for review by members of the W3C and other interested parties in the general public.

It has been reviewed by the XML Schema Working Group and the Working Group has agreed to its publication. Note that not that all sections of the draft represent the current consensus of the WG. Different sections of the specification may well command different levels of consensus in the WG. Public comments on this draft will be instrumental in the WG's deliberations.

The facilities described herein are in a preliminary state of design. The Working Group anticipates substantial changes, both in the mechanisms described herein, and in additional functions yet to be described. The present version should not be implemented except as a check on the design and to allow experimentation with alternative designs. The Schema WG will not allow early implementation to constrain its ability to make changes to this specification prior to final release.

A list of current W3C working drafts can be found at http://www.w3.org/TR. They may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

Appendices

2 Type System

This section describes the conceptual framework behind the type system defined in this specification. The framework has been influenced by the [ISO 11404] standard on language-independent datatypes as well as the datatypes for [SQL] and for programming languages such as Java.

The datatypes discussed in this specification are computer representations of well known abstract concepts such as integer and date. It is not the place of this specification to define these concepts; many other publications provide excellent definitions.

Two concepts are essential for an understanding of datatypes as they are discussed here: a value space is an abstract collection of permitted values for a datatype. Each datatype also has a space of valid lexical representations or literals, each of which denotes a single value. A single value in the value space may be denoted by several distinct valid literals.

2.1 Datatype

[Definition:] In this specification, a datatype is defined as a 3-tuple, consisting of a) a set of distinct values, called its value space, b) a set of lexical representations, called its lexical space, and c) a set of facets that characterize properties of the value space or individual values or lexical items.

2.2 Value space

A value space is a set of permitted values for a datatype. Value spaces have certain properties. For example, they always have the property of cardinality and some definition of equality and may have the concept of order by which individual values within the value space can be compared to one another.

[Definition:] A value space is the set of permitted values for a given datatype. The value space of a given datatype can be defined in one of the following ways:

enumerated outright (extensional definition)
defined axiomatically from fundamental notions (intensional definition)
defined as the subset of values from an already defined value space with a given set of properties
defined as a combination of values from some already defined value space(s) by a specific construction procedure

In addition to the value space, each datatype has a space of valid lexical representations or literals. A single value in the value space may be denoted by several valid literals. For example, "100" and "1.0E2" are two different representations for the same value. Depending on the situation, either or both of these representations might be acceptable. The type system defined in this specification provides a mechanism for schema designers to control the set of values and the corresponding set of acceptable lexical representations of those values for a datatype.

2.3 Lexical Space

In addition to its value space, each datatype also has a lexical representation space. [Definition:] The lexical space consists of a set of valid literals for a datatype. Each value in the datatype's value space is denoted by one or more literals in its lexical space. Each Primitive datatypes (§3.2) definition includes a detailed description of its lexical space.

2.4 Facets

[Definition:] A facet is a single defining aspect of a concept or an object. Generally speaking, each facet characterizes a concept or object along independent aspects or dimensions.

The facets of a datatype serve to distinguish those aspects of one datatype which differ from other datatypes. Rather than being defined solely in terms of a prose description the datatypes in this specification are defined in terms of the synthesis of facet values which together determine the value space and properties of the datatype.

Facets are of two types: fundamental facets that define the datatype and non-fundamental or constraining facets that constrain the permitted values of a datatype.

2.4.1 Fundamental facets

Datatypes are characterized by properties of their value spaces. These optional properties are discussed in this section. Each of these properties give rise to a facet that serves to characterize the datatype.

2.4.1.1 Order

[Definition:] A value space, and hence a datatype, is said to be ordered if there exists an order relation defined for that value space. Order relations have the following rules:

for every pair (a, b) from the value space, either a ≤ b or b ≤ a, or a = b;
for every triple (a, b, c) from the value space, if a ≤ b and b ≤ c, then a ≤ c.

There may exist several possible order relations for a given value space. Additionally, there may exist multiple datatypes with the same value space. In such cases, each datatype will define a different order relation on the value space.

Ed. Note: Currently, no order relations are defined on the built-in datatypes provided by this specification; additionally, there is no means to specify an order relation on user-generated datatypes. This will be addressed in a future draft.

2.4.1.2 Bounds

Some ordered value spaces, and hence some datatypes, are said to be bounded. [Definition:] A value space is bounded above if there exists a unique value U in the value space such that, for all values v in the value space, v ≤ U. The value U is said to be an upper bound of the value space. [Definition:] A value space is bounded below if there exists a unique value L in the space such that, for all values v in the value space, L ≤ v. The value L is then said to be a lower bound of the value space.

[Definition:] A datatype is bounded if its value space has both an upper and a lower bound.

2.4.1.3 Cardinality

[Definition:] Every value space has associated with it the concept of cardinality. Some value spaces are finite, some are countably infinite while still others are uncountably infinite. A datatype is said to have the cardinality of its value space. It is sometimes useful to categorize value spaces ( and hence, datatypes) as to their cardinality, there are three significant cases:

value spaces that are finite
value spaces that are countably infinite and exact (see Exact and Approximate (§2.4.1.4))
value spaces that are countably infinite and approximate (see Exact and Approximate (§2.4.1.4))

Every conceptually finite value space is necessarily exact. No computational datatype is uncountably infinite.

Ed. Note: Currently, cardinality is not specified for the built-in datatypes provided by this specification; additionally, there is no means to specify a cardinality on user-generated datatypes. This will be addressed in a future draft.

2.4.1.4 Exact and Approximate

The computational representation of a datatype may limit the degree to which values of the datatype can be distinguished. If every value in the value space of the conceptual datatype is distinguishable in the computational representation from every other value in the value space, then the datatype is said to be exact.

Certain mathematical datatypes with very large or infinite value spaces have representations which are said to be approximate in that multiple values in the conceptual value space map to single values in the value space of the representation. In this specification, all approximate datatypes have computational models which specify, via parametric values, a degree of approximation, that is, they require a certain minimum set of values of the mathematical datatype to be distinguishable in the computational datatype. Further, each value in the conceptual value space must be be capable of being represented in the representational value space within a certain distance i.e. the difference between the conceptual value and the representational value must not exceed some agreed upon value.

Ed. Note: Currently, exactness is not specified for the built-in datatypes provided by this specification; additionally, there is no means to specify a exactness for user-generated datatypes. This will be addressed in a future draft.

2.4.1.5 Numeric

A datatype is said to be numeric if its values are conceptually quantities (in some mathematical number system). A datatype whose values do not have this property is said to be non-numeric.

2.4.2 Constraining or Non-fundamental facets

Constraining facets are optional properties that can be applied to a datatype to (further) constrain its value space. Constraining the value space consequently constrains the allowed lexical representations. Adding constraining facets to a Base type (§2.5.2.1) is used in Defining Generated Datatypes (§4).

2.4.2.1 length

[Definition:] For the string (§3.2.1) datatype, length specifies the number of allowable characters in the string. For the binary (§3.2.7) datatype it specifies the length in bits. The value of the length facet must be a positive integer.

Ed. Note: We need to ultimately reconcile the notion of string length with the resolution of the i18n issues around character, indexing, etc. I18N recommends that length and maxLength be a "character count" and do not indicate storage requirements.

2.4.2.2 maximum length

[Definition:] The maxlength facet indicates the maximum length, in characters, of a string (§3.2.1) datatype for which the length (§2.4.2.1) facet is not specified. For the binary (§3.2.7) datatype it specifies the maximum length in bits if the length (§2.4.2.1) facet is not specified. The value of the maximum length facet must be a positive integer.

2.4.2.3 pattern

[Definition:] For the string datatype, the pattern facet can be used to constrain the format of allowable values using Regular Expressions (§E).

2.4.2.4 enumeration

[Definition:] Presence of an enumeration facet constrains the value space of the datatype to the specified list. The enumeration facet can be applied to to the following datatypes: string (§3.2.1), real (§3.2.3), timeInstant (§3.2.4), timeDuration (§3.2.5), recurringInstant (§3.2.6), language (§3.2.9), NMTOKEN (§3.3.1), Name (§3.3.3), NCName (§3.3.4), ENTITY (§3.3.8), NOTATION (§3.3.10), decimal (§3.3.11), integer (§3.3.12), non-negative-integer (§3.3.13), positive-integer (§3.3.14), non-positive-integer (§3.3.15), negative-integer (§3.3.16), date (§3.3.17), time (§3.3.18). No order or any other relationship is implied between the elements of the enumeration list.

2.4.2.5 minAbsoluteValue

[Definition:] The minAbsoluteValue facet specifies the minimum absolute value of the value space for generated datatypes whose basetype is real (§3.2.3). This facet (together with maxAbsoluteValue (§2.4.2.6)) can be used to generate subtypes of real (§3.2.3) which correspond to common floating point representations.

2.4.2.6 maxAbsoluteValue

[Definition:] The maxAbsoluteValue facet specifies the maximum absolute value of the value space for generated datatypes whose basetype is real (§3.2.3). This facet (together with maxAbsoluteValue (§2.4.2.6)) can be used to generate subtypes of real (§3.2.3) which correspond to common floating point representations.

2.4.2.7 maxInclusive

[Definition:] The maxInclusive facet determines the upper bound of the value space for a datatype with the Order (§2.4.1.1) property. The maximum value specified with this facet is inclusive in the sense that the value specified for the facet is itself included in the value space for the datatype.

2.4.2.8 maxExclusive

[Definition:] The maxExclusive facet determines the upper bound of the value space for a datatype with the Order (§2.4.1.1) property. The maximum value specified with this facet is exclusive in the sense that the value specified for the facet is itself excluded from the value space for the datatype.

2.4.2.9 minInclusive

[Definition:] The minInclusive facet determines the lower bound of the value space for a datatype with the Order (§2.4.1.1) property. The minimum value specified with this facet is inclusive in the sense that the value specified for the facet is itself included in the value space for the datatype.

2.4.2.10 minExclusive

[Definition:] The minExclusive facet determines the lower bound of the value space for a datatype with the Order (§2.4.1.1) property. The minimum value specified with this facet is exclusive in the sense that the value specified for the facet is itself excluded from the value space for the datatype.

2.4.2.11 precision

[Definition:] The precision facet, which only applies to the decimal (§3.3.11) datatype refers to the total number of digits in the number. Its value must be a positive integer.

2.4.2.12 scale

[Definition:] The scale facet, which only applies to the decimal (§3.3.11) datatype refers to the total number of digits to the right of the decimal point. Its value must be a positive number less than or equal to the precision.

2.4.2.13 encoding

[Definition:]

Ed. Note: need to fill out definition of this facet, which applies (currently) only to binary (§3.2.7)

2.4.2.14 period

[Definition:]

Ed. Note: need to fill out definition of this facet, which applies (currently) only to recurringInstant (§3.2.6)

2.5 Datatype dichotomies

It is useful to categorize the datatypes defined in this specification along various dimensions, forming a set of characterization dichotomies.

2.5.1 Atomic vs. aggregate datatypes

The first distinction to be made is that between atomic and aggregate datatypes.

[Definition:] Atomic datatypes are those having values which are intrinsically indivisible.
[Definition:] Aggregate datatypes are those having values which can be decomposed into two or more component values.

For example, a date that is represented as a single character string could be the value of an atomic date datatype; while another date represented as separate "month", "day" and "year" elements would be the value of an aggregate date datatype. Not surprisingly, the distinction is analogous to that between an XML element whose content model is #PCDATA and one with element content.

As discussed above, this specification focuses mainly on atomic datatypes. Later versions will address aggregate datatypes in more detail. Note that the legacy XML attribute types IDREFS (§3.3.7), ENTITIES (§3.3.9) and NMTOKENS (§3.3.2) can be thought of as aggregate (list) types.

A datatype which is atomic in this specification need not be an "atomic" datatype in any programming language used to implement this specification.

2.5.2 Primitive vs. generated datatypes

[Definition:] Primitive datatypes are those that are not defined in terms of other datatypes; they exist ab initio.
[Definition:] Generated datatypes are those that are defined in terms of other datatypes.

For example, a real (§3.2.3) is a well defined mathematical concept that cannot be defined in terms of other datatypes while a date (§3.3.17) is a special case of the more general datatype recurringInstant (§3.2.6).

The datatypes defined by this specification fall into both the primitive and the generated categories. It is felt that a judiciously chosen set of primitive datatypes will serve the widest possible audience by providing a set of convenient datatypes that can be used as is, as well as providing a rich enough base from which the variety of datatypes needed by schema designers can be generated.

A datatype which is primitive in this specification need not be a "primitive" datatype in any programming language used to implement this specification.

2.5.2.1 Base type

[Definition:] Every generated datatype is defined in terms of an existing datatype, referred to as the base type. Base types may be either primitive or generated.

[Definition:] In the example above, date (§3.3.17) is referred to as a subtype of the base type recurringInstant (§3.2.6). The value space of a subtype is a subset of the value space of the base type.

2.5.3 Built-in vs. user-generated datatypes

[Definition:] Built-in datatypes are those which are entirely defined in this specification, and may be either primitive or generated;
[Definition:] User-generated datatypes are those generated datatypes that are defined by individual schema designers by giving values to constraining facets.

Conceptually there is no difference between the built-in generated datatypes included in this specification and the user-generated datatypes which will be created by individual schema designers. The built-in generated datatypes are those which are believed to be so common that if they were not defined in this specification many schema designers would end up "reinventing" them. Furthermore, including these generated datatypes in this specification serves to demonstrate the mechanics and utility of the datatype generation facilities of this specification.

A datatype which is built-in in this specification need not be a "built-in" datatype in any programming language used to implement this specification.

3 Built-in datatypes

3.1 Namespace considerations

The built-in datatypes defined by this specification are designed so that systems other than the XML Schema Definition Language may access them. To facilitate such usage, the built-in datatypes in this specification come from the XML Datatype Language namespace, the specific namespace defined by this specification. This applies to both built-in primitive and built-in generated datatypes.

Ed. Note: The exact URLs for the namespace(s) defined by this W3C specification is still an open issue. This issue has been raised with the XML Coordination Group (issue 1999-0201-07 Standardizing W3C namespace URIs) for general coordination and resolution. On August 11, Dan Connolly recommended we make up our own URL for datatypes. See http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/1999Aug/0060.html.

Each user-generated datatype is also associated with a unique namespace. However, user-generated datatypes do not come from the XML Datatype Language namespace; rather, they come from the namespace of the schema in which they are defined. Note that associating a namespace with a user-generated datatype is not a general purpose extensibility mechanism and does not apply to primitive datatypes. Suppose a schema author wanted to introduce a new set of primitive datatypes, say a core set of mathematical datatypes not based on the Number datatype defined as a built-in primitive by this specification. Such a schema author might try to define those datatypes, associate a unique namespace with them and expect schema processors to understand them. Unfortunately, such a scenario would not work. Each such datatype would need specialized validation code and there are still many unresolved issues regarding standard mechanisms for sharing such code.

As described in more detail in Defining Generated Datatypes (§4), each user-generated datatype must be defined in terms of a base type included in this specification, by assigning facets which serve to constrain the value set of the user-generated datatype to a subset of the base type. Such a mechanism works because all schema processors are required to be able to validate datatypes defined by subsetting the value space of a datatype included in this specification.

3.2 Primitive datatypes

The primitive datatypes are described below. For each primitive datatype we discuss the fundamental facets, if any, and the constraining facets, if any.

3.2.1 string

[Definition:] The string datatype represents character strings in XML. The value space of the string datatype is the set of finite sequences of UCS characters ([ISO 10646] and [Unicode]). A UCS character (or just character, for short) is an atomic unit of communication; it is not further specified except to note that every UCS character has a corresponding UCS code point, which is an integer.

Ed. Note: We need to harmonize this definition with the I18N character model.

string has the following subtypes:

NMTOKEN (§3.3.1)

3.2.1.1 Pattern

The string datatype has an optional constraining facet called pattern (§2.4.2.3). The value of this facet is a regular expression. Regular expression constraints are discussed in Appendix Regular Expressions (§E). If this facet is not present, there is no restriction on the lexical representation.

3.2.1.2 Length

The string datatype has an optional constraining facet called length (§2.4.2.1). If length is specified we have a fixed length character string, whether length is measured in the number of characters in the string. If length is not specified we have a variable length character string. The value of the length facet must be a positive integer.

3.2.1.3 Maximum Length

The string datatype has an optional constraining facet called maximum length (§2.4.2.2). If maxlength is specified for a variable length string it represents an upper bound of the length of the string. The value of the maxlength facet must be a positive integer. Both length (§2.4.2.1) and maximum length (§2.4.2.2) cannot be specified for the same datatype. The absolute maximum length of variable length character strings depends on the XML parser implementation.

3.2.1.4 Maximum and Minimum Values

The string datatype also has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

Clearly, the effect of these constraining facets depends on the collating sequence used to define the Order (§2.4.1.1) property for strings.

Ed. Note: The issue of collating sequences for strings is complex. It will be discussed in detail in a subsequent version of this specification.

3.2.2 boolean

[Definition:] The boolean datatype has the value space required to support the mathematical concept of binary-valued logic: {true, false}.

3.2.2.1 Lexical Representation

An instance of a datatype that is defined as boolean can have the following legal lexical values {true, false}. The lexical representation is fixed and cannot be changed. The lexical representation facet is not supported.

3.2.3 real

[Definition:] The real datatype represents the standard mathematical concept of the real numbers.

real has the following constraining facets:

minAbsoluteValue (§2.4.2.5)
maxAbsoluteValue (§2.4.2.6)
maxInclusive (§2.4.2.7)
maxExclusive (§2.4.2.8)
minInclusive (§2.4.2.9)
minExclusive (§2.4.2.10)

real has the following subtype:

decimal (§3.3.11)

3.2.3.1 Lexical representation

real values have a single standard lexical representation consisting of a mantissa followed, optionally, by the character "E" followed by an exponent. The exponent must be an integer. The mantissa must be a decimal number. The representations for exponent and mantissa must follow the default lexical rules for integer and decimal numbers discussed above. If the "E" and the the following exponent are omitted, an exponent value of 1 is assumed. For example: -1E4, 1267.43233E12, 12.78E-2, 12.

3.2.4 timeInstant

[Definition:] The timeInstant datatype represents a combination of date and time values representing a single instant of time, encoded as a single string.

3.2.4.1 Lexical Representation

A single lexical representation, which is a subset of the lexical representations allowed by [ISO 8601], is allowed for timeInstant. This lexical representation is the [ISO 8601] extended format CCYY-MM-DDThh:mm:ss.sss where "CC" represents the century, "YY" the year, "MM" the month and "DD" the day, preceded by an optional sign (+ or -),. The letter "T" is the date/time separator and "hh", "mm", "ss.sss" represent hour, minute and second respectively. Additional digits can be used to increase the precision of fractional seconds if desired.

This representation can be immediately followed by a "Z" to indicate Coordinated Universal Time. To indicate the time zone, i.e. the difference between the local time and Coordinated Universal Time, the difference immediately follows the time and consists of a sign, + or -, followed by hhmm. See also ISO 8601 Date and Time Formats (§D).

For example, to indicate 1:20 pm on May the 31st, 1999 for Eastern Standard Time which is 5 hours behind Coordinated Universal Time, one would write: 1999-05-31T13:20:00-05:00.

3.2.5 timeDuration

[Definition:] The timeDuration datatype represents a combination of year, month, day and time values representing a single duration of time, encoded as a single string. .

3.2.5.1 Lexical Representation

A single lexical representation, which is a subset of the lexical representations allowed by [ISO 8601], is allowed for timeDuration. This lexical representation is the [ISO 8601] extended format CCYY-MM-DDThh:mm:ss.sss, preceded by an optional sign (+ or -), where "CC" represents the number of centuries, "YY" the number of years, "MM" the number of months (a month represents a time duration of 30 days) and "DD" the number of days. The letter "T" is the date/time separator and "hh", "mm", "ss.sss" represent number of hours, minutes and seconds respectively. Additional digits can be used to increase the precision of fractional seconds if desired. See also ISO 8601 Date and Time Formats (§D).

For example, to indicate a duration of 1 year, 2 months, 3 days, 10 hours, and 30 minutes, one would write: 0001-02-03T10:30:00.

Time periods, i.e. specific durations of time, can be represented by supplying two items of information: a start instant and a duration or a start instant and an end instant or an end instant and a duration.

3.2.6 recurringInstant

[Definition:] The recurringInstant datatype represents an instant of time that recurs with a specific timeDuration (§3.2.5). Note that we do not attempt to support general recurring instants of time, just those that needed to support the generated date (§3.3.17) and time (§3.3.18) datatypes and those that arise from truncated and reduced lexical representations of timeInstant (§3.2.4). See also ISO 8601 Date and Time Formats (§D).

recurringInstant has a single constraining facet.

period (§2.4.2.14)

which can be used to constrain the frequency of recurrence. The only values permitted for the period facet are those that arise from truncated representations of timeInstant (§3.2.4). See Lexical Representation (§3.2.6.1) below. Values of the period facet must be of type timeDuration (§3.2.5).

recurringInstant has the following subtypes:

date (§3.3.17)
time (§3.3.18)

3.2.6.1 Lexical Representation

The lexical representation for recurringInstant is the left truncated [ISO 8601] representation for timeInstant (§3.2.4). For example, if the century "CC" is omitted from the timeInstant representation it means a timeInstant that recurs every hundred years. Similarly, if "CCYY" is omitted it designates a time instant that recurs every year.

Every two character "unit" of the representation that is omitted is indicated by a single hyphen "-". For example, to indicate 1:20 pm on May the 31st every year, one would write write: --05-31T13:20:00-05:00.

3.2.7 binary

[Definition:] The binary datatype represents strings (blobs) of binary data. It has three fundamental facets. The optional length (§2.4.2.1) facet specifies the length of the data in bits. This defines a datatype with a fixed length. If the length is not specified, a datatype with variable length is specified . In this case, the optional maximum length (§2.4.2.2) facet specifies the maximum length of the data in bits. If the maximum length is not specified the default is unlimited length. The optional "encoding" facet specifies the encoding which may be "hex" for hexadecimal digits or "base64" for MIME style Base64 data.

Issue (application-specific-binary-formats): Should we add a facet to allow a binary datatype to be restricted to an application-specific format such as video, audio, image?

Issue (binary-mime-type): should we add facet(s) for mime-type/subtype?

Issue (binary-value-space): Is this really a datatype? What is the value space of this datatype: the set of encoded strings or the set of binary streams after decoding?

3.2.8 uri

[Definition:] The uri datatype represents a Universal Resource Identifier (URI) Reference as defined in [RFC 2396]. It has no fundamental or constraining facets.

Issue (uri-scheme-facet): should we have a facet to allow a limitation to a specific scheme? It might be useful to able to say that something was not only a URI, but that it was a "mailto" and not a "http://...".

3.2.9 language

[Definition:] The language datatype represents natural language identifiers as defined by [RFC 1766] The value space of language is the set of all tokens that match the LanguageID production in [XML]. The lexical space of language is the set of all strings that match the LanguageID production in [XML].

3.3 Generated datatypes

This section gives conceptual definitions for all built-in generated datatypes defined by this specification, including a description of the facets which apply to each datatype. The abstract syntax used to define generated datatypes (whether built-in or user-generated) is given in section Defining Generated Datatypes (§4) and the complete definitions of the built-in generated datatypes (written in the concrete syntax based on that abstract syntax given in Appendix Schema for Datatype Definitions (normative) (§A)) are provided in Appendix Schema for Datatype Definitions (normative) (§A).

3.3.1 NMTOKEN

[Definition:] The NMTOKEN datatype represents the NMTOKEN attribute type from [XML]. The value space of NMTOKEN is the set of all tokens that match the Nmtoken production in [XML]. The lexical space of NMTOKEN is the set of all strings that match the Nmtoken production in [XML]. The basetype of NMTOKEN is string (§3.2.1).

For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

NMTOKEN has the following subtypes:

Name (§3.3.3)

3.3.2 NMTOKENS

[Definition:] The NMTOKENS datatype represents the NMTOKENS attribute type from [XML]. It consists of a null-separated list of NMTOKENs. The value space of NMTOKENS is the set of all tokens that match the Nmtokens production in [XML]. The lexical space of ID is the set of all strings that match the Nmtokens production in [XML].

NMTOKENS has no fundamental or constraining facets. For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

3.3.3 Name

[Definition:] The Name datatype represents XML Names. The value space of this datatype is the set of all tokens which match the Name production of [XML]. The lexical space of this datatype is the set of all strings which match the Name production of [XML]. The basetype of Name is NMTOKEN (§3.3.1).

Name has the following subtypes:

NCName (§3.3.4)
ID (§3.3.5)
ENTITY (§3.3.8)
NOTATION (§3.3.10)

3.3.4 NCName

[Definition:] The NCName datatype represents XML "non-colonized" Names. The value space of this datatype is the set of all tokens which match the NCName production of [Namespaces in XML]. The lexical space of this datatype is the set of all strings which match the NCName production of [Namespaces in XML]. The basetype of NCName is Name (§3.3.3).

3.3.5 ID

[Definition:] The ID datatype represents the ID attribute type from [XML]. The value space of ID is the set of all tokens that match the Name production in [XML] and have been used in an XML document. The lexical space of ID is the set of all strings that match the Name production in [XML]. The basetype of ID is Name (§3.3.3).

ID has no fundamental or constraining facets. For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

Schema-validity Constraint: ID Unique
An ID must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

ID has the following subtypes:

IDREF (§3.3.6)

Issue (better-reference-mechanisms): There are several situations in which we need better reference mechanisms than those provided by ID and IDREF/IDREFS. For example, it would be desirable to extend IDs and IDREFs to be typed and scoped to better represent primary key/foreign key relationships in a database. XSL has recently introduced the concept of xsl:key and xsl:keyref whereby a single property of an element can be used as a key. We need such a mechanism for XML as a whole and it would be nice if this were extended to support multi-part keys.

3.3.6 IDREF

[Definition:] The IDREF datatype represents the IDREF attribute type from [XML]. The value space of IDREF is the set of all tokens that match the Name production in [XML] and have been used in an XML document as the value of an element or attribute of type ID. The lexical space of IDREF is the set of all strings that match the Name production in [XML]. The basetype of IDREF is ID (§3.3.5).

IDREF has no fundamental or constraining facets. For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

3.3.7 IDREFS

[Definition:] The IDREFS datatype represents the IDREFS attribute type from [XML]. It consists of a null-separated list of IDREFs. The value space of IDREFS is the set of all tokens that match the Names production in [XML] and have been used in an XML document as the value of an element or attribute of type ID. The lexical space of IDREFS is the set of all strings that match the Names production in [XML].

IDREFS has no fundamental or constraining facets. For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

3.3.8 ENTITY

[Definition:] The ENTITY datatype represents the ENTITY attribute type from [XML]. The value space of ENTITY is the set of all tokens that match the Name production in [XML] and have been declared as an Unparsed Entity in a schema. The lexical space of ENTITY is the set of all strings that match the Name production in [XML]. The basetype of ENTITY is Name (§3.3.3).

ENTITY has no fundamental or constraining facets. For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

3.3.9 ENTITIES

[Definition:] The ENTITIES datatype represents the ENTITIES attribute type from [XML]. It consists of a null-separated list of ENTITYs. The value space of ENTITIES is the set of all tokens that match the Name production in [XML] and have been declared as an Unparsed Entity in a schema. The lexical space of ENTITIES is the set of all strings that match the Name production in [XML].

ENTITIES has no fundamental or constraining facets. For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

3.3.10 NOTATION

[Definition:] The NOTATION datatype represents the NOTATION attribute type from [XML] and allows the identification of the format of unparsed entities by name. To use this datatype one or more notation names must be declared in the schema. These names can then be used as values for attributes or elements declared as NOTATIONs.

The value space of NOTATION is the set of all notations declared in a schema. The lexical space of NOTATION is the set of all strings that match the Name production in [XML]. The basetype of NOTATION is Name (§3.3.3).

Schema processors must provide the application with the name and external identifier(s) of any notation name that appears as a value.

For compatibility (see Terminology (§1.4)) this datatype should be used only on attributes.

3.3.11 decimal

[Definition:] The decimal datatype restricts allowable values to real numbers with an exact fractional part. Since this datatype specifies a fixed number of decimal digits it may also be called "fixed point decimal". The basetype of decimal is real (§3.2.3).

Decimal has the following required fundamental facets:

precision: the total number of digits in the number.
scale: the number of digits to the right of the decimal point. Must be less than or equal to precision.

Decimal has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

decimal has the following subtypes:

integer (§3.3.12)

3.3.11.1 Lexical representation

Decimal values have a single standard lexical representation. This consists of a string of digits separated by a period as a decimal indicator, in accordance with the scale and precision facets, with an optional leading sign to indicate a negative number. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. For example: -1.23, 12678967.543233, 100000.00.

3.3.12 integer

[Definition:] The integer datatype is the standard mathematical concept of the integer numbers. The basetype of integer is decimal (§3.3.11). The value space of the integer datatype is the infinite set {-∞,...,-2,-1,0,1,2,...,∞} although computer implementations restrict this to a finite set.

Integer has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

integer has the following subtypes:

non-negative-integer (§3.3.13)
non-positive-integer (§3.3.15)

3.3.12.1 Lexical representation

Integer values have a single, standard lexical representation. This consists of a string of digits with an optional leading sign. If the sign is omitted, "+" is assumed. For example: -1, 0, 12678967543233, +100000.

3.3.13 non-negative-integer

[Definition:] The non-negative-integer datatype is the standard mathematical concept of the non-negative integers. The value space of the non-negative-integer datatype is the infinite set {0,1,2,...,∞} although computer implementations restrict this to a finite set. The basetype of integer is integer (§3.3.12).

non-negative-integer has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

non-negative-integer has the following subtypes:

positive-integer (§3.3.14)

3.3.13.1 Lexical representation

Non-negative-integer values have a single, standard lexical representation. This consists of a string of digits with an optional leading "+" sign. If the sign is omitted, "+" is assumed. For example: 1, 0, 12678967543233, +100000.

3.3.14 positive-integer

[Definition:] The positive-integer datatype is the standard mathematical concept of the positive integers. The value space of the positive-integer datatype is the infinite set {1,2,...,∞} although computer implementations restrict this to a finite set. The basetype of integer is non-negative-integer (§3.3.13).

positive-integer has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

3.3.14.1 Lexical representation

positive-integer values have a single, standard lexical representation. This consists of a string of digits with an optional leading "+" sign. For example: 1, 12678967543233, +100000.

3.3.15 non-positive-integer

[Definition:] The non-positive-integer datatype is the standard mathematical concept of the non-positive integers. The value space of the non-positive-integer datatype is the infinite set {-∞,...,-2,-1,0} although computer implementations restrict this to a finite set. The basetype of integer is integer (§3.3.12).

non-positive-integer has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

non-positive-integer has the following subtypes:

negative-integer (§3.3.16)

3.3.15.1 Lexical representation

Non-positive-integer values have a single, standard lexical representation. This consists of a string of digits with a leading "-" sign. For example: -1, 0, -12678967543233, -100000.

3.3.16 negative-integer

[Definition:] The negative-integer datatype is the standard mathematical concept of the negative integers. The value space of the negative-integer datatype is the infinite set {-∞,...,-2,-1} although computer implementations restrict this to a finite set. The basetype of integer is non-positive-integer (§3.3.15).

negative-integer has the following constraining facets:

maxInclusive
maxExclusive
minInclusive
minExclusive

3.3.16.1 Lexical representation

negative-integer values have a single, standard lexical representation. This consists of a string of digits with a leading "-" sign. For example: -1, -12678967543233, -100000.

3.3.17 date

[Definition:] The date datatype represents a timeDuration (§3.2.5) that starts at midnight of a specified day and lasts for 24 hours. The basetype of date is recurringInstant (§3.2.6). date is generated from recurringInstant (§3.2.6) by setting the value of the period facet equal to 24 hours.

3.3.17.1 Lexical Representation

The lexical representation for date is the reduced (right truncated) lexical representation for recurringInstant (§3.2.6): CCYY-MM-DD. For example, to indicate May the 31st, 1999, one would write: 1999-05-31. See also ISO 8601 Date and Time Formats (§D).

Left truncated representations can be used to represent recurring dates. If the CC is omitted it signifies a date that occurs every century. If the YY is omitted it signifies a date every year and so on. Every two character "unit" of the representation that is omitted is indicated by a single hyphen "-". For example, ---05 signifies the fifth day of every month.

3.3.18 time

[Definition:] The time datatype represents a recurring instant of time that recurs every day. The basetype of time is recurringInstant (§3.2.6). The time datatype can be considered to be a shorthand to designate a specific truncated representation for recurringInstant (§3.2.6). time is generated from recurringInstant (§3.2.6) by setting the value of the period facet equal to 24 hours.

3.3.18.1 Lexical Representation

The lexical representation for time is the left truncated lexical representation for timeInstant (§3.2.4): hh:mm:ss.sss. For example, to indicate 1:20 pm for Eastern Standard Time which is 5 hours behind Coordinated Universal Time, one would write: 13:20:00-05:00. See also ISO 8601 Date and Time Formats (§D).

4 Defining Generated Datatypes

A generated datatype can be defined from a primitive datatype (or another generated datatype) by adding optional constraining facets. For example, it may be useful to define a datatype called i4 (signed 4-byte integer) from the built-in datatype integer by supplying maxInclusive and minInclusive facets. In this case, i4 is the name of the new user-generated datatype, integer is its base type and maxInclusive and minInclusive are the constraining facets.

Example

<datatype name="i4">   
   <basetype name="integer"/>   
   <minInclusive>   
      2147483648   
   </minInclusive>   
   <maxInclusive>   
      -2147483648   
   </maxInclusive>   
</datatype>

This section defines the abstract syntax used for defining generated datatypes. This abstract syntax is used for defining both Generated datatypes (§3.3) and user-generated datatypes; the only difference between the built-in and user-generated datatypes being that the datatype definitions for built-in generated datatypes are included in the Schema for Datatype Definitions (normative) (§A) while the datatype definitions for user-generated datatypes appear in schemas written by users.

[Definition:] An abstract syntax provides a formal specification of the information provided for each generated datatype definition. The abstract syntax is presented using a simplified BNF. Defined terms are to the left. Their components are to the right, with a small amount of meta-syntax: ()s for grouping, | to separate alternatives, ? for optionality, * and + for iteration.

[Definition:] The concrete syntax for generated datatype definitions is the exact element and attribute names used in definitions.. The concrete syntax is a key feature of its proposed design. The concrete syntax is the form in which the schema language is used by datatype designers. Though its elements and attributes are often different from the terms of the abstract syntax bnf, the features and expressive power of the two are congruent.

We include a preliminary concrete syntax in this draft, via examples, as well as in Schema for Datatype Definitions (normative) (§A) (defined using the schema language of [XML Schema Part 1: Structures]) and DTD for Datatype Definitions (normative) (§B). The emphasis in this version has been to stay quite close to the abstract syntax.

Ed. Note: The abstract syntax proposed here (and hence, the concrete syntax) are preliminary, as they allow datatype definitions which are logically inconsistent (e.g., they allow numeric facets on non-numeric datatypes). This will be corrected in future drafts, as the XML Schema language comes to allow the specification of tighter constraints.

Ed. Note: This section needs more explanatory text describing the productions and their relationship to the conceptual framework described in sections Type System (§2) and Built-in datatypes (§3).

Datatype definitions

`[1]`	`datatypeDefn`	`::=`	`NCName basetype facets`	`[ Constraint: Unique datatype definitions ]`
`[2]`	`basetype`	`::=`	`datatypename`
`[3]`	`facets`	`::=`	`ordered? unordered?`	`[ Constraint: Appropriate facets ]`

The following is the definition for a possible built-in generated datatype "currency". This datatype definition would appear in the schema which defines datatypes for XML Schemas and shows that a generated datatype can have the same value space as its basetype, which might mean that it is just an "alias" or "renaming" of basetype. In this case, the specification would probably also define some "semantics" for currency which went beyond those of decimal.

Example

<datatype name="currency">   
   <basetype name="decimal"/>   
</datatype>

Constraint on Schemas: Unique datatype definitions
The name of the datatype being defined must be unique among the datatypes defined in the containing schema.

Constraint on Schemas: Appropriate facets
If the value space of the basetype is ordered, then only ordered facets may appear in a datatype definition.

Datatype names

`[4]`	`datatypename`	`::=`	`builtinname \| usergenname`
`[5]`	`builtinname`	`::=`	`Name \| NCName \|`
			`ID \| IDREF \|IDREFS \|`
			`NMTOKEN \| NMTOKENS \|`
			`ENTITY \| ENTITIES \|`
			`string \| uri \|`
			`timeInstant \| timeDuration \| recurringInstant`
			`binary \|`
			`real \| decimal \|integer \|`
			`non-negative-integer \| positive-integer \|`
			`non-positive-integer \| negative-integer \|`
			`date \| time \| language`
`[6]`	`usergenname`	`::=`	`NCName schemaRef`	`[ Constraint: Datatype name ]`

NOTE: The datatypename production above is not to be confused with that labeled datatypeName in [XML Schema Part 1: Structures].

Constraint on Schemas: Datatype name
The name specified must be the name of a datatype defined in the schema in which the user-generated datatype is defined.

Facets

`[7]`	`ordered`	`::=`	`bounds? numeric? dateTime?`
`[8]`	`unordered`	`::=`	`pattern? enumeration? length? maxLength? encoding?`

Ordered facets

`[9]`	`bounds`	`::=`	`(minInclusive \| maxInclusive)? (minExclusive \| maxExclusive)?`
`[10]`	`maxInclusive`	`::=`	`literalValue`	`[ Constraint: Literal type ]`
`[11]`	`minInclusive`	`::=`	`literalValue`	`[ Constraint: Literal type ]`
`[12]`	`minExclusive`	`::=`	`literalValue`	`[ Constraint: Literal type ]`
`[13]`	`maxExclusive`	`::=`	`literalValue`	`[ Constraint: Literal type ]`

Constraint on Schemas: Literal type
The literal value give must be of the same type as the datatype as the basetype given in the datatype definition in which this facet appears.

Numeric facets

`[14]`	`numeric`	`::=`	`(minAbsoluteValue maxAbsoluteValue)?`
			`precision? scale?`
`[15]`	`minAbsoluteValue`	`::=`	`realLiteral`	`[ Constraint: minMaxAbsoluteValue ]`
`[16]`	`maxAbsoluteValue`	`::=`	`realLiteral`	`[ Constraint: minMaxAbsoluteValue ]`
`[17]`	`precision`	`::=`	`integerLiteral`
`[18]`	`scale`	`::=`	`integerLiteral`

Constraint on Schemas: minMaxAbsoluteValue
In a generated subtype of real (§3.2.3), if a value is specified for the minAbsoluteValue facet a value must also be specified for the maxAbsoluteValue facet.

dateTime facets

`[19]`	`dateTime`	`::=`	`period?`
`[20]`	`period`	`::=`	`timeInstantLiteral`

The following is the definition of a user-generated datatype which could be used to represent monetary amounts, such as in a financial management application which generally do not have figures above $1M and only allow whole cents. This definition would appear in a schema authored by an "end-user" and shows how to define a datatype by specifying facet values which constrain the range of the basetype in a manner specific to the basetype (different than specifying max/min values as before)

Example
<datatype name="ieee32">
   <basetype name="real"/>
   <minAbsoluteValue>
      1.40239846e-45
   </minAbsoluteValue>
   <maxAbsoluteValue>
      3.40282347e38
   </maxAbsoluteValue>
</datatype>
The above subtype of real represents an IEEE 32-bit floating. While the explanation is beyond the scope of this specification, the above minimum and maximum absolute values correspond to values which are representable with the 32-bit floating point format, which has 1 bit for sign, 8 bits of exponent and 23 bits of mantissa.

Example
<datatype name="ieee64">
   <basetype name="real"/>
   <minAbsoluteValue>
      4.90465645841246544e-324
   </minAbsoluteValue>
   <maxAbsoluteValue>
      1.79769313486231570e308
   </maxAbsoluteValue>
</datatype>
The above subtype of real represents an IEEE 64-bit floating point number. While the explanation is beyond the scope of this specification, the above minimum and maximum absolute values correspond to values which are representable with the IEEE 64-bit floating point format, which has 1 bit for sign, 11 bits of exponent and 52 bits of mantissa.

Example
<datatype name="ibmhex32">
   <basetype name="real"/>
   <minAbsoluteValue>
      5.2e-85
   </minAbsoluteValue>
   <maxAbsoluteValue>
      7.2e75
   </maxAbsoluteValue>
</datatype>
The above subtype of real represents an IEEE 32-bit floating point number. While the explanation is beyond the scope of this specification, the above minimum and maximum absolute values correspond to values which are representable with the IBM 32-bit hexidecimal floating point format, which has 1 bit for sign, 8 bits of exponent and 23 bits of mantissa.

Example
This type could just as well have been defined with the potential built-in generated type "currency" (defined above) as its basetype.
<datatype name="amount">   
   <basetype name="decimal"/>   
   <precision>   
      8   
   </precision>   
   <scale>   
      2   
   </scale>   
</datatype>

Unordered facets

`[21]`	`length`	`::=`	`integerLiteral`
`[22]`	`maxLength`	`::=`	`integerLiteral`
`[23]`	`enumeration`	`::=`	`literal+`
`[24]`	`pattern`	`::=`	`regularExpression+`
`[25]`	`lexical`	`::=`	`LexicalSpec`	`[ Constraint: Lexical specification ]`
`[26]`	`encoding`	`::=`	`'hex' \| 'base64'`

Constraint: Lexical specification
The lexical specification must be of the "correct" kind, i.e., a dateTime lexical for datatypes generated from timeInstant (§3.2.4).

The following example is a datatype definition for a user-generated datatype which limits the possible literal values of dates to the four US holidays enumerated. This datatype definition would appear in a schema authored by an "end-user" and shows how to define a datatype by enumerating the values in its value space. The enumerated values must be type-valid literals for the basetype.

Example

<datatype name="holidays">   
   <basetype name="date"/>   
   <enumeration>   
      <literal>   
        --01-01    <!-- New Year's day -->   
      </literal>   
      <literal>   
        --07-04    <!-- 4th of July -->   
      </literal>   
      <literal>   
        --11-25    <!-- Thanksgiving -->   
      </literal>   
      <literal>   
        --12-25    <!-- Christmas -->   
      </literal>   
   </enumeration>   
</datatype>

Literals

`[27]`	`literal`	`::=`	`literalValue`
`[28]`	`literalValue`	`::=`	`stringLiteral \| numericLiteral \| dateTimeLiteral \| uriLiteral \| languageLiteral`
`[29]`	`stringLiteral`	`::=`	`(see string (§3.2.1))`
`[30]`	`uriLiteral`	`::=`	`(see uri (§3.2.8))`
`[31]`	`languageLiteral`	`::=`	`(see language (§3.2.9))`

Numeric Literals

`[32]`	`numericLiteral`	`::=`	`realLiteral \| decimalLiteral \| integerLiteral`
`[33]`	`realLiteral`	`::=`	`(mantissa exponent?) \| NaN \| INF \| -INF`
`[34]`	`mantissa`	`::=`	`decimalLiteral`
`[35]`	`exponent`	`::=`	`('E' \| 'e') integerLiteral`
`[36]`	`decimalLiteral`	`::=`	`(integerLiteral ('.' digit*)?)`
`[37]`	`integerLiteral`	`::=`	`(('+' \| '-')? digit+)`
`[38]`	`digit`	`::=`	`'0' \| '1' \| '2' \| '3' \| '4' \|`
			`'5' \| '6' \| '7' \| '8' \| '9'`

Date and Time Literals

`[39]`	`dateTimeLiteral`	`::=`	`timeInstantLiteral \| timeDurationLiteral \| recurringInstantLiteral \| dateLiteral \| timeLiteral`
`[40]`	`timeInstantLiteral`	`::=`	`dateLiteral 'T' timeLiteral`
`[41]`	`timeDurationLiteral`	`::=`	`dateLiteral 'T' timeLiteral`
`[42]`	`recurringInstantLiteral`	`::=`	`dateLiteral 'T' timeLiteral`
`[43]`	`dateLiteral`	`::=`	`CCYYMMDD`
`[44]`	`timeLiteral`	`::=`	`hhmmss.sss timeZoneOffset?`
`[45]`	`timeZoneOffset`	`::=`	`'Z' \| (('+' \| '-') hhmmss (.sss?))`

Issue (definition-overriding): In some cases it may be desirable to specify datatype constraints in instance documents rather than in a schema. Should this be allowed? If the document does not have a schema then, clearly, the only possibility of adding datatype constraints is in the document instance. Even if the document has a schema the document instance may want to further restrict the content. For example, the schema may specify a value to be a string but the instance may want to impose a particular regex constraint on it. If we decide to allow datatype specification or specialization in instance document what syntax should be used? This needs to be coordinated with the structural schema editorial team.

Issue (non-positive-integer-literal): Do we need productions for the literals of non-negative-integer, positive-integer, non-positive-integer and negative-integer?

A Schema for Datatype Definitions (normative)

Ed. Note: This section (both its abstract content and its concrete wording) has not yet garnered consensus among WG members.

<?xml version='1.0'?>
<!-- XML Schema schema for XML Schemas: Part 2: Datatypes -->
<!-- Id: datatypes.xsd,v 1.7 1999/10/27 10:20:33 ht Exp  -->
<!DOCTYPE schema PUBLIC "-//W3C//DTD XMLSCHEMA 19991105//EN"
                 "http://www.w3.org/TR/1999/WD-xmlschema-1-19991105/structures.dtd">
<schema xmlns='http://www.w3.org/1999/XMLSchema'
        targetNS='http://www.w3.org/1999/XMLSchema'
        version='0.5'>

  <group name='facets' order='choice'>
        <element archRef='maxBound'/>
        <element archRef='minBound'/>
        <element ref='minAbsoluteValue'/>
        <element ref='maxAbsoluteValue'/>
        <element ref='precision'/>
        <element ref='scale'/>
        <element ref='length'/>
        <element ref='maxLength'/>
        <element ref='enumeration'/>
        <element ref='pattern'/>
        <element ref='encoding'/>
        <element ref='period'/>
  </group>

  <element name='datatype'>
     <archetype>
        <element ref='basetype'/>
	<group ref='facets' order='many'/>
        <attribute name='name' type='NMTOKEN' minOccurs='1'/>
        <attribute name='export' type='boolean' default='true'/>
     </archetype>
  </element>

  <element name='basetype'>
     <archetype content='empty'>
        <attribute name='name' type='NMTOKEN' minOccurs='1'/>
        <attribute name='schemaAbbrev' type='NMTOKEN'/>
        <attribute name='schemaName' type='uri'/>
     </archetype>
  </element>

  <!-- these are here to bridge between the content model above
       and the elements below -->
  <archetype name='minBound'/>
  <archetype name='maxBound'/>

  <!-- these can only be applied when the base type is 'real'
       and must be used in concert with one another -->
  <element name='minAbsoluteValue' type='real'/>
  <element name='maxAbsoluteValue' type='real'/>

  <!-- the true datatype of the four following depends on the basetype -->
  <element name='maxExclusive' type='string'>
     <archetype>
       <refines name='maxBound'/>
     </archetype>
  </element>
  <element name='maxInclusive' type='string'>
     <archetype>
       <refines name='maxBound'/>
     </archetype>
  </element>
  <element name='minExclusive' type='string'>
     <archetype>
       <refines name='minBound'/>
     </archetype>
  </element>
  <element name='minInclusive' type='string'>
     <archetype>
       <refines name='minBound'/>
     </archetype>
  </element>

  <element name='precision' type='integer'/>
  <element name='scale' type='integer'/>

  <element name='length' type='integer'/>
  <element name='maxLength' type='integer'/>

  <!-- the following datatype is used to limit the
       possible values for the encoding facet on
	   the binary datatype -->
  <datatype name='encodings'>
     <basetype name='NMTOKEN'/>
	 <enumeration>
	    <literal>hex</literal>
		<literal>base64</literal>
     </enumeration>
  </datatype>
  <element name='encoding' type='encodings'/>

  <element name='period' type='timeDuration'/>

  <element name='enumeration'>
    <archetype>
       <element ref='literal' minOccurs='1' maxOccurs='*'/>
    </archetype>
  </element>
  <!-- the true datatype of the following depends on the basetype -->
  <element name='literal' type='string'/>

  <element name='pattern'>
     <archetype>
        <element ref='lexical' minOccurs='1' maxOccurs='*'/>
     </archetype>
  </element>
    <!-- the true datatype of the following depends on the basetype -->
  <element name='lexical' type='string'/>

<!-- built-in generated datatypes -->
<!-- only has a few for now, eventually needs to have all of them -->

  <datatype name='integer'>
    <basetype name='decimal'/>
    <scale>0</scale>
  </datatype>
	
  <datatype name='non-negative-integer'>
    <basetype name='integer'/>
    <minInclusive>0</minInclusive>
  </datatype>

  <datatype name='positive-integer'>
    <basetype name='non-negative-integer'/>
    <minInclusive>1</minInclusive>
  </datatype>

  <datatype name='non-positive-integer'>
    <basetype name='integer'/>
    <maxInclusive>0</maxInclusive>
  </datatype>

  <datatype name='negative-integer'>
    <basetype name='non-positive-integer'/>
    <maxInclusive>-1</maxInclusive>
  </datatype>

  <datatype name='date'>
    <basetype name='recurringInstant'/>
    <period>000000T2400</period>
  </datatype>

  <datatype name='time'>
    <basetype name='recurringInstant'/>
    <period>000000T2400</period>
  </datatype>
</schema>

B DTD for Datatype Definitions (normative)

Ed. Note: This section (both its abstract content and its concrete wording) has not yet garnered consensus among WG members.

<!-- DTD for XML Schemas: Part 2: Datatypes -->
<!-- Id: datatypes.dtd,v 1.10 1999/10/27 10:21:21 ht Exp  -->
<!-- Note that the use of 'facets' below is less restrictive than is
     really intended:  There should in fact be no more than one of each of
     minInclusive, minExclusive, maxInclusive, maxExclusive,
     maxAbsoluteValue, minAbsoluteValue,
     precision, scale, pattern, enumeration,
     length, maxLength, encoding, period within datatype -->
<!ENTITY % minBound '(minInclusive | minExclusive)'>
<!ENTITY % maxBound '(maxInclusive | maxExclusive)'>
<!ENTITY % bounds '%minBound; | %maxBound;'>
<!ENTITY % numeric '(maxAbsoluteValue, minAbsoluteValue)? | precision | scale'>
<!ENTITY % ordered '%bounds; | %numeric;'>
<!ENTITY % unordered
   'pattern | enumeration | length | maxLength | encoding | period'>
<!ENTITY % facets '%ordered; | %unordered;'>
<!ELEMENT datatype (basetype, (%facets;)*)>
<!ATTLIST datatype
    name NMTOKEN #IMPLIED
    export (true|false) 'true'>
<!-- name is required at top level, export irrelevant when nested -->

<!ELEMENT basetype EMPTY>
<!ATTLIST basetype
	name NMTOKEN #REQUIRED
	schemaAbbrev NMTOKEN #IMPLIED
	schemaName CDATA #IMPLIED>

<!ELEMENT minAbsoluteValue (#PCDATA)>
<!ELEMENT maxAbsoluteValue (#PCDATA)>

<!ELEMENT maxExclusive (#PCDATA)>
<!ELEMENT minExclusive (#PCDATA)>
<!ELEMENT maxInclusive (#PCDATA)>
<!ELEMENT minInclusive (#PCDATA)>

<!ELEMENT precision (#PCDATA)>
<!ELEMENT scale (#PCDATA)>

<!ELEMENT length (#PCDATA)>
<!ELEMENT maxLength (#PCDATA)>
<!ELEMENT enumeration (literal)+>
<!ELEMENT literal (#PCDATA)>
<!ELEMENT pattern (lexical)+>
<!ELEMENT lexical (#PCDATA)>
<!ELEMENT encoding (#PCDATA)>
<!ELEMENT period (#PCDATA)>

C Datatypes and Facets

Ed. Note: This section (both its abstract content and its concrete wording) has not yet garnered consensus among WG members.

C.1 Fundamental Facets

The following table shows the values of the fundamental facets for each built-in datatype.

Ed. Note: (PVB 1999-07-09) Some entries in this table might conflict with what it says elsewhere in this draft, as creating this table pointed out to me some problems with the way some of the fundamental facets are defined (not to mention any transcription errors on my part in creating the table).
We obviously need more introductory text here explaining this table to the reader

	Datatype	Order (§2.4.1.1)	Bounds (§2.4.1.2)	Cardinality (§2.4.1.3)	Exact and Approximate (§2.4.1.4)	Numeric (§2.4.1.5)
Primitive	string (§3.2.1)	yes	none	countably infinite	exact	no
	boolean (§3.2.2)	no	none	finite	exact	no
	real (§3.2.3)	yes	none	uncountably infinite	approximate	yes
	timeInstant (§3.2.4)	yes	no	uncountably infinite	approximate	no
	timeDuration (§3.2.5)	yes	no	uncountably infinite	approximate	no
	recurringInstant (§3.2.6)	yes	no	uncountably infinite	approximate	no
	binary (§3.2.7)	no	no	?	?	no
	uri (§3.2.8)	no	no	uncountably infinite	exact	no
	language (§3.2.9)	no	no	countably infinite	exact	no

Generated	NMTOKEN (§3.3.1)	no	none	countably infinite	exact	no
	NMTOKENS (§3.3.2)	no	no	countably infinite	exact	no
	Name (§3.3.3)	no	no	countably infinite	exact	no
	NCName (§3.3.4)	no	no	countably infinite	exact	no
	ID (§3.3.5)	no	no	countably infinite	exact	no
	IDREF (§3.3.6)	no	no	countably infinite	exact	no
	IDREFS (§3.3.7)	no	no	countably infinite	exact	no
	ENTITY (§3.3.8)	no	no	countably infinite	exact	no
	ENTITIES (§3.3.9)	no	no	countably infinite	exact	no
	NOTATION (§3.3.10)	no	no	countably infinite	exact	no
	decimal (§3.3.11)	yes	no	countably infinite	exact	yes
	integer (§3.3.12)	yes	no	countably infinite	exact	yes
	non-negative-integer (§3.3.13)	yes	yes	countably infinite	exact	yes
	positive-integer (§3.3.14)	yes	yes	countably infinite	exact	yes
	non-positive-integer (§3.3.15)	yes	yes	countably infinite	exact	yes
	negative-integer (§3.3.16)	yes	yes	countably infinite	exact	yes
	date (§3.3.17)	yes	no	countably infinite	exact	no
	time (§3.3.18)	yes	no	uncountably infinite	approximate	no

C.2 Constraining Facets

The constraining facets are listed below with all the primitive and genrated datatypes that they apply to.

Ed. Note: Some entries in this table might conflict with what it says elsewhere in this draft, as creating this table pointed out to me some problems with the way some of the constraining facets and datatypes are defined (not to mention any transcription errors on my part in creating the table).

The constraining facet length (§2.4.2.1) applies to:

string (§3.2.1)
binary (§3.2.7)

The constraining facet maximum length (§2.4.2.2) applies to:

string (§3.2.1)
binary (§3.2.7)

The constraining facet pattern (§2.4.2.3) applies to:

string (§3.2.1)

The constraining facet enumeration (§2.4.2.4) applies to:

string (§3.2.1)
real (§3.2.3)
timeInstant (§3.2.4)
timeDuration (§3.2.5)
recurringInstant (§3.2.6)
language (§3.2.9)
NMTOKEN (§3.3.1)
Name (§3.3.3)
NCName (§3.3.4)
ENTITY (§3.3.8)
NOTATION (§3.3.10)
decimal (§3.3.11)
integer (§3.3.12)
non-negative-integer (§3.3.13)
positive-integer (§3.3.14)
non-positive-integer (§3.3.15)
negative-integer (§3.3.16)
date (§3.3.17)
time (§3.3.18)

The constraining facet maxInclusive (§2.4.2.7) applies to:

string (§3.2.1)
real (§3.2.3)
timeInstant (§3.2.4)
timeDuration (§3.2.5)
recurringInstant (§3.2.6)
decimal (§3.3.11)
integer (§3.3.12)
non-negative-integer (§3.3.13)
positive-integer (§3.3.14)
non-positive-integer (§3.3.15)
negative-integer (§3.3.16)
date (§3.3.17)
time (§3.3.18)

The constraining facet maxExclusive (§2.4.2.8) applies to:

string (§3.2.1)
real (§3.2.3)
timeInstant (§3.2.4)
timeDuration (§3.2.5)
recurringInstant (§3.2.6)
decimal (§3.3.11)
integer (§3.3.12)
non-negative-integer (§3.3.13)
positive-integer (§3.3.14)
non-positive-integer (§3.3.15)
negative-integer (§3.3.16)
date (§3.3.17)
time (§3.3.18)

The constraining facet minInclusive (§2.4.2.9) applies to:

string (§3.2.1)
real (§3.2.3)
timeInstant (§3.2.4)
timeDuration (§3.2.5)
recurringInstant (§3.2.6)
decimal (§3.3.11)
integer (§3.3.12)
non-negative-integer (§3.3.13)
positive-integer (§3.3.14)
non-positive-integer (§3.3.15)
negative-integer (§3.3.16)
date (§3.3.17)
time (§3.3.18)

The constraining facet minExclusive (§2.4.2.10) applies to:

string (§3.2.1)
real (§3.2.3)
timeInstant (§3.2.4)
timeDuration (§3.2.5)
recurringInstant (§3.2.6)
decimal (§3.3.11)
integer (§3.3.12)
non-negative-integer (§3.3.13)
positive-integer (§3.3.14)
non-positive-integer (§3.3.15)
negative-integer (§3.3.16)
date (§3.3.17)
time (§3.3.18)

The constraining facet precision (§2.4.2.11) applies to:

decimal (§3.3.11)

The constraining facet scale (§2.4.2.12) applies to:

decimal (§3.3.11)

The constraining facet encoding (§2.4.2.13) applies to:

binary (§3.2.7)

The constraining facet period (§2.4.2.14) applies to:

recurringInstant (§3.2.6)

D ISO 8601 Date and Time Formats

Ed. Note: This section (both its abstract content and its concrete wording) has not yet garnered consensus among WG members.

D.1 ISO 8601 Conventions

Three primitive datatypes described above, timeInstant (§3.2.4), timeDuration (§3.2.5), and recurringInstant (§3.2.6), and two generated dataypes, date (§3.3.17) and time (§3.3.18) use lexical formats inspired by [ISO 8601]. This appendix provides more detail on the ISO formats and discusses some deviations from them for the datatypes we have defined.

[ISO 8601] "specifies the representation of dates in the Gregorian calendar and times and representations of periods of time". It should be pointed out that the datatypes described in this specification do not cover all the types of data covered by [ISO 8601], nor do they support all the lexical representations for those types of data. Specifically, we permit only a single lexical representation for each datatype.

[ISO 8601] lexical formats are described using "pictures" in which characters are used in place of digits. These characters have the following meanings:

C -- represents a dgit used in the thousands and hundreds components, the "century" component, of the time element "year".
Y -- represents a digit used in the tens and units components of the time element "year".
M -- represents a digit used in the time element "month".
D -- represents a digit used in the time element "day".
h -- represents a digit used in the time element "hour".
m -- represents a digit used in the time element "minute".
s -- represents a digit used in the time element "second". In the formats described in this specification the whole number of seconds may be followed by decimal seconds to an arbitrary level of precision. This is represented in the picture by "ss.sss"

For all the information items indicated by the above characters, leading zeros are required where indicated.

In addition to the above, certain characters are used as designators and appear as themselves in lexical formats.

T -- is used as time designator to indicate the start of the representation of the time of day in timeInstant (§3.2.4) and recurringInstant (§3.2.6). It is also used to to indicate the start of the representation of the time-units for hour, minutes, seconds and fractional seconds in timeDuration (§3.2.5).
Z -- is used as time-zone designator, immediately (without a space) following a data element expressing the time of day in Coordinated Universal Time (UTC) in timeInstant (§3.2.4), recurringInstant (§3.2.6), and time (§3.3.18)

D.2 Truncated Formats

[ISO 8601] supports a variety of "truncated" formats in which some of the characters on the left of specific formats, such as, for example, the century, can be omitted. Truncated formats are, in general, not permitted for the datatypes defined in this specification with two exceptions. The recurringInstant (§3.2.6) datatype uses a truncated format for timeInstant (§3.2.4) to indicate recurring instants of time. In fact, only recurring instants that can be represented truncated representations of timeInstant (§3.2.4) are permitted.

Truncated representations are also allowed for the date (§3.3.17) datatype and can be used to represent recurring dates i.e. the same date every century, every year or every month.

D.3 Deviations from ISO 8601 Formats

D.3.1 Decimal seconds

As mentioned above, we have extended the [ISO 8601] formats to allow decimal seconds to as many places of decimals as required.

D.3.2 Sign Allowed

An optional sign, + or -, is allowed immediately preceding, without a space, the lexical representations for timeInstant (§3.2.4) and timeDuration (§3.2.5).

D.3.3 Lexical Representation for Time Period

The lexical representation for timeDuration (§3.2.5) follows the the lexical representations for timeInstant (§3.2.4) rather than the [ISO 8601] representation for the "periods of time". In this representation a month represents a period of 30 days.

E Regular Expressions

Ed. Note: This section (both its abstract content and its concrete wording) has not yet garnered consensus among WG members.

Ed. Note: The following description of regular expressions is copied (with slight modification) by permission from the documentation of the [Perl] programming language. This entire section should probably be replaced by something derived from the Unicode Regex TechReport [Unicode Regular Expression Guidelines] and the ECMAScript Regex proposal [ECMAScript Regex].

Issue (perl-regex): Should the final recommendation use Perl's regular expression "extensions"?

[Definition:] Regular expressions, similar to those in [Perl], can be used to constrain the format of strings. A regular expression is an alphanumeric string consisting of character symbols. Each symbol, which is usually one character but may be two characters, is a placeholder that stands for a set of characters.

Any single character matches itself, unless it is a metacharacter with a special meaning described here or above. You can cause characters that normally function as metacharacters to be interpreted literally by prefixing them with a "\" (e.g., "\." matches a ".", not any character; "\\" matches a "\"). A series of characters matches that series of characters in the target string, so the pattern blurfl would match "blurfl" in the target string.

You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list. If the first character after the "[" is "^", the class matches any character not in the list. Within a list, the "-" character is used to specify a range, so that a-z represents all characters between "a" and "z", inclusive. If you want "-" itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. (The following all specify the same class of three characters: [-az], [az-], and [a\-z]. All are different from [a-z], which specifies a class containing twenty-six characters.)

Certain characters as used as metacharacters. The following list contains all of the metacharacters and their meanings.

\: Quote the next metacharacter
^: Match the beginning of the line
.: Match any character (except newline)
$: Match the end of the line (or before newline at the end)
|: Alternation
(): Grouping
[]: Character class

Within a regular expression, the following standard quantifiers are recognized:

*: Match 0 or more times
+: Match 1 or more times
?: Match 1 or 0 times
{n}: Match exactly n times
{n,}: Match at least n times
{n,m}: Match at least n but not more than m times

The following character sequences also have special meaning within a regular expression.

\t: tab
\n: newline
\r: return
\033: octal char 003
\x1B: hex char 1B
\w: Match a "word" character (alphanumeric plus "_")
\W: Match a non-word character
\s: Match a whitespace character
\S: Match a non-whitespace character
\d: Match a digit character
\D: Match a non-digit character

Ed. Note: we should probably define XML-specific character sequences for things like Nmtoken, Name, etc., as well as ones for the character classes listed in XML 1.0 Appendix B. Character Classes

Regular expressions may also contain the following zero-width assertions:

\b: Match a word boundary
\B: Match a non-(word boundary)

A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W.

Example

   
  555-1212     is matched by \d{3}-\d{4}           (phone number)   
  888-555-1212 is matched by (\d{3}-)?\d{3}-\d{4}  (phone number with optional area code)   
  $123,45.90   is matched by \$\d{3},\d{2}\.\d{2}   
  123-45-5678  is matched by \d{3}-?\d{2}-?\d{4}   (Social Security Number)

F References

Ed. Note: This section (both its abstract content and its concrete wording) has not yet garnered consensus among WG members.

F.1 Normative

ECMAScript Regex: ECMAScript v2 Draft. Regular Expressions. See http://www2.hursley.ibm.com/tc39/regexp30.pdf
ISO 10646: ISO (International Organization for Standardization). ISO/IEC 10646-1993 (E). Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 1993 (plus amendments AM 1 through AM 7).
Namespaces in XML: Namespaces in XML, Tim Bray et al. W3C, 1998 Available at: http://www.w3.org/TR/REC-xml-names/
Perl: The Perl Programming Language. See http://www.perl.org
RFC 1766: H. Alvestrand, ed. RFC 1766: Tags for the Identification of Languages 1995. Available at: http://www.ietf.org/rfc/rfc1766.txt
RFC 2396: Tim Berners-Lee, et. al. RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax.. 1998 Available at: http://www.ietf.org/rfc/rfc2396.txt
SQL: SQL Standard. See http://www.jcc.com/SQLPages/jccs_sql.htm
Unicode: The Unicode Consortium. The Unicode Standard, Version 2.0. Reading, Mass.: Addison-Wesley Developers Press, 1996.
Unicode Regular Expression Guidelines: Mark Davis. Unicode Regular Expression Guidelines, 1988. Available at: http://www.unicode.org/unicode/reports/tr18/
XML: XML Standard. See http://www.w3.org/TR/REC-xml
XML Schema Part 1: Structures: XML Schema Part 1: Structures. Available at: http://www.w3.org/TR/1999/WD-xmlschema-1-19991105/structures.html
XML Schema Requirements: XML Schema Requirements. Available at: http://www.w3.org/TR/NOTE-xml-schema-req

F.2 Non-normative

ISO 11404: Language-independent Datatypes. Available from http://www.iso.ch/cate/d19346.html
ISO 8601: Representations of dates and times. Available from http://www.iso.ch/markete/8601.pdf A draft revision is also available from http://www.cl.cam.ac.uk/~mgk25/8601v04.pdf
RDF Schema: RDF Schema Specification. See http://www.w3.org/TR/PR-rdf-schema
XSL: XSL Working Draft. See http://www.w3.org/TR/WD-xsl/

G Acknowledgments (non-normative)

The following have contributed material to this draft:

Andrew Layman, Microsoft
David Fallside, IBM
Scott Lawrence, Agranat Systems

The editors acknowledge the members of the XML Schema Working Group, the members of other W3C Working Groups, and industry experts in other forums who have contributed directly or indirectly to the process or content of creating this document. The Working Group is particularly grateful to Lotus Development Corp. and IBM for providing teleconferencing facilities.

The current members of the XML Schema Working Group are:

Paula Angerstein, Vignette Corporation; David Beech, Oracle Corp.; Paul V. Biron, Health Level Seven; Allen Brown, Microsoft; Greg Bumgardner, Rogue Wave Software; Lee Buck, Extensibility; Dean Burson, Lotus Development Corporation; Peter Chen, Bootstrap Alliance and LSU; David Cleary, Progress Software; Dan Connolly, W3C (staff contact); Andrew Eisenberg, Progress Software; Rob Ellman, Calico Technology; David Ezell, Hewlett Packard Company; David Fallside, IBM; Matthew Fuchs, Commerce One; Paul Grosso, ArborText, Inc.; Dave Hollander, CommerceNet (co-chair); Mary Holstege, Calico Technology; Jane Hunter, Distributed Systems Technology Centre (DSTC Pty Ltd); Renato Iannella, Distributed Systems Technology Centre (DSTC Pty Ltd); Rick Jelliffe, Academia Sinica; Dianne Kennedy, Graphic Communications Association; Setrag Khoshafian, Technology Deployment International (TDI); Janet Koenig, Sun Microsystems; Ara Kullukian, Technology Deployment International (TDI); Andrew Layman, Microsoft; Dmitry Lenkov, Hewlett Packard Company; Eve Maler, ArborText, Inc.; Ashok Malhotra, IBM; Murray Maloney, Commerce One; John McCarthy, Lawrence Berkeley National Laboratory; Noah Mendelsohn, Lotus Development Corporation; Don Mullen, Extensibility; Murata Makoto, Xerox; Frank Olken, Lawrence Berkeley National Laboratory; Dave Peterson, Graphic Communications Association; Mark Reinhold, Sun Microsystems; Shriram Revankar, Xerox; Jonathan Robie, Software AG; Lew Shannon, NCR; C. M. Sperberg-McQueen, W3C (co-chair); Henry S. Thompson, University of Edinburgh; Matt Timmermans, Microstar; Jim Trezzo, Oracle Corp.; Steph Tryphonas, Microstar; Mark Tucker, Health Level Seven; Priscilla Walmsley, XMLSolutions; Aki Yoshida, SAP AG

The XML Schema Working Group has benefited in its work from the participation and contributions of a number of people not currently members of the Working Group, including in particular those named below. Affiliations given are those current at the time of their work with the WG.

Gabe Beged-Dov, Rogue Wave Software; George Feinberg, Object Design; Charles Frankston, Microsoft; Ernesto Guerrieri, Inso; Michael Hyman, Microsoft; Chris Olds, Wall Data; William Shea, Merrill Lynch; Ralph Swick, W3C; Tony Stewart, Rivcom

I Revisions from Previous Draft

19991020: AM: Rewrote "NOTATION".
19991020: AM: Made NMTOKEN a subtype of string.
19991020: AM: Changed lex reps for all date and time datatypes to ISO extended format i.e. with separators.
19991020: AM: Removed issue on non-Gregorian dates.
19991020: AM: Renamed "lexical representation" facet for string to "pattern".
19991026: AM: Added appendix discussing ISO 8601 formats. Removed note asking for such explanation.
1999-10-26: PVB: fixed errors in datatypes.xsd and datatypes.dtd as pointed out by Curt Arnold
1999-10-26: PVB: added period to the facets production
1999-10-26: PVB: added a note on the basetype to the definition of datatype NMTOKEN
1999-10-26: PVB: removed NaN, INF and -INF from the lexical space of integer and decimal

H Open Issues

application-specific-binary-formats
binary-mime-type
binary-value-space
uri-scheme-facet
better-reference-mechanisms
definition-overriding
non-positive-integer-literal

XML Schema Part 2: Datatypes

W3C Working Draft 05 November 1999

Abstract

Status of this document

Table of contents

Appendices

1 Introduction

1.1 Purpose

1.2 Requirements

1.3 Scope

1.4 Terminology

2 Type System

2.1 Datatype

2.2 Value space

2.3 Lexical Space

2.4 Facets

2.4.1 Fundamental facets

2.4.1.1 Order

2.4.1.2 Bounds

2.4.1.3 Cardinality

2.4.1.4 Exact and Approximate

2.4.1.5 Numeric

2.4.2 Constraining or Non-fundamental facets

2.4.2.1 length

2.4.2.2 maximum length

2.4.2.3 pattern

2.4.2.4 enumeration

2.4.2.5 minAbsoluteValue

2.4.2.6 maxAbsoluteValue

2.4.2.7 maxInclusive

2.4.2.8 maxExclusive

2.4.2.9 minInclusive

2.4.2.10 minExclusive

2.4.2.11 precision

2.4.2.12 scale

2.4.2.13 encoding

2.4.2.14 period

2.5 Datatype dichotomies

2.5.1 Atomic vs. aggregate datatypes

2.5.2 Primitive vs. generated datatypes

2.5.2.1 Base type

2.5.3 Built-in vs. user-generated datatypes

3 Built-in datatypes

3.1 Namespace considerations

3.2 Primitive datatypes

3.2.1 string

3.2.1.1 Pattern

3.2.1.2 Length

3.2.1.3 Maximum Length

3.2.1.4 Maximum and Minimum Values

3.2.2 boolean

3.2.2.1 Lexical Representation

3.2.3 real

3.2.3.1 Lexical representation

3.2.4 timeInstant

3.2.4.1 Lexical Representation

3.2.5 timeDuration

3.2.5.1 Lexical Representation

3.2.6 recurringInstant

3.2.6.1 Lexical Representation

3.2.7 binary

3.2.8 uri

3.2.9 language

3.3 Generated datatypes

3.3.1 NMTOKEN

3.3.2 NMTOKENS

3.3.3 Name

3.3.4 NCName

3.3.5 ID

3.3.6 IDREF

3.3.7 IDREFS

3.3.8 ENTITY

3.3.9 ENTITIES

3.3.10 NOTATION

3.3.11 decimal

3.3.11.1 Lexical representation

3.3.12 integer

3.3.12.1 Lexical representation

3.3.13 non-negative-integer

3.3.13.1 Lexical representation