This is a
The text of this draft is essentially that which appeared in the Last Call version of this specification published 17 February 2006. The WG has not approved any changes since that publication; new versions of the statusquo documents have been generated primarily for technical reasons internal to the editorial documentproduction system.
For those primarily interested in the changes since version 1.0,
the
The major changes since version 1.0 include:
Support for XML 1.1 has been added. It is now implementation
defined whether datatypes dependent on definitions in
A new primitive decimal type has been defined, which retains information about the precision of the value. This type is aligned with the floatingpoint decimal types which will be part of the next edition of IEEE 754.
In order to align this specification with those being prepared by
the XSL and XML Query Working Groups, a new datatype named
The conceptual model of the date and timerelated types has been defined more formally.
A more formal treatment of the fundamental facets of the primitive datatypes has been adopted.
More formal definitions of the lexical space of most types have
been provided, with detailed descriptions of the mappings from lexical
representation to value and from value to
Changes since the previous Working Draft include the following:
Explicit definitions are provided for the
The validation rule
The rules governing partial implementations of infinite datatypes have been clarified.
Various changes have been made in order to align the relevant
parts of this specification more closely with the corresponding
sections of
In order to correct an error in
version 1 of this specification and of
An error in the
prose descriptions of the lexical spaces of
The schema for schema documents found in
Comments on this document should be made in
W3C's public installation of Bugzilla, specifying "XML Schema" as the
product. Instructions can be found at
The end of the Last Call review period is 31 March 2006; comments received after that date will be considered if time allows, but no guarantees can be offered.
Although feedback based on any
aspect of this specification is welcome, there are certain aspects of
the design presented herein for which the Working Group is
particularly interested in feedback. These are designated
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the
This document was produced under the 5 February 2004 W3C Patent
Policy. The Working Group maintains a
The English version of this specification is the only normative
version. Information about translations of this document is available
at
Whether an SQLimplementation supports leap seconds, and the consequences of such support for date and interval arithmetic, is implementationdefined.Short and sweet.
XMLSchemadatatypes
.
Wording accepted without change 20060113.The Working Group has two main goals for this version of W3C XML Schema:
Significant improvements in simplicity of design and clarity
of exposition
Provision of support for versioning of XML languages defined using the XML Schema specification, including the XML transfer syntax for schemas itself.
These goals are slightly in tension with one another  the following summarizes the Working Group's strategic guidelines for changes between versions 1.0 and 1.1:
Add support for versioning (acknowledging that this
Allow bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
Allow editorial changes
Allow design cleanup to change behavior in edge cases
Allow relatively nondisruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
Allow design cleanup to change component structure (changes to functionality restricted to edge cases)
Do not allow any significant changes in functionality
Do not allow any changes to XML transfer syntax except those required by version control hooks and bug fixes
The overall aim as regards compatibility is that
All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behavior across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);
The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behavior across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);
The
The table below offers two typical examples of XML instances in which datatypes are implicit: the instance on the left represents a billing invoice, the instance on the right a memo or perhaps an email message in XML.
Data oriented  Document oriented 



The invoice contains several dates and telephone numbers, the postal abbreviation for a state (which comes from an enumerated list of sanctioned values), and a ZIP code (which takes a definable regular form). The memo contains many of the same types of information: a date, telephone number, email address and an "importance" value (from an enumerated list, such as "low", "medium" or "high"). Applications which process invoices and memos need to raise exceptions if something that was supposed to be a date or telephone number does not conform to the rules for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the instances that are not expressible in XML DTDs. The limited datatyping facilities in XML have prevented validating XML processors from supplying the rigorous type checking required in these situations. The result has been that individual applications writers have had to implement type checking in an ad hoc manner. This specification addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XMLrelated standards as well.
Other specifications on which this one depends
are listed in
This specification defines some datatypes which depend on
definitions in
Conforming implementations of this specification may provide either
the 1.1based datatypes or the 1.0based datatypes, or both. If both
are supported, the choice of which datatypes to use in a particular
assessment episode
When this specification is used to check the datatype validity of XML
input, implementations
The
provide for primitive data typing, including byte, date, integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation vs. those governing an underlying information set;
allow creation of userdefined datatypes, such as datatypes that are derived from existing datatypes and which may constrain certain of its properties (e.g., range, precision, length, format).
This portion of the XML Schema Language discusses datatypes that can
be used in an XML Schema. These datatypes can be specified for
element content that would be specified as
The terminology used to describe XML Schema Datatypes is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a datatype processor:
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with
Conforming documents and processors are permitted to but need not behave as described.
(Of strings or names:) Two strings or names being compared must be
identical. Characters with multiple possible representations in
ISO/IEC 10646 (e.g. characters with both precomposed and
base+diacritic forms) match only if they have the same representation
in both strings. No case folding is performed. (Of strings and rules
in the grammar:) A string matches a grammatical production if
Conforming documents and processors are required to behave as
described; otherwise they are in
A violation of the rules of this specification; results are undefined.
Conforming software
This specification provides three different kinds of normative statements about schema components, their representations in XML and their contribution to the schemavalidation of information items:
Constraints on the schema components themselves, i.e. conditions
components
Constraints on the representation of schema components in XML.
Some but not all of these are expressed in
Constraints expressed by schema components which information items
This section describes the conceptual framework behind the
The datatypes discussed in this specification are
Only those operations and relations needed for schema processing
are defined in this specification. Applications using these datatypes
are generally expected to implement appropriate additional functions
and/or relations to make the datatype generally useful. For
example, the description herein of the
A
A
A small collection of
This specification only defines the operations and relations needed
for schema processing. The choice of terminology for
describing/naming the datatypes is selected to guide users and
implementers in how to expand the datatype to be generally
useful—i.e., how to recognize the real world
datatypes and their variants for which the datatypes defined herein
are meant to be used for data interchange.
Along with the
Where
The
The value spaces of datatypes are abstractions,
and are defined in
In addition, other applications are expected to define additional
appropriate operations and/or relations on these value spaces (e.g.,
addition and multiplication on the various numerical datatypes'
value spaces), and are permitted where appropriate to even redefine
the operations and relations defined within this specification,
provided that
The
defined
enumerated outright
defined by restricting the
defined as a combination of values from one or more already
defined
The relations of
The identity relation is always defined. Every value space
inherently has an identity relation. Two things are
This does not preclude implementing datatypes by using more than
one
In the identity relation defined herein, values from different
Each
On the other hand, equality need not cover the entire value space
of the datatype (though it usually does).
The equality relation is used in conjunction with order when making
In the prior version of this specification (1.0), equality was
always identity. This has been changed to permit the datatypes
defined herein to more closely match the
For example, the
For another example, the
In the equality relation defined herein, values from different
primitive data spaces are made artificially unequal even if they might
otherwise be considered equal. For example, there is a number
For the purposes of this specification, there is one equality
relation for all values of all datatypes (the union of the various
datatype's individual equalities, if one consider relations to be
sets of ordered pairs). The
Each datatype has an order relation prescribed. This order may be
a
In this specification, this lessthan order relation is denoted by
The weak order
The value spaces of primitive datatypes are abstractions, which may
have values in common. In the order relation defined herein,
these value spaces are made artificially
While it is not an error to attempt to compare values from the
value spaces of two different primitive datatypes, they will always be
In addition to its
For example, "100" and "1.0E2" are two different literals from the
The literals in the
The number of literals for each value has been kept small; for many datatypes there is a onetoone mapping between literals and values. This makes it easy to exchange the values between different systems. In many cases, conversion from localedependent representations will be required on both the originator and the recipient side, both for computer processing and for interaction with humans.
Textual, rather than binary, literals are used. This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common programming languages and libraries.
While the datatypes defined in this specification have, for the most
part, a single lexical representation i.e. each value in the
datatype's
Should a derivation be made using a derivation mechanism that
removes
This could happen by means of a
Conversely, should a derivation remove values then their
There are currently no facets with such an impact. There may be in the future.
For example, '100' and '1.0E2' are two
different
While the datatypes defined in this specification generally have a
single
The facets of a datatype serve to distinguish those aspects of
one datatype which
Facets are of two types:
All
Constraining the
All
It is useful to categorize the datatypes defined in this
specification along various dimensions,
The first distinction to be made is that
between
First, we distinguish
For example, a single token which
Several type systems (such as the one described in
A
In the above example, the value of the
When a datatype is derived
For each of
For
The
The
A prototypical example of a
Any number (greater than
The order in which the
For example, given the definition below, the first instance of the <size> element
validates correctly as an
The
The
A datatype which is
Next, we distinguish between
Next, we distinguish
For example, in this specification,
The datatypes defined
by this specification fall into
In the example above,
A datatype which is
As described in more detail in
A
One datatype can be
Definition, derivation, restriction, and construction are conceptually distinct, although in practice they are frequently performed by the same mechanisms.
By
The properties of the
For all other datatypes, a
By
More generally,
B is the
There is some datatype X
such that X is the
It is a consequence of these definitions that
every datatype other than
Since each datatype has exactly one
By
Formally,
the
the
Note that all three forms of datatype
By
,
, and
, respectively.
Datatypes so constructed may be understood fully (for
purposes of a type system) in terms of (a) the properties
of the datatype(s) from which they are constructed, and
(b) their
Conceptually there is no difference between the
A datatype which is
Each builtin datatype in this specification
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype
For example, to address the
http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely
addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a builtin
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the
For example, to address the usage of the maxInclusive facet in
the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
The
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the XML Schema definition language,
such as those that do not want to know anything about aspects of the
XML Schema definition language other than the datatypes, each
http://www.w3.org/2001/XMLSchemadatatypes
This applies to both
The use of the XMLSchemadatatypes
namespace and the definitions therein are deprecated as of
XML Schema 1.1.
Each
The two datatypes at the root of the hierarchy of simple
types are
For further details of
The
The
It is implementationdefined whether an
implementation of this specification supports the
The
When a new datatype is defined
by
For further details of
The
The
It is implementationdefined whether an
implementation of this specification supports the
The
When a new datatype is defined
by
The
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formatting or
ruby annotation (see
The
It is implementationdefined whether an
implementation of this specification supports the
Equality for
As noted in
The
It is implementationdefined whether an
implementation of this specification supports the
The
An instance of a datatype that is defined as
The
The
All
1.23, 12678967.543233, +100000.00,
210
The lexical space of decimal is the set of
lexical representations which match the grammar given above, or
(equivalently) the regular expression

?(([09]+(.[09]*)?)(.[09]+))
The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in:
The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in
The mapping from values to
The
The mapping from values to
The mapping from values to
Precision is sometimes given in absolute, sometimes in relative
terms. 5
has an arithmetic precision of 0, and
5.01
an arithmetic precision of 2.
See the conformance note in
The
As explained below, the lexical
representation of the notanumber
, positive infinity
,
and negative infinity
.
The latter two together are called
the infinities
.
Equality and order for
Two numerical
INF is equal only to itself, and is greater than
−INF and all numerical
−INF is equal only to itself, and is less than
INF and all numerical
NaN is incomparable with all values,
The
"Equality" in this Recommendation is defined to be "identity" (i.e.,
values that are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A
The
As explained below, the
Equality and order for
Equality is identity, except that 0 = −0 (although they are not identical) and NaN ≠ NaN (although NaN is of course identical to itself).
0 and −0 are thus distinct for purposes of enumerations and identity constraints, but equal for purposes of minimum and maximum values.
For the basic values, the order relation
on float is the order relation for rational numbers. INF is greater
than all other nonNaN values; −INF is less than all other nonNaN
values. NaN is
Any value
The Schema 1.0 version of this datatype did not differentiate between
0 and −0 and NaN was equal to itself. The changes were
made to make the datatype more closely mirror
The INF
, INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, 1E4, 1267.43233E12, 12.78e2, 12
, 0, 0
and INF
are all legal
The
The (+)?(([09]+(.[09]*)?)(.[09]+))((eE)(+)?[09]+)??INFNaN
The
Since IEEE allows some variation in rounding of values, processors
conforming to this specification may exhibit some variation in their
The
The Schema 1.0 version of this datatype did not permit rounding
algorithms whose results differed from
The
The only significant differences between float and double are the three defining constants 53 (vs 24), −1074 (vs −149), and 971 (vs 104).
"Equality" in this Recommendation is defined to be "identity" (i.e.,
values that are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A
The
As explained below, the
Equality and order for
Equality is identity, except that 0 = −0 (although they are not identical) and NaN ≠ NaN (although NaN is of course identical to itself).
0 and −0 are thus distinct for purposes of enumerations and identity constraints, but equal for purposes of minimum and maximum values.
For the basic values, the order relation
on double is the order relation for rational numbers. INF is greater
than all other nonNaN values; −INF is less than all other nonNaN
values. NaN is
Any value
The Schema 1.0 version of this datatype did not differentiate between
0 and −0 and NaN was equal to itself. The changes were
made to make the datatype more closely mirror
The INF
, INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, 1E4, 1267.43233E12, 12.78e2, 12
, 0, 0
and INF
are all legal
The
The (+)?(([09]+(.[09]*)?)(.[09]+))((eE)(+)?[09]+)??INFNaN
The
Since IEEE allows some variation in rounding of values, processors
conforming to this specification may exhibit some variation in their
The
The Schema 1.0 version of this datatype did not permit rounding
algorithms whose results differed from
The
All YYYY
) and a minimum fractional second precision of
milliseconds or three decimal digits (i.e. s.sss
).
However,
) is
a
16960901T00:00:00Z
16970201T00:00:00Z
19030301T00:00:00Z
19030701T00:00:00Z
These four values are chosen so as to maximize
the possible differences in results that could occur,
such as the difference when adding P1M and P30D:
16970201T00:00:00Z + P1M < 16970201T00:00:00Z + P30D ,
but
19030301T00:00:00Z + P1M > 19030301T00:00:00Z + P30D ,
so that P1M <> P30D .
If two
Two totally ordered datatypes (
There are many ways to implement
See the conformance notes in
The lexical representation for
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary
integer.
Similarly, the value of the Seconds component
allows an arbitrary decimal.
Thus, the lexical representation of
An optional preceding minus sign ('') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M
.
One could also indicate a duration of minus 120 days as:
P120D
.
Reduced precision and truncated representations of this format are allowed provided they conform to the following:
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator
The seconds part
The designator 'T'
For example, P1347Y, P1347M and P1Y2MT2H are all allowed; P0Y1347M and P0Y1347M0D are allowed. P1347M is not allowed although P1347M is allowed. P1Y2MT is not allowed.
The PnYnMnDTnHnMnS
More precisely, the
Thus, a
The language accepted by the
The expression
The expression
The expression
?P(((([09]+Y([09]+M)?)
( ( [09]+M) ) )(([09]+D(T(([09]+H([09]+M)?([09]+(\.[09]+)?S)?)
( ( [09]+M) ([09]+(\.[09]+)?S)?)
( ( [09]+(\.[09]+)?S) ) ))?)
( ( T(([09]+H([09]+M)?([09]+(\.[09]+)?S)?)
( ( [09]+M) ([09]+(\.[09]+)?S)?)
( ( [09]+(\.[09]+)?S) ) )) ) )?)
( ( ([09]+D(T(([09]+H([09]+M)?([09]+(\.[09]+)?S)?)
( ( [09]+M) ([09]+(\.[09]+)?S)?)
( ( [09]+(\.[09]+)?S) ) ))?)
( ( T(([09]+H([09]+M)?([09]+(\.[09]+)?S)?)
( ( [09]+M) ([09]+(\.[09]+)?S)?)
( ( [09]+(\.[09]+)?S) ) )) ) ) ) )
The
In general, the
16960901T00:00:00Z
16970201T00:00:00Z
19030301T00:00:00Z
19030701T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate.
Relation  

P1Y  > P364D  <> P365D  <> P366D  < P367D  
P1M  > P27D  <> P28D  <> P29D  <> P30D  <> P31D  < P32D  
P5M  > P149D  <> P150D  <> P151D  <> P152D  <> P153D  < P154D 
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to compare durations of a small number of months against days.
Months  1  2  3  4  5  6  7  8  9  10  11  12  13  ...  

Days  Minimum  28  59  89  120  150  181  212  242  273  303  334  365  393  ... 
Maximum  31  62  92  123  153  184  215  245  276  306  337  366  397  ... 
In comparing
Certain derived datatypes of durations can be guaranteed have a total order. For this, they must have fields from only one row in the list below and the time zone must either be required or prohibited.
year, month
day, hour, minute, second
For example, a datatype could be defined to correspond to the
The
All timezoned times are Coordinated Universal Time
(
The date and time datatypes described in this recommendation were inspired
by
Those using this (1.0) version of this Recommendation to
represent negative years should be aware that the interpretation of lexical
representations beginning with a ''
is likely to change in
subsequent versions.
See the conformance note in
In version 1.0 of this specification, the
Note that 1 BCE, 5 BCE, and so on (years 0000, 0004, etc. in the lexical representation defined here) are leap years in the proleptic Gregorian calendar used for the date/time datatypes defined here. Version 1.0 of this specification was unclear about the treatment of leap years before the common era; caution should be used if existing schemas or data specify dates of 29 February for any years before the common era. With that possible exception, schemas and data valid under the old interpretation remain valid under the new.
The
See the conformance note in
Equality and order are as prescribed
in
Since the order of a
Although
Order and equality are essentially the same for
The ''? yyyy '' mm '' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?
,
where
''?
the remaining ''s are separators between parts of the date portion;
the first
'T' is a separator indicating that timeofday follows;
':' is a separator between parts of the timeofday portion;
the second
'.'
For example, 20021010T12:00:0005:00 (noon on 10 October 2002, Central Daylight Savings Time as well as Eastern Standard Time in the U.S.) is 20021010T17:00:00Z, five hours later than 20021010T12:00:00Z.
For further guidance on arithmetic with
Except for trailing fractional zero digits in the seconds representation,
'24:00:00' time representations,
and timezone (for timezoned values), the mapping
from literals to values is onetoone. Where there is more than
one possible representation, the
The 2digit numeral representing
the hour must not be '24
';
The fractional second string, if present,
must not end in '0
';
for timezoned values, the timezone must be represented with
'Z
' (All timezoned
The lexical representations for
Within a Constraint: Dayofmonth Values
) given above.
Subsequent
Alternatively,
For example, 20021010T12:00:00−05:00 (noon on 10 October 2002, Central Daylight Savings Time as well as Eastern Standard Time in the U.S.) is equal to 20021010T17:00:00Z, five hours later than 20021010T12:00:00Z.
The
\?([19][09][09][09]+)(0[09][09][09])\(0[19])(1[02])\(0[19])([12][09])(3[01])
T(([01][09])(2[03]):[05][09]:
([+\](0[09])(1[04]):[05][09])?
The
Timezones are durations with (integervalued) hour and minute properties (with the hour magnitude limited to at most 14, and the minute magnitude limited to at most 59, except that if the hour magnitude is 14, the minute value must be 0); they may be both positive or both negative.
The lexical representation of a timezone is a string of the form:
(('+'  '') hh ':' mm)  'Z'
,
where
'+' indicates a nonnegative duration,
'' indicates a nonpositive duration.
The mapping so defined is onetoone, except that '+00:00',
'00:00', and 'Z' all represent the same zerolength duration
timezone,
When a timezone is added to a
In general, the
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "14:00") means adding the timezone 14:00 to Q, where Q did not
already have a timezone.
The ordering between two
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 20000304T23:00:00+03:00 normalizes to 20000304T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time zone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both not specified, continue to the next i
If P[i] is not specified and Q[i] is, or vice versa, stop and return P <> Q
If P[i] < Q[i], stop and return P < Q
If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare as follows:
P < Q if P < (Q with time zone +14:00)
P > Q if P > (Q with time zone 14:00)
P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone 14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare as follows:
P < Q if (P with time zone 14:00) < Q.
P > Q if (P with time zone +14:00) > Q.
P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone 14:00)
Examples:
Determinate  Indeterminate 

20000115T00:00:00 < 20000215T00:00:00  20000101T12:00:00 <> 19991231T23:00:00Z 
20000115T12:00:00 < 20000116T12:00:00Z  20000116T12:00:00 <> 20000116T12:00:00Z 
20000116T00:00:00 <> 20000116T12:00:00Z 
Certain derived types from
Since the lexical representation allows an optional time zone
indicator,
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
A calendar ( or
Since the order of a
Examples that show the difference from version 1.0 of this specification (see
A day is a calendar (or
08:00:00+10:00 < 17:00:00+10:00 (just as 08:00:00Z has always been less than 17:00:00Z, but in version 1.0 08:00:00+10:00 > 17:00:00+10:00 )
A
00:00:00+01:00 is less than
A calendar day with a very early timezone may be completely disjoint from a calendar day with a very late timezone:
Each value with
22:00:00Z > 03:00:00+05:00 (since 19711231T03:00:00+05 is 19791230T22:00:00Z, not 19791231T22:00:00Z); in the previous version of this specification 22:00:00Z = 03:00:00+05:00 )
The lexical representation for
The
The lexical representations for
(((([01][09])(2[03])):([05][09]):(([05][09])(\.[09]+)?))
(24:00:00(\.0+)?))
(Z((+)(0[09]1[04]):[05][09]))?
The
A "date object" is an object with year,
month, and day properties just like those
of
Timezoned
For example: the first moment of 20021010+13:00 is 20021010T00:00:00+13,
which is 20021009T11:00:00Z, which is also the first moment of 2002100911:00.
Therefore 20021010+13:00 is 2002100911:00;
For most timezones, either the first moment or last moment of the day (a
See the conformance note in
The
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
Examples that show the difference from version 1.0 (see
A day is a calendar (or
20001212+13:00 < 20001212+11:00
(just as 20001212+12:00 has always been less than
20001212+11:00, but in version 1.0
20001212+13:00 > 20001212+11:00 ,
since 20001212+13:00's
Similarly:
20001212+13:00 = 20001213−11:00 (whereas under 1.0, as just stated, 20001212+13:00 = 20001212−11:00)
For the following discussion, let the
"date portion" of a
The ''? yyyy '' mm '' dd zzzzzz?
where the '' yyyy '' mm '' dd 'T00:00:00' zzzzzz?
and the least upper bound of the interval is the timeline point represented
(noncanonically) by:
'' yyyy '' mm '' dd 'T24:00:00' zzzzzz?
.
The
Given a member of the
The lexical representations for
Within a
Constraint: Dayofmonth Values
) given above.
\?([19][09][09][09]+)(0[09][09][09])\(0[19])(1[02])\([02][09])(3[01])((+\)(0[09]1[04]):[05][09])?
The
Since the lexical representation allows an optional
time zone indicator,
Because month/year combinations in one calendar only rarely correspond to month/year combinations in other calendars, values of this type are not, in general, convertible to simple values corresponding to month/year combinations in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or
200012+13:00 < 200012+11:00
(just as 200012+12:00 has always been less than 2000−12+11:00,
but in version 1.0 200012+13:00 >
200012+11:00 , since 2000−12+13:00's
The lexical representation for
For example, to indicate the month of May 1999, one would write: 199905.
See also
The lexical representations for
\?([19][09][09][09]+)(0[09][09][09])\(0[19])(1[02])((+\)(0[09]1[04]):[05][09])?
The
The
Since the lexical representation allows an optional time zone
indicator,
Because years in one calendar only rarely correspond to years in other calendars, values of this type are not, in general, convertible to simple values corresponding to years in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or
2000+13:00 < 2000+11:00
(just as 2000+12:00 has always been less than 2000+11:00,
but in version 1.0 2000+13:00 >
2000+11:00 , since 2000+13:00's
The lexical representation for
For example, to indicate 1999, one would write: 1999.
See also
The lexical representations for
\?([19][09][09][09]+)(0[09][09][09])((+\)(0[09]1[04]):[05][09])?
The
The
This datatype can be used, for example, to record birthdays; an instance of the datatype could be used to say that someone's birthday occurs on the 14th of September every year.
Since the lexical representation allows an optional time zone
indicator,
Because day/month combinations in one calendar only rarely correspond to day/month combinations in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or
1212+13:00 < 1212+11:00
(just as 1212+12:00 has always been less than
1212+11:00, but in version 1.0
1212+13:00 > 1212+11:00 , since
1212+13:00's
The lexical representation for
The lexical representations for
Within a Constraint: Dayofmonth Values
) given above.
\\(0[19])(1[02])\([02][09])(3[01])((+\)(0[09]1[04]):[05][09])?
This datatype can be used to represent a specific day in a month. To say, for example, that my birthday occurs on the 14th of September ever year.
The
The
Since the lexical representation allows an optional time zone
indicator,
Because days in one calendar only rarely
correspond to days in other calendars,
Equality and order are as prescribed in
Examples that may appear anomalous (see
15 < 16 , but 15−13:00 > 16+13:00
15−11:00 = 16+13:00
15−13:00 <> 16 , because 15−13:00 > 16+14:00 and 15−13:00 < 16−14:00
Timezones do not cause wraparound at the end of the month:
The lexical representation for
The lexical representations for
The
Since the lexical representation allows an optional time zone
indicator,
Because months in one calendar only rarely correspond to months in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A month is a calendar (or
12+13:00 < 12+11:00
(just as 12+12:00 has always been less than 12+11:00, but in version 1.0
12+13:00 > 12+11:00 , since 12+13:00's
The lexical representation for
The lexical representations for \\(0[19])(1[02])((+\)(0[09]1[04]):[05][09])?
The
The
The
More formally, the
The set recognized by
The
The
The
The
The az
, AZ
,
09
, the plus sign (+), the forward slash (/) and the
equal sign (=), together with
For compatibility with older mail gateways,
The
Base64Binary ::= ((B64S B64S B64S B64S)*
((B64S B64S B64S B64) 
(B64S B64S B16S '=') 
(B64S B04S '=' #x20? '=')))?
B64S ::= B64 #x20?
B16S ::= B16 #x20?
B04S ::= B04 #x20?
B04 ::= [AQgw]
B16 ::= [AEIMQUYcgkosw048]
B64 ::= [AZaz09+/]
Note that this grammar requires the number of nonwhitespace
characters in the
The
The above definition of the
The canonical
Canonicalbase64Binary ::= (B64
B64 B64 B64)*
((B64 B64 B16 '=')  (B64 B04 '=='))?
That is, the
For some values the
The length of a
lex2 := killwhitespace(lexform)  remove whitespace characters
lex3 := strip_equals(lex2)  strip padding characters at end
length := floor (length(lex3) * 3 / 4)  calculate length
Note on encoding:
Section 5.4
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed
fragment
identifiers. Because it is
impractical for processors to check that a value is a
contextappropriate URI reference, this specification follows the
lead of
The
For an
Each URI scheme imposes specialized syntax rules
for URIs in that scheme, including restrictions on the syntax of
allowed fragment identifiers. Because it is impractical for processors
to check that a value is a contextappropriate URI reference,
neither the syntactic constraints defined by the definitions of individual
schemes nor the generic syntactic constraints defined by
Spaces are, in principle, allowed in the
The
The definitions of URI in the current
IETF specifications define certain URIs as equivalent to each other.
Those equivalences are not part of this datatype as defined here:
if two
It is implementationdefined whether an
implementation of this specification supports the
The mapping between
Because the lexical representations available for
any value of type
The use of
It is an
Because the lexical representations available for any given value
of
The use of
This section gives conceptual definitions for all
[azAZ]{1,8}([azAZ09]{1,8})*
The regular expression above provides the only normative
constraint on the lexical and value spaces of this type. The
additional constraints imposed on language identifiers by
are to be treated as case insensitive; there
exist conventions for capitalization of some of them, but these should
not be taken to carry meaning. For instance, [ISO 3166] recommends
that country codes are capitalized (MN Mongolia), while [ISO 639]
recommends that language codes are written in lower case (mn
Mongolian).
Since the
It is implementationdefined whether an
implementation of this specification supports the
For compatibility (see
For compatibility (see
It is implementationdefined whether an
implementation of this specification supports the
It is implementationdefined whether an
implementation of this specification supports the
It is implementationdefined whether an
implementation of this specification supports
the
For compatibility (see
Uniqueness of items validated as
It is implementationdefined whether an
implementation of this specification supports
the
For compatibility (see
Existence of referents for items validated as
For compatibility (see
Existence of referents for items validated as
It is implementationdefined whether an
implementation of this specification supports
the
The
For compatibility (see
The
For compatibility (see
The
The
The
The
The
The
The
The
The
The
The
The
The
The alwayszero
The lexical space is reduced from that of
The
The
The lexical space is reduced from that of
The
The preceding sections of this
specification have described datatypes in a way largely
independent of their use in the particular context of
This section presents the mechanisms necessary to integrate datatypes into
the context of
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
For more information on the notion of
Simple Type Definitions provide for:
Establishing the
Attaching a unique name (actually a
In the case of
In the case of
Attaching a
The Simple Type Definition schema component has the following properties:
If
If
The value of
The value of
The
The
If
The
The XML representation for a
name
targetNamespace
schema
the
the type definition base
the
final
finalDefault
the empty set;
a set with members drawn from the set above, each being present or absent depending on whether the string contains an equivalently named spacedelimited substring.
Although the finalDefault
{
}
, determined as follows.
the empty set;
{
}
;
Consider
the
the parent element information item is
the corresponding
the parent element information item is
the corresponding
the parent element information item is
the
(the parent element information item is
the grandparent element information item is
the
(the grandparent element information item is
the
a set of
the
a set with one member, a
the empty set
the
If the
If the
If the
If the
An electronic commerce schema might define a datatype called
In this case,
If the
the
the itemType
itemType
(that is, the
In this case, a
A system might want to store lists of floating point values.
In this case,
If the
the
memberTypes
memberTypes
(that is, the
In this case, a
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
A
base
An electronic commerce schema might define a datatype called
In this case,
itemType
A
A system might want to store lists of floating point values.
In this case,
As mentioned in
regardless of the
For each of
memberTypes
A
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
As mentioned in
regardless of the
Unless otherwise specifically allowed by this specification
Either the itemType
Either the base
simpleType
Either the memberTypes
simpleType
A value in a
the value is facetvalid with respect to the particular
A
it
if
if
if
if
if
the value denoted by the
A
Since every value in the
That is, the constraints on
If there is a
The appropriate case among the following is true:
If the
If the
If the
V, as determined by the
appropriate subclause of
Note that
The
If
If
There is a simple type definition nearly equivalent to the simple version
of the
The
anySimpleType
The
definition of
The
anyAtomicType
Simple type definitions for all the builtin primitive datatypes,
namely
Similarly,
Schema components are identified by kind.
is not a kind of component. Each kind of ordered
,
bounded
, etc.) is
a separate kind of schema component.
The term
A
The value of any
Every
for any
there is no pair
for all
for any
for any
for any
the
On every datatype, the operation Equal is defined in terms of the equality
property of the
Note that in consequence of the above:
given
two values which are members of the
if a datatype
if a datatype
if datatypes
There is no schema component corresponding to the
A
for no
for all
for all
The notation
A
for all
The fact that this specification does not define an
indicating whether an
Some datatypes have a nontrivial order relation associated with
their value spaces (see
A
Some of the realworld
datatypes which are the basis for those defined herein
are ordered in some applications, even though no order is prescribed for schemaprocessing
purposes. For example, lexical
orderings. They are
When
When
When
If every member of
If every member of
the
the
the
the
every
each member of the
indicating whether a
Some ordered datatypes have the property that
there is one value greater than or equal to every other value, and
another that
When
When
When the
It
is sometimes useful to categorize
indicating whether the
Every value space has a specific number of members. This number can be characterized as
When
When
one of
all of the following are true:
one of
one of
either of the following are true:
When the
the
at least one of
all of the following are true:
one of
one of
either of the following are true:
When
When the
indicating whether a
Some value spaces are made up of things that
are
When
When
When
Schema components are identified by kind. Constraining
is not a kind of component. Each kind of whiteSpace
,
length
, etc.) is a separate kind of schema component.
The term
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in a
if the
if
if
if
if the
The use of
If
It is an error for
the
there is type definition from which this one is derived by
one or more
It is an error for
the
there is type definition from which this one is derived by
one or more restriction steps in which
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in a
if the
if
if
if
if the
The use of
If both
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in a
if the
if
if
if
if the
The use of
It is an
Constraining a
The following is the definition of a
The XML representation for a
value
there is only one
the value
the concatenation of the value
the
the union of that
just {
The
Thus, to impose two
The value
If multiple
It is a consequence of the schema representation constraint
Thus, to impose two pattern constraints simultaneously, schema authors may either write a single pattern which expresses the intersection of the two patterns they wish to impose, or define each pattern on a separate type derivation step.
A
As noted in
Constraining a
The following example is a
The XML representation for an
value
there is only one
a set with one member, the value
a set of the value
The value
If multiple
A value in a
It is an
No normalization is done, the value is not changed (this is the
behavior required by
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
After the processing implied by
The notation #xA used here (and elsewhere in this specification)
represents the Universal Character Set (UCS) code point
hexadecimal A
(line feed), which is denoted by
U+000A. This notation is to be distinguished from

, which is the XML
collapse
and cannot be changed by a schema author; for
preserve
; for any type derived by
collapse
and cannot
be changed by a schema author. For all datatypes
For more information on
Constraining a
The following example is the
The values
and
replace
may appear to provide a
convenient way to collapse
unwrap
text (i.e. undo the effects of
prettyprinting and wordwrapping). In some cases, especially
highly constrained data consisting of lists of artificial tokens
such as part numbers or other identifiers, this appearance is
correct. For naturallanguage data, however, the whitespace
processing prescribed for these values is not only unreliable but
will systematically remove the information needed to perform
unwrapping correctly. For Asian scripts, for example, a correct
unwrapping process will replace line boundaries not with blanks but
with zerowidth separators or nothing. In consequence, it is
normally unwise to use these values for naturallanguage data, or
for any data other than lists of highly constrained tokens.
If
The XML representation for a
value
fixed
There are no
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value
in an
if the
if the
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
fixed
A value
in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
fixed
A value in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in an
if the
if the
It is an
It is an
For
For
The
The term
If
The XML representation for a
value
fixed
A value in a
that value is expressible as
A value v
is facetvalid with respect to a
v is a
v is a
v is a
It is an
It is an
The term
The following is the definition of a
If
The XML representation for a
value
fixed
A value
It is an
It is an
The term
The following is the definition of a userdefined datatype which could be used to represent a floatingpoint decimal datatype which allows seven decimal digits for the coefficient and exponents between −95 and 96. Note that the scale is −1 times the exponent.
If
The XML representation for a
value
fixed
A
v has
The
It is an
The term
The following is the definition of a userdefined
datatype which could be used to represent amounts in a decimal
currency; it corresponds to a SQL column definition of
DECIMAL(8,2)
. The effect is to allow values
between 999,999.99 and 999,999.99, with a fixed interval
of 0.01 between values.
If
The XML representation for a
value
fixed
A
v has
The
It is an
Note that it is
It is an
This specification describes two levels of conformance for datatype processors. The first is required of all processors. Support for the other will depend on the application environments for which the processor is intended.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be
Some
When this specification is used in the context of other languages
(as it is, for example, by
When presented with a literal or value exceeding the capacity of its partial implementation of a datatype, a minimally conforming implementation of this specification will sometimes be unable to determine with certainty whether the value is datatypevalid or not. Sometimes it will be unable to represent the value correctly through its interface to any downsteam application.
When either of these is so, a conforming processor
This specification does not constrain the method used to indicate that a literal or value in the input data has exceeded the capacity of the implementation, or the form such indications take.
These are the partialimplementation
All
All
All
All
All
All
The conformance limits given in the text correspond to those
of the decimal64 type defined in the current draft of IEEE 754R, which
can be stored in a 64bit field. The XML Schema Working Group
recommends that implementors support limits corresponding to those of
the decimal128 type. This entails supporting the values in the value
space of the otherwise unconstrained datatype for which
The XML representation of the datatypesrelevant part of the schema for schema documents is presented here as a normative part of the specification.
targetNamespace
attribute on the schema
element refers to the XML Schema namespace
itself.
Schema documents conforming to this specification may be in XML
1.0 or XML 1.1. Conforming implementations may accept input in
XML 1.0 or XML 1.1 or both. See
The DTD for the datatypesspecific
aspects of schema documents is given below. Note there is
schema
The following, although in the form of a
schema document, does not conform to the rules for schema documents
defined in this specification. It contains explicit XML
representations of the primitive datatypes which need not be declared
in a schema document, since they are automatically included in every
schema, and indeed must not be declared in a schema document, since it
is forbidden to try to derive types with
The following, although in the form of a schema document, contains XML representations of components already present in all schemas by definition. It is included here as a form of documentation.
These datatypes do not need to be declared in a schema document, since they are automatically included in every schema.
It is an open question whether this and similar XML documents should be accepted or rejected by software conforming to this specification. The XML Schema Working Group expects to resolve this question in connection with its work on issues relating to schema composition.
In the meantime, some existing schema processors will accept declarations for them; other existing processors will reject such declarations as duplicates.
Some datatypes, such as
In this document, the arguments to functions are assumed to be
Properties always have values.
Those values that are more primitive, and are used (among other things) herein to
construct object value spaces but which we do not explicitly define are described here:
A
The following standard operators are defined here in case the reader is unsure of their definition:
n the greatest integer
.
There are several different primitive but related datatypes defined in the specification which pertain to various combinations of dates and times, and parts thereof. They all use related valuespace models, which are described in detail in this section. It is not difficult for a casual reader of the descriptions of the individual datatypes elsewhere in this specification to misunderstand some of the details of just what the datatypes are intended to represent, so more detail is presented here in this section.
All of the value spaces for dates and times
described here represent moments or periods of time in Universal
Coordinated Time (UTC).
There are two distinct ways to model moments in time: either
by tracking their year, month, day, hour, minute and second (with
fractional seconds as needed), or by tracking
their time (measured generally in seconds or
days) from some starting moment. Each has
its advantages. The two are isomorphic. For
definiteness, we choose to model the first
using five
There is also a seventh
Nonnegative values of the properties map
to the years, months, days of month, etc. of the Gregorian
calendar in the obvious way.
Values less than 1582 in the
In version 1.0 of this specification, the
Note that 1 BCE, 5 BCE, and so on (years 0000, 0004, etc. in the lexical representation defined here) are leap years in the proleptic Gregorian calendar used for the date/time datatypes defined here. Version 1.0 of this specification was unclear about the treatment of leap years before the common era; caution should be used if existing schemas or data specify dates of 29 February for any years before the common era. With that possible exception, schemas and data valid under the old interpretation remain valid under the new.
The model just described is called herein the
sevenproperty
model for date/time
datatypes. It is used
Leapseconds are not permitted
As of the time this specification was published,
leapseconds (always one leapsecond) have been introduced
by the responsible authorities at the end (in
19720630
19721231
19731231
19741231
19751231
19761231
19771231
19781231
19810630
19820630
19830630
19850630
19871231
19891231
19901231
19920630
19930630
19940630
19951231
19970630
19981231
20051231
While calculating, property values from the
Each fragment other than
(The redundancy between
The following fragment
The more important functions and
procedures defined here are summarized in the
text When there is a text summary, the name of the function in each is a
The following functions are used with various numeric and date/time datatypes.
0 when d =
1 when d =
2 when d =
−
−
etc.
s_{0} = i and
s_{j+1} = s_{j}
s_{0} = f − 10 , and
s_{j+1} = (s_{j}
For example:
123.4567
n when F is present, and
0 otherwise.
Set pD's
0 when LEX is a
Set pD's
Return pD.
Set d to
Return d.
If d is an integer, then return
Otherwise, return
s be an
c be a nonnegative
e be an
Set s to −1 when nV < 0 .
So select e that 2^{cWidth} × 2^{(e − 1)} < nV ≤ 2^{cWidth} × 2^{e} .
So select c that (c − 1) × 2^{e} ≤ nV  <c × 2^{e} and 2^{cWidth−1} < c ≤ 2^{cWidth} .
when eMax < e
otherwise:
When e < eMin
Set e = eMin
So select c that (c − 1) × 2^{e} ≤ nV  <c × 2^{e} .
Set nV to
c × 2^{e} when nV  > c × 2^{e} − 2^{(e−1)} ;
(c − 1) × 2^{e} when nV  < c × 2^{e} − 2^{(e−1)} ;
c × 2^{e} or (c − 1) × 2^{e} according to whether c is even or c − 1 is even, otherwise (i.e., nV  = c × 2^{e} − 2^{(e−1)} , the midpoint between the two values).
Return
s × nV when nV < 2^{cWidth} × 2^{eMax},
Implementers will find the algorithms of
Return
otherwise (LEX is a numeral):
Set nV to
Set nV to
Return:
When nV is zero:
nV otherwise.
This specification permits the substitution of any other rounding algorithm
which conforms to the requirements of
Return
otherwise (LEX is a numeral):
Set nV to
Set nV to
Return:
When nV is zero:
nV otherwise.
This specification permits the substitution of any other rounding algorithm
which conforms to the requirements of
l be a nonnegative
s be an
c be a positive
e be an
Return
return
return
otherwise (f is numeric and nonzero):
Set s to −1 when f < 0 .
Let c be the smallest
Let e be log_{10}(f  / c) (so that f  = c × 10^{e} ).
Let l be the largest nonnegative integer for which
c × 10^{e} =
Return
l be a nonnegative
s be an
c be a positive
e be an
Return
return
return
otherwise (f is numeric and nonzero):
Set s to −1 when f < 0 .
Let c be the smallest
Let e be log_{10}(f  / c) (so that f  = c × 10^{e} ).
Let l be the largest nonnegative integer for which
c × 10^{e} =
Return
Let nV be the
Let aP be the
If pD is one of NaN, INF, or INF, then return
Otherwise, if nV is an integer and aP is zero and
1E6 ≤ nV ≤ 1E6, then return
Otherwise, if aP is greater than zero and
1E6 ≤ nV ≤ 1E6, then let s be
Otherwise, it will be the case that
nV is less than 1E−6 or greater than 1E6.
Let
s be
m be the part of s which precedes the E
.
n be the part of s which follows the E
.
p be the integer denoted by n.
f be the number of fractional digits in m; note that f will invariably be less than or equal to aP + p.
t be a string consisting of
aP + p − f
occurrences of the digit
y be
m be
h be
m be
s be
d be
t be
0 if Y is not present,
−
0 if D is not present,
−
−
−
y be ym
m be ym
the empty string (
the empty string (
the empty string (
the empty string (
the empty string (
d is
ss
h is
(ss
m is
(ss
s is
ss
m be v's
s be v's
sgn be
sgn &
sgn &
sgn &
m be ym's
sgn be
s be dt's
sgn be
When adding and subtracting numbers from date/time properties, the
immediate results may not conform to the limits specified.
Accordingly, the following procedures are used to
Add (mo − 1)
Set mo to
(mo − 1)
Repeat until da is positive and not greater than
If da exceeds the upper limit from the table then:
Subtract that limit from da.
Add 1 to mo.
If da is not positive then:
Subtract 1 from mo.
Add the new upper limit from the table to da.
Add mi
Set mi to mi
Add hr
Set hr to hr
Add se
Set se to
28 when m is 2 and
y is not evenly divisible by 4,
or is evenly divisible by 100 but not by 400,
or is
29 when m is 2 and y is evenly divisible by 400, or is evenly divisible by 4 but not by 100,
30 when m is 4, 6, 9, or 11,
31 otherwise (m is 1, 3, 5, 7, 8, 10, or 12)
dt be an instance of the
the
the
the
the
the
the
the
Given a
Essentially, this calculation
Leap seconds are
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and—most importantly—be stable over time.
A definition that attempted to take leapseconds into account would
need to be constantly updated, and could not predict the results of
future implementation's additions. The decision to introduce a leap
second in
yr be dt's
mo be dt's
da be dt's
hr be dt's
mi be dt's
se be dt's
tz be dt's
Add du's
Set da
to min(da,
Add du's
Return
This algorithm may be applied to date/time types
other than
For each
Call the function.
For each property
dateTime  duration  result 

20000112T12:13:14Z  P1Y3M5DT7H10M3.3S  20010417T19:23:17.3Z 
200001  P3M  199910 
20000112  PT33H  20000113 
(20000330 + P1D) + P1M = 20000331 + P1M = 20000430
(20000330 + P1M) + P1D = 20000430 + P1D = 20000501
yr be
mo be 12 or
dt's
da be
hr be 0 or
dt's
mi be 0 or
dt's
Subtract
(
Set ToTl to 31536000 × yr .
(Leapyear Days,
Add 86400 ×
(yr
Add 86400 ×
Add 86400 × da to ToTl.
(
Add 3600 × hr + 60 × mi + se to ToTl.
Return ToTl.
0 when TZ is
−(
DT when
DT &
T when
T &
D when
D &
YM when ym's
YM &
MD when md's
MD &
The following functions are used with various datatypes neither numeric nor date/time related.
The
The auxiliary functions
0000 when d =
0001 when d =
0010 when d =
0011 when d =
...
1110 when d =
1111 when d =
The
...
The following table shows the values of the fundamental facets
for each
The
C  represents a digit used in the thousands and hundreds components, the "century" component, of the time element "year". Legal values are from 0 to 9.
Y  represents a digit used in the tens and units components of the time element "year". Legal values are from 0 to 9.
M  represents a digit used in the time element "month". The two digits in a MM format can have values from 1 to 12.
D  represents a digit used in the time element "day". The two digits in a DD format can have values from 1 to 28 if the month value equals 2, 1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30 if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value equals 1, 3, 5, 7, 8, 10 or 12.
h  represents a digit used in the time element "hour". The two digits in a hh format can have values from 0 to 24. If the value of the hour element is 24 then the values of the minutes element and the seconds element must be 00 and 00.
m  represents a digit used in the time element "minute". The two digits in a mm format can have values from 0 to 59.
s  represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31
For all the information items indicated by the above characters, leading zeroes are required where indicated.
In addition to the above, certain characters are used as designators and appear as themselves in lexical formats.
T  is used as time designator to indicate the start of the
representation of the time of day in
Z  is used as timezone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time (
In the lexical format for
P  is used as the time duration designator, preceding a data element representing a given duration of time.
Y  follows the number of years in a time duration.
M  follows the number of months or minutes in a time duration.
D  follows the number of days in a time duration.
H  follows the number of hours in a time duration.
S  follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for
The year "0000" is an illegal year value.
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of
The lexical representations for the datatypes
Given a
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(1,3) = 1
fQuotient(0,3)...fQuotient(2,3) = 0
fQuotient(3,3) = 1
fQuotient(3.123,3) = 1
modulo(a, b) = a  fQuotient(a,b)*b
modulo(1,3) = 2
modulo(0,3)...modulo(2,3) = 0...2
modulo(3,3) = 0
modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a  low, high  low)
fQuotient(0, 1, 13) = 1
fQuotient(1, 1, 13) ... fQuotient(12, 1, 13) = 0
fQuotient(13, 1, 13) = 1
fQuotient(13.123, 1, 13) = 1
modulo(a, low, high) = modulo(a  low, high  low) + low
modulo(0, 1, 13) = 12
modulo(1, 1, 13) ... modulo(12, 1, 13) = 1...12
modulo(13, 1, 13) = 1
modulo(13.123, 1, 13) = 1.123
maximumDayInMonthFor(yearValue, monthValue) =
M := modulo(monthValue, 1, 13)
Y := yearValue + fQuotient(monthValue, 1, 13)
Return a value based on M and Y:
31  M = January, March, May, July, August, October, or December  
30  M = April, June, September, or November  
29  M = February AND (modulo(Y, 400) = 0 OR (modulo(Y, 100) != 0) AND modulo(Y, 4) = 0)  
28  Otherwise 
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is
Leap seconds are
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and  most importantly  be stable over time.
A definition that attempted to take leapseconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in
The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).
temp := S[month] + D[month]
E[month] := modulo(temp, 1, 13)
carry := fQuotient(temp, 1, 13)
E[year] := S[year] + D[year] + carry
E[zone] := S[zone]
temp := S[second] + D[second]
E[second] := modulo(temp, 60)
carry := fQuotient(temp, 60)
temp := S[minute] + D[minute] + carry
E[minute] := modulo(temp, 60)
carry := fQuotient(temp, 60)
temp := S[hour] + D[hour] + carry
E[hour] := modulo(temp, 24)
carry := fQuotient(temp, 24)
if S[day] > maximumDayInMonthFor(E[year], E[month])
tempDays := maximumDayInMonthFor(E[year], E[month])
else if S[day] < 1
tempDays := 1
else
tempDays := S[day]
E[day] := tempDays + D[day] + carry
E[day] := E[day] + maximumDayInMonthFor(E[year], E[month]  1)
carry := 1
E[day] := E[day]  maximumDayInMonthFor(E[year], E[month])
carry := 1
temp := E[month] + carry
E[month] := modulo(temp, 1, 13)
E[year] := E[year] + fQuotient(temp, 1, 13)
dateTime  duration  result 

20000112T12:13:14Z  P1Y3M5DT7H10M3.3S  20010417T19:23:17.3Z 
200001  P3M  199910 
20000112  PT33H  20000113 
Time durations are added by simply adding each of their fields, respectively, without overflow.
The order of addition of durations to instants
((dateTime + duration1) + duration2) != ((dateTime + duration2) + duration1)
(20000330 + P1D) + P1M = 20000331 + P1M = 20000430
(20000330 + P1M) + P1D = 20000430 + P1D = 20000501
A
Unlike some popular regular expression languages (including those
defined by Perl and standard Unix utilities), the regular
expression language defined here implicitly anchors all regular
expressions at the head and tail, as the most common use of
regular expressions in A
(#x41) and end with the character
Z
(#x5a) would be defined as follows:
In regular expression languages that are not implicitly anchored at the head and tail, it is customary to write the equivalent regular expression as:
^A.*Z$
where "^" anchors the pattern at the head and "$" anchors at the tail.
In those rare cases where an unanchored match is desired, including
.*
at the beginning and ending of the regular expression will
achieve the desired results. For example, a datatype derived from string such that all values must contain at least 3 consecutive A
(#x41
) characters somewhere within the value could be defined as follows:

characters.
For all 
Denoting the set of strings 

(empty string)  the set containing just the empty string 
all strings in 

all strings in 
For all 
Denoting the set of strings 

all strings in 

all strings 
For all 
Denoting the set of strings 

all strings in 

the empty string, and all strings in


All strings in 

All strings 

All strings 

All strings in 

All strings in L(S{n}S*) 

All strings 

The set containing only the empty string 
The regular expression language in the Perl Programming Language
S{,m}
, since it is logically equivalent to S{0,m}
.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification.
?
, *
, +
,
{n,m}
or {n,}
, which have the meanings
defined in the table above.
For all 
Denoting the set of strings 

the single string consisting only of 

all strings in 

( 
all strings in 
.
, \
, ?
,
*
, +
, {
, }
(
, )

[
]
.
These characters have special meanings in
Note that a
A character class is either a
[
and ]
characters. For all character
groups
For all 
Identifying the set of characters 

all characters in 

all characters in 

all characters in 

all characters in 
^
character.
For all

character.
For any
A single XML character is a
The [
, ]
, 
and \
characters are not
valid character ranges;
The ^
character is only valid at the beginning of a

character is a
valid character range only at the beginning
or end of a
The grammar for
A
\
If s is the first character in a ^
\
or [
; and
The code point of
The code point of a
The valid 
Identifying the set of characters 

\n 
the newline character (#xA) 
\r 
the return character (#xD) 
\t 
the tab character (#x9) 
\\ 
\ 
\ 
 
\. 
. 
\ 
 
\^ 
^ 
\? 
? 
\* 
* 
\+ 
+ 
\{ 
{ 
\} 
} 
\( 
( 
\) 
) 
\[ 
[ 
\] 
] 
X
,
can be identified with a \p{X}
.
The complement of this set is specified with the
\P{X}
.
([\P{X}]
= [^\p{X}]
).
The following table specifies the recognized values of the "General Category" property.
Category  Property  Meaning 

Letters  L  All Letters 
Lu  uppercase  
Ll  lowercase  
Lt  titlecase  
Lm  modifier  
Lo  other  
Marks  M  All Marks 
Mn  nonspacing  
Mc  spacing combining  
Me  enclosing  
Numbers  N  All Numbers 
Nd  decimal digit  
Nl  letter  
No  other  
Punctuation  P  All Punctuation 
Pc  connector  
Pd  dash  
Ps  open  
Pe  close  
Pi  initial quote (may behave like Ps or Pe depending on usage)  
Pf  final quote (may behave like Ps or Pe depending on usage)  
Po  other  
Separators  Z  All Separators 
Zs  space  
Zl  line  
Zp  paragraph  
Symbols  S  All Symbols 
Sm  math  
Sc  currency  
Sk  modifier  
So  other  
Other  C  All Others 
Cc  control  
Cf  format  
Co  private use  
Cn  not assigned 
The properties mentioned above exclude the Cs
property.
The Cs
property identifies
X
(with all white space stripped out),
can be identified with a \p{IsX}
.
The complement of this set is specified with the
\P{IsX}
.
([\P{IsX}]
= [^\p{IsX}]
).
The following table specifies the recognized block names (for more
information, see the "Blocks.txt" file in
Start Code  End Code  Block Name  Start Code  End Code  Block Name  

#x0000  #x007F  BasicLatin  #x0080  #x00FF  Latin1Supplement  
#x0100  #x017F  LatinExtendedA  #x0180  #x024F  LatinExtendedB  
#x0250  #x02AF  IPAExtensions  #x02B0  #x02FF  SpacingModifierLetters  
#x0300  #x036F  CombiningDiacriticalMarks  #x0370  #x03FF  Greek  
#x0400  #x04FF  Cyrillic  
#x0530  #x058F  Armenian  #x0590  #x05FF  Hebrew  
#x0600  #x06FF  Arabic  #x0700  #x074F  Syriac  
#x0780  #x07BF  Thaana  
#x0900  #x097F  Devanagari  #x0980  #x09FF  Bengali  
#x0A00  #x0A7F  Gurmukhi  #x0A80  #x0AFF  Gujarati  
#x0B00  #x0B7F  Oriya  #x0B80  #x0BFF  Tamil  
#x0C00  #x0C7F  Telugu  #x0C80  #x0CFF  Kannada  
#x0D00  #x0D7F  Malayalam  #x0D80  #x0DFF  Sinhala  
#x0E00  #x0E7F  Thai  #x0E80  #x0EFF  Lao  
#x0F00  #x0FFF  Tibetan  #x1000  #x109F  Myanmar  
#x10A0  #x10FF  Georgian  #x1100  #x11FF  HangulJamo  
#x1200  #x137F  Ethiopic  
#x13A0  #x13FF  Cherokee  #x1400  #x167F  UnifiedCanadianAboriginalSyllabics  
#x1680  #x169F  Ogham  #x16A0  #x16FF  Runic  
#x1780  #x17FF  Khmer  #x1800  #x18AF  Mongolian  
#x1E00  #x1EFF  LatinExtendedAdditional  #x1F00  #x1FFF  GreekExtended  
#x2000  #x206F  GeneralPunctuation  #x2070  #x209F  SuperscriptsandSubscripts  
#x20A0  #x20CF  CurrencySymbols  #x20D0  #x20FF  CombiningMarksforSymbols  
#x2100  #x214F  LetterlikeSymbols  #x2150  #x218F  NumberForms  
#x2190  #x21FF  Arrows  #x2200  #x22FF  MathematicalOperators  
#x2300  #x23FF  MiscellaneousTechnical  #x2400  #x243F  ControlPictures  
#x2440  #x245F  OpticalCharacterRecognition  #x2460  #x24FF  EnclosedAlphanumerics  
#x2500  #x257F  BoxDrawing  #x2580  #x259F  BlockElements  
#x25A0  #x25FF  GeometricShapes  #x2600  #x26FF  MiscellaneousSymbols  
#x2700  #x27BF  Dingbats  
#x2800  #x28FF  BraillePatterns  
#x2E80  #x2EFF  CJKRadicalsSupplement  #x2F00  #x2FDF  KangxiRadicals  
#x2FF0  #x2FFF  IdeographicDescriptionCharacters  #x3000  #x303F  CJKSymbolsandPunctuation  
#x3040  #x309F  Hiragana  #x30A0  #x30FF  Katakana  
#x3100  #x312F  Bopomofo  #x3130  #x318F  HangulCompatibilityJamo  
#x3190  #x319F  Kanbun  #x31A0  #x31BF  BopomofoExtended  
#x3200  #x32FF  EnclosedCJKLettersandMonths  #x3300  #x33FF  CJKCompatibility  
#x3400  #x4DB5  CJKUnifiedIdeographsExtensionA  
#x4E00  #x9FFF  CJKUnifiedIdeographs  #xA000  #xA48F  YiSyllables  
#xA490  #xA4CF  YiRadicals  
#xAC00  #xD7A3  HangulSyllables  
#xE000  #xF8FF  PrivateUse  
#xF900  #xFAFF  CJKCompatibilityIdeographs  #xFB00  #xFB4F  AlphabeticPresentationForms  
#xFB50  #xFDFF  ArabicPresentationFormsA  
#xFE20  #xFE2F  CombiningHalfMarks  
#xFE30  #xFE4F  CJKCompatibilityForms  #xFE50  #xFE6F  SmallFormVariants  
#xFE70  #xFE  ArabicPresentationFormsB  
#xFF00  #xFFEF  HalfwidthandFullwidthForms  #xFFF0  #xFF  Specials  
The blocks mentioned above exclude the HighSurrogates
,
LowSurrogates
and HighPrivateUseSurrogates
blocks.
These blocks identify "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
For example, the \p{IsBasicLatin}
.
Character sequence  Equivalent 

.  [^\n\r] 
\s  [#x20\t\n\r] 
\S  [^\s] 
\i 
the set of initial name characters, those

\I  [^\i] 
\c 
the set of name characters, those

\C  [^\c] 
\d  \p{Nd} 
\D  [^\d] 
\w 
[#x0000#x10FFFF][\p{P}\p{Z}\p{C}]
( 
\W  [^\w] 
The
In order to align this specification with
those being prepared by the XSL and XML Query Working Groups, a new
datatype named
The
Units of length have been
The use of the namespace
http://www.w3.org/2001/XMLSchemadatatypes
has been
deprecated. The definition of a namespace separate from the main
namespace defined by this specification proved not to be necessary or
helpful in facilitating the use, by other specifications, of the
datatypes defined here, and its use raises a number of difficult
unsolved practical questions.
The
As noted above, positive and negative zero,
The description of the lexical spaces of
The
The minimum requirements for implementation support of the
The lexical representation
Algorithms for arithmetic involving
The treatment of leap seconds is no longer implementationdefined: the date/time types described here do not include leapsecond values.
Support has been added for
Two new totally ordered restrictions of
The XML representations of the
Numerous minor corrections have been made in response to comments on earlier working drafts.
The treatment of topics handled both in this specification and in
Several references to other specifications have been updated to
refer to current versions of those specifications, including
Requirements for the datatypevalidity of values of type
Explicit definitions have been provided for the lexical and
Some errors in the definition of regularexpression metacharacters have been corrected.
The descriptions of the
A warning against using the whitespace facet for tokenizing naturallanguage data has been added on the request of the W3C Internationalization Working Group.
The listing below is for the benefit of readers of a printed version of this document: it collects together all the definitions which appear in the document above.