This is a
Public Working Draft of XML Schema 1.1. It is here made
available for review by W3C members and the public. It is intended to
give an indication of the W3C XML Schema Working Group's intentions
for this new version of the XML Schema language and our progress in
achieving them. It attempts to be complete in indicating
For those primarily interested in the changes since version 1.0,
the
This draft was published on 24 February 2005. The major changes are:
A new primitive decimal type has been defined, which retains information about the precision of the value. This type is aligned with the floatingpoint decimal types which will be part of the next edition of IEEE 754.
In order to align this specification with those being prepared
by the XSL and XML Query Working Groups, a new datatype named
The conceptual model of the date and timerelated types has been defined more formally.
Two subtypes of
A more formal treatment of the fundamental facets of the primitive datatypes has been adopted.
More formal definitions of the lexical space of most types have been provided, with detailed descriptions of the mappings from lexical representation to value and from value to canonical representation.
Please send comments on this Working Draft to
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the
Patent disclosures relevant to this specification may
be found on the Working Group's
The English version of this specification is the only normative
version. Information about translations of this document is available
at
How should this specification be aligned with XML 1.1? The changes in character set and name characters, and the question of what determines which ones to use, must be addressed.
Current plan is that all datatypes defined herein will have EBNF productions at least approximately defining their lexical space, and will include a nonnormative regex derived from the EBNF if a user wishes to copy it directly.
It is not possible for all datatypes to have canonical representations of all values without violating the rules of derivation or adding specialpurpose &cfacet;s which the WG does not deem appropriate. The WG has not yet decided how to deal with datatypes whose lexical and/or canonical mappings are context sensitive.
The word will probably be removed.
"Derivations" other than "derivations by restriction" will be renamed "constructions".
The Working Group has two main goals for this version of W3C XML Schema:
Significant improvements in simplicity of design and clarity of
exposition
Provision of support for versioning of XML languages defined using the XML Schema specification, including the XML transfer syntax for schemas itself.
These goals are slightly in tension with one another  the following summarizes the Working Group's strategic guidelines for changes between versions 1.0 and 1.1:
Add support for versioning (acknowledging that this
Allow bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
Allow editorial changes
Allow design cleanup to change behavior in edge cases
Allow relatively nondisruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
Allow design cleanup to change component structure (changes to functionality restricted to edge cases)
Do not allow any significant changes in functionality
Do not allow any changes to XML transfer syntax except those required by version control hooks and bug fixes
The overall aim as regards compatibility is that
All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behaviour across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);
The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behaviour across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);
The
The table below offers two typical examples of XML instances in which datatypes are implicit: the instance on the left represents a billing invoice, the instance on the right a memo or perhaps an email message in XML.
Data oriented  Document oriented 



The invoice contains several dates and telephone numbers, the postal abbreviation for a state (which comes from an enumerated list of sanctioned values), and a ZIP code (which takes a definable regular form). The memo contains many of the same types of information: a date, telephone number, email address and an "importance" value (from an enumerated list, such as "low", "medium" or "high"). Applications which process invoices and memos need to raise exceptions if something that was supposed to be a date or telephone number does not conform to the rules for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the instances that are not expressible in XML DTDs. The limited datatyping facilities in XML have prevented validating XML processors from supplying the rigorous type checking required in these situations. The result has been that individual applications writers have had to implement type checking in an ad hoc manner. This specification addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XMLrelated standards as well.
The
provide for primitive data typing, including byte, date, integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation vs. those governing an underlying information set;
allow creation of userdefined datatypes, such as datatypes that are derived from existing datatypes and which may constrain certain of its properties (e.g., range, precision, length, format).
This portion of the XML Schema Language discusses datatypes that can be
used in an XML Schema. These datatypes can be specified for element
content that would be specified as
The terminology used to describe XML Schema Datatypes is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a datatype processor:
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with
Conforming documents and processors are permitted to but need not behave as described.
(Of strings or names:) Two strings or names being compared must be
identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g.
characters with both precomposed and base+diacritic forms) match only if they have
the same representation in both strings. No case folding is performed. (Of strings and
rules in the grammar:) A string matches a grammatical production
if
Conforming documents and processors are required to behave as
described; otherwise they are in
A violation of the rules of this specification; results are undefined.
Conforming software
This specification provides three different kinds of normative statements about schema components, their representations in XML and their contribution to the schemavalidation of information items:
Constraints on the schema components themselves, i.e. conditions
components
Constraints on the representation of schema components in XML. Some but
not all of these are expressed in
Constraints expressed by schema components which information
items
This section describes the conceptual framework behind the
The datatypes discussed in this specification are
Only those operations and relations needed for schema processing are defined in this
specification. Applications using these datatypes are generally expected to implement
appropriate additional functions and/or relations to make the datatype generally
useful. For example, the description herein of the
A values
)
is influenced by the set of valuespace operations and relations used therewith.
A
A small collection of
A
This specification only defines the operations and relations needed for schema processing. The
choice of terminology for describing/naming the datatypes is selected to guide users and implementers
in how to expand the datatype to be generally useful—i.e., how to recognize the real world
datatypes and their variants for which the datatypes defined herein are
meant to be used for data interchange.
Along with the
The value spaces of datatypes are abstractions,
and are defined in
In addition, other applications are expected to define additional appropriate
operations and/or relations on these value spaces (e.g., addition and multiplication
on the various numerical datatypes' value spaces), and are permitted where
appropriate to even redefine the operations and relations defined within this
specification, provided that
The
defined
enumerated outright
defined by restricting the
defined as a combination of values from one or more already defined
The relations of
The identity relation is always defined. Every value space inherently has an
identity relation. Two things are
This does not preclude implementing datatypes by using more than one
In the identity relation defined herein, values
from different
Each
On the other hand, equality need not cover the entire value space of the datatype (though it usually does).
The equality relation is used in conjunction with
order when making restrictions involving order. This is the only use of
In the prior version of
this specification (1.0), equality was always identity. This has been changed
to permit the datatypes defined herein to more closely match the
For example, the
For another example, the
In the equality relation defined herein, values
from different primitive data spaces are made artificially unequal even if they might
otherwise be considered equal. For example, there is a number
For the purposes of this specification, there is one equality relation for all values
of all datatypes (the union of the various datatype's individual equalities, if one
consider relations to be sets of ordered pairs). The
Each datatype has an order relation prescribed. This order may be a
In this specification, this lessthan order relation is denoted by
The weak order
The value spaces of primitive datatypes are abstractions, which may have values in common. In
the order relation defined herein, these value spaces are made artificially
While it is not an error to attempt to compare values from the
value spaces of two different primitive datatypes, they will alway be
In addition to its
For example, "100" and "1.0E2" are two different literals from the
The literals in the
The number of literals for each value has been kept small; for many datatypes there is a onetoone mapping between literals and values. This makes it easy to exchange the values between different systems. In many cases, conversion from localedependent representations will be required on both the originator and the recipient side, both for computer processing and for interaction with humans.
Textual, rather than binary, literals are used. This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common programming languages and libraries.
While the datatypes defined in this specification have, for the most part,
a single lexical representation i.e. each value in the datatype's
Should a derivation be made using a derivation mechanism that
removes
This could happen by means of a
Conversely, should a derivation remove values then their
There are currently no facets with such an impact. There may be in the future.
For example, '100' and '1.0E2' are two different
The dependencies are in Part 1; they will be resolved there. Text in this Part will reflect that canonical representation are provided for the benefit of other users, including other specifications that might want to reference these datatypes.
Given the "pattern" &cfacet;, restricting away canonical representations cannot be prohibited without undue processing expense. A warning will be inserted, and RQ129 will insure that loss of canonical representations will not affect schema processing.
While the datatypes defined in this specification generally have
a single
This decision is not yet written up herein: The four informational facets, each of which have only one property, will be lumped into one facet having four properties. This will represent a further technical change to the facet structure, but will not result in any additional or lost information in a schema.
The facets of a datatype serve to distinguish those aspects of
one datatype which
Facets are of two types:
Facets are of two kinds:
In the 1.0 version of this specification, information facets were called "fundamental facets". Information facets are not required for schema processing, but some applications use them.
All
Constraining the
All
It is useful to categorize the datatypes defined in this specification along various dimensions, forming a set of characterization dichotomies.
The first distinction to be made is that between
For example, a single token which
Several type systems (such as the one described in
A
In the above example, the value of the
When a datatype is
For each of
For
The
The
A prototypical example of a
Any number (greater than 1) of
The order in which the
For example, given the definition below, the first instance of the <size> element
validates correctly as an
The
A datatype which is
Next, we distinguish between
For example, in this specification,
A new
The datatypes defined by this specification fall into both
the
In the example above,
A datatype which is
As described in more detail in
A
One datatype can be
Conceptually there is no difference between the
A datatype which is
Datatypes as defined above exist, in the abstract, independently of whether they have any relation to schemas as defined in this specification. Datatypes are tied to schemas either by explicit description in this specification, or by user mechanisms prescribed in this specdification for use in usercreated schemas.
The userusable mechanism prescribed by this specification is the
ability to add additional
Datatypes associated with a schema are organized in a hierarchy
that exactly parallels the datatypes' defining (or selecting)
At the root of the hierarchy are two
special datatypes,
All other (
Ordinary datatypes may be characterized as
All datatypes in the
Special and primitive datatypes are placed in the hierarchy by explicit rules in this specification. As mentioned above,
A constructed datatype (
The special, primitive, and other ordinary datatypes described in this specification are present in every schema's
Each builtin datatype in this specification (both
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype
For example, to address the
http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a builtin datatype definition can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype, followed by a period (".") followed by the name of the facet
For example, to address the usage of the maxInclusive facet in the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
The
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the &schemalanguage;,
such as those that do not want to know anything about aspects of the
&schemalanguage; other than the datatypes, each
http://www.w3.org/2001/XMLSchemadatatypes
This applies to both
Each
Special datatypes
There are two
xxx
xxx
The
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formating or
ruby annotation (see
As noted in
An instance of a datatype that is defined as
The canonical representation for
The minimum number of digits implementations are required to support will be lowered to 16 digits; a health warning will be added to note that implementations of derived datatypes may support more digits of precision than the base decimal type does, but that they are not required to do so.
All
1.23, 12678967.543233, +100000.00, 210
.
The lexical space of &odec; is the set of
lexical representations which match the grammar given above, or
(equivalently) the regular expression
The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in:
The canonical representation for
The mapping from values to canonical representations
is given formally in:
Precision is sometimes given in absolute, sometimes in relative
terms. 5
has an arithmetic precision of 0, and
5.01
an arithmetic precision of 2.
The
As explained below, the lexical
representation of the notanumber
, positive infinity
,
and negative infinity
.
The latter two together are called
the infinities
.
Equality and order for
Two numerical
A numerical value n
is less than, equal to, or greater than
and a
INF is equal only to itself, and is greater than
−INF and all numerical
−INF is equal only to itself, and is less than
INF and all numerical
NaN is incomparable with all values,
The lexical mapping and canonical mapping
for
The
a
a
{
an
a
a
a
The description of canonical representations for float and double needs to be cleaned up.
Two zeros will be provided similar to those in precisionDecimal
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A literal in the
The INF
, INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, 1E4, 1267.43233E12, 12.78e2, 12
, 0, 0
and INF
are all legal literals for
The canonical representation for
NaN has the canonical form
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A literal in the
The INF
, INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, 1E4, 1267.43233E12, 12.78e2, 12
, 0, 0
and INF
are all legal literals for
The canonical representation for
NaN has the canonical form
All YYYY
) and a minimum fractional second precision of
milliseconds or three decimal digits (i.e. s.sss
).
However,
) is
a
The
The
Durations can be modeled in at least two ways: as sixproperty tuples (similar to
the sevenproperty model used for other date/time datatypes) or as twoproperty tuples
(somewhat similar to the alternative oneproperty timeOnTimeline model especially useful for
16960901T00:00:00Z
16970201T00:00:00Z
19030301T00:00:00Z
19030701T00:00:00Z
These four values are chosen so as to maximize
the possible differences in results that could occur, such as the difference when adding
P1M and P30D: 16970201T00:00:00Z + P1M < 16970201T00:00:00Z + P30D ,
but 19030301T00:00:00Z + P1M > 19030301T00:00:00Z + P30D , so
that P1M <> P30D . If two
This minor anomaly is the result of having
It turns out that under the definition just given, two
Two totally ordered datatypes (
There are many ways to implement
The lexical representation for
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary
[09]+
.[09]+(\.[09]+)?
.
An optional preceding minus sign ('') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M
.
One could also indicate a duration of minus 120 days as:
P120D
.
Reduced precision and truncated representations of this format are allowed provided they conform to the following:
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator
The seconds part
The designator 'T'
For example, P1347Y, P1347M and P1Y2MT2H are all allowed; P0Y1347M and P0Y1347M0D are allowed. P1347M is not allowed although P1347M is allowed. P1Y2MT is not allowed.
The PnYnMnDTnHnMnS
More precisely, the
Thus, a
The ?P(((([09]+Y([09]+M)?)
(
(
(
(
(
(
(
(
(
(
(
(
The
herein
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
herein
In general, the
16960901T00:00:00Z
16970201T00:00:00Z
19030301T00:00:00Z
19030701T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate. Note that because of leapseconds, a seconds field can vary
from 59 to 60. However, because of the way that addition is defined in
Relation  

P1Y  > P364D  <> P365D  <> P366D  < P367D  
P1M  > P27D  <> P28D  <> P29D  <> P30D  <> P31D  < P32D  
P5M  > P149D  <> P150D  <> P151D  <> P152D  <> P153D  < P154D 
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to compare durations of a small number of months against days.
Months  1  2  3  4  5  6  7  8  9  10  11  12  13  ...  

Days  Minimum  28  59  89  120  150  181  212  242  273  303  334  365  393  ... 
Maximum  31  62  92  123  153  184  215  245  276  306  337  366  397  ... 
In comparing
Certain derived datatypes of durations can be guaranteed have a total order. For this, they must have fields from only one row in the list below and the time zone must either be required or prohibited.
year, month
day, hour, minute, second
For example, a datatype could be defined to correspond to the
The
All timezoned times are Coordinated Universal Time
(
The date and time datatypes described in this recommendation were inspired
by
Those using this (1.0) version of this Recommendation to
represent negative years should be aware that the interpretation of lexical
representations beginning with a ''
is likely to change in
subsequent versions.
See the conformance note in
The
The
See the conformance note in
Equality and order are as prescribed
in
Since the order of a
Although
Order and equality are essentially the same for
The ''? yyyy '' mm '' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?
,
where
''?
the remaining ''s are separators between parts of the date portion;
the first
'T' is a separator indicating that timeofday follows;
':' is a separator between parts of the timeofday portion;
the second
'.'
For example, 20021010T12:00:0005:00 (noon on 10 October 2002, Central Daylight Savings Time as well as Eastern Standard Time in the U.S.) is 20021010T17:00:00Z, five hours later than 20021010T12:00:00Z.
For further guidance on arithmetic with
Except for trailing fractional zero digits in the seconds representation,
'24:00:00' time representations,
and timezone (for timezoned values), the mapping
from literals to values is onetoone. Where there is more than
one possible representation, the canonical representation is as follows:
The 2digit numeral representing
the hour must not be '24
';
The fractional second string, if present,
must not end in '0
';
for timezoned values, the timezone must be represented with
'Z
' (All timezoned
The lexical representations for
Within a Constraint: Dayofmonth Values
) given above.
Within a Constraint: Leapsecond Values
) given
above. Should a negative leapsecond be declared, the
\?([19][09][09][09]+)(0[09][09][09])\(0[19])(1[02])\(0[19])([12][09])(3[01])
T(([01][09])(2[03]):[05][09]:(( [06 5 ][09])(60))(.[09]+)?) (24:00:00(.[09]+)? )
([+\](0[09])(1[04]):[05][09])?
The lexical mapping and canonical mapping
for
Timezones are durations with (integervalued) hour and minute properties (with the hour magnitude limited to at most 14, and the minute magnitude limited to at most 59, except that if the hour magnitude is 14, the minute value must be 0); they may be both positive or both negative.
The lexical representation of a timezone is a string of the form:
(('+'  '') hh ':' mm)  'Z'
,
where
'+' indicates a nonnegative duration,
'' indicates a nonpositive duration.
The mapping so defined is onetoone, except that '+00:00',
'00:00', and 'Z' all represent the same zerolength duration
timezone,
When a timezone is added to a
In general, the
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "14:00") means adding the timezone 14:00 to Q, where Q did not
already have a timezone.
The ordering between two
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 20000304T23:00:00+03:00 normalizes to 20000304T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time zone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both not specified, continue to the next i
If P[i] is not specified and Q[i] is, or vice versa, stop and return P <> Q
If P[i] < Q[i], stop and return P < Q
If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare as follows:
P < Q if P < (Q with time zone +14:00)
P > Q if P > (Q with time zone 14:00)
P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone 14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare as follows:
P < Q if (P with time zone 14:00) < Q.
P > Q if (P with time zone +14:00) > Q.
P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone 14:00)
Examples:
Determinate  Indeterminate 

20000115T00:00:00 < 20000215T00:00:00  20000101T12:00:00 <> 19991231T23:00:00Z 
20000115T12:00:00 < 20000116T12:00:00Z  20000116T12:00:00 <> 20000116T12:00:00Z 
20000116T00:00:00 <> 20000116T12:00:00Z 
Certain derived types from
Since the lexical representation allows an optional time zone
indicator,
See the conformance note in
The
See the conformance note in
The "seven property model" rewrite of
date/time datatype descriptions includes
a carefully crafted definition of order
that insures that for repeating datatypes (time, gDay, etc.), timezoned values
will be compared as though they are on the same
Equality and order are as prescribed in
A calendar ( or
Since the order of a
Examples that show the difference from version 1.0 of this specification (see
A day is a calendar (or
08:00:00+10:00 < 17:00:00+10:00 (just as 08:00:00Z has always been less than 17:00:00Z, but in version 1.0 08:00:00+10:00 > 17:00:00+10:00 )
A
00:00:00+01:00 is less than
A calendar day with a very early timezone may be completely disjoint from a calendar day with a very late timezone:
Each value with
22:00:00Z > 03:00:00+05:00 (since 19711231T03:00:00+05 is 19791230T22:00:00Z, not 19791231T22:00:00Z); in the previous version of this specification 22:00:00Z = 03:00:00+05:00 )
The lexical representation for
The canonical representation for
The lexical representations for
An
(([01][09])(2[03]):[05][09]:[06][09])(24:00:00)
(([01][09])(2[03]):[05][09]:(( [06 5 ][09])(60))(.[09]+)?) (24:00:00(.[09]+)? )
([+\](0[09])(1[04]):[05][09])?
The lexical mapping and canonical mapping
for
A "date object" is an object with year,
month, and day properties just like those
of
Timezoned
For example: the first moment of 20021010+13:00 is 20021010T00:00:00+13,
which is 20021009T11:00:00Z, which is also the first moment of 2002100911:00.
Therefore 20021010+13:00 is 2002100911:00;
For most timezones, either the first moment or last moment of the day (a
The
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
Examples that show the difference from version 1.0 (see
A day is a calendar (or
20001212+13:00 < 20001212+11:00
(just as 20001212+12:00 has always been less than
20001212+11:00, but in version 1.0
20001212+13:00 > 20001212+11:00 ,
since 20001212+13:00's
Similarly:
20001212+13:00 = 20001213−11:00 (whereas under 1.0, as just stated, 20001212+13:00 = 20001212−11:00)
For the following discussion, let the
"date portion" of a
The ''? yyyy '' mm '' dd zzzzzz?
where the '' yyyy '' mm '' dd 'T00:00:00' zzzzzz?
and the least upper bound of the interval is the timeline point represented
(noncanonically) by:
'' yyyy '' mm '' dd 'T24:00:00' zzzzzz?
.
The
Given a member of the
The lexical representations for
Within a
Constraint: Dayofmonth Values
) given above.
\?([19][09][09][09]+)(0[09][09][09])\(0[19])(1[02])\([02][09])(3[01])((+\)(0[09]1[04]):[05][09])?
The lexical mapping and canonical mapping
for
Since the lexical representation allows an optional
time zone indicator,
Because month/year combinations in one calendar only rarely correspond to month/year combinations in other calendars, values of this type are not, in general, convertible to simple values corresponding to month/year combinations in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or
200012+13:00 < 200012+11:00
(just as 200012+12:00 has always been less than 2000−12+11:00,
but in version 1.0 200012+13:00 >
200012+11:00 , since 2000−12+13:00's
The lexical representation for
For example, to indicate the month of May 1999, one would write: 199905.
See also
The lexical representations for
\?([19][09][09][09]+)(0[09][09][09])\(0[19])(1[02])((+\)(0[09]1[04]):[05][09])?
The lexical mapping and canonical mapping
for
Since the lexical representation allows an optional time zone
indicator,
Because years in one calendar only rarely correspond to years in other calendars, values of this type are not, in general, convertible to simple values corresponding to years in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or
2000+13:00 < 2000+11:00
(just as 2000+12:00 has always been less than 2000+11:00,
but in version 1.0 2000+13:00 >
2000+11:00 , since 2000+13:00's
The lexical representation for
For example, to indicate 1999, one would write: 1999.
See also
The lexical representations for
\?([19][09][09][09]+)(0[09][09][09])((+\)(0[09]1[04]):[05][09])?
The lexical mapping and canonical mapping
for
This datatype can be used, for example, to record birthdays; an instance of the datatype could be used to say that someone's birthday occurs on the 14th of September every year.
Since the lexical representation allows an optional time zone
indicator,
Because day/month combinations in one calendar only rarely correspond to day/month combinations in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or
1212+13:00 < 1212+11:00
(just as 1212+12:00 has always been less than
1212+11:00, but in version 1.0
1212+13:00 > 1212+11:00 , since
1212+13:00's
The lexical representation for
The lexical representations for
Within a Constraint: Dayofmonth Values
) given above.
\\(0[19])(1[02])\([02][09])(3[01])((+\)(0[09]1[04]):[05][09])?
This datatype can be used to represent a specific day in a month. To say, for example, that my birthday occurs on the 14th of September ever year.
The lexical mapping and canonical mapping
for
Since the lexical representation allows an optional time zone
indicator,
Because days in one calendar only rarely
correspond to days in other calendars,
Equality and order are as prescribed in
Examples that may appear anomalous (see
15 < 16 , but 15−13:00 > 16+13:00
15−11:00 = 16+13:00
15−13:00 <> 16 , because 15−13:00 > 16+14:00 and 15−13:00 < 16−14:00
Timezones do not cause wraparound at the end of the month:
The lexical representation for
The lexical representations for
The lexical mapping and canonical mapping for
Since the lexical representation allows an optional time zone
indicator,
Because months in one calendar only rarely correspond to months in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A month is a calendar (or
12+13:00 < 12+11:00
(just as 12+12:00 has always been less than 12+11:00, but in version 1.0
12+13:00 > 12+11:00 , since 12+13:00's
The lexical representation for
The lexical representations for \\(0[19])(1[02])((+\)(0[09]1[04]):[05][09])?
The lexical mapping and canonical mapping
for
The canonical representation for
The lexical forms of az
,
AZ
, 09
, the plus sign (+), the forward slash (/) and the
equal sign (=), together with the characters defined in
For compatibility with older mail gateways,
The lexical space of
Base64Binary ::= ((B64S B64S B64S B64S)*
((B64S B64S B64S B64) 
(B64S B64S B16S '=') 
(B64S B04S '=' #x20? '=')))?
B64S ::= B64 #x20?
B16S ::= B16 #x20?
B04S ::= B04 #x20?
B04 ::= [AQgw]
B16 ::= [AEIMQUYcgkosw048]
B64 ::= [AZaz09+/]
Note that this grammar requires the number of nonwhitespace characters in the lexical
form to be a multiple of four, and for equals signs to appear only at the end of the
lexical form; strings which do not meet these constraints are not legal lexical forms
of
The above definition of the lexical space is more restrictive than that
given in
The canonical lexical form of a
Canonicalbase64Binary ::= (B64
B64 B64 B64)*
((B64 B64 B16 '=')  (B64 B04 '=='))?
For some values the canonical form defined above does not conform to
The length of a
lex2 := killwhitespace(lexform)  remove whitespace characters
lex3 := strip_equals(lex2)  strip padding characters at end
length := floor (length(lex3) * 3 / 4)  calculate length
Note on encoding:
The mapping from
Section 5.4
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed
fragment
identifiers. Because it is
impractical for processors to check that a value is a
contextappropriate URI reference, this specification follows the
lead of
The
Spaces are, in principle, allowed in the
The mapping between literals in the
The use of
It is an
For compatibility (see
The use of
This section gives conceptual definitions for all
[azAZ]{1,8}([azAZ09]{1,8})*
.
The
For compatibility (see
For compatibility (see
For compatibility (see
For compatibility (see
For compatibility (see
The
For compatibility (see
The
For compatibility (see
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The alwayszero
The lexical space is reduced from that of
herein, is that of
The regular expression
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
The
The
pattern
eunmeration
whitespace
minInclusive
minExclusive
maxInclusive
maxExclusive
The lexical space is reduced from that of
herein, is that of
The regular expression
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
The
pattern
eunmeration
whitespace
minInclusive
minExclusive
maxInclusive
maxExclusive
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
For more information on the notion of datatype (schema) components,
see
Simple Type definitions provide for:
Establishing the
Attaching a unique name (actually a
In the case of
In the case of
Attaching a
The Simple Type Definition schema component has the following properties:
Datatypes are identified by their
If
If
The value of
The value of
If
The XML representation for a
name
&iattribute;final
&iattribute;finalDefault
&iattribute;the empty set;
a set with members drawn from the set above, each being present or absent depending on whether the string contains an equivalently named spacedelimited substring.
Although the finalDefault
&iattribute; of
Although the
targetNamespace
&iattribute;
of the parent schema
element information item
A
A
A userdefined
base
&iattribute;
An electronic commerce schema might define a datatype called
SKU
In this case,
itemType
&iattribute;
or the
A
A system might want to store lists of floating point values.
In this case,
As mentioned in
regardless of the
For each of
memberTypes
&iattribute;, if any,
in order, followed by the
A
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
this is a test
]]>
As mentioned in
regardless of the
Unless otherwise specifically allowed by this specification
(
Either the itemType
&iattribute; or the
Either the base
&iattribute; or the
simpleType
&ichild; of the
Either the memberTypes
&iattribute; of the simpleType
&ichild;.
A value in a
the value is facetvalid with respect to the particular
A string is datatypevalid with respect to a datatype definition
if
it
if
if
if
if
if
the value denoted by the literal
The
If
If
The Constraints just given serve, among other things, to insure
that
There is a simple type definition nearly equivalent to the simple version
of the
The definition of
The
definition of
The
Simple type definitions for all the builtin primitive datatypes,
namely
Similarly, simple type definitions for all the builtin &derived;
datatypes are present by definition in every schema, with properties
as specified in
Every
for any
there is no pair
for all
for any
for any
for any
the
On every datatype, the operation Equal is defined in terms of the equality
property of the
Note that in consequence of the above:
given
two values which are members of the
if a datatype
if a datatype
if datatypes
There is no schema component corresponding to the
The decision that the four informational facets, each of which have only one property, will be lumped into one facet having four properties has been rescinded by the WG before it made it into the text of this specification.
Schema components are identified by kind.
is not a kind of component. Each kind of ordered
,
bounded
, etc.) is
A
The value of any
A
for no
for all
for all
The notation
A
for all
The fact that this specification does not define an
indicating whether an
Some datatypes have a nontrivial order relation associated with
their value spaces (see
A
Some of the realworld
datatypes which are the basis for those defined herein
are ordered in some applications, even though no order is prescribed for schemaprocessing
purposes. For example, lexical
orderings. They are
When
When
When
If every member of
If every member of
the
the
the
the
every member of the
each member of the
indicating whether a
Some ordered datatypes have the property that
there is one value greater than or equal to every other value, and
another that less than or equal to every other value. (In the
case of derived datatypes, these two values
When
When
When the
It
is sometimes useful to categorize
indicating whether the
Every value space has a specific number of members. This number can be characterized as
When
When
one of
all of the following are true:
one of
one of
either of the following are true:
When the
the
at least one of
all of the following are true:
one of
one of
either of the following are true:
When
When the
indicating whether a
Some value spaces are made up of things that
are
When
When
When
Schema components are identified by kind. Constraining
is not a kind of component. Each kind of whiteSpace
,
length
, etc.) is a separate kind of schema component.
The WG is considering the ramifications of removing the length &cfacet;, letting the schema document elements that currently set that facet set both minLength and maxLength instead.
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value in a
if the
if
if
if
if the
The use of
If
It is an error for
the
there is type definition from which this one is derived by
one or more restriction steps in which
It is an error for
the
there is type definition from which this one is derived by
one or more restriction steps in which
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value in a
if the
if
if
if
if the
The use of
If both
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value in a
if the
if
if
if
if the
The use of
It is an
Constraining a
The following is the definition of a
The XML representation for a
value
&iattribute;
If multiple
It is a consequence of the schema representation constraint
Thus, to impose two
A literal in a
the literal is among the set of character sequences denoted by
the
Constraining a
The following example is a datatype definition for a
The XML representation for an
value
&iattribute;
If multiple
A value in a
It is an
No normalization is done, the value is not changed (this is the
behavior required by
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
After the processing implied by
The notation #xA used here (and elsewhere in this specification) represents
the Universal Character Set (UCS) code point hexadecimal A
(line feed), which is denoted by
U+000A. This notation is to be distinguished from 

,
which is the XML
collapse
and cannot be changed by a schema author; for
preserve
; for any type collapse
and cannot
be changed by a schema author. For all datatypes
For more information on
Constraining a
The following example is the datatype definition for
the
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
There are no
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value
if the
if the
The
The
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value
if the
if the
The
The
It is an
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value in an
if the
if the
The
The
It is an
It is an
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value in an
if the
if the
It is an
It is an
For
For
The
The term
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise A value in a
that value is expressible as
A value v
is facetvalid with respect to a
v is a
v is a
v is a
It is an
It is an
The term
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A value
It is an
It is an
The term
The following is the definition of a userdefined datatype which could be used to represent a floatingpoint decimal datatype which allows seven decimal digits for the coefficient and exponents between −95 and 96. Note that the scale is −1 times the exponent.
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A
v has
The
It is an
The term
The following is the definition of a userdefined
datatype which could be used to represent amounts in a decimal
currency; it corresponds to a SQL column definition of
DECIMAL(8,2)
. The effect is to allow values
between 999,999.99 and 999,999.99, with a fixed interval
of 0.01 between values.
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise
A
v has
The
It is an
Note that it is
It is an
The
It has been suggested that the
The
A
v's
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise A
L is a
It is an
If during a
The constraints on the various facets and the explicit descriptions of
The
In each of the above,
The most basic datatypes are
Each
Only values from the item type which have
&String;s that are
This specification describes two levels of conformance for datatype processors. The first is required of all processors. Support for the other will depend on the application environments for which the processor is intended.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be
All YYYY
) and a minimum fractional second precision of
milliseconds or three decimal digits (i.e. s.sss
).
However,
Some datatypes, such as
In this document, the arguments to functions are assumed to be
Properties always have values.
Those values that are more primitive, and are used (among other things) herein to
construct object value spaces but which we do not explicitly define are described here:
A
The following standard operators are defined here in case the reader is unsure of their definition:
n the greatest integer in n
.
Numbers are sometimes thought of as including both a numerical value and a
five plusorminus two
or two million to the
nearest thousand
.
There is a smaller class of plusorminus
in order to indicate their
precision. They indicate their precision by the number of digits to the
right of the decimal point. 5.0 has precision plusorminus 0.05, but
5.00 has precision plusorminus 0.005.
There is also a kind of precision
where the plusorminus is expressed as a percentage (or other proportion) of
the numerical value, rather than an exact value: 15 plusorminus
10 percent
or 15000 plusorminus 10 percent
, where the
same percentage indicates a different absolute precision depending on the
size. This kind of precision is properly called geometric
precision
; the absolute precision first described is properly called
arithmetic precision
.
A close approximation to geometric precision also can, for some combinations
of numerical value and precision, be indicated without the
plusorminus
: The precision is indicated by the total
number of digits (not counting leading zero digits). 5.0 has precision
plusorminus 1 percent but 5.00 has precision plusorminus onetenth percent.
Geometric precision doesn't quite match with the digit count. 5.0 and 50
both have precision plusorminus 1 percent but 1.5 and 15 both have precision
plusorminus 3 percent. For various reasons we choose to call this digitcount
precision floatingpoint precision
.
The
One point needs to be made about the notations and the precisions they can
indicate. It's impossible for ordinary decimal notation to indicate a
positive arithmetic precision (as in one million to the nearest thousand
);
this needs
Much of the material defining the various date/time datatypes is
found here and is or will be referenced in the sections defining each
individual date/time datatype. See e.g.
There are several different primitive but related datatypes defined in the specification which pertain to various combinations of dates and times, and parts thereof. They all use related valuespace models, which are described in detail in this section. It is not difficult for a casual reader of the descriptions of the individual datatypes elsewhere in this specification to misunderstand some of the details of just what the datatypes are intended to represent, so more detail is presented here in this section.
All of the value spaces for dates and times
described here represent moments or periods of time in Universal
Coordinated Time (UTC).
There are various concepts involving dates (counting days) and times (counting moments) that have developed over the millenia. This section does not pretend to be a complete tutorial on the history; it only discusses the methods which are necessary to understand just which set of the possible reasonable choices has been adopted for Schema date/time datatypes.
A day is, at least approximately, the time of one rotation of the
Earth about its axis with respect to the Sun. Each day is
divided into 24 hours; each hour into 60 minutes, and each minute
zenith
)
Thus a day is (usually) 86400 (= 60 × 60 × 24) seconds.
real
time: One day is (exactly,
or at least as close as can be astronomically measured) one revolution
of the Earth about its axis with respect to the Sun. The day is
divided into 86400 equallength seconds, which may vary in length from
day to day.
TAI seconds are all the same length, and there are exactly 86400 seconds in each day.
UT1 seconds vary in length, but there are exactly 86400 seconds each day. Days always have the sun at zenith at noon in Greenwich, England. (As a historical note, the TAI second, defined in 1956 in terms of the excitation frequency of Cesium atoms, was chosen to be the average length of a UT1 second during the year 1900.)
Noon of TAI days do not necessarily match the Sun at the zenith. In 1958, TAI was promulgated and synchronized with UT1. Since then, the difference has been slowly increasing, with a given number of seconds from that date measured in UT1 coming later than that same number measured in TAI.
As of the writing of this specification, leapseconds have been added
to
Date  Number of Leapseconds  Date  Number of Leapseconds 
19601231  1.422818  19751231  1 
19610731  0.224752  19761231  1 
19610131  0.198288  19771231  1 
19631030  0.8514208  19781231  1 
19631231  0.0685152  19891231  1 
19640331  0.217936  19810630  1 
19640831  0.298288  19820630  1 
19640131  0.258112  19830630  1 
19650228  0.176464  19850630  1 
19650630  0.258112  19871231  1 
19650831  0.180352  19891231  1 
19651231  0.158112  19901231  1 
19680131  1.872512  19920630  1 
19711231  3.814318  19930630  1 
19720630  1  19940630  1 
19721231  1  19951231  1 
19731231  1  19970630  1 
19741231  1  19981231  1 
standard
times.
There are inherently no precise measurements of the difference
between UT1 on the one hand and proleptic (i.e., used to measure times
prior to their adoption) TAI and
Schema date/time datatypes (except
Once one decides on how many seconds are in each day, one must also
count the days—and months and years. The standard used for
Schema date/time datatypes is the socalled Gregorian
calendar
. Since days are (generally) 86400 seconds, and
one wants each year to correspond to one complete cycle of the Earth
around the Sun (which is not exactly a multiple of 86400 seconds), and
traditionally months have various numbers of days, the following
algorithm was chosen to determine which days fell in which months in
which years: Counting from an agreedupon arbitrary day, years are
numbered consecutively, each year has 12 months (numbered 1 through
12, as well as named) within it, and each day has between 28 and 31
days (also numbered from 1), depending on the month and year according
to the following table:
Month  Nbr of Days 

1 (January)  31 
2 (February)  If the associated year is divisble by 400, or by 4 but not 100, then 29; otherwise 28 
3 (March)  31 
4 (April)  30 
5 (May)  31 
6 (June)  30 
7 (July)  31 
8 (August)  31 
9 (September)  30 
10 (October)  31 
11 (November)  30 
12 (December)  31 
For example, the three numbers (year, month, and day) for 20 January 2003 (20030120) are 2003, 1, and 20 respectively.
The following rewrite includes allowing year 0000 (1 BCE) and redefining all the lexical representations with negative years from that specified in Schema 1.0, as warned in a Note in Schema 1.0 2E. A formal Note calling attention to this change elsewhere in the "normative" part of this specification will be added.
The count of years, months, and days were made official and locked
to real
time by decree of (the Roman Catholic) Pope
Gregory in 1582 (from which comes the name
Gregorian
). Since then, and somewhat even before,
days had been counted with reasonable historical accuracy so that the
Gregorian calendar algorithm can even be used proleptically, i.e., to
establish dates prior to its official adoption. By relatively
recent convention (it began to be adopted by astronomers during the
1800s), there is a year numbered zero; this makes calculating the
difference between two dates easier. The year called 1 of
the Common Era
(1 CE
, or 1 AD
)
is numbered one; the preceding year is numbered zero, not minus
one. (Warning: The date using the proleptic Gregorian calendar
will not generally be the same for a given day as the date using the
Julian
calendar which was in common use prior to the
adoption of the Gregorian calendar, nor will Gregorian years
before the Common Era
(BCE
, or
BC
) be numbered the same as with the current standard
negative numbering.)
There are also standard schemes for numbering days without
reference to months and years. The most common is
Note that the JD
All of the preceding discussion applies to real
times Greenwich meridian
, the meridian
where longitude is 0 degreesstandard
time to get the local time. The
standard
time is selected to be that where noon is when
the Sun is exactly overhead at 0 degrees longitude;
A moment in time is like a point on a line; the point does not change if we change where we put zero on the line, but the number we use to represent that point changes. Similarly, when one specifies a moment in time, one can specify the same moment regardless of which timezone one specifies, but the numbers one uses for year, month, day, hour, minute, and second will be different.
There are two distinct ways to model moments in time: either
by tracking their year, month, day, hour, minute and second (with
fractional seconds as needed), or by tracking
their time (measured generally in seconds or
days) from some starting moment. Each has
its advantages. The two are isomorphic;
the Gregorian calendar algorithm, modified for
There is also a seventh
The model just described is called herein the
sevenproperty
model for date/time
datatypes. It is used
Leapseconds are not permitted when
As of the time this specification was published,
leapseconds (always one leapsecond) have been introduced
by the responsible authorities at the end (in
19720630
19721231
19731231
19741231
19751231
19761231
19771231
19781231
19891231
19810630
19820630
19830630
19850630
19871231
19891231
19901231
19920630
19930630
19940630
19951231
19970630
19981231
While calculating, property values from the
Each fragment other than
(The redundancy between
The following fragment
The more important functions and
procedures defined here are summarized in the
text When there is a text summary, the name of the function in each is a
The following functions are used with various numeric and date/time datatypes.
0 when d =
1 when d =
2 when d =
−
−
etc.
s_{0} = i and
s_{j+1} = s_{j}
s_{0} = f − 10 , and
s_{j+1} = (s_{j}
For example:
123.4567
n when F is present, and
0 otherwise.
Set pD's
0 when LEX is a
Set pD's
Return pD.
Set d to
Return d.
If d is an integer, then return
Otherwise, return
Let nV be the
Let aP be the
If pD is one of NaN, INF, or INF, then return
Otherwise, if nV is an integer and aP is zero and
1E6 ≤ nV ≤ 1E6, then return
Otherwise, if aP is greater than zero and
1E6 ≤ nV ≤ 1E6, then let s be
Otherwise, it will be the case that
nV is less than 1E−6 or greater than 1E6.
Let
s be
m be the part of s which precedes the E
.
n be the part of s which follows the E
.
p be the integer denoted by n.
f be the number of fractional digits in m; note that f will invariably be less than or equal to aP + p.
t be a string consisting of
aP + p − f
occurrences of the digit
y be
m be
h be
m be
s be
d be
t be
0 if Y is not present,
−
0 if D is not present,
−
−
−
y be ym
m be ym
the empty string (
the empty string (
the empty string (
the empty string (
the empty string (
d is
ss
h is
(ss
m is
(ss
s is
ss
m be v's
s be v's
sgn be
sgn &concat;
sgn &concat;
sgn &concat;
m be ym's
sgn be
s be dt's
sgn be
When adding and subtracting numbers from date/time properties, the
immediate results may not conform to the limits specified.
Accordingly, the following procedures are used to
Add (mo − 1)
Set mo to
(mo − 1)
Repeat until da is positive and not greater than
If da exceeds the upper limit from the table then:
Subtract that limit from da.
Add 1 to mo.
If da is not positive then:
Subtract 1 from mo.
Add the new upper limit from the table to da.
Add mi
Set mi to mi
Add hr
Set hr to hr
Add se
Set se to
Add 60 × mi + 3600 × hr to se .
Set mi and hr to zero.
Repeat until se is nonnegative and less than 86400
plus the number of leapseconds
If se equals or exceeds 86400 plus the upper limit from the table then:
Subtract (86400 plus that leapsecond count) from se.
Add 1 to da.
If se is negative then:
Subtract 1 from da.
Add 86400 plus the new leapsecond count from the table to se.
If se is less than 86340 then:
Set mi to se
Set se to se
If se is not less than 86340 then:
Set mi to 1439.
Subtract 86340 from se.
The
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day
limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day
limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day
limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day
limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day
limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be rawYear when rawYear
is not
mo be rawMo, dt's
da be rawDa, dt's
hr be rawHr, dt's
mi be rawMi, dt's
se be rawSe, dt's
If dt's
Subtract
Set
30 when m is 4, 6, 9, or 11,
28 when m is 2 and y is divisble by
400, or by 4 but not by 100, or is
28 otherwise and m is 2, and
31 otherwise (m is 1, 3, 5, 7, 8, 10, or 12)
yr be rawYear when rawYear
is not
mo be rawMo, dt's
da be rawDa, dt's
hr be rawHr, dt's
mi be rawMi, dt's
se be rawSe, dt's
yr be
mo be 12 or
dt's
da be
hr be 0 or
dt's
mi be 0 or
dt's
Subtract
(
Set ToTl to 31536000 × yr .
(Leapyear Days,
Add 86400 ×
(yr
Add 86400 ×
Add 86400 × da to ToTl.
(Leapseconds)
Add (the count of leapseconds prior to
(
Add 3600 × hr + 60 × mi + se to ToTl.
Return ToTl.
0 when TZ is
−(
Set dt's
Return dt.
Set ti's
Return ti.
Set da's
Return da.
Set gYM's
Return gYM.
Set gY's
Return gY.
Set gMD's
Return gMD.
Set gD's
Return gD.
Set gM's
Return gM.
DT when
DT &concat;
T when
T &concat;
D when
D &concat;
YM when ym's
YM &concat;
MD when md's
MD &concat;
The following table shows the values of the fundamental facets
for each
The
C  represents a digit used in the thousands and hundreds components, the "century" component, of the time element "year". Legal values are from 0 to 9.
Y  represents a digit used in the tens and units components of the time element "year". Legal values are from 0 to 9.
M  represents a digit used in the time element "month". The two digits in a MM format can have values from 1 to 12.
D  represents a digit used in the time element "day". The two digits in a DD format can have values from 1 to 28 if the month value equals 2, 1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30 if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value equals 1, 3, 5, 7, 8, 10 or 12.
h  represents a digit used in the time element "hour". The two digits in a hh format can have values from 0 to 24. If the value of the hour element is 24 then the values of the minutes element and the seconds element must be 00 and 00.
m  represents a digit used in the time element "minute". The two digits in a mm format can have values from 0 to 59.
s  represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31
For all the information items indicated by the above characters, leading zeros are required where indicated.
In addition to the above, certain characters are used as designators and appear as themselves in lexical formats.
T  is used as time designator to indicate the start of the
representation of the time of day in
Z  is used as timezone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time (
In the lexical format for
P  is used as the time duration designator, preceding a data element representing a given duration of time.
Y  follows the number of years in a time duration.
M  follows the number of months or minutes in a time duration.
D  follows the number of days in a time duration.
H  follows the number of hours in a time duration.
S  follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for
The year "0000" is an illegal year value.
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of
The lexical representations for the datatypes
Given a
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(1,3) = 1
fQuotient(0,3)...fQuotient(2,3) = 0
fQuotient(3,3) = 1
fQuotient(3.123,3) = 1
modulo(a, b) = a  fQuotient(a,b)*b
modulo(1,3) = 2
modulo(0,3)...modulo(2,3) = 0...2
modulo(3,3) = 0
modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a  low, high  low)
fQuotient(0, 1, 13) = 1
fQuotient(1, 1, 13) ... fQuotient(12, 1, 13) = 0
fQuotient(13, 1, 13) = 1
fQuotient(13.123, 1, 13) = 1
modulo(a, low, high) = modulo(a  low, high  low) + low
modulo(0, 1, 13) = 12
modulo(1, 1, 13) ... modulo(12, 1, 13) = 1...12
modulo(13, 1, 13) = 1
modulo(13.123, 1, 13) = 1.123
maximumDayInMonthFor(yearValue, monthValue) =
M := modulo(monthValue, 1, 13)
Y := yearValue + fQuotient(monthValue, 1, 13)
Return a value based on M and Y:
31  M = January, March, May, July, August, October, or December  
30  M = April, June, September, or November  
29  M = February AND (modulo(Y, 400) = 0 OR (modulo(Y, 100) != 0) AND modulo(Y, 4) = 0)  
28  Otherwise 
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is
Leap seconds are handled by the computation by treating them as overflows. Essentially, a value of 60 seconds in S is treated as if it were a duration of 60 seconds added to S (with a zero seconds field). All calculations thereafter use 60 seconds per minute.
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and  most importantly  be stable over time.
A definition that attempted to take leapseconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in
The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).
temp := S[month] + D[month]
E[month] := modulo(temp, 1, 13)
carry := fQuotient(temp, 1, 13)
E[year] := S[year] + D[year] + carry
E[zone] := S[zone]
temp := S[second] + D[second]
E[second] := modulo(temp, 60)
carry := fQuotient(temp, 60)
temp := S[minute] + D[minute] + carry
E[minute] := modulo(temp, 60)
carry := fQuotient(temp, 60)
temp := S[hour] + D[hour] + carry
E[hour] := modulo(temp, 24)
carry := fQuotient(temp, 24)
if S[day] > maximumDayInMonthFor(E[year], E[month])
tempDays := maximumDayInMonthFor(E[year], E[month])
else if S[day] < 1
tempDays := 1
else
tempDays := S[day]
E[day] := tempDays + D[day] + carry
E[day] := E[day] + maximumDayInMonthFor(E[year], E[month]  1)
carry := 1
E[day] := E[day]  maximumDayInMonthFor(E[year], E[month])
carry := 1
temp := E[month] + carry
E[month] := modulo(temp, 1, 13)
E[year] := E[year] + fQuotient(temp, 1, 13)
dateTime  duration  result 

20000112T12:13:14Z  P1Y3M5DT7H10M3.3S  20010417T19:23:17.3Z 
200001  P3M  199910 
20000112  PT33H  20000113 
Time durations are added by simply adding each of their fields, respectively, without overflow.
The order of addition of durations to instants
((dateTime + duration1) + duration2) != ((dateTime + duration2) + duration1)
(20000330 + P1D) + P1M = 20000331 + P1M = 20000430
(20000330 + P1M) + P1D = 20000430 + P1D = 20000501
A
Unlike some popular regular expression languages (including those
defined by Perl and standard Unix utilities), the regular
expression language defined here implicitly anchors all regular
expressions at the head and tail, as the most common use of
regular expressions in A
(#x41) and end with the character
Z
(#x5a) would be defined as follows:
In regular expression languages that are not implicitly anchored at the head and tail, it is customary to write the equivalent regular expression as:
^A.*Z$
where "^" anchors the pattern at the head and "$" anchors at the tail.
In those rare cases where an unanchored match is desired, including
.*
at the beginning and ending of the regular expression will
achieve the desired results. For example, a datatype A
(#x41
) characters somewhere within the value could be defined as follows:

characters.
For all 
Denoting the set of strings 

(empty string)  the set containing just the empty string 
all strings in 

all strings in 
For all 
Denoting the set of strings 

all strings in 

all strings 
For all 
Denoting the set of strings 

all strings in 

the empty string, and all strings in


All strings in 

All strings 

All strings 

All strings in 

All strings in L(S{n}S*) 

All strings 

The set containing only the empty string 
The regular expression language in the Perl Programming Language
S{,m}
, since it is logically equivalent to S{0,m}
.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification.
?
, *
, +
,
{n,m}
or {n,}
, which have the meanings
defined in the table above.
For all 
Denoting the set of strings 

the single string consisting only of 

all strings in 

( 
all strings in 
.
, \
, ?
,
*
, +
, {
, }
(
, )
, [
or ]
.
These characters have special meanings in
Note that a
A character class is either a
[
and ]
characters. For all character
groups
For all 
Identifying the set of characters 

all characters in 

all characters in 

all characters in 

all characters in 
^
character.
For all

character.
For any
A single XML character is a
The [
, ]
, 
and \
characters are not
valid character ranges;
The ^
character is only valid at the beginning of a
The grammar for
A
\
If s is the first character in a ^
\
or [
; and
The code point of
The code point of a
The valid 
Identifying the set of characters 

\n 
the newline character (#xA) 
\r 
the return character (#xD) 
\t 
the tab character (#x9) 
\\ 
\ 
\ 
 
\. 
. 
\ 
 
\^ 
^ 
\? 
? 
\* 
* 
\+ 
+ 
\{ 
{ 
\} 
} 
\( 
( 
\) 
) 
\[ 
[ 
\] 
] 
X
,
can be identified with a \p{X}
.
The complement of this set is specified with the
\P{X}
.
([\P{X}]
= [^\p{X}]
).
The following table specifies the recognized values of the "General Category" property.
Category  Property  Meaning 

Letters  L  All Letters 
Lu  uppercase  
Ll  lowercase  
Lt  titlecase  
Lm  modifier  
Lo  other  
Marks  M  All Marks 
Mn  nonspacing  
Mc  spacing combining  
Me  enclosing  
Numbers  N  All Numbers 
Nd  decimal digit  
Nl  letter  
No  other  
Punctuation  P  All Punctuation 
Pc  connector  
Pd  dash  
Ps  open  
Pe  close  
Pi  initial quote (may behave like Ps or Pe depending on usage)  
Pf  final quote (may behave like Ps or Pe depending on usage)  
Po  other  
Separators  Z  All Separators 
Zs  space  
Zl  line  
Zp  paragraph  
Symbols  S  All Symbols 
Sm  math  
Sc  currency  
Sk  modifier  
So  other  
Other  C  All Others 
Cc  control  
Cf  format  
Co  private use  
Cn  not assigned 
The properties mentioned above exclude the Cs
property.
The Cs
property identifies "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
X
(with all white space stripped out),
can be identified with a \p{IsX}
.
The complement of this set is specified with the
\P{IsX}
.
([\P{IsX}]
= [^\p{IsX}]
).
The following table specifies the recognized block names (for more
information, see the "Blocks.txt" file in
Start Code  End Code  Block Name  Start Code  End Code  Block Name  

#x0000  #x007F  BasicLatin  #x0080  #x00FF  Latin1Supplement  
#x0100  #x017F  LatinExtendedA  #x0180  #x024F  LatinExtendedB  
#x0250  #x02AF  IPAExtensions  #x02B0  #x02FF  SpacingModifierLetters  
#x0300  #x036F  CombiningDiacriticalMarks  #x0370  #x03FF  Greek  
#x0400  #x04FF  Cyrillic  #x0530  #x058F  Armenian  
#x0590  #x05FF  Hebrew  #x0600  #x06FF  Arabic  
#x0700  #x074F  Syriac  #x0780  #x07BF  Thaana  
#x0900  #x097F  Devanagari  #x0980  #x09FF  Bengali  
#x0A00  #x0A7F  Gurmukhi  #x0A80  #x0AFF  Gujarati  
#x0B00  #x0B7F  Oriya  #x0B80  #x0BFF  Tamil  
#x0C00  #x0C7F  Telugu  #x0C80  #x0CFF  Kannada  
#x0D00  #x0D7F  Malayalam  #x0D80  #x0DFF  Sinhala  
#x0E00  #x0E7F  Thai  #x0E80  #x0EFF  Lao  
#x0F00  #x0FFF  Tibetan  #x1000  #x109F  Myanmar  
#x10A0  #x10FF  Georgian  #x1100  #x11FF  HangulJamo  
#x1200  #x137F  Ethiopic  #x13A0  #x13FF  Cherokee  
#x1400  #x167F  UnifiedCanadianAboriginalSyllabics  #x1680  #x169F  Ogham  
#x16A0  #x16FF  Runic  #x1780  #x17FF  Khmer  
#x1800  #x18AF  Mongolian  #x1E00  #x1EFF  LatinExtendedAdditional  
#x1F00  #x1FFF  GreekExtended  #x2000  #x206F  GeneralPunctuation  
#x2070  #x209F  SuperscriptsandSubscripts  #x20A0  #x20CF  CurrencySymbols  
#x20D0  #x20FF  CombiningMarksforSymbols  #x2100  #x214F  LetterlikeSymbols  
#x2150  #x218F  NumberForms  #x2190  #x21FF  Arrows  
#x2200  #x22FF  MathematicalOperators  #x2300  #x23FF  MiscellaneousTechnical  
#x2400  #x243F  ControlPictures  #x2440  #x245F  OpticalCharacterRecognition  
#x2460  #x24FF  EnclosedAlphanumerics  #x2500  #x257F  BoxDrawing  
#x2580  #x259F  BlockElements  #x25A0  #x25FF  GeometricShapes  
#x2600  #x26FF  MiscellaneousSymbols  #x2700  #x27BF  Dingbats  
#x2800  #x28FF  BraillePatterns  #x2E80  #x2EFF  CJKRadicalsSupplement  
#x2F00  #x2FDF  KangxiRadicals  #x2FF0  #x2FFF  IdeographicDescriptionCharacters  
#x3000  #x303F  CJKSymbolsandPunctuation  #x3040  #x309F  Hiragana  
#x30A0  #x30FF  Katakana  #x3100  #x312F  Bopomofo  
#x3130  #x318F  HangulCompatibilityJamo  #x3190  #x319F  Kanbun  
#x31A0  #x31BF  BopomofoExtended  #x3200  #x32FF  EnclosedCJKLettersandMonths  
#x3300  #x33FF  CJKCompatibility  #x3400  #x4DB5  CJKUnifiedIdeographsExtensionA  
#x4E00  #x9FFF  CJKUnifiedIdeographs  #xA000  #xA48F  YiSyllables  
#xA490  #xA4CF  YiRadicals  #xAC00  #xD7A3  HangulSyllables  
#xE000  #xF8FF  PrivateUse  
#xF900  #xFAFF  CJKCompatibilityIdeographs  #xFB00  #xFB4F  AlphabeticPresentationForms  
#xFB50  #xFDFF  ArabicPresentationFormsA  #xFE20  #xFE2F  CombiningHalfMarks  
#xFE30  #xFE4F  CJKCompatibilityForms  #xFE50  #xFE6F  SmallFormVariants  
#xFE70  #xFEFE  ArabicPresentationFormsB  #xFEFF  #xFEFF  Specials  
#xFF00  #xFFEF  HalfwidthandFullwidthForms  #xFFF0  #xFFFD  Specials 
The blocks mentioned above exclude the HighSurrogates
,
LowSurrogates
and HighPrivateUseSurrogates
blocks.
These blocks identify "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
For example, the \p{IsBasicLatin}
.
Character sequence  Equivalent 

.  [^\n\r] 
\s  [#x20\t\n\r] 
\S  [^\s] 
\i 
the set of initial name characters, those

\I  [^\i] 
\c 
the set of name characters, those

\C  [^\c] 
\d  \p{Nd} 
\D  [^\d] 
\w 
[#x0000#x10FFFF][\p{P}\p{Z}\p{C}]
( 
\W  [^\w] 
The
The revision of
The model of an abstract datatype has been made more precise and
explicit. more precise and explicit
but also a specific formal
requirement to redo the handling of facets
(
The
Units of length have been selected for all datatypes that are
permitted the length &cfacet;
(
The
The seven property model rewrite of date/time datatype descriptions includes a carefully crafted definition of order
that insures that for repeating datatypes (time, gDay, etc.), timezoned values will be compared as though they are on
the same "calendar day" ("local" property values) so that in any given timezone, the days start at "local" 00:00:00 and
end not quite including "local" 24:00:00. Days are not 00:00:00Z to 24:00:00Z in timezones other than Z. This covers
the requirements of
In addition to the changes already made, the Working Group has decided on a number of further changes which have not yet been reflected in this draft. These are indicated throughout the text as issues, including more or less detail on the intended resolution. The ones remaining in this draft are summarized below, linked to their occurrence in the text above, where more detail can be found, including links to the original requirement or other point of origin.
The listing below is for the benefit of readers of a printed version of this document: it collects together all the definitions which appear in the document above.
Coeditor Ashok Malhotra's work on this specification from March 1999 until February 2001 was supported by IBM, and from then until May 2004 by Microsoft. Since July 2004 his work on this specification has been supported by Oracle Corporation.
The XML Schema Working Group acknowledges with thanks the members of other W3C Working Groups and industry experts in other forums who have contributed directly or indirectly to the creation of this document and its predecessor.
At the time this Working Draft is published, the members in good standing of the XML Schema Working Group are:
The XML Schema Working Group has benefited in its work from the participation and contributions of a number of people who are no longer members of the Working Group in good standing at the time of publication of this Working Draft. Their names are given below. In particular we note with sadness the accidental death of Mario Jeckle shortly before publication of the first Working Draft of XML Schema 1.1. Affiliations given are those current at the time of their first work with the WG.