This
Changes since the previous public Working Draft include the following:
Some typographic errors have been corrected.
An
For those primarily interested in the changes since version 1.0,
the
The major changes since version 1.0 include:
Support for XML 1.1 has been added. It is now implementation
defined whether datatypes dependent on definitions in
In order to align this specification with those being prepared by
the XSL and XML Query Working Groups, a new datatype named
The conceptual model of the date- and time-related types has been defined more formally.
A more formal treatment of the fundamental facets of the primitive datatypes has been adopted.
More formal definitions of the lexical space of most types have
been provided, with detailed descriptions of the mappings from lexical
representation to value and from value to
The validation rule
The rules governing partial implementations of infinite datatypes have been clarified.
Various changes have been made in order to align the relevant
parts of this specification more closely with other relevant
specifications, including especially the corresponding
sections of
Comments on this document should be made in
W3C's public installation of Bugzilla, specifying "XML Schema" as the
product. Instructions can be found at
W3C Advisory Committee Representatives are invited to submit
their formal reviews as described in the Call for Review
(see
Publication as a Proposed Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The
This document has been produced by the
This document was produced
The English version of this specification is the only normative
version. Information about translations of this document is available
at
Whether an SQL-implementation supports leap seconds, and the consequences of such support for date and interval arithmetic, isShort and sweet..
XMLSchema-datatypes
.
Wording accepted without change 2006-01-13.The Working Group has two main goals for this version of W3C XML Schema:
Significant improvements in simplicity of design and clarity
of exposition
Provision of support for versioning of XML languages defined using the XML Schema specification, including the XML transfer syntax for schemas itself.
These goals are slightly in tension with one another -- the following summarizes the Working Group's strategic guidelines for changes between versions 1.0 and 1.1:
Add support for versioning (acknowledging that this
Allow bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
Allow editorial changes
Allow design cleanup to change behavior in edge cases
Allow relatively non-disruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
Allow design cleanup to change component structure (changes to functionality restricted to edge cases)
Do not allow any significant changes in functionality
Do not allow any changes to XML transfer syntax except those required by version control hooks and bug fixes
The overall aim as regards compatibility is that
All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behavior across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);
The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behavior across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);
The
The table below offers two typical examples of XML instances in which datatypes are implicit: the instance on the left represents a billing invoice, the instance on the right a memo or perhaps an email message in XML.
Data oriented | Document oriented |
---|---|
|
|
The invoice contains several dates and telephone numbers, the postal abbreviation for a state (which comes from an enumerated list of sanctioned values), and a ZIP code (which takes a definable regular form). The memo contains many of the same types of information: a date, telephone number, email address and an "importance" value (from an enumerated list, such as "low", "medium" or "high"). Applications which process invoices and memos need to raise exceptions if something that was supposed to be a date or telephone number does not conform to the rules for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the instances that are not expressible in XML DTDs. The limited datatyping facilities in XML have prevented validating XML processors from supplying the rigorous type checking required in these situations. The result has been that individual applications writers have had to implement type checking in an ad hoc manner. This specification addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XML-related standards as well.
Other specifications on which this one depends
are listed in
This specification defines some datatypes which depend on
definitions in
Conforming implementations of this specification
When this specification is used to check the datatype validity of XML
input, implementations
This specification
makes use of the EBNF notation used in the
The
provide for primitive data typing, including byte, date, integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation
allow creation of user-defined datatypes, such as datatypes that are derived from existing datatypes and which may constrain certain of its properties (e.g., range, precision, length, format).
This
The terminology used to describe XML Schema Datatypes is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a datatype processor:
A feature of this specification included solely to ensure that
schemas which use this feature remain compatible with
It is recommended that schemas, schema documents, and processors behave as described, but there can be valid reasons for them not to; it is important that the full implications be understood and carefully weighed before adopting behavior at variance with the recommendation.
Schemas, schema documents
and processors are forbidden to behave as
described; schemas and documents which nevertheless
do so are in
A violation of the rules of this
specification; results are undefined.
Conforming software
A failure of a
Except as otherwise specified,
processors
Failure of an XML element or attribute to be datatype-valid against a particular datatype in a particular schema is not in itself a failure to conform to this specification and thus, for purposes of this specification, not an error.
A choice left under the control of the user of a processor, rather than being fixed for all users or uses of the processor.
Statements in this specification that Processors
behave in a certain way mean that
processors
The normal expectation is that the default setting for
such options will be to disable the
Nothing in this specification constrains the manner in which processors allow users to control user options. Command-line options, menu choices in a graphical user interface, environment variables, alternative call patterns in an application programming interface, and other mechanisms may all be taken as providing user options.
This specification provides three different kinds of normative statements about schema components, their representations in XML and their contribution to the schema-validation of information items:
Constraints on the schema components themselves, i.e. conditions
components
Constraints on the representation of schema components in XML.
Some but not all of these are expressed in
Constraints expressed by schema components which information items
This section describes the conceptual framework behind the
The datatypes discussed in this specification are
Only those operations and relations needed for schema processing
are defined in this specification. Applications using these datatypes
are generally expected to implement appropriate additional functions
and/or relations to make the datatype generally useful. For
example, the description herein of the
A A A small collection of
This specification only defines the operations and relations needed
for schema processing. The choice of terminology for
describing/naming the datatypes is selected to guide users and
implementers in how to expand the datatype to be generally
useful—i.e., how to recognize the real world
datatypes and their variants for which the datatypes defined herein
are meant to be used for data interchange.
Along with the
Where
The
This specification sometimes uses the shorter form type
where one might strictly speaking expect the longer form
datatype
(e.g. in the phrases
union type
, list type
,
base type
, item type
, etc.
No systematic distinction is intended between
the forms of these phrase with type
and
those with datatype
;
the two forms are used interchangeably.
The distinction between datatype
and simple type definition
, by contrast,
carries more information: the datatype is characterized by its
The value spaces of datatypes are abstractions,
and are defined in
In addition, other applications are expected to define additional
appropriate operations and/or relations on these value spaces (e.g.,
addition and multiplication on the various numerical datatypes'
value spaces), and are permitted where appropriate to even redefine
the operations and relations defined within this specification,
provided that
The defined enumerated outright defined by restricting the defined as a combination of values from one or more already
defined
The relations of
The identity relation is always defined. Every value space
inherently has an identity relation. Two things are
This does not preclude implementing datatypes by using more than
one
In the identity relation defined herein, values from different
Datatypes
Given a list A and a list B, A and B are the same list if they are the same sequence of atomic values. The necessary and sufficient conditions for this identity are that A and B have the same length and that the items of A are pairwise identical to the items of B.
It is a consequence of the rule just given for list identity
that there is only one empty list. An empty list declared as
having
Each
On the other hand, equality need not cover the entire value space
of the datatype (though it usually does).
sameness
prescribed by this specification
test for
In the prior version of this specification (1.0), equality was
always identity. This has been changed to permit the datatypes
defined herein to more closely match the
For example, the
For another example, the
In the equality relation defined herein, values from different
primitive data spaces are made artificially unequal even if they might
otherwise be considered equal. For example, there is a number
Two lists A and B are equal if and only if they have the same length and their items are pairwise equal. A list of length one containing a value V1 and an atomic value V2 are equal if and only if V1 is equal to V2.
For the purposes of this specification, there is one equality
relation for all values of all datatypes (the union of the various
datatype's individual equalities, if one consider relations to be
sets of ordered pairs). The
The order relation is used
in conjunction with equality when making
In this specification, this less-than order relation is denoted by
The weak order
Comparison of values from different
When made for purposes of checking an enumeration constraint,
such a comparison is not in itself an error, but since
Specifying an upper or lower bound which is of the wrong primitive
datatype (and therefore
Comparison of
In addition to its
For example, "100" and "1.0E2" are two different literals from the
The literals in the
The number of literals for each value has been kept small; for many datatypes there is a one-to-one mapping between literals and values. This makes it easy to exchange the values between different systems. In many cases, conversion from locale-dependent representations will be required on both the originator and the recipient side, both for computer processing and for interaction with humans.
Textual, rather than binary, literals are used. This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common programming languages and libraries.
While the datatypes defined in this specification have, for the most
part, a single lexical representation i.e. each value in the
datatype's
For the
For the
For
If a derivation introduces a
One should be aware that in the context of XML
Should a derivation be made using a derivation mechanism that
removes
This could happen by means of a
Conversely, should a derivation remove values then their
There are currently no facets with such an impact. There may be in the future.
For example, '100' and '1.0E2' are two
different
While the datatypes defined in this specification
The facets of a datatype serve to distinguish those aspects of
one datatype which
Facets are of two types:
All
Constraining the
It is useful to categorize the datatypes defined in this
specification along various dimensions,
The first distinction to be made is that
between
First, we distinguish
It is a consequence of constraints normatively specified elsewhere
in this document
It is a consequence of constraints normatively specified
elsewhere in this document
For example, a single token which
An
Atomic values are sometimes regarded, and described, as not
decomposable
, but in fact the values in several datatypes
defined here are described with internal structure, which is appealed
to in checking whether particular values satisfy various constraints
(e.g. upper and lower bounds on a datatype). Other specifications
which use the datatypes defined here may define operations which
attribute internal structure to values and expose or act upon that
structure.
The
There is one
Several type systems (such as the one described in
A
In the above example, the value of the
When a datatype is
For each of
For
The
The
It will be observed that the
A prototypical example of a
Any number
When datatypes are represented using XSD schema components, as
described in
The
The order in which the
For example, given the definition below, the first instance of the <size> element
validates correctly as an
The
The
A datatype which is
When a datatype is
Next, we distinguish between
Next, we distinguish
As normatively specified elsewhere,
conforming processors
Processors
For example, in this specification,
The
datatypes defined by this specification fall into
In the example above,
A datatype which is
As described in more detail in
A
One datatype can be
Definition, derivation, restriction, and construction are conceptually distinct, although in practice they are frequently performed by the same mechanisms.
By
The properties of the
The properties of any
For all other datatypes, a
By
The above does not preclude the
More generally,
B is the There is some datatype X
such that X is the
A datatype
It is a consequence of
Since each datatype has exactly one
By
Formally,
the the
Note that all three forms of datatype
By
,
, and
, respectively.
Datatypes so constructed may be understood fully (for
purposes of a type system) in terms of (a) the properties
of the datatype(s) from which they are constructed, and
(b) their
The
The mechanism for making
From the schema author's perspective, a reference to
a datatype which proves to be An error has been made in giving the name of the datatype. The datatype is a The datatype is an The datatype is an The datatype is a
In the terminology of
Conceptually there is no difference between the
A datatype which is
Each built-in datatype the base URI is the URI of the XML Schema namespace the fragment identifier is the name of the datatype
For example, to address the http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely
addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a built-in the base URI is the URI of the XML Schema namespace the fragment identifier is the name of the
For example, to address the usage of the maxInclusive facet in
the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
The
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the XML Schema definition language,
such as those that do not want to know anything about aspects of the
XML Schema definition language other than the datatypes, each
http://www.w3.org/2001/XMLSchema-datatypes
This applies to both
Each
The two datatypes at the root of the hierarchy of simple
types are
For further details of
The
It is a consequence of this definition, together with the
definition of the potential
effable
or nameable
The
It is
The
When a new datatype is defined
by
For further details of
The
The
It is
The
When a new datatype is defined
by
The
Processors
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formatting or
ruby annotation (see
The
It is
Equality for
As noted in
The
It is
The
An instance of a datatype that is defined as
The
The
For a decimal datatype whose values do reflect precision, see
All
-1.23, 12678967.543233, +100000.00,
210
The lexical space of decimal is the set of
lexical representations which match the grammar given above, or
(equivalently) the regular expression
(\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)
The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in:
The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in
The
The mapping from values to
The
The mapping from values to
"Equality" in this Recommendation is defined to be "identity" (i.e.,
values that are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A
The
As explained below, the
Equality and order for Equality is identity, except that 0 = −0 (although
they are not identical) and NaN ≠ NaN
(although NaN is of course identical to itself). 0 and −0 are thus For the basic values, the order relation
on float is the order relation for rational numbers. INF is greater
than all other non-NaN values; −INF is less than all other non-NaN
values. NaN is
Any value
The Schema 1.0 version of this datatype did not differentiate between
0 and −0 and NaN was equal to itself. The changes were
made to make the datatype more closely mirror
The INF
, -INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, -1E4, 1267.43233E12, 12.78e-2, 12
, -0, 0
and INF
are all legal
The
The (\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)([Ee](\+|-)?[0-9]+)?
|(\+|-)?INF|NaN
The
Since IEEE allows some variation in rounding of values, processors
conforming to this specification may exhibit some variation in their
The
The Schema 1.0 version of this datatype did not permit rounding
algorithms whose results differed from
The
The only significant differences between float and double are the three defining constants 53 (vs 24), −1074 (vs −149), and 971 (vs 104).
"Equality" in this Recommendation is defined to be "identity" (i.e.,
values that are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A
The
As explained below, the
Equality and order for Equality is identity, except that 0 = −0 (although
they are not identical) and NaN ≠ NaN
(although NaN is of course identical to itself). For the basic values, the order relation
on double is the order relation for rational numbers. INF is greater
than all other non-NaN values; −INF is less than all other non-NaN
values. NaN is
Any value
The Schema 1.0 version of this datatype did not differentiate between
0 and −0 and NaN was equal to itself. The changes were
made to make the datatype more closely mirror
The INF
, -INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, -1E4, 1267.43233E12, 12.78e-2, 12
, -0, 0
and INF
are all legal
The
The (\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)([Ee](\+|-)?[0-9]+)?
|(\+|-)?INF|NaN
The
Since IEEE allows some variation in rounding of values, processors
conforming to this specification may exhibit some variation in their
The
The Schema 1.0 version of this datatype did not permit rounding
algorithms whose results differed from
The
All YYYY
) and a minimum fractional second precision of
milliseconds or three decimal digits (i.e. s.sss
).
However,
) is
a
1696-09-01T00:00:00Z 1697-02-01T00:00:00Z 1903-03-01T00:00:00Z 1903-07-01T00:00:00Z
These four values are chosen so as to maximize
the possible differences in results that could occur,
such as the difference when adding P1M and P30D:
1697-02-01T00:00:00Z + P1M < 1697-02-01T00:00:00Z + P30D ,
but
1903-03-01T00:00:00Z + P1M > 1903-03-01T00:00:00Z + P30D ,
so that P1M <> P30D .
If two
Two totally ordered datatypes (
There are many ways to implement
See the conformance notes in
The lexical representation for
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary
integer.
Similarly, the value of the Seconds component
allows an arbitrary decimal.
Thus, the lexical representation of
An optional preceding minus sign ('-') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M
.
One could also indicate a duration of minus 120 days as:
-P120D
.
Reduced precision and truncated representations of this format are allowed provided they conform to the following:
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator
The seconds part
The designator 'T'
For example, P1347Y, P1347M and P1Y2MT2H are all allowed; P0Y1347M and P0Y1347M0D are allowed. P-1347M is not allowed although -P1347M is allowed. P1Y2MT is not allowed.
The PnYnMnDTnHnMnS
More precisely, the
Thus, a
The language accepted by the The expression
The expression The expression -?P[0-9]+Y?([0-9]+M)?([0-9]+D)?(T([0-9]+H)?([0-9]+M)?([0-9]+(\.[0-9]+)?S)?)?
The
In general, the 1696-09-01T00:00:00Z 1697-02-01T00:00:00Z 1903-03-01T00:00:00Z 1903-07-01T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate.
Relation | |||||||
---|---|---|---|---|---|---|---|
P1Y | > P364D | <> P365D | <> P366D | < P367D | |||
P1M | > P27D | <> P28D | <> P29D | <> P30D | <> P31D | < P32D | |
P5M | > P149D | <> P150D | <> P151D | <> P152D | <> P153D | < P154D |
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to compare durations of a small number of months against days.
Months | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | ... | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Days | Minimum | 28 | 59 | 89 | 120 | 150 | 181 | 212 | 242 | 273 | 303 | 334 | 365 | 393 | ... |
Maximum | 31 | 62 | 92 | 123 | 153 | 184 | 215 | 245 | 276 | 306 | 337 | 366 | 397 | ... |
In comparing
Certain derived datatypes of durations can be guaranteed have a total order. For this, they must have fields from only one row in the list below and the time zone must either be required or prohibited.
year, month
day, hour, minute, second
For example, a datatype could be defined to correspond to the
The
All timezoned times are Coordinated Universal Time
(
The date and time datatypes described in this recommendation were inspired
by
Those using this (1.0) version of this Recommendation to
represent negative years should be aware that the interpretation of lexical
representations beginning with a '-'
is likely to change in
subsequent versions.
See the conformance note in
In version 1.0 of this specification, the
old style
calendar; the two correspond
approximately but not exactly to each other.
In this version of this specification,
two changes are made in order to agree with existing usage.
First,
Note that 1 BCE, 5 BCE, and so on (years 0000, -0004, etc. in the
lexical representation defined here) are leap years in the proleptic
Gregorian calendar used for the date/time datatypes defined here.
Version 1.0 of this specification was unclear about the treatment of
leap years before the common era
The
See the conformance note in
Equality and order are as prescribed
in
Since the order of a
Although
Order and equality are essentially the same for
The '-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?
,
where
'-'?
the remaining '-'s are separators between parts of the date portion;
the first
'T' is a separator indicating that time-of-day follows;
':' is a separator between parts of the time-of-day portion;
the second
'.'
For example, 2002-10-10T12:00:00-05:00 (noon on 10 October 2002, Central Daylight Savings Time as well as Eastern Standard Time in the U.S.) is 2002-10-10T17:00:00Z, five hours later than 2002-10-10T12:00:00Z.
For further guidance on arithmetic with
Except for trailing fractional zero digits in the seconds representation,
'24:00:00' time representations,
and timezone (for timezoned values), the mapping
from literals to values is one-to-one. Where there is more than
one possible representation, the The 2-digit numeral representing
the hour must not be ' The fractional second string, if present,
must not end in ' for timezoned values, the timezone must be represented with
'24
';0
';Z
' (All timezoned
The lexical representations for Within a Subsequent Alternatively, For example, 2002-10-10T12:00:00−05:00
(noon on 10 October 2002, Central Daylight
Savings Time as well as Eastern Standard Time
in the U.S.) is equal to 2002-10-10T17:00:00Z,
five hours later than 2002-10-10T12:00:00Z. For the most part, this specification adopts the distinction between
Constraint: Day-of-month Values
) given above.
The
The
Timezones are durations with (integer-valued) hour and minute properties (with the hour magnitude limited to at most 14, and the minute magnitude limited to at most 59, except that if the hour magnitude is 14, the minute value must be 0); they may be both positive or both negative.
The lexical representation of a timezone is a string of the form:
(('+' | '-') hh ':' mm) | 'Z'
,
where
'+' indicates a nonnegative duration,
'-' indicates a nonpositive duration.
The mapping so defined is one-to-one, except that '+00:00',
'-00:00', and 'Z' all represent the same zero-length duration
timezone,
When a timezone is added to a
In general, the
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "-14:00") means adding the timezone -14:00 to Q, where Q did not
already have a timezone.
The ordering between two
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 2000-03-04T23:00:00+03:00 normalizes to 2000-03-04T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time zone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both
not specified, continue to the next i If P[i] is not specified
and Q[i] is, or vice versa, stop and return
P <> Q If P[i] < Q[i], stop and return P < Q If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare as follows:
P < Q if P < (Q with time zone +14:00)
P > Q if P > (Q with time zone -14:00)
P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone -14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare as follows:
P < Q if (P with time zone -14:00) < Q.
P > Q if (P with time zone +14:00) > Q.
P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone -14:00)
Examples:
Determinate | Indeterminate |
---|---|
2000-01-15T00:00:00 < 2000-02-15T00:00:00 | 2000-01-01T12:00:00 <> 1999-12-31T23:00:00Z |
2000-01-15T12:00:00 < 2000-01-16T12:00:00Z | 2000-01-16T12:00:00 <> 2000-01-16T12:00:00Z |
2000-01-16T00:00:00 <> 2000-01-16T12:00:00Z |
Certain derived types from
Since the lexical representation allows an optional time zone
indicator,
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
A calendar (or
The lexical representation for
The
The lexical representations for
The
The
A "date object" is an object with year,
month, and day properties just like those
of
timezoned
For example: the first moment of 2002-10-10+13:00 is 2002-10-10T00:00:00+13,
which is 2002-10-09T11:00:00Z, which is also the first moment of 2002-10-09-11:00.
Therefore 2002-10-10+13:00 is 2002-10-09-11:00;
For most timezones, either the first moment or last moment of the day (a
See the conformance note in
The
See the conformance note in
Equality and order are as prescribed in
In version 1.0 of this specification,
For the following discussion, let the
"date portion" of a
The '-'? yyyy '-' mm '-' dd zzzzzz?
where the '-' yyyy '-' mm '-' dd 'T00:00:00' zzzzzz?
and the least upper bound of the interval is the timeline point represented
(noncanonically) by:
'-' yyyy '-' mm '-' dd 'T24:00:00' zzzzzz?
.
The
Given a member of the
The lexical representations for Within a
Constraint: Day-of-month Values
) given above.-?([1-9][0-9]{3,}|0[0-9]{3})-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
The
Since the lexical representation allows an optional
time zone indicator,
Because month/year combinations in one calendar only rarely correspond to month/year combinations in other calendars, values of this type are not, in general, convertible to simple values corresponding to month/year combinations in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
The lexical representation for
For example, to indicate the month of May 1999, one would write: 1999-05.
See also
The lexical representations for
-?([1-9][0-9]{3,}|0[0-9]{3})-(0[1-9]|1[0-2])(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
The
The
Since the lexical representation allows an optional time zone
indicator,
Because years in one calendar only rarely correspond to years in other calendars, values of this type are not, in general, convertible to simple values corresponding to years in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
See the conformance note in
Equality and order are as prescribed in
The lexical representation for
For example, to indicate 1999, one would write: 1999.
See also
The lexical representations for
-?([1-9][0-9]{3,}|0[0-9]{3})(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
The
The
This datatype can be used, for example, to record birthdays; an instance of the datatype could be used to say that someone's birthday occurs on the 14th of September every year.
Since the lexical representation allows an optional time zone
indicator,
Because day/month combinations in one calendar only rarely correspond to day/month combinations in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The
Equality and order are as prescribed in
In version 1.0 of this specification,
An example that shows the difference from version 1.0 (see
A day is a calendar (or --12-12+13:00 < --12-12+11:00
(just as --12-12+12:00 has always been less than
--12-12+11:00, but in version 1.0
--12-12+13:00 > --12-12+11:00 , since
--12-12+13:00's
The lexical representation for
The lexical representations for
Within a Constraint: Day-of-month Values
) given above.--(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
This datatype can be used to represent a specific day in a month. To say, for example, that my birthday occurs on the 14th of September ever year.
The
The
Since the lexical representation allows an optional time zone
indicator,
Because days in one calendar only rarely
correspond to days in other calendars,
Equality and order are as prescribed in
Examples that may appear anomalous (see ---15 < ---16 , but ---15−13:00 > ---16+13:00 ---15−11:00 = ---16+13:00 ---15−13:00 <> ---16 ,
because ---15−13:00 > ---16+14:00
and ---15−13:00 < 16−14:00
The lexical representation for
The lexical representations for ---(0[1-9]|[12][0-9]|3[01])(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
The
Since the lexical representation allows an optional time zone
indicator,
Because months in one calendar only rarely correspond to months in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
Equality and order are as prescribed in
The lexical representation for
The lexical representations for --(0[1-9]|1[0-2])(Z|(\+|-)((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?
The
The
The
More formally, the
The set recognized by
The
The
The
The
The
The a-z
, A-Z
,
0-9
, the plus sign (+), the forward slash (/) and the
equal sign (=), together with
For compatibility with older mail gateways,
The
Base64Binary ::= ((B64S B64S B64S B64S)*
((B64S B64S B64S B64) |
(B64S B64S B16S '=') |
(B64S B04S '=' #x20? '=')))?
B64S ::= B64 #x20?
B16S ::= B16 #x20?
B04S ::= B04 #x20?
B04 ::= [AQgw]
B16 ::= [AEIMQUYcgkosw048]
B64 ::= [A-Za-z0-9+/]
The ((([A-Za-z0-9+/] ?){4})*(([A-Za-z0-9+/] ?){3}[A-Za-z0-9+/]|([A-Za-z0-9+/] ?){2}[AEIMQUYcgkosw048] ?=|[A-Za-z0-9+/] ?[AQgw] ?= ?=))?
Note that this grammar requires the number of non-whitespace
characters in the
The
The above definition of the
The canonical
Canonical-base64Binary ::= (B64
B64 B64 B64)*
((B64 B64 B16 '=') | (B64 B04 '=='))?
That is, the
For some values the
The length of a
lex2 := killwhitespace(lexform)
-- remove whitespace characters
lex3 := strip_equals(lex2)
-- strip padding characters at end
length := floor (length(lex3) * 3 / 4)
-- calculate length
Note on encoding:
Section 5.4
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed
fragment identifiers. Because it is
impractical for processors to check that a value is a
context-appropriate URI reference, this specification follows the
lead of
The value space of
The
For an
Each URI scheme imposes specialized syntax rules
for URIs in that scheme, including restrictions on the syntax of
allowed fragment identifiers. Because it is impractical for processors
to check that a value is a context-appropriate URI reference,
neither the syntactic constraints defined by the definitions of individual
schemes nor the generic syntactic constraints defined by
Spaces are, in principle, allowed in the
The
The definitions of URI in the current
IETF specifications define certain URIs as equivalent to each other.
Those equivalences are not part of this datatype as defined here:
if two
It is
The mapping from lexical space to value space for a particular
When
The host language, whether XML-based or otherwise,
The mapping between
Because the lexical representations available for
any value of type
The use of
Because its current schema
, as instantiated for example
by
The lexical mapping rules for
It is an
It is (with one exception) an
The exception is that in the
Because the lexical representations available for any given value
of
The use of
This section gives conceptual definitions for all
[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
The regular expression above provides the only normative
constraint on the lexical and value spaces of this type. The
additional constraints imposed on language identifiers by
are to be treated as case insensitive; there
exist conventions for capitalization of some of
Since the
The empty string is not a member of the
One way to define the desired set of possible values is
illustrated by the schema document for the XML namespace
at
It is
For compatibility (see
anySimpleType
For compatibility (see
It is
It is
It is
For compatibility (see
Uniqueness of items validated as
It is
For compatibility (see
Existence of referents for items validated as
anySimpleType
For compatibility (see
Existence of referents for items validated as
It is
The
For compatibility (see
anySimpleType
The
For compatibility (see
The
The
The
The
The
The
The
The
The
The
The
The
The
The always-zero
The lexical space is reduced from that of
The
The
The
The lexical space is reduced from that of
The
The
As a consequence of requiring an explicit time zone offset, the
lexical space of
For details of the
In other words, the lexical space of
The
The
The
The preceding sections of this
specification have described datatypes in a way largely
independent of their use in the particular context of
This section presents the mechanisms necessary to integrate datatypes into
the context of
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
For more information on the notion of
Simple Type Definitions provide for:
Establishing the
Attaching a unique name (actually a
In the case of
In the case of
Attaching a
The Simple Type Definition schema component has the following properties:
If
If
The value of
The value of
The
The
If
The
The XML representation for a
name
targetNamespace
schema
the
the type definition base
the
final
finalDefault
the empty set;
a set with members drawn from the set above, each being present or absent depending on whether the string contains an equivalently named space-delimited substring.
Although the finalDefault
{
}
, determined as follows.
the empty set;
{
}
;
Consider
the
the parent element information item is the corresponding the parent element information item is the corresponding the parent element information item is the (the parent element information item is the grandparent element information item is the (the grandparent element information item is
the
the
a set with one member, a
the empty set
If the
An electronic commerce schema might define a datatype called
In this case,
If the
the
the itemType
itemType
(that is, the In this case, a
A system might want to store lists of floating point values.
In this case,
If the
the
memberTypes
memberTypes
(that is, the In this case, a
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
A
base
An electronic commerce schema might define a datatype called
In this case,
itemType
A
A system might want to store lists of floating point values.
In this case,
As mentioned in
regardless of the
For each of
memberTypes
A
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
As mentioned in
regardless of the
Unless otherwise specifically allowed by this specification
Either the itemType
Either the base
simpleType
Either the memberTypes
simpleType
A value in a
the value is facet-valid with respect to the particular
A
it
if
if
if
if
if
the value denoted by the
A
Since every value in the
That is, the constraints on If there is a The appropriate case among the following is true:
If the For If the If the V, as determined by the
appropriate sub-clause of appropriate documentation
is the relevant
section of this specification. For
Note that
The
For any
Similarly, for any
If
If
There is a simple type definition nearly equivalent to the simple version
of the
The
anySimpleType
The
definition of
The
anyAtomicType
Simple type definitions for all the built-in primitive datatypes,
namely
The The value of The value of
It is a consequence of the rule just given that each
Similarly,
Schema components are identified by kind.
is not a kind of component. Each kind of ordered
,
bounded
, etc.) is
a separate kind of schema component.
A
The value of any
Every
for any
there is no pair
for all
for any
for any
for any
the
On every datatype, the operation Equal is defined in terms of the equality
property of the
Note that in consequence of the above:
given
two values which are members of the
if a datatype
if a datatype
if datatypes
There is no schema component corresponding to the
A
for no
for all
for all
The notation
A
for all
The fact that this specification does not define an
indicating whether an
A
The value
When new datatypes are derived from datatypes with partial orders,
the constraints imposed can sometimes result in a value space
for which the ordering is total, or trivial. The value of the
Some of the real-world
datatypes which are the basis for those defined herein
are ordered in some applications, even though no order is prescribed for schema-processing
purposes. For example, lexical
orderings. They are
When
When
When
If every member of
If every member of
the
the the the every
each member of the
indicating whether a
Some ordered datatypes have the property that
there is one value greater than or equal to every other value, and
another that
When
When
When the
It
is sometimes useful to categorize
indicating whether the
Every value space has a specific number of members. This number can be characterized as
When
When
one of
all of the following are true:
one of
one of
either of the following are true:
When the the at least one of all of the following are true: one of one of either of the following are true:
When
When the
indicating whether a
Some value spaces are made up of things that
are
When
When
When
Schema components are identified by kind. Constraining
is not a kind of component. Each kind of whiteSpace
,
length
, etc.) is a separate kind of schema component.
This specification distinguishes three kinds of constraining facets: This specification defines just one This specification defines As specified normatively elsewhere, Most of the constraining facets defined by this specification
are As specified normatively elsewhere,
Conforming processors
A reference to an
The descriptions of individual facets given
below include both constraints on
The preceding paragraph does not forbid implementations from attempting
to make use of such partial information as they have about
For
For
Constraining a
The following is the definition of a
If
The
The XML representation for a
value
fixed
A value in a
if the
if
if
if
if the
The use of
If It is an error for the there is It is an error for the there is
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in a
if the
if
if
if
if the
The use of
If both
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in a
if the
if
if
if
if the
The use of
It is an
An XML
Constraining a
The following is the definition of a
The XML representation for a
value
there is only one
the value
the concatenation of the value
the
the union of that
just {
The
Thus, to impose two
The value
If multiple
It is a consequence of the schema representation constraint
Thus, to impose two pattern constraints simultaneously, schema authors may either write a single pattern which expresses the intersection of the two patterns they wish to impose, or define each pattern on a separate type derivation step.
A
As noted in
It is an
For components constructed from XML representations in schema documents,
the satisfaction of this constraint is a consequence of the XML mapping rules:
any pattern imposed by a simple type definition S will always
also be imposed by any type derived from S by born-binary
components) similarly preserve
Constraining a
The following example is a
The XML representation for an
value
there is only one
a set with one member, the value
a set of the value
The anySimpleType
The value
If multiple
A value in a
As specified normatively elsewhere, for purposes of checking enumerations, no distinction is made between an atomic value V and a list of length one containing V as its only item.
In this question, the behavior of this specification is thus
the same as the behavior specified by
It is an
No normalization is done, the value is not changed (this is the
behavior required by
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
After the processing implied by
The notation #xA used here (and elsewhere in this specification)
represents the Universal Character Set (UCS) code point
hexadecimal A
(line feed), which is denoted by
U+000A. This notation is to be distinguished from


, which is the XML
collapse
and cannot be changed by a schema author; for
preserve
; for any type collapse
and cannot
be changed by a schema author. For all datatypes
For more information on
Constraining a
The following example is the
The values
and
replace
may appear to provide a
convenient way to collapse
unwrap
text (i.e. undo the effects of
pretty-printing and word-wrapping). In some cases, especially
highly constrained data consisting of lists of artificial tokens
such as part numbers or other identifiers, this appearance is
correct. For natural-language data, however, the whitespace
processing prescribed for these values is not only unreliable but
will systematically remove the information needed to perform
unwrapping correctly. For Asian scripts, for example, a correct
unwrapping process will replace line boundaries not with blanks but
with zero-width separators or nothing. In consequence, it is
normally unwise to use these values for natural-language data, or
for any data other than lists of highly constrained tokens.
If
The XML representation for a
value
fixed
There are no
It is an
In order of increasing restrictiveness, the
legal values for the
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value
in an
if the
if the
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
fixed
A value
in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
fixed
A value in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
fixed
A value in an
if the
if the
It is an
It is an
For
The
The term
If
The XML representation for a
value
fixed
A value in a
that value is expressible as
A value v
is facet-valid with respect to a
It is an
It is an
The term
The following is the definition of a
If
The XML representation for a
value
fixed
A value
It is an
It is an
The following is the definition of a
The following example defines the datatype triple
,
whose
The same datatype can be defined without the use of assertions, but the pattern necessary to represent the set of triples is long and error-prone:
The assertion used in the first version of triple
is
likely to be clearer for many readers of the schema document.
The XML representation for an
If the
Annotations specified within an
The following rule refers to
the nearest built-in
datatypeXDM representation
of a value
under a datatype
If T = If T . If T . If T . If the Because the xs:anySimpleType
xs:anyAtomicType
xs:untypedAtomic
.
A value V
is facet-valid with respect to an
true
under the
conditions laid out below, without raising any
Evaluation of {test} is performed as defined in
The XPath expression {test} is evaluated,
following the rules given in
The The XDM type label There is no In the terminology of
As a consequence the expression The variable There is likewise no value for the
The If V has no expanded QName
of that member has type
of the member is
anyAtomicType*
anyAtomicType*
simply says
that for static typing purposes the variable $value
will have a value consisting of a sequence of zero or more
atomic values.
undefined
.
can be
used to refer to the value being checked.
$value
expanded QName
of that member has value
of the member is
The evaluation result is converted to either true
or
false
as if by a call to the XPath
The
For components constructed from XML representations in schema documents,
the satisfaction of this constraint is a consequence of the XML mapping rules:
any assertion imposed by a simple type definition S will always
also be imposed by any type derived from S by born-binary
components) similarly preserve
The following
The same effect could also be achieved using the
If
It is a consequence of
The XML representation for an
value
fixed
A The The The
If the
The effect of this rule is to allow datatypes with
a
This specification describes two levels of conformance for datatype processors. The first is required of all processors. Support for the other will depend on the application environments for which the processor is intended.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be
In the usual case, it will embedded in a
Certain aspects of the behavior of conforming
processors are described in this specification as
When
This specification imposes certain constraints on the
embedding of host language
as the subject
of the verb.
For convenience, the most important of these constraints
are noted here: Host languages Host languages If user-defined datatypes are to be supported in the host language,
then the host language
In addition, host languages
Support all the Completely and correctly implement all of
the Completely and correctly implement all of
the
Implementations claiming Accept simple type definitions in the form specified in
Completely and correctly implement all of
rules governing the XML representation of simple type definitions
specified in Map the XML representations of simple type definitions to
simple type definition components as specified in the mapping
rules given in
The term
Abstract representations of simple type definitions conform to this
specification if and only if they obey all of the
XML representations of simple type definitions conform to this specification if they obey all of the applicable rules defined in this specification.
Because the conformance of the resulting simple type definition
component depends not only on the XML representation of a given
simple type definition, but on the properties of its
Some
When this specification is used in the context of other languages
(as it is, for example, by
When presented with a literal or value exceeding the capacity of its partial implementation of a datatype, a minimally conforming implementation of this specification will sometimes be unable to determine with certainty whether the value is datatype-valid or not. Sometimes it will be unable to represent the value correctly through its interface to any downsteam application.
When either of these is so, a conforming processor
This specification does not constrain the method used to indicate that a literal or value in the input data has exceeded the capacity of the implementation, or the form such indications take.
These are the partial-implementation
All All All All All
The XML representation of the datatypes-relevant
part of the schema for schema documents is presented here
as a normative
part of the specification.
targetNamespace
attribute on the schema
element refers to the XML Schema namespace
itself.
Schema documents conforming to this specification may be in XML
1.0 or XML 1.1. Conforming implementations may accept input in
XML 1.0 or XML 1.1 or both. See
The DTD for the datatypes-specific
aspects of schema documents is given below. Note there is
schema
The following, although in the form of a
schema document, does not conform to the rules for schema documents
defined in this specification. It contains explicit XML
representations of the primitive datatypes which need not be declared
in a schema document, since they are automatically included in every
schema, and indeed must not be declared in a schema document, since it
is forbidden to try to derive types with
The following, although in the form of a schema document, contains XML representations of components already present in all schemas by definition. It is included here as a form of documentation.
These datatypes do not need to be declared in a schema document, since they are automatically included in every schema.
It is an open question whether this and similar XML documents should be accepted or rejected by software conforming to this specification. The XML Schema Working Group expects to resolve this question in connection with its work on issues relating to schema composition.
In the meantime, some existing schema processors will accept declarations for them; other existing processors will reject such declarations as duplicates.
Some datatypes, such as
A
In the case of
The following standard operators are defined here
in case the reader is unsure of their definition:
n the greatest integer
.
There are several different primitive but related datatypes defined in the specification which pertain to various combinations of dates and times, and parts thereof. They all use related value-space models, which are described in detail in this section. It is not difficult for a casual reader of the descriptions of the individual datatypes elsewhere in this specification to misunderstand some of the details of just what the datatypes are intended to represent, so more detail is presented here in this section.
All of the value spaces for dates and times
described here represent moments or periods of time in Universal
Coordinated Time (UTC).
There are two distinct ways to model moments in time: either
by tracking their year, month, day, hour, minute and second (with
fractional seconds as needed), or by tracking
their time (measured generally in seconds or
days) from some starting moment. Each has
its advantages. The two are isomorphic. For
definiteness, we choose to model the first
using five
There is also a seventh
Non-negative values of the properties map
to the years, months, days of month, etc. of the Gregorian
calendar in the obvious way.
Values less than 1582 in the
In version 1.0 of this specification, the
old style
calendar; the two correspond
approximately but not exactly to each other.
In this version of this specification,
two changes are made in order to agree with existing usage.
First,
Note that 1 BCE, 5 BCE, and so on (years 0000, −0004, etc. in the
lexical representation defined here) are leap years in the proleptic
Gregorian calendar used for the date/time datatypes defined here.
Version 1.0 of this specification was unclear about the treatment of
leap years before the common era
The model just described is called herein the
seven-property
model for date/time
datatypes. It is used
Leap-seconds are not permitted
While calculating, property values from the
Each fragment other than
The redundancy between
The following fragment
The more important functions and
procedures defined here are summarized in the
text When there is a text summary, the name of the function in each is a
The following functions are used with various numeric and date/time datatypes.
0 when d =
1 when d =
2 when d =
−
−
etc.
s0 = i and
sj+1 = sj
s0 = f − 10 , and
sj+1 = (sj
For example:
123.4567
Set d to
Return d.
If d is an integer, then return
Otherwise, return
s be an
c be a nonnegative
e be an
Set s to −1 when nV < 0 .
So select e that
2cWidth × 2(e−1)
So select c that (c − 1) × 2e ≤ |nV | <c × 2e .
when eMax < e otherwise: When e < eMin Set e = eMin So select c that
(c − 1) × 2e ≤ |nV | <c × 2e .
Set nV to c × 2e
when
|nV | > c × 2e − 2(e−1) ; (c − 1) × 2e
when
|nV | < c × 2e − 2(e−1) ; c × 2e or
(c − 1) × 2e
according to whether c is even
or c − 1 is even, otherwise (i.e.,
|nV | = c × 2e − 2(e−1) ,
the midpoint between the two values). Return
s × nV
when nV < 2cWidth × 2eMax,
Implementers will find the algorithms of
Return
otherwise (LEX is a numeral): Set nV to Set nV to
Return: When nV is zero: nV
otherwise.
This specification permits the substitution of any other rounding algorithm
which conforms to the requirements of
Return
otherwise (LEX is a numeral): Set nV to Set nV to
Return: When nV is zero: nV
otherwise.
This specification permits the substitution of any other rounding algorithm
which conforms to the requirements of
l be a nonnegative
s be an
c be a positive
e be an
Return
return
return
otherwise (f is numeric and non-zero): Set s to −1 when
f < 0 . Let c be the smallest Let e be log10(|f | / c)
(so that
|f | = c × 10e ). Let l be the largest nonnegative integer for which
c × 10e =
Return
l be a nonnegative
s be an
c be a positive
e be an
Return
return
return
otherwise (f is numeric and non-zero): Set s to −1 when
f < 0 . Let c be the smallest Let e be log10(|f | / c)
(so that
|f | = c × 10e ). Let l be the largest nonnegative integer for which
c × 10e =
Return
y be m be h be m be s be d be t be 0 if Y is not present, − 0 if D is not present, − − − y be ym m be ym the empty string ( the empty string ( the empty string ( the empty string ( the empty string ( d is
ss h is
(ss m is
(ss s is
ss m be v's s be v's sgn be sgn & sgn & sgn & m be ym's sgn be s be dt's sgn be
When adding and subtracting numbers from date/time properties, the
immediate results may not conform to the limits specified.
Accordingly, the following procedures are used to
Add (mo − 1)
Set mo to
(mo − 1) Repeat until da is positive and not greater than
If da exceeds
Subtract that limit from da. Add 1 to mo. If da is not positive then:
Subtract 1 from mo. Add the new upper limit from the table to da. Add mi Set mi to mi Add hr Set hr to hr Add se Set se to
28 when m is 2 and
y is not evenly divisible by 4,
or is evenly divisible by 100 but not by 400,
or is 29 when m is 2 and
y is evenly divisible by 400,
or is evenly divisible by 4 but not by 100, 30 when m is 4, 6, 9, or 11, 31 otherwise (m is 1, 3, 5, 7, 8, 10, or 12) dt be an instance of the Set the Set the Set the Set the Set the Set the Set the Return dt.
Given a
Essentially, this calculation
Leap seconds are
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and—most importantly—be stable over time.
A definition that attempted to take leap-seconds into account would
need to be constantly updated, and could not predict the results of
future implementation's additions. The decision to introduce a leap
second in
yr be dt's
mo be dt's
da be dt's
hr be dt's
mi be dt's
se be dt's
tz be dt's
Add du's
Set da
to min(da,
Add du's
Return
This algorithm may be applied to date/time types
other than
For each
Call the function.
For each property
dateTime | duration | result |
---|---|---|
2000-01-12T12:13:14Z | P1Y3M5DT7H10M3.3S | 2001-04-17T19:23:17.3Z |
2000-01 | -P3M | 1999-10 |
2000-01-12 | PT33H | 2000-01-13 |
(2000-03-30 + P1D) + P1M = 2000-03-31 + P1M = 2000-04-30
(2000-03-30 + P1M) + P1D = 2000-04-30 + P1D = 2000-05-01
yr be
mo be 12 or
dt's
da be
hr be 0 or
dt's
mi be 0 or
dt's
se be 0 or
dt's
Subtract
( Set ToTl to
31536000 × yr .
(Leap-year Days, Add 86400 ×
(yr Add 86400 × Add 86400 × da to
ToTl.
( Add 3600 × hr +
60 × mi + se
to ToTl.
Return ToTl.
0 when TZ is
−(
There is no
DT when
DT &
T when
T &
D when
D &
YM when ym's
YM &
MD when md's
MD &
The following functions are used with various datatypes neither numeric nor date/time related.
The
The auxiliary functions
0000 when d =
0001 when d =
0010 when d =
0011 when d =
...
1110 when d =
1111 when d =
The
...
The following table shows the values of the fundamental facets
for each
The
C -- represents a digit used in the thousands and hundreds components, the "century" component, of the time element "year". Legal values are from 0 to 9.
Y -- represents a digit used in the tens and units components of the time element "year". Legal values are from 0 to 9.
M -- represents a digit used in the time element "month". The two digits in a MM format can have values from 1 to 12.
D -- represents a digit used in the time element "day". The two digits in a DD format can have values from 1 to 28 if the month value equals 2, 1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30 if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value equals 1, 3, 5, 7, 8, 10 or 12.
h -- represents a digit used in the time element "hour". The two digits in a hh format can have values from 0 to 24. If the value of the hour element is 24 then the values of the minutes element and the seconds element must be 00 and 00.
m -- represents a digit used in the time element "minute". The two digits in a mm format can have values from 0 to 59.
s -- represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31
For all the information items indicated by the above characters, leading zeroes are required where indicated.
In addition to the above, certain characters are used as designators and appear as themselves in lexical formats.
T -- is used as time designator to indicate the start of the
representation of the time of day in
Z -- is used as time-zone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time (
In the lexical format for
P -- is used as the time duration designator, preceding a data element representing a given duration of time.
Y -- follows the number of years in a time duration.
M -- follows the number of months or minutes in a time duration.
D -- follows the number of days in a time duration.
H -- follows the number of hours in a time duration.
S -- follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for
The year "0000" is an illegal year value.
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of
The lexical representations for the datatypes
Given a
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(-1,3) = -1 fQuotient(0,3)...fQuotient(2,3) = 0 fQuotient(3,3) = 1 fQuotient(3.123,3) = 1
modulo(a, b) = a - fQuotient(a,b)*b
modulo(-1,3) = 2 modulo(0,3)...modulo(2,3) = 0...2 modulo(3,3) = 0 modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a - low, high - low)
fQuotient(0, 1, 13) = -1 fQuotient(1, 1, 13) ... fQuotient(12, 1, 13) = 0 fQuotient(13, 1, 13) = 1 fQuotient(13.123, 1, 13) = 1
modulo(a, low, high) = modulo(a - low, high - low) + low
modulo(0, 1, 13) = 12 modulo(1, 1, 13) ... modulo(12, 1, 13) = 1...12 modulo(13, 1, 13) = 1 modulo(13.123, 1, 13) = 1.123
maximumDayInMonthFor(yearValue, monthValue) =
M := modulo(monthValue, 1, 13) Y := yearValue + fQuotient(monthValue, 1, 13) Return a value based on M and Y:
31 | M = January, March, May, July, August, October, or December | |
30 | M = April, June, September, or November | |
29 | M = February AND (modulo(Y, 400) = 0 OR (modulo(Y, 100) != 0) AND modulo(Y, 4) = 0) | |
28 | Otherwise |
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is
Leap seconds are
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and -- most importantly -- be stable over time.
A definition that attempted to take leap-seconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in
The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).
temp := S[month] + D[month] E[month] := modulo(temp, 1, 13) carry := fQuotient(temp, 1, 13)
E[year] := S[year] + D[year] + carry
E[zone] := S[zone]
temp := S[second] + D[second] E[second] := modulo(temp, 60) carry := fQuotient(temp, 60)
temp := S[minute] + D[minute] + carry E[minute] := modulo(temp, 60) carry := fQuotient(temp, 60)
temp := S[hour] + D[hour] + carry E[hour] := modulo(temp, 24) carry := fQuotient(temp, 24)
if S[day] > maximumDayInMonthFor(E[year], E[month])
tempDays := maximumDayInMonthFor(E[year], E[month]) else if S[day] < 1
tempDays := 1 else
tempDays := S[day] E[day] := tempDays + D[day] + carry E[day] := E[day] + maximumDayInMonthFor(E[year], E[month] - 1) carry := -1 E[day] := E[day] - maximumDayInMonthFor(E[year], E[month]) carry := 1 temp := E[month] + carry E[month] := modulo(temp, 1, 13) E[year] := E[year] + fQuotient(temp, 1, 13)
dateTime | duration | result |
---|---|---|
2000-01-12T12:13:14Z | P1Y3M5DT7H10M3.3S | 2001-04-17T19:23:17.3Z |
2000-01 | -P3M | 1999-10 |
2000-01-12 | PT33H | 2000-01-13 |
Time durations are added by simply adding each of their fields, respectively, without overflow.
The order of addition of durations to instants
((dateTime + duration1) + duration2) != ((dateTime + duration2) + duration1)
(2000-03-30 + P1D) + P1M = 2000-03-31 + P1M = 2000-04-30
(2000-03-30 + P1M) + P1D = 2000-04-30 + P1D = 2000-05-01
A
Unlike some popular regular expression languages (including those
defined by Perl and standard Unix utilities), the regular
expression language defined here implicitly anchors all regular
expressions at the head and tail, as the most common use of
regular expressions in A
Z
In regular expression languages that are not implicitly anchored at the head and tail,
it is customary to write the equivalent regular expression as:
^A.*Z$
In those rare cases where an unanchored match is desired, including
.*
A
(#x41
)
|
For all
| Denoting the set of strings
|
---|---|
(empty string) | |
S | all strings in
|
S | | all strings in
|
For all
| Denoting the set of strings
|
---|---|
S | all strings in
|
ST | all strings
|
For all
| Denoting the set of strings
|
---|---|
S | all strings in
|
S? | the empty string, and all strings in
|
S * | ? )* ) |
S+ | * ) |
S{ n, m} | { n−1, m−1} ) |
S{ n} | { n, n} ) |
{ n,} | { n} S * ) |
S {0, m} | ? ){ 0, m−1} ) |
S{0,0} |
The regular expression language in the Perl Programming Language
S
{,
m}
,
since it is logically equivalent to
S
{0,
m}
.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification.
?
, *
,
+
, {n,m}
or
{n,}
,{
n,
m}
or {
n,}
,
For all
| Denoting the set of strings
|
---|---|
the single string consisting only of
| |
C | all strings in
|
( S ) |
.
, \
,
?
, *
, +
, {
,
}
(
,
)
,
[
or
]
Note that a
A character class is either
The rules for which characters must be escaped and which
can represent themselves are different when inside a
[
]
[
G ]
[
G]
)
If the first character in a
For example, the string
The string
A
For any -
C
For all
|
Identifying the set of characters
|
---|---|
R | all characters in
|
all characters in
| |
all characters in
| |
all characters in
|
^
^
P)
-
character.
For any
If a If the hyphen is immediately followed by
If the hyphen is immediately followed by
If the hyphen is immediately followed by
Otherwise, the hyphen
If the hyphen is followed by any other character sequence,
then the string in which it occurs
is not recognized as a regular expression.
The rule just given resolves what would otherwise
be the ambiguous interpretion of some strings, e.g.
A single XML character is a
The [
, ]
, -
and \
characters are not
valid character ranges;
The ^
character is only valid at the beginning of a
-
character is a
valid character range only at the beginning
or end of a
The grammar for
A -
e
s is a
s is not \
If s is the first character in a ^
\
or [
; and
The code point of
The code point of a
A single unescaped character
(
A single unescaped character identifies the singleton set of characters containing that character alone.
A single escaped character
The valid
| Identifying the set of characters
|
---|---|
\n | the newline character (#xA) |
\r | the return character (#xD) |
\t | the tab character (#x9) |
\\ | \ |
\| | | |
\. | . |
\- | - |
\^ | ^ |
\? | ? |
\* | * |
\+ | + |
\{ | { |
\} | } |
\( | ( |
\) | ) |
\[ | [ |
\] | ] |
X
\p{X}
\p{
X}
\P{X}
\P{
X}
[\P{X}]
= [^\p{X}]
[\P{X}]
= [^\p{X}]
.
In order to benefit from continuing work on the Unicode database,
a conforming implementation might by default use the latest supported
version of the character properties. In order to maximize consistency
with other implementations of this specification, however, an
implementation might choose to provide
PropertyAliases.txt
and PropertyValueAliases.txt
files of
the Unicode database may be helpful to implementors in this connection.
N
, the union of Nd
, Nl
and
No
.
As of this publication the Java regex
library does Cn
in its definition of
C
, so that definition cannot be used without modification
in conformant implementations.
Category | Property | Meaning |
---|---|---|
Letters | L | All Letters |
Lu | uppercase | |
Ll | lowercase | |
Lt | titlecase | |
Lm | modifier | |
Lo | other | |
Marks | M | All Marks |
Mn | nonspacing | |
Mc | spacing combining | |
Me | enclosing | |
Numbers | N | All Numbers |
Nd | decimal digit | |
Nl | letter | |
No | other | |
Punctuation | P | All Punctuation |
Pc | connector | |
Pd | dash | |
Ps | open | |
Pe | close | |
Pi | initial quote (may behave like Ps or Pe depending on usage) | |
Pf | final quote (may behave like Ps or Pe depending on usage) | |
Po | other | |
Separators | Z | All Separators |
Zs | space | |
Zl | line | |
Zp | paragraph | |
Symbols | S | All Symbols |
Sm | math | |
Sc | currency | |
Sk | modifier | |
So | other | |
Other | C | All Others |
Cc | control | |
Cf | format | |
Co | private use | |
Cn | not assigned |
The properties mentioned above exclude the Cs
Cs
X
\p{IsX}
\P{IsX}
[\P{IsX}]
=
[^\p{X}]
\p{IsX}
(using lower-case
\P{IsX}
(using upper-case
[\P{Is
X}]
=
[^\p{Is
X}]
.
For example, the
Current versions of the Unicode database recommend that whenever
block names are being matched hyphens, underbars, and white space
should be dropped and letters folded to a single case, so both the
string Basic Latin
.
The handling of block names in block escapes differs from this
behavior in two ways. First, the normalized block names defined in
this specification do not suppress hyphens in the Unicode block
names and do not level case distinctions. The normalized form of the
block name
Start Code | End Code | Block Name | Start Code | End Code | Block Name | |
---|---|---|---|---|---|---|
#x0000 | #x007F | BasicLatin | #x0080 | #x00FF | Latin-1Supplement | |
#x0100 | #x017F | LatinExtended-A | #x0180 | #x024F | LatinExtended-B | |
#x0250 | #x02AF | IPAExtensions | #x02B0 | #x02FF | SpacingModifierLetters | |
#x0300 | #x036F | CombiningDiacriticalMarks | #x0370 | #x03FF | Greek | |
#x0400 | #x04FF | Cyrillic | ||||
#x0530 | #x058F | Armenian | #x0590 | #x05FF | Hebrew | |
#x0600 | #x06FF | Arabic | #x0700 | #x074F | Syriac | |
#x0780 | #x07BF | Thaana | ||||
#x0900 | #x097F | Devanagari | #x0980 | #x09FF | Bengali | |
#x0A00 | #x0A7F | Gurmukhi | #x0A80 | #x0AFF | Gujarati | |
#x0B00 | #x0B7F | Oriya | #x0B80 | #x0BFF | Tamil | |
#x0C00 | #x0C7F | Telugu | #x0C80 | #x0CFF | Kannada | |
#x0D00 | #x0D7F | Malayalam | #x0D80 | #x0DFF | Sinhala | |
#x0E00 | #x0E7F | Thai | #x0E80 | #x0EFF | Lao | |
#x0F00 | #x0FFF | Tibetan | #x1000 | #x109F | Myanmar | |
#x10A0 | #x10FF | Georgian | #x1100 | #x11FF | HangulJamo | |
#x1200 | #x137F | Ethiopic | ||||
#x13A0 | #x13FF | Cherokee | #x1400 | #x167F | UnifiedCanadianAboriginalSyllabics | |
#x1680 | #x169F | Ogham | #x16A0 | #x16FF | Runic | |
#x1780 | #x17FF | Khmer | #x1800 | #x18AF | Mongolian | |
#x1E00 | #x1EFF | LatinExtendedAdditional | #x1F00 | #x1FFF | GreekExtended | |
#x2000 | #x206F | GeneralPunctuation | #x2070 | #x209F | SuperscriptsandSubscripts | |
#x20A0 | #x20CF | CurrencySymbols | #x20D0 | #x20FF | CombiningMarksforSymbols | |
#x2100 | #x214F | LetterlikeSymbols | #x2150 | #x218F | NumberForms | |
#x2190 | #x21FF | Arrows | #x2200 | #x22FF | MathematicalOperators | |
#x2300 | #x23FF | MiscellaneousTechnical | #x2400 | #x243F | ControlPictures | |
#x2440 | #x245F | OpticalCharacterRecognition | #x2460 | #x24FF | EnclosedAlphanumerics | |
#x2500 | #x257F | BoxDrawing | #x2580 | #x259F | BlockElements | |
#x25A0 | #x25FF | GeometricShapes | #x2600 | #x26FF | MiscellaneousSymbols | |
#x2700 | #x27BF | Dingbats | ||||
#x2800 | #x28FF | BraillePatterns | ||||
#x2E80 | #x2EFF | CJKRadicalsSupplement | #x2F00 | #x2FDF | KangxiRadicals | |
#x2FF0 | #x2FFF | IdeographicDescriptionCharacters | #x3000 | #x303F | CJKSymbolsandPunctuation | |
#x3040 | #x309F | Hiragana | #x30A0 | #x30FF | Katakana | |
#x3100 | #x312F | Bopomofo | #x3130 | #x318F | HangulCompatibilityJamo | |
#x3190 | #x319F | Kanbun | #x31A0 | #x31BF | BopomofoExtended | |
#x3200 | #x32FF | EnclosedCJKLettersandMonths | #x3300 | #x33FF | CJKCompatibility | |
#x3400 | #x4DB | CJKUnifiedIdeographsExtensionA | ||||
#x4E00 | #x9FFF | CJKUnifiedIdeographs | #xA000 | #xA48F | YiSyllables | |
#xA490 | #xA4CF | YiRadicals | ||||
#xAC00 | #xD7A | HangulSyllables | ||||
#xE000 | #xF8FF | PrivateUse | ||||
#xF900 | #xFAFF | CJKCompatibilityIdeographs | #xFB00 | #xFB4F | AlphabeticPresentationForms | |
#xFB50 | #xFDFF | ArabicPresentationForms-A | ||||
#xFE20 | #xFE2F | CombiningHalfMarks | ||||
#xFE30 | #xFE4F | CJKCompatibilityForms | #xFE50 | #xFE6F | SmallFormVariants | |
#xFE70 | #xFE | ArabicPresentationForms-B | ||||
#xFF00 | #xFFEF | HalfwidthandFullwidthForms | #xFFF0 | #xFF | Specials | |
The blocks mentioned above exclude the
HighSurrogates
LowSurrogates
HighPrivateUseSurrogates
In particular, the version of #x0370 - #x03FF: Greek #x20D0 - #x20FF: CombiningMarksforSymbols #xE000 - #xF8FF: PrivateUse #xF0000 - #xFFFFD: PrivateUse #x100000 - #x10FFFD: PrivateUse
For example, the \p{IsBasicLatin}
A tabulation of normalized block names for Unicode 2.0.0 and
later is given in
For the treatment of regular expressions
containing unrecognized Unicode block names, see
A string of the form
constitutes a \p{S}
constitutes
a \P{S}
If an unknown string of characters is used in a
category escape instead of a known character category code
or a string matching the
Any string of hyphens, digits, and Basic Latin characters
beginning with
Treating unrecognized block names as errors increases the
likelihood that errors in spelling the block name will be detected
and can be helpful in checking the correctness of schema
documents. However, it also decreases the portability of schema
documents among processors supporting different versions of
If a string
matches the
non-terminal IsX
and
\p{IsX}
each denote the set of all
characters.
Processors \P{IsX}
and
\p{IsX}
as
denoting the empty set, instead of the set of all characters.
\P{IsX}
The meaning defined for a block escape with an unrecognized
block name makes it synonymous with the regular expression
If (at
Which behavior is preferable in concrete circumstances depends on
the relative cost of failure to accept valid input (false negatives)
and failure to reject invalid input (false positives). It is for
this reason that processors are allowed to provide
Character sequence | Equivalent |
---|---|
. | [^\n\r] |
\s | [#x20\t\n\r] |
\S | [^\s] |
\i |
the set of initial name characters, those
|
\I | [^\i] |
\c |
the set of name characters, those
|
\C | [^\c] |
\d | \p{Nd} |
\D | [^\d] |
\w |
[#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
( |
\W | [^\w] |
The
The following features in this specification are
For the datatypes which depend on
For the datatypes with infinite
It is
For each
In addition, the following information
The nature of the datatype's The nature of the equality relation; in particular, how to determine
whether two values which are not identical are equal. There is no requirement that equality be distinct from identity,
but it The values of the Which of the If What URI reference (more precisely, what It is convenient if the URI for a datatype and the
For each As specified normatively elsewhere, the set of facets given values
will at the very least include the
The
The
For consistency with the
The implementor
It is
For each What properties the facet has, viewed as a schema component. For most Whether the facet is a Whether restriction of the facet takes the form of replacing
a less restrictive facet value with a more restrictive value
(as in the When an The effect of the preceding paragraph is to ensure that
a type derived by What For a For a For a The host language What URI reference (more precisely, what What element is to be used in XSD schema documents to apply the facet
in the course of The elements'
It follows from the above that each
The following features in this specification are
When multiple errors are encountered in type definitions or elsewhere,
it is
In order to align this specification with
those being prepared by the XSL and XML Query Working Groups, a new
datatype named
The
Units of length have been
An
The discussion of whitespace handling in
Conforming implementations
As noted above, positive and negative zero,
The description of the lexical spaces of
The
The character sequence
At the suggestion of the
The lexical representation
Algorithms for arithmetic involving
The treatment of leap seconds is no longer
At the suggestion of the
time zone
have been replaced with references to
time zone offset
; this
resolves issue
A number of syntactic and semantic errors in some of the regular
expressions given to describe the lexical spaces of the
The lexical mapping for times of the form
Support has been added for
To reduce confusion and avert a widespread misunderstanding,
the normative references to various W3C specifications now state
explicitly that while the reference describes the particular edition
of a specification current at the time this specification is
published, conforming implementations of this specification
are not required to ignore later editions of the other
specification but instead
References to various other specifications have also been updated.
Two new totally ordered restrictions of
The XML representations of the
Numerous minor corrections have been made in response to comments on earlier working drafts.
The treatment of topics handled both in this specification and in
Several references to other specifications have been updated to
refer to current versions of those specifications, including
Requirements for the datatype-validity of values of type
Explicit definitions have been provided for the lexical and
Schema Component Constraint
Some errors in the definition of regular-expression metacharacters have been corrected.
The descriptions of the
A warning against using the whitespace facet for tokenizing natural-language data has been added on the request of the W3C Internationalization Working Group.
In order to correct an error in
version 1 of this specification and of
The requirements of conformance have been clarified in various
ways.
The definitions of
The lexical mapping of the
The characterization of
The nature of equality and identity of lists has been clarified.
Enumerations, identity constraints, and value constraints now
The mutual relations of lists and unions have been clarified, in
particular the restrictions on what kinds of datatypes
Unions with no member types (and thus with empty
Cycles in the definitions of
A number of minor errors and obscurities have been fixed.
The listing below is for the benefit of readers of a printed version of this document: it collects together all the definitions which appear in the document above.
Co-editor Ashok Malhotra's work on this specification from March 1999 until February 2001 was supported by IBM, and from then until May 2004 by Microsoft. Since July 2004 his work on this specification has been supported by Oracle Corporation.
The work of Dave Peterson
as a co-editor of this specification was supported by IDEAlliance
(formerly GCA) through March 2004, and beginning in
April 2004 by SGML
The work of C. M. Sperberg-McQueen as a co-editor of this specification was supported by the World Wide Web Consortium through January 2009 and again from June 2010 through May 2011, and beginning in February 2009 by Black Mesa Technologies LLC.
The XML Schema Working Group acknowledges with thanks the members of other W3C Working Groups and industry experts in other forums who have contributed directly or indirectly to the creation of this document and its predecessor.
At the time this
The XML Schema Working Group has benefited in its work from the participation and contributions of a number of people who are no longer members of the Working Group in good standing at the time of publication of this Working Draft. Their names are given below. In particular we note with sadness the accidental death of Mario Jeckle shortly before publication of the first Working Draft of XML Schema 1.1. Affiliations given are (among) those current at the time of the individuals' work with the WG.