This is the
For those primarily interested in the changes since version 1.0,
the
Please send comments on this Working Draft to
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the
Patent disclosures relevant to this specification may be found
on the Working Group's
Per
The English version of this specification is the only normative
version. Information about translations of this document is available
at
How should this specification be aligned with XML 1.1? The changes in character set and name characters, and the question of what determines which ones to use, must be addressed.
Current plan is that all datatypes defined herein will have EBNF productions at least approximately defining their lexical space, and will include a nonnormative regex derived from the EBNF if a user wishes to copy it directly.
It is not possible for all datatypes to have canonical representations of all values without violating the rules of derivation or adding specialpurpose &cfacet;s which the WG does not deem appropriate. The WG has not yet decided how to deal with datatypes whose lexical and/or canonical mappings are context sensitive.
The word will probably be removed.
"Derivations" other than "derivations by restriction" will be renamed "constructions".
The Working Group has two main goals for this version of W3C XML Schema:
Significant improvements in simplicity of design and clarity of
exposition
Provision of support for versioning of XML languages defined using the XML Schema specification, including the XML transfer syntax for schemas itself.
These goals are slightly in tension with one another  the following summarizes the Working Group's strategic guidelines for changes between versions 1.0 and 1.1:
Add support for versioning (acknowledging that this
Allow bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
Allow editorial changes
Allow design cleanup to change behavior in edge cases
Allow relatively nondisruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
Allow design cleanup to change component structure (changes to functionality restricted to edge cases)
Do not allow any significant changes in functionality
Do not allow any changes to XML transfer syntax except those required by version control hooks and bug fixes
The overall aim as regards compatibility is that
All schema documents conformant to version 1.0 of this specification should also conform to version 1.1, and should have the same validation behaviour across 1.0 and 1.1 implementations (except possibly in edge cases and in the details of the resulting PSVI);
The vast majority of schema documents conformant to version 1.1 of this specification should also conform to version 1.0, leaving aside any incompatibilities arising from support for versioning, and when they are conformant to version 1.0 (or are made conformant by the removal of versioning information), should have the same validation behaviour across 1.0 and 1.1 implementations (again except possibly in edge cases and in the details of the resulting PSVI);
The
The table below offers two typical examples of XML instances in which datatypes are implicit: the instance on the left represents a billing invoice, the instance on the right a memo or perhaps an email message in XML.
Data oriented  Document oriented 



The invoice contains several dates and telephone numbers, the postal abbreviation for a state (which comes from an enumerated list of sanctioned values), and a ZIP code (which takes a definable regular form). The memo contains many of the same types of information: a date, telephone number, email address and an "importance" value (from an enumerated list, such as "low", "medium" or "high"). Applications which process invoices and memos need to raise exceptions if something that was supposed to be a date or telephone number does not conform to the rules for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the instances that are not expressible in XML DTDs. The limited datatyping facilities in XML have prevented validating XML processors from supplying the rigorous type checking required in these situations. The result has been that individual applications writers have had to implement type checking in an ad hoc manner. This specification addresses the need of both document authors and applications writers for a robust, extensible datatype system for XML which could be incorporated into XML processors. As discussed below, these datatypes could be used in other XMLrelated standards as well.
The
provide for primitive data typing, including byte, date, integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation vs. those governing an underlying information set;
allow creation of userdefined datatypes, such as datatypes that are derived from existing datatypes and which may constrain certain of its properties (e.g., range, precision, length, format).
This portion of the XML Schema Language discusses datatypes that can be
used in an XML Schema. These datatypes can be specified for element
content that would be specified as
The terminology used to describe XML Schema Datatypes is defined in the body of this specification. The terms defined in the following list are used in building those definitions and in describing the actions of a datatype processor:
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with
Conforming documents and processors are permitted to but need not behave as described.
(Of strings or names:) Two strings or names being compared must be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. No case folding is performed. (Of strings and rules in the grammar:) A string matches a grammatical production if it belongs to the language generated by that production.
Conforming documents and processors are required to behave as
described; otherwise they are in
A violation of the rules of this specification; results are undefined.
Conforming software
This specification provides three different kinds of normative statements about schema components, their representations in XML and their contribution to the schemavalidation of information items:
Constraints on the schema components themselves, i.e. conditions
components
Constraints on the representation of schema components in XML. Some but
not all of these are expressed in
Constraints expressed by schema components which information
items
This section describes the conceptual framework behind the
The datatypes discussed in this specification are
Only those operations and relations needed for schema processing are defined in this
specification. Applications using these datatypes are generally expected to implement
appropriate additional functions and/or relations to make the datatype generally
useful. For example, the description herein of the
A values
)
is influenced by the set of valuespace operations and relations used therewith.
A
A small collection of
This specification only defines the operations and relations needed for schema processing. The
choice of terminology for describing/naming the datatypes is selected to guide users and implementers
in how to expand the datatype to be generally useful—i.e., how to recognize the real world
datatypes and their variants for which the datatypes defined herein are
meant to be used for data interchange.
A
The value spaces of datatypes are abstractions, and are defined
in
In addition, other applications are expected to define additional appropriate
operations and/or relations on these value spaces (e.g., addition and multiplication
on the various numerical datatypes' value spaces), and are permitted where
appropriate to even redefine the operations and relations defined within this
specification, provided that
The
defined
enumerated outright
defined by restricting the
defined as a combination of values from one or more already defined
The relations of
The identity relation is always defined. Every value space inherently has an
identity relation. Two things are
This does not preclude implementing datatypes by using more than one
In the identity relation defined herein, values
from different
Each
On the other hand, equality need not cover the entire value space of the datatype (though it usually does).
The equality relation is used in conjunction with
order when making restrictions involving order. This is the only use of
In the prior version of
this specification (1.0), equality was always identity. This has been changed
to permit the datatypes defined herein to more closely match the
For example, the
For another example, the
In the equality relation defined herein, values
from different primitive data spaces are made artificially unequal even if they might
otherwise be considered equal. For example, there is a number
For the purposes of this specification, there is one equality relation for all values
of all datatypes (the union of the various datatype's individual equalities, if one
consider relations to be sets of ordered pairs). The
Each datatype has an order relation prescribed. This order may be a
In this specification, this lessthan order relation is denoted by
The weak order
The value spaces of primitive datatypes are abstractions, which may have values in common. In the order relation defined herein, these value spaces are made artificially incomparable. For example, the numbers two and three are values in both the decimal datatype and the float datatype. In the order relation defined herein, two in the decimal datatype and three in the float datatype are incomparable values. Other applications making use of these datatypes may choose to consider values such as these comparable.
While it is not an error to attempt to compare values from the value spaces of two different primitive datatypes, they will alway be incomparable and therefore unequal: If x and y are in the value spaces of different primitive datatypes then x &inc; y (and hence x ≠ y ).
In addition to its
For example, "100" and "1.0E2" are two different literals from the
The literals in the
The number of literals for each value has been kept small; for many datatypes there is a onetoone mapping between literals and values. This makes it easy to exchange the values between different systems. In many cases, conversion from localedependent representations will be required on both the originator and the recipient side, both for computer processing and for interaction with humans.
Textual, rather than binary, literals are used. This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common programming languages and libraries.
While the datatypes defined in this specification have, for the most part,
a single lexical representation i.e. each value in the datatype's
Should a derivation be made using a derivation mechanism that
removes
This could happen by means of a
Conversely, should a derivation remove values then their
There are currently no facets with such an impact. There may be in the future.
For example, '100' and '1.0E2' are two different
The dependencies are in Part 1; they will be resolved there. Text in this Part will reflect that canonical representation are provided for the benefit of other users, including other specifications that might want to reference these datatypes.
Given the "pattern" &cfacet;, restricting away canonical representations cannot be prohibited without undue processing expense. A warning will be inserted, and RQ129 will insure that loss of canonical representations will not affect schema processing.
While the datatypes defined in this specification generally have
a single
This decision is not yet written up herein: The four informational facets, each of which have only one property, will be lumped into one facet having four properties. This will represent a further technical change to the facet structure, but will not result in any additional or lost information in a schema.
The facets of a datatype serve to distinguish those aspects of
one datatype which
Facets are of two types:
Facets are of two kinds:
In the 1.0 version of this specification, information facets were called "fundamental facets". Information facets are not required for schema processing, but some applications use them.
All
Constraining the
All
It is useful to categorize the datatypes defined in this specification along various dimensions, forming a set of characterization dichotomies.
The first distinction to be made is that between
For example, a single token which
Several type systems (such as the one described in
A
In the above example, the value of the
When a datatype is
For each of
For
The
The
A prototypical example of a
Any number (greater than 1) of
The order in which the
For example, given the definition below, the first instance of the <size> element
validates correctly as an
The
A datatype which is
Next, we distinguish between
For example, in this specification,
A new "magic" datatype will be introduced as a child of anySimpleType and the parent of all primitive atomic datatypes.
The datatypes defined by this specification fall into both
the
In the example above,
A datatype which is
As described in more detail in
A
One datatype can be
Conceptually there is no difference between the
A datatype which is
Each builtin datatype in this specification (both
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype
For example, to address the
http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a builtin datatype definition can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype, followed by a period (".") followed by the name of the facet
For example, to address the usage of the maxInclusive facet in the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
The
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the &schemalanguage;,
such as those that do not want to know anything about aspects of the
&schemalanguage; other than the datatypes, each
http://www.w3.org/2001/XMLSchemadatatypes
This applies to both
Each
The
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formating or
ruby annotation (see
As noted in
An instance of a datatype that is defined as
The canonical representation for
The minimum will be lowered to 16 digits; a health warning will be added to indicate that optimized implementations of derived datatypes may exceed the limits of the base, but are not required to.
All
1.23, 12678967.543233, +100000.00, 210
.
The canonical representation for
The description of canonical representations for float and double needs to be cleaned up.
Two zeros will be provided similar to those in precisionDecimal
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A literal in the
The INF
, INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, 1E4, 1267.43233E12, 12.78e2, 12
, 0, 0
and INF
are all legal literals for
The canonical representation for
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the
Any value incomparable with the value used for the four bounding facets
(
This datatype differs from that of
A literal in the
The INF
, INF
and
NaN
, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, 1E4, 1267.43233E12, 12.78e2, 12
, 0, 0
and INF
are all legal literals for
The canonical representation for
precisionDecimal has been added. It is possible that precisionDecimal will replace decimal.
The WG feels that having this capability for precisionDecimal will be adequate.
The WG feels that having this capability for precisionDecimal will be adequate.
The
As explained below, the lexical
representation of the notanumber
, positive infinity
,
and negative infinity
.
Equality and order for
Two numerical
INF is equal only to itself, and is greater than
−INF and all numerical
−INF is equal only to itself, and is less than
INF and all numerical
NaN is incomparable with all values,
numerals
.) The
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
precisionDecimal has the following
fractionDigits
minFractionDigits
totalDigits
specials
maxInclusive
maxExclusive
minInclusive
minExclusive
pattern
whitespace
eunmeration
Durations can be modeled in at least two ways: as sixproperty tuples (similar to
the sevenproperty model used for other date/time datatypes) or as twoproperty tuples
(somewhat similar to the alternative oneproperty timeOnTimeline model especially useful for
16960901T00:00:00Z
16970201T00:00:00Z
19030301T00:00:00Z
19030701T00:00:00Z
These four values are chosen so as to maximize
the possible differences in results that could occur, such as the difference when adding
P1M and P30D: 16970201T00:00:00Z + P1M < 16970201T00:00:00Z + P30D ,
but 19030301T00:00:00Z + P1M > 19030301T00:00:00Z + P30D , so
that P1M <> P30D . If two
This minor anomaly is the result of having
It turns out that under the definition just given, two
Two totally ordered datatypes (
There are many ways to implement
The PnYnMnDTnHnMnS
More precisely, the
Thus, a
The ?P(((([09]+Y([09]+M)?)
(
(
(
(
(
(
(
(
(
(
(
(
The
herein
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
herein
The
All timezoned times are Coordinated Universal Time (
The date and time datatypes described in this recommendation were inspired
by
Those using this (1.0) version of this Recommendation to
represent negative years should be aware that the interpretation of lexical
representations beginning with a ''
is likely to change in
subsequent versions.
See the conformance note in
The ''? yyyy '' mm '' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?
,
where
''?
the remaining ''s are separators between parts of the date portion;
the first
'T' is a separator indicating that timeofday follows;
':' is a separator between parts of the timeofday portion;
the second
'.'
For example, 20021010T12:00:0005:00 (noon on 10 October 2002, Central Daylight Savings Time as well as Eastern Standard Time in the U.S.) is 20021010T17:00:00Z, five hours later than 20021010T12:00:00Z.
For further guidance on arithmetic with
Except for trailing fractional zero digits in the seconds representation,
'24:00:00' time representations, and timezone (for timezoned values), the mapping
from literals to values is onetoone. Where there is more than
one possible representation, the canonical representation is as follows:
The 2digit numeral representing the hour must not be '24
';
The fractional second string, if present, must not end in '0
';
for timezoned values, the timezone must be
represented with 'Z
'
(All timezoned
Timezones are durations with (integervalued) hour and minute properties (with the hour magnitude limited to at most 14, and the minute magnitude limited to at most 59, except that if the hour magnitude is 14, the minute value must be 0); they may be both positive or both negative.
The lexical representation of a timezone is a string of the form:
(('+'  '') hh ':' mm)  'Z'
,
where
'+' indicates a nonnegative duration,
'' indicates a nonpositive duration.
The mapping so defined is onetoone, except that '+00:00', '00:00', and 'Z'
all represent the same zerolength duration timezone,
When a timezone is added to a
In general, the
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "14:00") means adding the timezone 14:00 to Q, where Q did not
already have a timezone.
The ordering between two
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 20000304T23:00:00+03:00 normalizes to 20000304T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time zone, compare P and Q field by field from the year field down to the second field, and return a result as soon as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both not specified, continue to the next i
If P[i] is not specified and Q[i] is, or vice versa, stop and return P <> Q
If P[i] < Q[i], stop and return P < Q
If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare as follows:
P < Q if P < (Q with time zone +14:00)
P > Q if P > (Q with time zone 14:00)
P <> Q otherwise, that is, if (Q with time zone +14:00) < P < (Q with time zone 14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare as follows:
P < Q if (P with time zone 14:00) < Q.
P > Q if (P with time zone +14:00) > Q.
P <> Q otherwise, that is, if (P with time zone +14:00) < Q < (P with time zone 14:00)
Examples:
Determinate  Indeterminate 

20000115T00:00:00 < 20000215T00:00:00  20000101T12:00:00 <> 19991231T23:00:00Z 
20000115T12:00:00 < 20000116T12:00:00Z  20000116T12:00:00 <> 20000116T12:00:00Z 
20000116T00:00:00 <> 20000116T12:00:00Z 
Certain derived types from
Since the lexical representation allows an optional time zone
indicator,
See the conformance note in
The lexical representation for
The canonical representation for
A "date object" is an object with year, month, and day properties just like those
of
Timezoned
For example: the first moment of 20021010+13:00 is 20021010T00:00:00+13,
which is 20021009T11:00:00Z, which is also the first moment of 2002100911:00.
Therefore 20021010+13:00 is 2002100911:00;
For most timezones, either the first moment or last moment of the day (a
See the conformance note in
For the following discussion, let the "date portion" of a
The ''? yyyy '' mm '' dd zzzzzz?
where the '' yyyy '' mm '' dd 'T00:00:00' zzzzzz?
and the least upper bound of the interval is the timeline point represented
(noncanonically) by:
'' yyyy '' mm '' dd 'T24:00:00' zzzzzz?
.
The
Given a member of the
Since the lexical representation allows an optional time zone
indicator,
Because month/year combinations in one calendar only rarely correspond to month/year combinations in other calendars, values of this type are not, in general, convertible to simple values corresponding to month/year combinations in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
The lexical representation for
For example, to indicate the month of May 1999, one would write: 199905.
See also
Since the lexical representation allows an optional time zone
indicator,
Because years in one calendar only rarely correspond to years in other calendars, values of this type are not, in general, convertible to simple values corresponding to years in other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
See the conformance note in
The lexical representation for
For example, to indicate 1999, one would write: 1999.
See also
Since the lexical representation allows an optional time zone
indicator,
Because day/month combinations in one calendar only rarely correspond to day/month combinations in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
This datatype can be used to represent a specific day in a month. To say, for example, that my birthday occurs on the 14th of September ever year.
Since the lexical representation allows an optional time zone
indicator,
Because days in one calendar only rarely correspond
to days in other calendars,
The "seven property model" rewrite of date/time datatype descriptions includes a carefully crafted definition of order that insures that for repeating datatypes (time, gDay, etc.), timezoned values will be compared as though they are on the same "calendar day" ("raw" property values) so that in any given timezone, the days start at "raw" 00:00:00 and end not quite including "raw" 24:00:00. Days are not 00:00:00Z to 24:00:00Z in timezones other than Z.
Equality and order are as prescribed in
Examples that may appear anomalous (see
15 < 16 , but 15−13:00 > 16+13:00
15−11:00 = 16+13:00
15−13:00 <> 16 , because 15−13:00 > 16+14:00 and 15−13:00 < 16−14:00
Timezones do not cause wraparound at the end of the month: 31−13:00 in
one month may start after 01+13:00 in the
The lexical representation for
The lexical representations for ([02][09]3[01])((+)(0[09]1[04]):[05][09])?
The lexical mapping and canonical mapping for
This datatype can be used to represent a specific month. To say, for example, that Thanksgiving falls in the month of November.
Since the lexical representation allows an optional time zone
indicator,
Because months in one calendar only rarely correspond to months in other calendars, values of this type do not, in general, have any straightforward or intuitive representation in terms of most other calendars. This type should therefore be used with caution in contexts where conversion to other calendars is desired.
The lexical representation for
The canonical representation for
The lexical forms of az
,
AZ
, 09
, the plus sign (+), the forward slash (/) and the
equal sign (=), together with the characters defined in
For compatibility with older mail gateways,
The lexical space of
Base64Binary ::= ((B64S B64S B64S B64S)*
((B64S B64S B64S B64) 
(B64S B64S B16S '=') 
(B64S B04S '=' #x20? '=')))?
B64S ::= B64 #x20?
B16S ::= B16 #x20?
B04S ::= B04 #x20?
B04 ::= [AQgw]
B16 ::= [AEIMQUYcgkosw048]
B64 ::= [AZaz09+/]
Note that this grammar requires the number of nonwhitespace characters in the lexical
form to be a multiple of four, and for equals signs to appear only at the end of the
lexical form; strings which do not meet these constraints are not legal lexical forms
of
The above definition of the lexical space is more restrictive than that
given in
The canonical lexical form of a
Canonicalbase64Binary ::= (B64
B64 B64 B64)*
((B64 B64 B16 '=')  (B64 B04 '=='))?
For some values the canonical form defined above does not conform to
The length of a
lex2 := killwhitespace(lexform)  remove whitespace characters
lex3 := strip_equals(lex2)  strip padding characters at end
length := floor (length(lex3) * 3 / 4)  calculate length
Note on encoding:
The mapping from
Section 5.4
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed
fragment
identifiers. Because it is
impractical for processors to check that a value is a
contextappropriate URI reference, this specification follows the
lead of
The
Spaces are, in principle, allowed in the
The mapping between literals in the
The use of
It is an
For compatibility (see
The use of
This section gives conceptual definitions for all
[azAZ]{1,8}([azAZ09]{1,8})*
.
The
For compatibility (see
For compatibility (see
For compatibility (see
For compatibility (see
For compatibility (see
The
For compatibility (see
The
For compatibility (see
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The canonical representation for
The alwayszero
The lexical space is reduced from that of
herein, is that
of
The regular expression
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
The
The
pattern
eunmeration
whitespace
minInclusive
minExclusive
maxInclusive
maxExclusive
The lexical space is reduced from that of
herein, is that
of
The regular expression
Canonical mappings are not used during schema processing. They are provided in this specification for the benefit of other users of these datatype definitions who may find them useful, and for other specifications which might find it useful to reference them normatively.
The
pattern
eunmeration
whitespace
minInclusive
minExclusive
maxInclusive
maxExclusive
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
For more information on the notion of datatype (schema) components,
see
Simple Type definitions provide for:
Establishing the
Attaching a unique name (actually a
The Simple Type Definition schema component has the following properties:
Datatypes are identified by their
If
If
The value of
The value of
If
The XML representation for a
name
&iattribute;, if present,
otherwise final
&iattribute;, if present, otherwise
the &vvalue; of the
finalDefault
&iattribute; of the ancestor
the empty set;
a set with members drawn from the set above, each being present or absent depending on whether the string contains an equivalently named spacedelimited substring.
Although the finalDefault
&iattribute; of
targetNamespace
&iattribute;
of the parent schema
element information item.
A
base
&iattribute; or the
An electronic commerce schema might define a datatype called
In this case,
itemType
&iattribute;
or the
A
A system might want to store lists of floating point values.
In this case,
As mentioned in
regardless of the
For each of
memberTypes
&iattribute;, if any,
in order, followed by the
A
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
this is a test
]]>
As mentioned in
regardless of the
Unless otherwise specifically allowed by this specification
(
Either the itemType
&iattribute; or the
Either the base
&iattribute; or the
simpleType
&ichild; of the
Either the memberTypes
&iattribute; of the simpleType
&ichild;.
A value in a
the value is facetvalid with respect to the particular
A string is datatypevalid with respect to a datatype definition if:
it
if
if
if
if
if
the value denoted by the literal
The
If
If
There is a simple type definition nearly equivalent to the simple version
of the
Every
for any
there is no pair
for all
for any
for any
for any
the
On every datatype, the operation Equal is defined in terms of the equality
property of the
Note that in consequence of the above:
given
two values which are members of the
if a datatype
if a datatype
if datatypes
There is no schema component corresponding to the
(
Schema components are identified by kind. Information
is not a kind of component. Each kind of ordered
,
bounded
, etc.) is realized as a separate kind of schema component.
An
The value of any
A
for no
for all
for all
The notation
A
for all
The fact that this specification does not define an
indicating whether an
Some datatypes have a nontrivial order relation associated with
their value spaces (see
A
Some of the realworld
datatypes which are the basis for those defined herein
are ordered in some applications, even though no order is prescribed for schemaprocessing
purposes. For example, lexical
orderings. They are
When
When
When
If every member of
If every member of
the
the
the
the
every member of the
each member of the
indicating whether a
Some ordered datatypes have the property that there is one value greater than or equal to
every other value, and another that less than or equal to every other value. (In the case of derived
datatypes, these two values may not be in the value space of the derived datatype, but must be in the
value space of the primitive datatype from which they have been derived.) The
When
When
When the
It
is sometimes useful to categorize
indicating whether the
Every value space has a specific number of members. This number can be characterized as
When
When
one of
all of the following are true:
one of
one of
either of the following are true:
When the
the
at least one of
all of the following are true:
one of
one of
either of the following are true:
When the
When the
indicating whether a
Some value spaces are made up of things that are generally considered
When
When
When
The WG is considering the ramifications of removing the length &cfacet;, letting the schema document elements that currently set that facet set both minLength and maxLength instead.
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in a
if the
if
if
if
if the
The use of
If
It is an error for
the
there is type definition from which this one is derived by
one or more restriction steps in which
It is an error for
the
there is type definition from which this one is derived by
one or more restriction steps in which
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in a
if the
if
if
if
if the
The use of
If both
It is an
For
For
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in a
if the
if
if
if
if the
The use of
It is an
Constraining a
The following is the definition of a
The XML representation for a
value
&iattribute;
If multiple
It is a consequence of the schema representation constraint
Thus, to impose two
A literal in a
the literal is among the set of character sequences denoted by
the
Constraining a
The following example is a datatype definition for a
The XML representation for an
value
&iattribute;
If multiple
A value in a
It is an
No normalization is done, the value is not changed (this is the
behavior required by
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space)
After the processing implied by
The notation #xA used here (and elsewhere in this specification) represents
the Universal Character Set (UCS) code point hexadecimal A
(line feed), which is denoted by
U+000A. This notation is to be distinguished from 

,
which is the XML
collapse
and cannot be changed by a schema author; for
preserve
; for any type collapse
and cannot
be changed by a schema author. For all datatypes
For more information on
Constraining a
The following example is the datatype definition for
the
{preserve, replace, collapse}
.
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
There are no
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false, if present, otherwise false
A value in an
if the
if the
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
Note that the
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in an
if the
if the
It is an
It is an
It is an
Constraining a
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in an
if the
if the
It is an
It is an
The term
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in a
that value is expressible as
It is an
The term
The following is the definition of a
If
The XML representation for a
value
&iattribute;
fixed
&iattribute;, if present, otherwise false
A value in a
that value is expressible as
It is an
It is an
This specification describes two levels of conformance for datatype processors. The first is required of all processors. Support for the other will depend on the application environments for which the processor is intended.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be
All YYYY
) and a minimum fractional second precision of milliseconds or three decimal digits (i.e. s.sss
). However,
Some datatypes, such as
In this document, the arguments to functions are assumed to be
Properties always have values.
Those values that are more primitive, and are used (among other things) herein to
construct object value spaces but which we do not explicitly define are described here:
A
The following standard operators are defined here in case the reader is unsure of their definition:
n the greatest integer in n
.
Numbers are sometimes thought of as including both a numerical value and a
five plusorminus two
or two million to the
nearest thousand
.
There is a smaller class of plusorminus
in order to indicate their
precision. They indicate their precision by the number of digits to the
right of the decimal point. 5.0 has precision plusorminus 0.05, but
5.00 has precision plusorminus 0.005.
There is also a kind of precision
where the plusorminus is expressed as a percentage (or other proportion) of
the numerical value, rather than an exact value: 15 plusorminus
10 percent
or 15000 plusorminus 10 percent
, where the
same percentage indicates a different absolute precision depending on the
size. This kind of precision is properly called geometric
precision
; the absolute precision first described is properly called
arithmetic precision
.
A close approximation to geometric precision also can, for some combinations
of numerical value and precision, be indicated without the
plusorminus
: The precision is indicated by the total
number of digits (not counting leading zero digits). 5.0 has precision
plusorminus 1 percent but 5.00 has precision plusorminus onetenth percent.
Geometric precision doesn't quite match with the digit count. 5.0 and 50
both have precision plusorminus 1 percent but 1.5 and 15 both have precision
plusorminus 3 percent. For various reasons we choose to call this digitcount
precision floatingpoint precision
.
The
One point needs to be made about the notations and the precisions they can
indicate. It's impossible for ordinary decimal notation to indicate a
positive arithmetic precision (as in one million to the nearest thousand
);
this needs
Much of the material defining the various date/time datatypes is found here and is or will be referenced
in the sections defining each individual date/time datatype. See e.g.
There are several different primitive but related datatypes defined in the specification which pertain to various combinations of dates and times, and parts thereof. They all use related valuespace models, which are described in detail in this section. It is not difficult for a casual reader of the descriptions of the individual datatypes elsewhere in this specification to misunderstand some of the details of just what the datatypes are intended to represent, so more detail is presented here in this section.
There are various concepts involving dates (counting days) and times (counting moments) that have developed over the millenia. This section does not pretend to be a complete tutorial on the history; it only discusses the methods which are necessary to understand just which set of the possible reasonable choices has been adopted for Schema date/time datatypes.
A day is, at least approximately, the time of one rotation of the Earth about
its axis with respect to the Sun. Each day is divided into 24 hours; each hour
into 60 minutes, and each minute zenith
)
Thus a day is (usually) 86400 (= 60 × 60 × 24) seconds.
real
time: One day is (exactly, or at least as close as can be astronomically
measured) one revolution of the Earth about its axis with respect to the Sun. The day is
divided into 86400 equallength seconds, which may vary in length from day to day.
TAI seconds are all the same length, and there are exactly 86400 seconds in each day.
UT1 seconds vary in length, but there are exactly 86400 seconds each day. Days always have the sun at zenith at noon in Greenwich, England. (As a historical note, the TAI second, defined in 1956 in terms of the excitation frequency of Cesium atoms, was chosen to be the average length of a UT1 second during the year 1900.)
Noon of TAI days do not necessarily match the Sun at the zenith. In 1958, TAI was promulgated and synchronized with UT1. Since then, the difference has been slowly increasing, with a given number of seconds from that date measured in UT1 coming later than that same number measured in TAI.
As of the writing of this specification, leapseconds have been added to
Date  Number of Leapseconds  Date  Number of Leapseconds 
19601231  1.422818  19751231  1 
19610731  0.224752  19761231  1 
19610131  0.198288  19771231  1 
19631030  0.8514208  19781231  1 
19631231  0.0685152  19891231  1 
19640331  0.217936  19810630  1 
19640831  0.298288  19820630  1 
19640131  0.258112  19830630  1 
19650228  0.176464  19850630  1 
19650630  0.258112  19871231  1 
19650831  0.180352  19891231  1 
19651231  0.158112  19901231  1 
19680131  1.872512  19920630  1 
19711231  3.814318  19930630  1 
19720630  1  19940630  1 
19721231  1  19951231  1 
19731231  1  19970630  1 
19741231  1  19981231  1 
standard
times.
There are inherently no precise measurements of the difference between
UT1 on the one hand and proleptic (i.e., used to measure times prior to their
adoption) TAI and
Schema date/time datatypes (except
Once one decides on how many seconds are in each day, one must also count the days—and
months and years. The standard used for Schema date/time datatypes is the socalled
Gregorian calendar
. Since days are (generally) 86400 seconds, and one
wants each year to correspond to one complete cycle of the Earth around the Sun (which
is not exactly a multiple of 86400 seconds), and traditionally months have various numbers
of days, the following algorithm was chosen to determine which days
fell in which months in which years: Counting from an agreedupon arbitrary day, years are
numbered consecutively, each year has 12 months (numbered 1 through 12, as well as named)
within it, and each day has between 28 and 31 days (also numbered from 1), depending on
the month and year according to the following table:
Month  Nbr of Days 

1 (January)  31 
2 (February)  If the associated year is divisble by 400, or by 4 but not 100, then 29; otherwise 28 
3 (March)  31 
4 (April)  30 
5 (May)  31 
6 (June)  30 
7 (July)  31 
8 (August)  31 
9 (September)  30 
10 (October)  31 
11 (November)  30 
12 (December)  31 
For example, the three numbers (year, month, and day) for 20 January 2003 (20030120) are 2003, 1, and 20 respectively.
The following rewrite includes allowing year 0000 (1 BCE) and redefining all the lexical representations with negative years from that specified in Schema 1.0, as warned in a Note in Schema 1.0 2E. A formal Note calling attention to this change elsewhere in the "normative" part of this specification will be added.
The count of years, months, and days were made official and locked to real
time
by decree of (the Roman Catholic) Pope Gregory in 1582 (from which comes the name
Gregorian
). Since then, and somewhat even before, days had been counted
with reasonable historical accuracy so that the Gregorian calendar algorithm can even be
used proleptically, i.e., to establish dates prior to its official adoption.
By relatively recent convention (it began to be adopted by astronomers during the 1800s), there is a year
numbered zero; this makes calculating the difference between two dates easier. The year called
1 of the Common Era
(1 CE
, or 1 AD
) is numbered one; the
preceding year is numbered zero, not minus one. (Warning: The date using the proleptic
Gregorian calendar will not generally be the same for a given day as the date using the
Julian
calendar which was in common use prior to the adoption of the
Gregorian calendar, nor will Gregorian years before the Common Era
(BCE
,
or BC
) be numbered the same as with the current standard negative numbering.)
There are also standard
schemes for numbering days without reference to months and years. The most common is
Note that the JD
All of the preceding discussion applies to real
times
Greenwich meridian
, the meridian where longitude is 0
degreesstandard
time to get the local time. The
standard
time is selected to be that where noon is when the Sun is
exactly overhead at 0 degrees longitude;
A moment in time is like a point on a line; the point does not change if we change where we put zero on the line, but the number we use to represent that point changes. Similarly, when one specifies a moment in time, one can specify the same moment regardless of which timezone one specifies, but the numbers one uses for year, month, day, hour, minute, and second will be different.
There are two distinct ways to model moments in time: either by tracking their year,
month, day, hour, minute and second (with fractional seconds as needed), or by tracking
their time (measured generally in seconds or days) from some starting moment. Each has
its advantages. The two are isomorphic; the Gregorian calendar algorithm, modified for
There is also a seventh
The model just described is called herein the sevenproperty
model for date/time
datatypes. It is used
Leapseconds are not permitted when
While calculating, property values from the
Values from any one date/time datatype using the sevencomponent model (all except
Each fragment other than
(The redundancy between
The following fragment
The more important functions and procedures defined here are summarized in the
text When there is a text summary, the name of the function in each is a
The following functions are used with various numeric and date/time datatypes.
0 when d =
1 when d =
2 when d =
−
−
etc.
s_{0} = i and
s_{j+1} = s_{j}
s_{0} = f − 10 , and
s_{j+1} = (s_{j}
For example:
123.4567
n when F is present, and
0 otherwise.
−
Set pD's
set pD's
0 when LEX is a
Set pD's
y be
m be
h be
m be
s be
d be
t be
0 if Y is not present,
−
0 if D is not present,
−
−
−
y be ym
m be ym
the empty string (
the empty string (
the empty string (
the empty string (
the empty string (
d is
ss
h is
(ss
m is
(ss
s is
ss
m be v's
s be v's
sgn be
sgn &concat;
sgn &concat;
sgn &concat;
m be ym's
sgn be
s be dt's
sgn be
When adding and subtracting numbers from date/time properties, the immediate results may not conform
to the limits specified. Accordingly, the following procedures are used to
Add (mo − 1)
Set mo to (mo − 1)
Repeat until da is positive and not greater than the limit specified in the table of day
limits in
If da exceeds the upper limit from the table then:
Subtract that limit from da.
Add 1 to mo.
If da is not positive then:
Subtract 1 from mo.
Add the new upper limit from the table to da.
Add mi
Set mi to mi
Add hr
Set hr to hr
Add se
Set se to se
Add 60 × mi + 3600 × hr to se .
Set mi and hr to zero.
Repeat until se is nonnegative and less than 86400 plus the number of leapseconds
specified by the leapsecond table in
If se equals or exceeds 86400 plus the upper limit from the table then:
Subtract (86400 plus that leapsecond count) from se.
Add 1 to da.
If se is negative then:
Subtract 1 from da.
Add 86400 plus the new leapsecond count from the table to se.
If se is less than 86340 then:
Set mi to se
Set se to se
If se is not less than 86340 then:
Set mi to 1439.
Subtract 86340 from se.
The
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be 1971 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day limits in
hr be 0 or dt's
mi be 0 or dt's
Add
If
yr be rawYear when rawYear is not
mo be rawMo, dt's
da be rawDa, dt's
hr be rawHr, dt's
mi be rawMi, dt's
se be rawSe, dt's
If dt's
Subtract
Set
yr be 1970 when dt's
mo be 12 or dt's
da be (the limit specified in the table of day limits) − 1
or (dt's
hr be 0 or dt's
mi be 0 or dt's
(
Set ToTl to 31536000 × yr .
(Leapyear Days,
Add 86400 ×
(yr
Add 86400 × (total number of days in months less than
mo, from table in
Add 86400 × da to ToTl.
(
Add 3600 × hr + 60 × mi + se to ToTl.
Return ToTl.
0 when TZ is
−(
Set gD's
Return gD.
The following table shows the values of the fundamental facets
for each
The
C  represents a digit used in the thousands and hundreds components, the "century" component, of the time element "year". Legal values are from 0 to 9.
Y  represents a digit used in the tens and units components of the time element "year". Legal values are from 0 to 9.
M  represents a digit used in the time element "month". The two digits in a MM format can have values from 1 to 12.
D  represents a digit used in the time element "day". The two digits in a DD format can have values from 1 to 28 if the month value equals 2, 1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30 if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value equals 1, 3, 5, 7, 8, 10 or 12.
h  represents a digit used in the time element "hour". The two digits in a hh format can have values from 0 to 24. If the value of the hour element is 24 then the values of the minutes element and the seconds element must be 00 and 00.
m  represents a digit used in the time element "minute". The two digits in a mm format can have values from 0 to 59.
s  represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31
For all the information items indicated by the above characters, leading zeros are required where indicated.
In addition to the above, certain characters are used as designators and appear as themselves in lexical formats.
T  is used as time designator to indicate the start of the
representation of the time of day in
Z  is used as timezone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time (
In the lexical format for
P  is used as the time duration designator, preceding a data element representing a given duration of time.
Y  follows the number of years in a time duration.
M  follows the number of months or minutes in a time duration.
D  follows the number of days in a time duration.
H  follows the number of hours in a time duration.
S  follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for
The year "0000" is an illegal year value.
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of
The lexical representations for the datatypes
Given a
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(1,3) = 1
fQuotient(0,3)...fQuotient(2,3) = 0
fQuotient(3,3) = 1
fQuotient(3.123,3) = 1
modulo(a, b) = a  fQuotient(a,b)*b
modulo(1,3) = 2
modulo(0,3)...modulo(2,3) = 0...2
modulo(3,3) = 0
modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a  low, high  low)
fQuotient(0, 1, 13) = 1
fQuotient(1, 1, 13) ... fQuotient(12, 1, 13) = 0
fQuotient(13, 1, 13) = 1
fQuotient(13.123, 1, 13) = 1
modulo(a, low, high) = modulo(a  low, high  low) + low
modulo(0, 1, 13) = 12
modulo(1, 1, 13) ... modulo(12, 1, 13) = 1...12
modulo(13, 1, 13) = 1
modulo(13.123, 1, 13) = 1.123
maximumDayInMonthFor(yearValue, monthValue) =
M := modulo(monthValue, 1, 13)
Y := yearValue + fQuotient(monthValue, 1, 13)
Return a value based on M and Y:
31  M = January, March, May, July, August, October, or December  
30  M = April, June, September, or November  
29  M = February AND (modulo(Y, 400) = 0 OR (modulo(Y, 100) != 0) AND modulo(Y, 4) = 0)  
28  Otherwise 
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is
Leap seconds are handled by the computation by treating them as overflows. Essentially, a value of 60 seconds in S is treated as if it were a duration of 60 seconds added to S (with a zero seconds field). All calculations thereafter use 60 seconds per minute.
Thus the addition of either PT1M or PT60S to any dateTime will always produce the same result. This is a special definition of addition which is designed to match common practice, and  most importantly  be stable over time.
A definition that attempted to take leapseconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in
The following is the precise specification. These steps must be followed in the same order. If a field in D is not specified, it is treated as if it were zero. If a field in S is not specified, it is treated in the calculation as if it were the minimum allowed value in that field, however, after the calculation is concluded, the corresponding field in E is removed (set to unspecified).
temp := S[month] + D[month]
E[month] := modulo(temp, 1, 13)
carry := fQuotient(temp, 1, 13)
E[year] := S[year] + D[year] + carry
E[zone] := S[zone]
temp := S[second] + D[second]
E[second] := modulo(temp, 60)
carry := fQuotient(temp, 60)
temp := S[minute] + D[minute] + carry
E[minute] := modulo(temp, 60)
carry := fQuotient(temp, 60)
temp := S[hour] + D[hour] + carry
E[hour] := modulo(temp, 24)
carry := fQuotient(temp, 24)
if S[day] > maximumDayInMonthFor(E[year], E[month])
tempDays := maximumDayInMonthFor(E[year], E[month])
else if S[day] < 1
tempDays := 1
else
tempDays := S[day]
E[day] := tempDays + D[day] + carry
E[day] := E[day] + maximumDayInMonthFor(E[year], E[month]  1)
carry := 1
E[day] := E[day]  maximumDayInMonthFor(E[year], E[month])
carry := 1
temp := E[month] + carry
E[month] := modulo(temp, 1, 13)
E[year] := E[year] + fQuotient(temp, 1, 13)
dateTime  duration  result 

20000112T12:13:14Z  P1Y3M5DT7H10M3.3S  20010417T19:23:17.3Z 
200001  P3M  199910 
20000112  PT33H  20000113 
Time durations are added by simply adding each of their fields, respectively, without overflow.
The order of addition of durations to instants
((dateTime + duration1) + duration2) != ((dateTime + duration2) + duration1)
(20000330 + P1D) + P1M = 20000331 + P1M = 20000430
(20000330 + P1M) + P1D = 20000430 + P1D = 20000501
A
Unlike some popular regular expression languages (including those
defined by Perl and standard Unix utilities), the regular
expression language defined here implicitly anchors all regular
expressions at the head and tail, as the most common use of
regular expressions in A
(#x41) and end with the character
Z
(#x5a) would be defined as follows:
In regular expression languages that are not implicitly anchored at the head and tail, it is customary to write the equivalent regular expression as:
^A.*Z$
where "^" anchors the pattern at the head and "$" anchors at the tail.
In those rare cases where an unanchored match is desired, including
.*
at the beginning and ending of the regular expression will
achieve the desired results. For example, a datatype A
(#x41
) characters somewhere within the value could be defined as follows:

characters.
For all 
Denoting the set of strings 

(empty string)  the set containing just the empty string 
all strings in 

all strings in 
For all 
Denoting the set of strings 

all strings in 

all strings 
For all 
Denoting the set of strings 

all strings in 

the empty string, and all strings in


All strings in 

All strings 

All strings 

All strings in 

All strings in L(S{n}S*) 

All strings 

The set containing only the empty string 
The regular expression language in the Perl Programming Language
S{,m}
, since it is logically equivalent to S{0,m}
.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification.
?
, *
, +
,
{n,m}
or {n,}
, which have the meanings
defined in the table above.
For all 
Denoting the set of strings 

the single string consisting only of 

all strings in 

( 
all strings in 
.
, \
, ?
,
*
, +
, {
, }
(
, )
, [
or ]
.
These characters have special meanings in
Note that a
A character class is either a
[
and ]
characters. For all character
groups
For all 
Identifying the set of characters 

all characters in 

all characters in 

all characters in 

all characters in 
^
character.
For all

character.
For any
A single XML character is a
The [
, ]
, 
and \
characters are not
valid character ranges;
The ^
character is only valid at the beginning of a
A
\
If s is the first character in a ^
\
or [
; and
The code point of
The code point of a
The valid 
Identifying the set of characters 

\n 
the newline character (#xA) 
\r 
the return character (#xD) 
\t 
the tab character (#x9) 
\\ 
\ 
\ 
 
\. 
. 
\ 
 
\^ 
^ 
\? 
? 
\* 
* 
\+ 
+ 
\{ 
{ 
\} 
} 
\( 
( 
\) 
) 
\[ 
[ 
\] 
] 
X
,
can be identified with a \p{X}
.
The complement of this set is specified with the
\P{X}
.
([\P{X}]
= [^\p{X}]
).
The following table specifies the recognized values of the "General Category" property.
Category  Property  Meaning 

Letters  L  All Letters 
Lu  uppercase  
Ll  lowercase  
Lt  titlecase  
Lm  modifier  
Lo  other  
Marks  M  All Marks 
Mn  nonspacing  
Mc  spacing combining  
Me  enclosing  
Numbers  N  All Numbers 
Nd  decimal digit  
Nl  letter  
No  other  
Punctuation  P  All Punctuation 
Pc  connector  
Pd  dash  
Ps  open  
Pe  close  
Pi  initial quote (may behave like Ps or Pe depending on usage)  
Pf  final quote (may behave like Ps or Pe depending on usage)  
Po  other  
Separators  Z  All Separators 
Zs  space  
Zl  line  
Zp  paragraph  
Symbols  S  All Symbols 
Sm  math  
Sc  currency  
Sk  modifier  
So  other  
Other  C  All Others 
Cc  control  
Cf  format  
Co  private use  
Cn  not assigned 
The properties mentioned above exclude the Cs
property.
The Cs
property identifies "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
X
(with all white space stripped out),
can be identified with a \p{IsX}
.
The complement of this set is specified with the
\P{IsX}
.
([\P{IsX}]
= [^\p{IsX}]
).
The following table specifies the recognized block names (for more
information, see the "Blocks.txt" file in
Start Code  End Code  Block Name  Start Code  End Code  Block Name  

#x0000  #x007F  BasicLatin  #x0080  #x00FF  Latin1Supplement  
#x0100  #x017F  LatinExtendedA  #x0180  #x024F  LatinExtendedB  
#x0250  #x02AF  IPAExtensions  #x02B0  #x02FF  SpacingModifierLetters  
#x0300  #x036F  CombiningDiacriticalMarks  #x0370  #x03FF  Greek  
#x0400  #x04FF  Cyrillic  #x0530  #x058F  Armenian  
#x0590  #x05FF  Hebrew  #x0600  #x06FF  Arabic  
#x0700  #x074F  Syriac  #x0780  #x07BF  Thaana  
#x0900  #x097F  Devanagari  #x0980  #x09FF  Bengali  
#x0A00  #x0A7F  Gurmukhi  #x0A80  #x0AFF  Gujarati  
#x0B00  #x0B7F  Oriya  #x0B80  #x0BFF  Tamil  
#x0C00  #x0C7F  Telugu  #x0C80  #x0CFF  Kannada  
#x0D00  #x0D7F  Malayalam  #x0D80  #x0DFF  Sinhala  
#x0E00  #x0E7F  Thai  #x0E80  #x0EFF  Lao  
#x0F00  #x0FFF  Tibetan  #x1000  #x109F  Myanmar  
#x10A0  #x10FF  Georgian  #x1100  #x11FF  HangulJamo  
#x1200  #x137F  Ethiopic  #x13A0  #x13FF  Cherokee  
#x1400  #x167F  UnifiedCanadianAboriginalSyllabics  #x1680  #x169F  Ogham  
#x16A0  #x16FF  Runic  #x1780  #x17FF  Khmer  
#x1800  #x18AF  Mongolian  #x1E00  #x1EFF  LatinExtendedAdditional  
#x1F00  #x1FFF  GreekExtended  #x2000  #x206F  GeneralPunctuation  
#x2070  #x209F  SuperscriptsandSubscripts  #x20A0  #x20CF  CurrencySymbols  
#x20D0  #x20FF  CombiningMarksforSymbols  #x2100  #x214F  LetterlikeSymbols  
#x2150  #x218F  NumberForms  #x2190  #x21FF  Arrows  
#x2200  #x22FF  MathematicalOperators  #x2300  #x23FF  MiscellaneousTechnical  
#x2400  #x243F  ControlPictures  #x2440  #x245F  OpticalCharacterRecognition  
#x2460  #x24FF  EnclosedAlphanumerics  #x2500  #x257F  BoxDrawing  
#x2580  #x259F  BlockElements  #x25A0  #x25FF  GeometricShapes  
#x2600  #x26FF  MiscellaneousSymbols  #x2700  #x27BF  Dingbats  
#x2800  #x28FF  BraillePatterns  #x2E80  #x2EFF  CJKRadicalsSupplement  
#x2F00  #x2FDF  KangxiRadicals  #x2FF0  #x2FFF  IdeographicDescriptionCharacters  
#x3000  #x303F  CJKSymbolsandPunctuation  #x3040  #x309F  Hiragana  
#x30A0  #x30FF  Katakana  #x3100  #x312F  Bopomofo  
#x3130  #x318F  HangulCompatibilityJamo  #x3190  #x319F  Kanbun  
#x31A0  #x31BF  BopomofoExtended  #x3200  #x32FF  EnclosedCJKLettersandMonths  
#x3300  #x33FF  CJKCompatibility  #x3400  #x4DB5  CJKUnifiedIdeographsExtensionA  
#x4E00  #x9FFF  CJKUnifiedIdeographs  #xA000  #xA48F  YiSyllables  
#xA490  #xA4CF  YiRadicals  #xAC00  #xD7A3  HangulSyllables  
#xE000  #xF8FF  PrivateUse  
#xF900  #xFAFF  CJKCompatibilityIdeographs  #xFB00  #xFB4F  AlphabeticPresentationForms  
#xFB50  #xFDFF  ArabicPresentationFormsA  #xFE20  #xFE2F  CombiningHalfMarks  
#xFE30  #xFE4F  CJKCompatibilityForms  #xFE50  #xFE6F  SmallFormVariants  
#xFE70  #xFEFE  ArabicPresentationFormsB  #xFEFF  #xFEFF  Specials  
#xFF00  #xFFEF  HalfwidthandFullwidthForms  #xFFF0  #xFFFD  Specials 
The blocks mentioned above exclude the HighSurrogates
,
LowSurrogates
and HighPrivateUseSurrogates
blocks.
These blocks identify "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
For example, the \p{IsBasicLatin}
.
Character sequence  Equivalent 

.  [^\n\r] 
\s  [#x20\t\n\r] 
\S  [^\s] 
\i 
the set of initial name characters, those

\I  [^\i] 
\c 
the set of name characters, those

\C  [^\c] 
\d  \p{Nd} 
\D  [^\d] 
\w 
[#x0000#x10FFFF][\p{P}\p{Z}\p{C}]
( 
\W  [^\w] 
The
A number of proposals for final wording satisfying various approved requirements for Schema 1.1 are included in the prose
of this document. Only one has been formally
accepted by the WG: the rewrite of
The model of an abstract datatype is being made more precise and explicit. more precise and explicit
but also a specific formal requirement to redo the handling of facets
(
The
Units of length have been selected for all datatypes that are permitted the length &cfacet;
(
The
The seven property model rewrite of date/time datatype descriptions includes a carefully crafted definition of order
that insures that for repeating datatypes (time, gDay, etc.), timezoned values will be compared as though they are on
the same "calendar day" ("raw" property values) so that in any given timezone, the days start at "raw" 00:00:00 and
end not quite including "raw" 24:00:00. Days are not 00:00:00Z to 24:00:00Z in timezones other than Z. This covers
the requirements of
In addition to the changes already made, the Working Group has decided on a number of further changes which have not yet been reflected in this draft. These are indicated throughout the text as issues, including more or less detail on the intended resolution. The ones remaining in this draft are summarized below, linked to their occurrence in the text above, where more detail can be found, including links to the original requirement or other point of origin.
The listing below is for the benefit of readers of a printed version of this document: it collects together all the definitions which appear in the document above.
Coeditor Ashok Malhotra's work on this specification from March 1999 until
February 2001 was supported by IBM
The editors acknowledge the members of the XML Schema Working Group, the members of other W3C Working Groups, and industry experts in other
forums who have contributed directly or indirectly to the process or content of
creating this document
The current members of the XML Schema Working Group are:
The XML Schema Working Group has benefited in its work from the
participation and contributions of a number of people