W3C XML Schema Working Group">
actual value">
attribute">
children">
child">
attributes">
children">
value">
character code">
]>
XML Schema Part 2: Datatypesdatatypes-20010316W3C Proposed Recommendationⅆ&MM;&year;
(in XML and
HTML, with a
schema and
DTD including datatype definitions,
as well as a schema
for built-in datatypes only, in a separate namespace.)
http://www.w3.org/TR/2001/PR-xmlschema-2-20010316/
http://www.w3.org/TR/2000/CR-xmlschema-2-20001024/
http://www.w3.org/TR/xmlschema-2/
Paul V. BironKaiser Permanente, for Health Level SevenPaul.V.Biron@kp.orgAshok MalhotraMicrosoft, formerly of IBMashokma@microsoft.com
This specification of the XML Schema language is a Proposed
Recommendation of the World Wide Web Consortium. This means that
the specification is stable and that implementation experience has
been gathered showing that each feature of the specification can be
implemented. After review by the Consortium's Advisory Committee,
this specification will either be published as a Recommendation, or
(if review shows further changes are required) republished as a
Candidate Recommendation or as a Working Draft.
Implementors should note that this part of this specification makes a
normative reference to the current version of the Unicode Database,
which specifies properties for characters on which the regular
expression language defined here relies. A new version of the
Unicode Database is expected to appear between the time this
Proposed Recommendation is published and the time it becomes a W3C
Recommendation; it is expected that the normative reference to the
Unicode Database will be updated accordingly.
The deadline for review of this document is Monday 16 April 2001.
Technical and editorial comments should be sent to the publicly
archived www-xml-schema-comments@w3.org mailing list.
This document has been produced as part of the W3C XML Activity. The
authors of this document are the XML Schema WG members. Different
parts of this specification have different editors.
There have been no declarations regarding patents related to this
specification within the XML Schema Working Group.
A list of current W3C Recommendations and other technical documents can be found at
http://www.w3.org/TR/.
XML Schema: Datatypes is part 2 of the specification of the XML
Schema language. It defines facilities for defining datatypes to be used
in XML Schemas as well as other XML specifications.
The datatype language, which is itself represented in
XML 1.0, provides a superset of the capabilities found in XML 1.0
document type definitions (DTDs) for specifying datatypes on elements
and attributes.
English
2000-09-28: PVB: fixed syntax errors in example schemas for "Derivation by Union"
and "enumeration" facet.
2000-09-28: PVB: fixed typos in content models of restriction, list and union
in section "XML Representation of Datatype Definitions". Still need to fix stylesheet
to correctly generate "List of QName" for the type of the memberTypes attribute
on union.
2000-09-28: PVB: fixed typo in section on equality, where "restriction" was left
out of the final sentence, beginning "By definition".
2000-09-28: PVB: added appropriate definitions for a list's "itemType" and
a union's "memberTypes".
2000-09-28: PVB: folded old Constraint on Schemas: length and maxLength into
the existing Constraint on Schemas: length and minLength
2000-09-28: PVB: fixed many typos as reported by Wayne Carr in post on
2000-09-17.
2000-09-29: PVB: fixed NOTATION datatype, by requiring at least one enumeration
facet and further requiring that all enumeration facets name a declared notation.
Folded the old "NOTATION declared" constraint into a new COS: "enumeration required
for NOTATION"
2000-09-29: PVB: changed SVC "encoding required" to a COS.
2000-09-29: PVB: implemented WG decision in LC-7: minimum number of
decimal digits for precision.
2000-09-29: PVB: started removing inconsistencies introduced by the presence
of list and union as derivation methods: i.e., it is no longer the case that
all derived types have a base type, it is only those types derived by
restriction that do (lists have itemType's, while unions have memberTypes).
Still have much more to clean up in this regard tho, including rework in
sections 4.1 and 5.1.
2000-09-29: PVB: updated the schema and datatypes namespaces to be consistent
with the Hawthorne votes
2000-10-02: AM: fixed value space for recurring duration.
2000-10-02: AM: added info about timeDuration to Appendix D.
2000-10-02: AM: rewrote order property for timeDuration and recurringDuration.
2000-10-02: AM: added canonical lexical forms for list and union.
2000-10-04: PVB: minor editorial fix in prose describing recurringDuration
and timeDuration
2000-10-04: PVB: fixed lexical space of NOTATION to be the set of names of
declared NOTATIONs and added the fact that NOTATION is derived from QName.
2000-10-05: PVB: added S{n} to the regex language, which should have been
there all the time (equiv to S{n,n})
2000-10-06: PVB: added whiteSpace facet, component def and XML Rep for it.
2000-10-06: PVB: changed the initial wording of CVC Datatype Valid to
say that "a string is..." instead of "a sequence of char info items is...".
Makes the spec more generally applicable.
2000-10-06: PVB: flushed out schema components and XML representation/
property mapping, to encorporate derivation by restriction, list
and union
2000-10-06: PVB: sync'd the acknowledgement sections with Part 1
2000-10-06: PVB: flushed out the schema for datatypes, such that all
datatype definitions have an id attribute, all elements involved in
datatype definitions also now have an id attribute. Each
built-in datatype definition has a documentation annotation that
points to the section of the spec where that datatype is described;
each element used for facets also has a documentation annotation
that points to the section of the spec where that facet is defined.
2000-10-11: PVB: fixed bug in SVC that said that union/@memberTypes and
union/child::simpleType were mutually exclusive...the corrected constraint
is simply that at least one of them must be valued.
2000-10-11: PVB: fixed the broken link on the XML Rep for union, to point
to the Datatype Definition component (there is no union component).
2000-10-16: PVB: clarified property mapping for memberTypes property
in the XML Rep of the Union Element Information Item for Datatype Definitions.
2000-10-16: PVB: fixed typo on the XML Rep of the Union Element Information Item
that incorrectly referred to a "union" schema component.
2000-10-16: PVB: fixed many broken links
2000-10-16: PVB: added xml:lang='en' to all documentation elements in
the schema for datatypes
2000-10-18: PVB: fixed typo in century which said that 20 was the lexical
for the 19th century...it is now 19 is the literal for 20th century
2000-10-18: PVB: fixed copy-paste typos in specification of min/maxLength
facet, which said just "length" in several places
2000-10-18: PVB: removed mention of character info items in the definition
of lexical space (and literal)
2000-10-18: PVB: cleared up ambiguous (many) uses of the word "may" that were
not used in the sense of the term "may" in the Terminology section...made sure
that all correct uses of "may" were linked to the definition
2000-10-18: PVB: definition of match aligned with XML 1.0 2e
2000-10-18: PVB: changed string length-related facets to be measured in
terms of XML 1.0 characters instead of code points
2000-10-18: PVB: added note to string length-related facets stating that
length may not be what some users perceive as the "string length"
2000-10-18: PVB: changed example in description of hex encoding to something
more "binary"
2000-10-18: PVB: clarified value space of string in terms of XML 1.0 characters.
2000-10-18: PVB: fixed (thought this was already done, but I guess not) Appdx
D to note that hours range form 0-23, minutes from 0-59 and seconds from 0-59 or
0-60 in the case of leap seconds.
2000-10-18: PVB: definition of language now references the "Language Identifiers"
section in XML 1.02e instead of the LanguageID production (which is gone in 2e).
2000-10-18: PVB: added a PFR requesting advice on whether future versions should
allow embedded white space in regex's.
2000-10-18: PVB: removed Cs property from regex language and added note stating
why it is the only property not allowed
2000-10-18: PVB: fixed typo in character range expansion of \w escape in
the regex language
2000-10-18: PVB: now sites XML 1.0 2e and Unicode 3 normatively, and ISO 10646
and Unicode 2 non-normatively.
2000-10-18: PVB: added note stating that conforming processors are only required
to support the Unicode char props and block names in the regex language are are
current at the time this spec goes to Rec, but that implementors are encouraged
to provide access to future revisions to Unicode.
2000-10-18: PVB: added note about possible future support for "Level 2" in
regex's
2000-10-18: PVB: changed body-temp example from fahrenheit to celsius
2000-10-18: PVB: added xml:lang='en' to all documentation annotations in
the schema for datatypes
2000-10-19: PVB: added note to the effect that recurringDuration won't meet
the needs of all calendaring/scheduling applications
2000-10-19: PVB: added PFR to recurringDuration asking for interop feedback
not just between schema processors but with other date/time systems
2000-10-19: PVB: added PFR regarding order-relation on timeDuration
2000-10-19: PVB: added note on uriReference about hex encoding and
possible "out-of-sync" problems with XMl 1.0, XPointer and CharMod.
2000-10-19: PVB: removed ednote on pattern for binary
2000-11-21: AM: added sentence on float and double special values.
2000-11-22: AM: fixed minor bugs in lexical and canonical representations of the date/time datatypes.
2000-11-27: AM: removed century and recurringDuration. Made timeInstant,
timePeriod, recurringDate, recurringDay primitive.
2000-11-27: AM: renamed CDATA to normalizedString.
2000-11-28: AM: made timeInstant and timePeriod primitive datatypes.
Also, recurringDate and recurringDay.
2000-11-28: AM: added new sections on order relations for timeDuration and
timeInstant. Added Appendix E on addition of timeInstant and durations.
2000-11-28: AM: changed canonical representation for decimal.
2000-11-28: AM: made all date/time datatypes primitive types.
2000-11-28: AM: changed recurringDate to gMonthDay, recurringDay to day and month to yearMonth.
2000-12-21: AM: fixed NMTOKENS, ENTITIES, IDREFS to have minLength = 1.
2000-12-21: AM: changed introductory section on order facet to discuss partial order..
2001-01-22: am: added 1 and 0 to lexical space of boolean. Created placeholder
section for canonical representtaion for boolean.
2001-01-22: am: Changed section on facet comparison for timeDuration.
2001-01-22: am: Changed name timeDuration to duration and timeInstant to dateTime.
2001-01-22: am: Removed binary. Replaced with hexBinary and base64Binary.
Removed encoding facet.
2001-02-16: PVB: A whole host of changes that I haven't documented yet.
2001-03-01: PVB: changed the name of uriReference to anyURI as a result
of vote at boston f2f descision
2001-03-08: PVB: moved references to unicode and ISO 10646 to the non-normative
section, since we get their content through the normative ref to XML 1.02e.
2001-03-08: PVB: correctly updated reference to the Jan 2001 CharMod draft
2001-03-08: PVB: removed all Priority Feedback Requests
2001-03-11: PVB: changed am's affiliation
2001-03-11: PVB: added the ability to make a simpleType final, thus
blocking any further derivation
2001-03-11: PVB: added the text that has always been in the schema for
datatypes regarding the unique IDs for datatypes, facets, and datatype/facet
pairs to the beginning of section 3.
2001-03-11: PVB: added language intended to clarify the why the built-in
types are defined in both the schema NS and in the built-in NS
2001-03-11: PVB: added {true,false} as the canonical rep for boolean
2001-03-11: PVB: added definition for derivation by restriction, which
makes it clear that a restriction must result in subseted value space
2001-03-11: PVB: added a SRC which rules out multiple occurances of
facets, other than pattern and enumeration
2001-03-11: PVB: removed fixed attribute from pattern and enumeration facets
2001-03-11: PVB: added note that spaces are discouraged in anyURI
2001-03-11: PVB: added note clarifying that a namespace decl must be
in scope for the lexical-to-value space mapping
2001-03-11: PVB: added "related by union" to the clause that equality
clause that says that values from unrelated types are not equal.
2001-03-11: PVB: fixed definition(s) of bounded, such that the bound
does not necessarily have to be in the value space (to account for
min/maxExclusive) and modified the definitions of min/maxExclusive to
reference the new definitions
2001-03-12: PVB: fixed typo ("although [not=>no] value space...") in
definition of cardinality
2001-03-13: PVB: changed name of decimal to number; precision to totalDigits
and scale to fractionDigits
2001-03-14: pvb: removed references to Unicode and 10646 (but kept
normative ref to the Unicode DB)
2001-03-14: pvb: changed names of day, month, year, monthDay and yearMonth
to gDay, gMonth, gYear, gMonthDay and gYearMonth; also added health warnings
that conversion to other calendar systems may not result in simple values
2001-03-14: pvb: added a note about how to get AND behavior with
patterns
2001-03-14: pvb: updated description of value/lexical space of anyURI,
including note that scheme-specific syntax checking is not part of
type validity. Also added note that spaces are discouraged, and added
anyURI to the discussion in the section on lists of types with spaces.
2001-03-14: pvb: added health warning to string, stating that it isn't
always appropriate for text
2001-03-14: pvb: made NOTATION primitive, made the value space all QNames,
and removed the constraint that the enumeration given must be that of
the name of a notation declared in the schema.
2001-03-14: pvb: added note that AM used to work for IBM and that his
participation until now was supported by IBM
2001-03-14: pvb: removed Cs from the legal properties in the regex BNF
(it was already absent from the table of properties)
2001-03-14: pvb: removed surrogate blocks from the BlockNames table in
the regex appendix
2001-03-15: pvb: fixed one remaining problem in the defns for bounds
introduced on 2001-03-11.
2001-03-15: pvb: changed all occurances of "namespace URI" to "namespace name"
to be consistent with the latest Infoset draft
2001-03-15: pvb: renamed datatype definition component as simple type definition
component; changed itemType and memberTypes properties to "item type" and "member types"
2001-03-15: pvb: added components for all fundamental facets; moved the
prose that specified how they get their values from the simple type definition
component to each individual component.
2001-03-15: pvb: changed wording in property mapping of annoations to be
the same as that in structures
2001-03-15: pvb: changed final from a boolean to {restirction, list, union}
2001-03-15: pvb: changed pattern-valid and datatype-valid constraints
to check pattern against the lexical-space
2001-03-15: pvb: added "F restriction valid" COS's that rule out all
forms of invalid restriction for each facet F
2001-03-15: pvb: cleaned up equality, and partial and total orders
Introduction
Purpose
The specification defines limited
facilities for applying datatypes to document content in that documents
may contain or refer to DTDs that assign types to elements and attributes.
However, document authors, including authors of traditional
documents and those transporting data in XML,
often require a higher degree of type checking to ensure robustness in
document understanding and data interchange.
The table below offers two typical examples of XML instances
in which datatypes are implicit: the instance on the left
represents a billing invoice, the instance on the
right a memo or perhaps an email message in XML.
Data oriented
Document oriented
1999-01-211999-01-25Ashok Malhotra123 Microsoft Ave.HawthorneNY10532-0000555-1234555-4321
]]>
Paul V. BironAshok MalhotraLatest draft
We need to discuss the latest
draft immediately.
Either email me at
mailto:paul.v.biron@kp.org
or call 555-9876
]]>
The invoice contains several dates and telephone numbers, the postal
abbreviation for a state
(which comes from an enumerated list of sanctioned values), and a ZIP code
(which takes a definable regular form). The memo contains many
of the same types of information: a date, telephone number, email address
and an "importance" value (from an enumerated
list, such as "low", "medium" or "high"). Applications which process
invoices and memos need to raise exceptions if something that was
supposed to be a date or telephone number does not conform to the rules
for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the
instances that are not expressible in XML DTDs. The limited datatyping
facilities in XML have prevented validating XML processors from supplying
the rigorous type checking required in these situations. The result
has been that individual applications writers have had to implement type
checking in an ad hoc manner. This specification addresses
the need of both document authors and applications writers for a robust,
extensible datatype system for XML which could be incorporated into
XML processors. As discussed below, these datatypes could be used in other
XML-related standards as well.
Requirements
The document spells out
concrete requirements to be fulfilled by this specification,
which state that the XML Schema Language must:
provide for primitive data typing, including byte, date,
integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export
from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation
vs. those governing an underlying information set;
allow creation of user-defined datatypes, such as
datatypes that are derived from existing datatypes and which
may constrain certain of its properties (e.g., range,
precision, length, format).
Scope
This portion of the XML Schema Language discusses datatypes that can be
used in an XML Schema. These datatypes can be specified for element
content that would be specified as
#PCDATA and attribute
values of various
types in a DTD. It is the intention of this specification
that it be usable outside of the context of XML Schemas for a wide range
of other XML-related activities such as and
.
Terminology
The terminology used to describe XML Schema Datatypes is defined in the
body of this specification. The terms defined in the following list are
used in building those definitions and in describing the actions of a
datatype processor:
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with
Conforming documents and processors are permitted to but need
not behave as described.
(Of strings or names:) Two strings or names being compared must be
identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g.
characters with both precomposed and base+diacritic forms) match only if they have
the same representation in both strings. No case folding is performed. (Of strings and
rules in the grammar:) A string matches a grammatical production if it belongs to the
language generated by that production.
Conforming documents and processors are required to behave as
described; otherwise they are in error.
A violation of the rules of this specification; results are undefined.
Conforming software detect and report an
error and recover from it.
Constraints and Contributions
This specification provides three different kinds of normative
statements about schema components, their representations in XML and
their contribution to the schema-validation of information items:
Constraints on the schema components themselves, i.e. conditions
components satisfy to be components at all.
Largely to be found in .
Constraints on the representation of schema components in XML. Some but
not all of these are expressed in and
. Largely to be found in
.
Constraints expressed by schema components which information
items satisfy to be schema-valid. Largely
to be found in .
Type System
This section describes the conceptual framework behind the type system
defined in this specification. The framework has been influenced by the
standard on language-independent datatypes as
well as the datatypes for and for programming
languages such as Java.
The datatypes discussed in this specification are computer
representations of well known abstract concepts such as
integer and date. It is not the place of this
specification to define these abstract concepts; many other publications
provide excellent definitions.
Datatype
In this specification,
a datatype is a 3-tuple, consisting of
a) a set of distinct values, called its ,
b) a set of lexical representations, called its
, and c) a set of s
that characterize properties of the ,
individual values or lexical items.
Value space
A value
space is the set of values for a given datatype.
Each value in the value space of a datatype is denoted by
one or more literals in its .
The of a given datatype can
be defined in one of the following ways:
defined axiomatically from fundamental notions (intensional definition)
[see ]
defined by restricting the of
an already defined datatype to a particular subset with a given set
of properties [see ]
defined as a combination of values from one or more already defined
(s) by a specific construction procedure
[see and ]
s have certain properties. For example,
they always have the property of ,
some definition of equality
and might be , by which individual
values within the can be compared to
one another. The properties of s that
are recognized by this specification are defined in
.
Lexical space
In addition to its , each datatype also
has a lexical space.
A
lexical space is the set of valid literals
for a datatype.
For example, "100" and "1.0E2" are two different literals from the
of which both
denote the same value. The type system defined in this specification
provides a mechanism for schema designers to control the set of values
and the corresponding set of acceptable literals of those values for
a datatype.
The literals in the s defined in this specification
have the following characteristics:
The number of literals for each value has been kept small; for many
datatypes there is a one-to-one mapping between literals and values.
This makes it easy to exchange the values between different systems.
In many cases, conversion from locale-dependent representations will
be required on both the originator and the recipient side, both for
computer processing and for interaction with humans.
Textual, rather than binary, literals are used.
This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common
programming languages and libraries.
Canonical Lexical Representation
While the datatypes defined in this specification have, for the most part,
a single lexical representation i.e. each value in the datatype's
is denoted by a single literal in its
, this is not always the case. The
example in the previous section showed two literals for the datatype
which denote the same value. Similarly, there
be
several literals for one of the date or time datatypes that denote the
same value using different timezone indicators.
A canonical lexical representation
is a set of literals from among the valid set of literals
for a datatype such that there is a one-to-one mapping between literals
in the canonical lexical representation and
values in the .
Facets
A facet is a single
defining aspect of a . Generally
speaking, each facet characterizes a
along independent axes or dimensions.
The facets of a datatype serve to distinguish those aspects of
one datatype which differ from other datatypes.
Rather than being defined solely in terms of a prose description
the datatypes in this specification are defined in terms of
the synthesis of facet values which together determine the
and properties of the datatype.
Facets are of two types: fundamental facets that define
the datatype and non-fundamental or constraining
facets that constrain the permitted values of a datatype.
Fundamental facets
A fundamental facet is an abstract property which
serves to semantically characterize the values in a
.
These properties are discussed in this section.
Equal
Every supports the notion of equality,
with the following rules:
for any a and b in
the ,
either a is equal to b,
denoted a = b, or a
is not equal to b, denoted a != b
there is no pair a and b
from the such that both
a = b and a != b
for all a in the ,
a = a
for any a and b
in the ,
a = b if and only if b = a
for any a, b and
c in the ,
if a = b and
b = c, then a = c
for any a and b
in the
if a = b, then a
and b cannot be distinguished
(i.e., equality is identity)
Note that a consequence of the above is that, given A and B where A and
B are not related by
or ,
for every pair of values a from
A and b from
B, a != b.
On every datatype, the operation Equal is defined in terms of the equality
property of the : for any values
a, b drawn from the
, Equal(a,b) is
true if a = b, and false otherwise.
Order
An
order relation on a
is a mathematical relation that imposes a
or a on the
members of the .
A
, and hence a datatype, is said to be
ordered if there exists an
defined for that
.
A partial order is an
that is irreflexive, antisymmetric and
transitive.
A has the following properties:
for no a in the ,
a < a
(irreflexivity)
for all a and b
in the ,
a < b and b < a
implies a = b
(antisymmetry)
for all a, b
and c in the ,
a < b and b < c
implies a < c
(transitivity)
The notation a <> b is used to indicate the
case when a != b and neither
a < b nor b < a
A total order is an
such that for no a and b
is it the case that a <> b.
A has all of the properties specified
above for , plus
the following property:
for all a and b
in the ,
either a < b or b < a
or a = b
The fact that this specification does not define an
for some datatype does not
mean that some other application cannot treat that datatype as
being ordered by imposing its own order relation.
Bounds
A value u in an U
is said to be an inclusive upper bound of a
V
(where V is a subset of U)
if for all v in V,
u >= v.
A value u in an U
is said to be an exclusive upper bound of a
V
(where V is a subset of U)
if for all v in V,
u > v.
A value l in an L
is said to be an inclusive lower bound of a
V
(where V is a subset of L)
if for all v in V,
l <= v.
A value l in an L
is said to be an exclusive lower bound of a
V
(where V is a subset of L)
if for all v in V,
l < v.
A datatype is bounded
if its has either an
or an
and either an and an
.
Cardinality
Every
has associated with it the concept of
cardinality. Some s
are finite, some are countably infinite while still others could
conceivably be uncountably infinite (although no
defined by this specification is uncountable infinite). A datatype is
said to have the cardinality of its
.
It
is sometimes useful to categorize s
(and hence, datatypes) as to their cardinality. There are two
significant cases:
s that are finite
s that are countably infinite
Numeric
A datatype is said to be
numeric if its values are conceptually quantities (in some
mathematical number system).
A datatype whose values
are not is said to be
non-numeric.
Constraining or Non-fundamental facets
A
constraining facet is an optional property that can be
applied to a datatype to constrain its .
Constraining the consequently constrains
the . Adding
s to a
is described in .
In this section we define all s
that are available for use when defining
datatypes.
length
length is the number
of units of length, where units of length
varies depending on the type that is being from.
The value of
length be a
.
For and datatypes from ,
length is measured in units of
characters as defined in .
For and and datatypes from them,
length is measured in octets (8 bits) of binary data.
For datatypes by ,
length is measured in number of list items.
For and datatypes from ,
length will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for length
and in attempting to infer storage requirements from a given value for
length.
minLength
minLength is
the minimum number of units of length, where
units of length varies depending on the type that is being
from.
The value of minLength be a .
For and datatypes from ,
minLength is measured in units of
characters as defined in .
For and and datatypes from them,
minLength is measured in octets (8 bits) of binary data.
For datatypes by ,
minLength is measured in number of list items.
For and datatypes from ,
minLength will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for minLength
and in attempting to infer storage requirements from a given value for
minLength.
maxLength
maxLength is
the maximum number of units of length, where
units of length varies
depending on the type that is being from.
The value of maxLength be a .
For and datatypes from ,
maxLength is measured in units of
characters as defined in .
For and and datatypes from them,
maxLength is measured in octets (8 bits) of binary data.
For datatypes by ,
maxLength is measured in number of list items.
For and datatypes from ,
maxLength will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for maxLength
and in attempting to infer storage requirements from a given value for
maxLength.
pattern
pattern is a constraint on the
of a datatype which is achieved by
constraining the to literals
which match a specific pattern. The value of pattern be a .
enumeration
enumeration constrains the
to a specified set of values.
enumeration does not impose an order relation on the
it creates; the value of the
property of the
datatype remains that of the datatype from which it is
.
whiteSpace
whiteSpace constrains the
of types from such that
the various behaviors
specified in Attribute Value Normalization
in are realized. The value of
whiteSpace must be one of {preserve, replace, collapse}.
No normalization is done, the value is not changed (this is the
behavior required by for element content)
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return)
are replaced with #x20 (space)
After the processing implied by replace, contiguous
sequences of #x20's are collapsed to a single #x20, and leading and
trailing #x20's are removed.
The notation #xA used here (and elsewhere in this specification) represents
the Universal Character Set (UCS) code point hexadecimal A (line feed), which is denoted by
U+000A. This notation is to be distinguished from 
,
which is the XML character reference
to that same UCS code point.
whiteSpace is applicable to all and
datatypes. For all
datatypes other than (and types
by from it) the value of whiteSpace is
collapse and cannot be changed by a schema author; for
the value of whiteSpace is
preserve; for any type by
from
the value of whiteSpace can
be any of the three legal values. For all datatypes
by the
value of whiteSpace is collapse and cannot
be changed by a schema author. For all datatypes
by whiteSpace does not apply directly; however, the
normalization behavior of types is controlled by
the value of whiteSpace on that one of the
against which the
is successfully validated.
For more information on whiteSpace, see the
discussion on white space normalization in
Schema Component Details
in .
maxInclusive
maxInclusive is the
of the for a datatype with the
property. The value of
maxInclusive be
in the of the
.
maxExclusive
maxExclusive is the
of the for a datatype with the
property. The value of maxExclusive be in the of the
.
minInclusive
minInclusive is the
of the for a datatype with the
property. The value of
minInclusive be in the of the
.
minExclusive
minExclusive is the
of the for a datatype with the
property.
The value of minExclusive
be in the of the
.
totalDigits
totalDigits
is the maximum number of digits in values of datatypes
from . The value of
totalDigits be a
.
fractionDigits
fractionDigits
is the maximum number of digits in the fractional part
of values of datatypes from
. The value of fractionDigits be a .
Datatype dichotomies
It is useful to categorize the datatypes defined in this specification
along various dimensions, forming a set of characterization dichotomies.
Atomic vs. list vs. union datatypes
I know, now this is a trichotomy and not a dichotomy...hopefully no one
will be picky enough to complain
The first distinction to be made is that between
, and
datatypes.
Atomic datatypes
are those having values which are regarded by this specification as
being indivisible.
List
datatypes are those having values each of which consists of a
finite-length (possibly empty) sequence of values of an
datatype.
Union
datatypes are those whose s and
s are the union of
the s and
s of one or more other datatypes.
For example, a single token which es
Nmtoken from
could be the value of an
datatype (); while a sequence of such tokens
could be the value of a datatype
().
Atomic datatypes
datatypes can be either
or . The
of an datatype
is a set of "atomic" values, which for the purposes of this specification,
are not further decomposable. The of
an datatype is a set of literals
whose internal structure is specific to the datatype in question.
List datatypes
Several type systems (such as the one described in
) treat datatypes as
special cases of the more general notions of aggregate or collection
datatypes.
datatypes are always .
The of a
datatype is a set of finite-length sequences of
values. The of a
datatype is a set of literals whose internal
structure is a white space separated sequence of literals of the
datatype of the items in the
(where whitespace es
S in ).
The datatype that participates in the
definition of a datatype is known as the
itemType of that datatype.
]]> 8 10.5 12
]]>
A datatype can be
from an datatype whose
allows whitespace (such as
or ). In such a case, regardless of the input, list items
will be separated at whitespace boundaries.
]]>
<someElement xsi:type='listOfString'>
this is not list item 1
this is not list item 2
this is not list item 3
</someElement>
In the above example, the value of the someElement element
is not a of 3;
rather, it is a of
18.
When a datatype is from a
datatype, the following
s apply:
For each of ,
and , the unit of length is
measured in number of list items. The value of
is fixed to the value collapse.
The for the
datatype is defined as the lexical form in which
each item in the has the canonical lexical
representation of its .
Union datatypes
The and
of a datatype are the union of the
s and s of
its .
datatypes are always .
Currently, there are no
datatypes.
A prototypical example of a type is the
maxOccurs attribute on the
element element
in XML Schema itself: it is a union of nonNegativeInteger
and an enumeration with the single member, the string "unbounded", as shown below.
]]>
Any number (greater than 1) of or s can participate in a type.
The datatypes that participate in the
definition of a datatype are known as the
memberTypes of that datatype.
The order in which the are specified in the
definition (that is, the order of the <simpleType> children of the <union>
element, or the order of the s in the memberTypes
attribute) is significant.
During validation, an element or attribute's value is validated against the
in the order in which they appear in the
definition until a match is found. The evaluation order can be overridden
with the use of xsi:type. See
and for
more details.
For example, given the definition below, the first instance of the <size> element
validates correctly as an , the second and third as
.
]]>1
large1
]]>
The for a
datatype is defined as the lexical form in which
the values have the canonical lexical representation
of the appropriate .
A datatype which is in this specification
need not be an "atomic" datatype in any programming language used to
implement this specification. Likewise, a datatype which is a
in this specification need not be a "list"
datatype in any programming language used to implement this specification.
Furthermore, a datatype which is a in this
specification need not be a "union" datatype in any programming
language used to implement this specification.
Primitive vs. derived datatypes
Next, we distinguish between and
datatypes.
Primitive
datatypes are those that are not defined in terms of other datatypes;
they exist ab initio.
Derived
datatypes are those that are defined in terms of other datatypes.
For example, in this specification, is a well-defined
mathematical
concept that cannot be defined in terms of other datatypes, while
a is a special case of the more general datatype
.
There exists a conceptual datatype, whose name is anySimpleType,
that is the simple version of the
ur-type definition from
. anySimpleType can be
considered as the of all
types. The of anySimpleType
can be considered to be the of the
s of all datatypes.
The datatypes defined by this specification fall into both
the and
categories. It is felt that a judiciously chosen set of
datatypes will serve the widest
possible audience by providing a set of convenient datatypes that
can be used as is, as well as providing a rich enough base from
which the variety of datatypes needed by schema designers can be
.
In the example above, is
from .
A datatype which is in this specification
need not be a "primitive" datatype in any programming language used to
implement this specification. Likewise, a datatype which is
in this specification need not be a
"derived" datatype in any programming language used to implement
this specification.
As described in more detail in ,
each datatype
be defined in terms of another datatype in one of three ways: 1) by assigning
s which serve to restrict the
of the
datatype to a subset of that of the ; 2) by creating
a datatype whose
consists of finite-length sequences of values of its
; or 3) by creating a
datatype whose consists of the union of the
its .
Derived by restriction
A datatype is said to be
by restriction from another datatype
values for one or more s are specified
that serve to constrain its and/or its
to a subset of those of its
.
Every
datatype that is by restriction
is defined in terms of an existing datatype, referred to as its
base type. base types can be either
or .
Derived by list
A datatype can be
from another datatype (its ) by creating
a that consists of a finite-length sequence
of values of its .
Derived by union
One datatype can be from one or more
datatypes by ing their s
and, consequently, their s.
Built-in vs. user-derived datatypes
Built-in
datatypes are those which are defined in this specification,
and can be either or
;
User-derived datatypes are those
datatypes that are defined by individual schema designers.
Conceptually there is no difference between the
datatypes
included in this specification and the
datatypes which will be created by individual schema designers.
The datatypes
are those which are believed to be so common that if they were not
defined in this specification many schema designers would end up
"reinventing" them. Furthermore, including these
datatypes in this specification serves to
demonstrate the mechanics and utility of the datatype generation
facilities of this specification.
A datatype which is in this specification
need not be a "built-in" datatype in any programming language used
to implement this specification. Likewise, a datatype which is
in this specification need not
be a "user-derived" datatype in any programming language used to
implement this specification.
Built-in datatypes
Each built-in datatype in this specification (both
and
) can be uniquely addressed via a
URI Reference constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype
For example, to address the datatype, the URI is:
http://www.w3.org/2000/10/XMLSchema#int
Additionally, each facet definition element can be uniquely
addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2000/10/XMLSchema#maxInclusive
Additionally, each facet usage in a built-in datatype definition
can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype, followed
by a period (".") followed by the name of the facet
For example, to address the usage of the maxInclusive facet in
the definition of int, the URI is:
The datatypes defined by this specification
are designed to be used with the &schema-language; as well as other
XML specifications.
To facilitate usage within the &schema-language;, the
datatypes in this specification have the namespace name:
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the &schema-language;,
such as those that do not want to know anything about aspects of the
&schema-language; other than the datatypes, each
datatype is also defined in the namespace whose URI is:
http://www.w3.org/2001/XMLSchema-datatypes
This applies to both
and
datatypes.
Each datatype is also associated with a
unique namespace. However, datatypes
do not come from the namespace defined by this specification; rather,
they come from the namespace of the schema in which they are defined
(see XML Representation of
Schemas in ).
Primitive datatypes
The datatypes defined by this specification
are described below. For each datatype, the
and
are defined, s which apply
to the datatype are listed and any datatypes
from this datatype are specified.
datatypes can only be added by revisions
to this specification.
string
The string datatype
represents character strings in XML. The
of string is the set of finite-length sequences of
characters (as defined in
) that the
Char production from .
A character is an atomic unit of
communication; it is not further specified except to note that every
character has a corresponding
Universal Character Set code point, which is an integer.
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formating or
ruby annotation (see and Section 8.2.4
Overriding the
bidirectional algorithm: the BDO element of ).
Thus, string, as a simple type that can contain only
characters but not child elements, is often not suitable for representing text.
In such situations, a complex type that allows mixed content should be considered.
For more information, see Section 5.5
Any Element, Any Attribute
of .
As noted in , the fact that this specification does
not specify an for
does not preclude other applications from treating strings as being ordered.
Constraining facets
Derived datatypes
boolean
boolean has the
required to support the mathematical
concept of binary-valued logic: {true, false}.
Lexical representation
An instance of a datatype that is defined as
can have the following legal literals {true, false, 1, 0}.
Canonical representation
The canonical representation for boolean is the set of
literals {true, false}.
Constraining facets
number
number
represents arbitrary precision decimal numbers.
The of number
is the set of the values i × 10^-n,
where i and n are integers
such that n >= 0.
The on number
is: x < y iff y - x is positive.
The of types derived from number
with a value for of p
is the set of values i × 10^-n, where
n and i are integers such that
p >= n >= 0 and the number of significant decimal digits
in i is less than or equal to p.
The of types derived from number
with a value for of s
is the set of values i × 10^-n, where
i and n are integers such
that 0 <= n <= s.
All processors
support decimal numbers with a minimum of 18 decimal digits (i.e., with a
of 18). However,
processors
set an application-defined limit on the maximum number of decimal digits
they are prepared to support, in which case that application-defined
maximum number be clearly documented.
Lexical representation
number has a lexical representation
consisting of a finite-length sequence of decimal digits (#x30-#x39) separated
by a period as a decimal indicator. If is
specified, the number of digits must be less than or equal to
.
If is specified, the
number of digits following the decimal point must be less than or equal to
the . An optional leading sign is allowed.
If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional.
If the fractional part is zero, the period and following zero(es) can
be omitted.
For example: -1.23, 12678967.543233, +100000.00, 210.
Canonical representation
The canonical representation for number is defined by
prohibiting certain options from the
. Specifically, the preceding
optional "+" sign is prohibited. The decimal point is required. Leading and
trailing zeroes are prohibited subject to the following: there must be at least
one digit to the right and to the left of the decimal point which may be a zero.
Constraining facets
Derived datatypes
float
float corresponds
to the IEEE single-precision 32-bit floating point type
. The basic of
float consists of the values
m × 2^e, where m
is an integer whose absolute value is less than
2^24, and e is an integer
between -149 and 104, inclusive. In addition to the basic
described above, the
of float also contains the
following special values: positive and negative zero,
positive and negative infinity and not-a-number.
The on float
is: x < y iff y - x is positive.
Positive zero is greater than negative zero. Not-a-number equals itself and is
greater than all float values including positive infinity.
A literal in the representing a
decimal number d maps to the normalized value
in the of float that is
closest to d; if d is
exactly halfway between two such values then the even value is chosen.
This is the best approximation of d, which is more
accurate than the mapping required by .
Lexical representation
float values have a lexical representation
consisting of a mantissa followed, optionally, by the character
"E" or "e", followed by an exponent. The exponent
be an . The mantissa must be a number. The representations
for exponent and mantissa must follow the lexical rules for
and . If the "E" or "e" and
the following exponent are omitted, an exponent value of 0 is assumed.
The special values positive and negative zero, positive
and negative infinity and not-a-number have lexical representations 0,
-0, INF, -INF and
NaN, respectively.
For example, -1E4, 1267.43233E12, 12.78e-2, 12 and INF
are all legal literals for float.
Canonical representation
The canonical representation for float is defined by
prohibiting certain options from the
. Specifically, the exponent
must be indicated by "E". Leading zeroes are prohibited in the exponent.
For the mantissa, the preceding optional "+" sign is prohibited
and the decimal point is required.
Leading and trailing zeroes are prohibited subject to the following:
number representations must
be normalized such that there is a single
digit to the left of the decimal point and at least a single digit to the
right of the decimal point.
Constraining facets
double
The double
datatype corresponds to IEEE double-precision 64-bit floating point
type . The basic
of double consists of the values
m × 2^e, where m
is an integer whose absolute value is less than
2^53, and e is an
integer between -1075 and 970, inclusive. In addition to the basic
described above, the
of double also contains
the following special values: positive and negative zero,
positive and negative infinity and not-a-number.
The on double
is: x < y iff y - x is positive.
Positive zero is greater than negative zero. Not-a-number equals itself and is
greater than all double values including positive infinity.
A literal in the representing a
decimal number d maps to the normalized value
in the of double that is
closest to d; if d is
exactly halfway between two such values then the even value is chosen.
This is the best approximation of d
(, ), which is more
accurate than the mapping required by .
Lexical representation
double values have a lexical representation
consisting of a mantissa followed, optionally, by the character "E" or
"e", followed by an exponent. The exponent be
an integer. The mantissa must be a decimal number. The representations
for exponent and mantissa must follow the lexical rules for
and . If the "E" or "e"
and the following exponent are omitted, an exponent value of 0 is assumed.
The special values positive and negative zero, positive
and negative infinity and not-a-number have lexical representations 0,
-0, INF, -INF and
NaN, respectively.
For example, -1E4, 1267.43233E12, 12.78e-2, 12 and INF
are all legal literals for double.
Canonical representation
The canonical representation for double is defined by
prohibiting certain options from the
. Specifically, the exponent
must be indicated by "E". Leading zeroes are prohibited in the exponent.
For the mantissa, the preceding optional "+" sign is prohibited
and the decimal point is required.
Leading and trailing zeroes are prohibited subject to the following:
number representations must
be normalized such that there is a single
digit to the left of the decimal point and at least a single digit to the
right of the decimal point.
Constraining facets
duration
duration represents a duration of time.
The of duration is
a six-dimensional space where the coordinates
designate the Gregorian year, month, day, hour, minute, and second components defined in
§ 5.5.3.2 of ,
respectively. These components are ordered
in their significance by their order of appearance i.e. as year, month, day,
hour, minute, and second.
Lexical representation
The lexical representation for duration is the
extended format PnYn
MnDTnH nMnS, where
nY represents the number of years, nM the
number of months, nD the number of days, 'T' is the
date/time separator, nH the number of hours,
nM the number of minutes and nS the
number of seconds. The number of seconds can include decimal digits
to arbitrary precision.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical representation of
duration does not follow the alternative
format of § 5.5.3.2.1 of .
An optional preceding minus sign ('-') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also .
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M.
One could also indicate a duration of minus 120 days as:
-P120D.
Reduced precision and truncated representations of this format are allowed
provided they conform to the following:
The lowest order items be omitted. If omitted their value is
assumed to be zero.
The lowest order item have a decimal fraction.
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator
be omitted. However, at least one number and its designator must be present.
The designator 'T' shall be absent if all of the time items are absent.
The designator 'P' must always be present.
For example, P1347Y, P1347M and P1Y2MT2H are all allowed;
P0Y1347M and P0Y1347M0D are allowed. P-1347M is not allowed although
-P1347M is allowed. P1Y2MT is not allowed.
Order relation on duration
In general, the on duration
is a partial order since there is no determinate relationship between certain
durations such as one month (P1M) and 30 days (P30D).
The
of two duration values x and
y is x <= y iff s+x <= s+y
for each qualified s
in the list below. These values for s cause the greatest deviations in the addition of
dateTimes and durations. Addition of durations to time instants is defined
in .
1696-09-01T00:00:00Z
1697-02-01T00:00:00Z
1903-03-01T00:00:00Z
1903-07-01T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate. Note that because of leap-seconds, a seconds field can vary
from 59 to 60. However, because of the way that addition is defined in
, they are still totally ordered.
Relation
P1Y
> P364D
>= P365D
<= P366D
< P367D
P1M
> P27D
>= P28D
<> P29D
<> P30D
<= P31D
< P32D
P5M
> P149D
>= P150D
<> P151D
<> P152D
<= P153D
< P154D
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to
compare durations of a small number of months against days.
Months
1
2
3
4
5
6
7
8
9
10
11
12
13
...
Days
Minimum
28
59
89
120
150
181
212
242
273
303
334
365
393
...
Maximum
31
62
92
123
153
184
215
245
276
306
337
366
397
...
Facet Comparison for durations
In comparing duration
values with , ,
and facet values
indeterminate comparisons should be considered as "false".
Totally ordered durations
Certain derived datatypes of durations can be guaranteed have a total order. For
this, they must have fields from only one row in the list below and the time zone
must either be required or prohibited.
year, month
day, hour, minute, second
For example, a datatype could be defined to correspond to the
datatype Year-Month interval that required a four digit
year field and a two digit month field but required all other fields to be unspecified. This datatype could be defined as below and would have a total order.
]]>
Constraining facets
dateTime
dateTime represents a specific instant of time. The
of dateTime is the space
of Combinations of date and time of day values as defined
in § 5.4 of .
Lexical representation
A single lexical representation, which is a subset of the lexical
representations allowed by , is allowed for
dateTime. This lexical representation is the
extended format CCYY-MM-DDThh:mm:ss
where "CC" represents the century, "YY" the year, "MM" the month and
"DD" the day, preceded by an optional leading "-" sign to indicate a
negative number. If the sign is omitted, "+" is assumed. The letter
"T" is the date/time separator and "hh", "mm", "ss" represent hour,
minute and second respectively. Additional digits can be used to
increase the precision of fractional seconds if desired i.e the format
ss.ss... with any number of digits after the decimal point is supported.
To accommodate
year values greater than 9999 additional digits can be added to the
left of this representation. The year 0000 is prohibited.
This representation may be immediately followed by a "Z" to indicate
Coordinated Universal Time (UTC) or, to indicate the time zone, i.e. the
difference between the local time and Coordinated Universal Time,
immediately followed by a sign,
+ or -, followed by the difference from UTC represented as hh:mm.
See for details about legal values in the
various fields.
For example, to indicate 1:20 pm on May the 31st, 1999 for Eastern
Standard Time which is 5 hours behind Coordinated Universal Time (UTC), one
would write: 1999-05-31T13:20:00-05:00.
Canonical representation
The canonical representation for dateTime is defined
by prohibiting certain options from the
.
Specifically, either the time zone must
be omitted or, if present, the time zone must be Coordinated Universal
Time (UTC) indicated by a "Z".
Order relation on dateTime
In general, the on dateTime
is a partial order since there is no determinate relationship between certain
instants. For example, there is no determinate
ordering between
(a)
2000-01-20T12:00:00 and (b) 2000-01-20T12:00:00Z. Based on
timezones currently in use, (c) could vary from 2000-01-20T12:00:00+12 to
2000-01-20T12:00:00-13. It is, however, possible for this range to expand or
contract in the future, based on local laws. Because of this, the following
definition uses a somewhat broader range of indeterminate values: +14..-14.
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "-14") means adding the timezone -14 to Q, where Q did not
already have a timezone. This is a logical explanation of the process. Actual
implementations are free to optimize as long as they produce the same results.
The ordering between two dateTimes P and Q is defined by the following
algorithm:
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 2000-03-04T23:00:00+03 normalizes to 2000-03-05T02:00:00Z
B. If P and Q either both have a time zone or both do not have a time
zone, compare P and Q field by field from the year field down to the
second field, and return a result as soon as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both not specified, continue to the next i
If P[i] is not specified and Q[i] is, or vice versa, stop and return
P <> Q
If P[i] < Q[i], stop and return P < Q
If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare
as follows:
P <= Q if P <= (Q with time zone -14)
P >= Q if P >= (Q with time zone +14)
P <> Q otherwise, that is, if (Q with time zone -14) < P < (Q with time zone +14)
D. Otherwise, if P does not contain a time zone and Q does, compare
as follows:
P <= Q if (P with time zone +14) <= Q.
P >= Q if (P with time zone -14) >= Q.
P <> Q otherwise, that is, if (P with time zone -14) < Q < (P with time zone +14)
Examples:
Determinate
Indeterminate
2000-01-15T00:00:00 < 2000-02-15T00:00:00
2000-01-01T12:00:00 <>
1999-12-31T23:00:00Z
2000-01-15T12:00:00 < 2000-01-16T12:00:00Z
2000-01-16T12:00:00 <>
2000-01-16T12:00:00Z
2000-01-15T00:00:00 <> 2000-01-16T12:00:00Z
Totally ordered dateTimes
Certain derived types from dateTime
can be guaranteed have a total order. To
do so, they must require that a specific set of fields are always specified, and
that remaining fields (if any) are always unspecified. For example, the date
datatype without time zone is defined to contain exactly year, month, and day.
Thus dates without time zone have a total order among themselves.
Constraining facets
time
time
represents an instant of time that recurs every day. The
of time is the space
of time of day values as defined in § 5.3 of
. Specifically, it is a set of zero-duration daily
time instances.
Since the lexical representation allows an optional time zone
indicator, time values are partially ordered because it may
not be able to determine the order of two values one of which has a
time zone and the other does not. The order relation on
time values is the
using an arbitrary date. See also
. Pairs of time values with or without time zone indicators are totally ordered.
Lexical representation
The lexical representation for time is the left
truncated lexical representation for :
hh:mm:ss.sss with optional following time zone indicator. For example,
to indicate 1:20 pm for Eastern Standard Time which is 5 hours behind
Coordinated Universal Time (UTC), one would write: 13:20:00-05:00. See also
.
Canonical representation
The canonical representation for time is defined
by prohibiting certain options from the
. Specifically, either the time zone must
be omitted or, if present, the time zone must be Coordinated Universal
Time (UTC) indicated by a "Z".
Constraining facets
date
date
represents a calendar date. The
of date is the set of Gregorian calendar dates as defined
in § 5.2.1 of . Specifically,
it is a set of one-day long, non-periodic instances
e.g. lexical 1999-10-26 to represent the calendar date 1999-10-26, independent
of how many hours this day has.
Since the lexical representation allows an optional time zone
indicator, date values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of which has a
time zone and the other does not. If
date values are considered as periods of time, the order relation
on date values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of date values with or without time zone indicators are totally ordered.
Lexical representation
The lexical representation for date is the reduced (right
truncated) lexical representation for :
CCYY-MM-DD. No left truncation is allowed. An optional following time
zone qualifier is allowed as for . To
accommodate year values outside the range from 0001 to 9999, additional
digits can be added to the left of this representation and a preceding "-"
sign is allowed.
For example, to indicate May the 31st, 1999, one would write: 1999-05-31.
See also .
Constraining facets
gYearMonth
gYearMonth represents a
specific gregorian month in a specific gregorian year. The
of gYearMonth
is the set of Gregorian calendar months as defined in § 5.2.1 of
. Specifically, it is a set of one-month long,
non-periodic instances
e.g. 1999-10 to represent the whole month of 1999-10, independent of
how many days this month has.
Since the lexical representation allows an optional time zone
indicator, gYearMonth values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of
which has a time zone and the other does not. If gYearMonth
values are considered as periods of time, the order relation on
gYearMonth values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gYearMonth
values with or without time zone indicators are totally ordered.
Because month/year combinations in one calendar only rarely correspond
to month/year combinations in other calendars, values of this type
are not, in general, convertible to simple values corresponding to month/year
combinations in other calendars. This type should therefore be used with caution
in contexts where conversion to other calendars is desired.
Lexical representation
The lexical representation for gYearMonth is the reduced
(right truncated) lexical representation for :
CCYY-MM. No left truncation is allowed. An optional following time
zone qualifier is allowed. To accommodate year values outside the range from 0001 to 9999, additional digits
can be added to the left of this representation and a preceding "-" sign is allowed.
For example, to indicate the month of May 1999, one would write: 1999-05.
See also .
Constraining facets
gYear
gYear represents a
gregorian calendar year. The of
gYear is the set of Gregorian calendar years as defined in
§ 5.2.1 of . Specifically, it is a set of one-year
long, non-periodic instances
e.g. lexical 1999 to represent the whole year 1999, independent of
how many months and days this year has.
Since the lexical representation allows an optional time zone
indicator, gYear values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of which has a
time zone and the other does not. If
gYear values are considered as periods of time, the order relation
on gYear values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gYear values with or without time zone indicators are totally ordered.
Because years in one calendar only rarely correspond to years
in other calendars, values of this type
are not, in general, convertible to simple values corresponding to years
in other calendars. This type should therefore be used with caution
in contexts where conversion to other calendars is desired.
Lexical representation
The lexical representation for gYear is the reduced (right
truncated) lexical representation for : CCYY.
No left truncation is allowed. An optional following time
zone qualifier is allowed as for . To
accommodate year values outside the range from 0001 to 9999, additional
digits can be added to the left of this representation and a preceding
"-" sign is allowed.
For example, to indicate 1999, one would write: 1999.
See also .
Constraining facets
gMonthDay
gMonthDay is a gregorian date that recurs, specifically a day of
the year such as the third of May. Arbitrary recurring dates are not
supported by this datatype. The of
gMonthDay is the set of calendar
dates, as defined in § 3 of . Specifically,
it is a set of one-day long, annually periodic instances.
Since the lexical representation allows an optional time zone
indicator, gMonthDay values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of which has a
time zone and the other does not. If
gMonthDay values are considered as periods of time, the order relation
on gMonthDay values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gMonthDay values with or without time zone indicators are totally ordered.
Because day/month combinations in one calendar only rarely correspond
to day/month combinations in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.
Lexical representation
The lexical representation for gMonthDay is the left
truncated lexical representation for : --MM-DD.
An optional following time
zone qualifier is allowed as for .
No preceding sign is allowed. No other formats are allowed. See also .
This datatype can be used to represent a specific day in a month.
To say, for example, that my birthday occurs on the 14th of September ever year.
Constraining facets
gDay
gDay is a gregorian day that recurs, specifically a day
of the month such as the 5th of the month. Arbitrary recurring days
are not supported by this datatype. The
of gDay is the space of a set of calendar
dates as defined in § 3 of . Specifically,
it is a set of one-day long, monthly periodic instances.
This datatype can be used to represent a specific day of the month.
To say, for example, that I get my paycheck on the 15th of each month.
Since the lexical representation allows an optional time zone
indicator, gDay values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of
which has a time zone and the other does not. If
gDay values are considered as periods of time, the order relation
on gDay values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gDay
values with or without time zone indicators are totally ordered.
Because days in one calendar only rarely correspond
to days in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.
Lexical representation
The lexical representation for gDay is the left
truncated lexical representation for : ---DD .
An optional following time
zone qualifier is allowed as for . No preceding sign is
allowed. No other formats are allowed. See also .
Constraining facets
gMonth
gMonth is a gregorian month that recurs every year.
The
of gMonth is the space of a set of calendar
months as defined in § 3 of . Specifically,
it is a set of one-month long, yearly periodic instances.
This datatype can be used to represent a specific month.
To say, for example, that Thanksgiving falls in the month of November.
Since the lexical representation allows an optional time zone
indicator, gMonth values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of which has a
time zone and the other does not. If
gMonth values are considered as periods of time, the order relation
on gMonth is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gMonth
values with or without time zone indicators are totally ordered.
Because months in one calendar only rarely correspond
to months in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.
Lexical representation
The lexical representation for gMonth is the left
and right truncated lexical representation for : --MM--.
An optional following time
zone qualifier is allowed as for . No preceding sign is
allowed. No other formats are allowed. See also .
Constraining facets
hexBinary
hexBinary represents
arbitrary hex-encoded binary data. The of
hexBinary is the set of finite-length sequences of binary
octets. Each binary octet is encoded as a character tuple, consisting of two
hexadecimal digits ([0-9a-fA-F]) representing the octet code. For example,
"0FB7" is the hex encoding for the 16-bit integer 4023
(whose binary representation is 111110110111).
Constraining facets
base64Binary
base64Binary
represents Base64-encoded arbitrary binary data. The of
base64Binary is the set of finite-length sequences of binary
octets. For base64Binary data the
entire binary stream is encoded using the Base64
Content-Transfer-Encoding defined in Section 6.8 .
Constraining facets
anyURI
anyURI represents a Uniform Resource Identifier Reference
(URI). An anyURI value can be absolute or relative, and may
have an optional fragment identifier (i.e., it may be a URI Reference). This
type should be used to specify the intention that the value fulfills
the role of a URI as defined by , as amended by
.
The mapping from anyURI values to URIs is as defined in
Section 5.4 Locator Attribute
of (see also Section 8
Character Encoding in URI References
of ). This means
that a wide range of internationalized resource identifiers can be specified
when an anyURI is called for, and still be understood as
URIs per , as amended by ,
where appropriate to identify resources.
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, inclusing restrictions on the syntax of allowed fragement
identifiers. Because it is
impractical for processors to check that a value is a
context-appropriate URI reference, this specification follows the
lead of (as amended by )
in this matter: such rules and restrictions are not part of type validity
and are not checked by processors.
Thus in practice the above definition imposes only very modest obligations
on processors.
Lexical representation
The of anyURI is
finite-length character sequences which, when the algorithm defined in
Section 5.4 of is applied to them, result in strings
which are legal URIs according to , as amended by
.
Spaces are, in principle, allowed in the
of anyURI, however, their use is highly discouraged
(unless they are encoded by %20).
Constraining facets
QName
QName represents
XML qualified names.
The of QName is the set of
tuples {namespace name,
local part},
where namespace name
is a
and local part is
an .
The of QName is the set
of strings that the
QName production of .
The mapping between literals in the and
values in the of QName requires
a namespace declaration to be in scope for the context in which QName
is used.
Constraining facets
NOTATION
NOTATION
represents the NOTATION attribute
type from . The
of NOTATION is the set s. The
of NOTATION is the set
of all names of notations
declared in the current schema.
enumeration facet value required for NOTATION
It is an for NOTATION
to be used directly in a schema. Only datatypes that are
from NOTATION by
specifying a value for can be used
in a schema.
For compatibility (see ) NOTATION
should be used only on attributes.
Constraining facets
Derived datatypes
This section gives conceptual definitions for all
datatypes
defined by this specification. The XML representation used to define
datatypes (whether
or ) is
given in section and the complete
definitions of the datatypes are provided in Appendix A
.
normalizedString
normalizedString
represents white space normalized strings.
The of normalizedString is the
set of strings that do not
contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters.
The of normalizedString is the
set of strings that do not
contain the carriage return (#xD) nor tab (#x9) characters.
The of normalizedString is .
Constraining facets
Derived datatypes
token
token
represents tokenized strings.
The of token is the
set of strings that do not
contain the line feed (#xA) nor tab (#x9) characters, that have no
leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces.
The of token is the
set of strings that do not
contain the line feed (#xA) nor tab (#x9) characters, that have no
leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces.
The of token is .
Constraining facets
Derived datatypes
language
language
represents natural language identifiers as defined by
. The of
language is the set of all strings that are valid
language identifiers as defined in the
language identification
section of . The of
language is the set of all strings that are valid
language identifiers as defined in the
language identification
section of .
The of language is .
Constraining facets
IDREFS
IDREFS represents the
IDREFS attribute type from
. The of
IDREFS is the set of finite, non-zero-length sequences of
s that have been used in an XML document.
The of IDREFS is the
set of white space separated lists of tokens, of which each token is in the
of .
The of IDREFS is
.
The of IDREFS is scoped
to a specific instance document.
For compatibility (see ) IDREFS
should be used only on attributes.
Constraining facets
ENTITIES
ENTITIES
represents the ENTITIES attribute
type from . The
of ENTITIES is the set of finite, non-zero-length sequences of
s that have been declared as
unparsed entities
in a document type definition.
The of ENTITIES is the
set of white space separated lists of tokens, of which each token is in the
of .
The of ENTITIES is
.
The of ENTITIES is scoped
to a specific instance document.
For compatibility (see ) ENTITIES
should be used only on attributes.
Constraining facets
NMTOKEN
NMTOKEN represents
the NMTOKEN attribute type
from . The of
NMTOKEN is the set of tokens that
the Nmtoken production in
. The of
NMTOKEN is the set of strings that
the Nmtoken production in
. The of
NMTOKEN is .
For compatibility (see ) NMTOKEN
should be used only on attributes.
Constraining facets
Derived datatypes
NMTOKENS
NMTOKENS
represents the NMTOKENS attribute
type from . The
of NMTOKENS is the set of finite, non-zero-length sequences of
s. The
of NMTOKENS is the set of white space separated lists of tokens,
of which each token is in the of
. The of
NMTOKENS is .
For compatibility (see )
NMTOKENS should be used only on attributes.
Constraining facets
Name
Name
represents XML Names.
The of Name is
the set of all strings which the
Name production of
. The of
Name is the set of all strings which
the Name production of
. The of Name
is .
Constraining facets
Derived datatypes
NCName
NCName represents XML
"non-colonized" Names. The of
NCName is the set of all strings which
the NCName production of
. The of
NCName is the set of all strings which
the NCName production of
. The of
NCName is .
Constraining facets
Derived datatypes
ID
ID represents the
ID attribute type from
. The of
ID is the set of all strings that
the NCName production in
. The
of ID is the set of all
strings that the
NCName production in
.
The of ID is .
For compatibility (see )
ID should be used only on attributes.
Constraining facets
IDREF
IDREF represents the
IDREF attribute type from
. The of
IDREF is the set of all strings that
the NCName production in
. The
of IDREF is the set of
strings that the
NCName production in
.
The of IDREF is .
For compatibility (see ) this datatype
should be used only on attributes.
Constraining facets
Derived datatypes
ENTITY
ENTITY represents the
ENTITY attribute type from
. The of
ENTITY is the set of all strings that
the NCName production in
and have been declared as an
unparsed entity in
a document type definition.
The of ENTITY is the set
of all strings that the
NCName production in
.
The of ENTITY is .
For compatibility (see ) ENTITY
should be used only on attributes.
Constraining facets
Derived datatypes
integer
integer is
from by fixing the
value of to be 0. This results in the standard
mathematical concept of the integer numbers. The
of integer is the infinite
set {...,-2,-1,0,1,2,...}. The of
integer is .
Lexical representation
integer has a lexical representation consisting of a finite-length sequence
of decimal digits (#x30-#x39) with an optional leading sign. If the sign is omitted,
"+" is assumed. For example: -1, 0, 12678967543233, +100000.
Canonical representation
The canonical representation for integer is defined
by prohibiting certain options from the
. Specifically, the preceding optional "+" sign is prohibited and leading zeroes are prohibited.
nonPositiveInteger is from
by setting the value of
to be 0. This results in the
standard mathematical concept of the non-positive integers.
The of nonPositiveInteger
is the infinite set {...,-2,-1,0}. The
of nonPositiveInteger is .
Lexical representation
nonPositiveInteger has a lexical representation consisting
of a negative sign ("-") followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sequence of digits consists of all
zeros then the sign is optional.
For example: -1, 0, -12678967543233, -100000.
Canonical representation
The canonical representation for nonPositiveInteger is defined
by prohibiting certain options from the
. Specifically, the
negative sign ("-") is required with the token "0" and leading zeroes are prohibited.
negativeInteger is from
by setting the value of
to be -1. This results in the
standard mathematical concept of the negative integers. The
of negativeInteger
is the infinite set {...,-2,-1}. The
of negativeInteger is .
Lexical representation
negativeInteger has a lexical representation consisting
of a negative sign ("-") followed by a finite-length
sequence of decimal digits (#x30-#x39). For example: -1, -12678967543233, -100000.
Canonical representation
The canonical representation for negativeInteger is defined
by prohibiting certain options from the
. Specifically, leading zeroes are prohibited.
Constraining facets
&long;
&long; is
from by setting the
value of to be 9223372036854775807
and to be -9223372036854775808.
The of &long; is
.
Lexical representation
&long; has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0,
12678967543233, +100000.
Canonical representation
The canonical representation for long is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Derived datatypes
∫
∫
is from by setting the
value of to be 2147483647 and
to be -2147483648. The
of ∫ is .
Lexical representation
∫ has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0,
126789675, +100000.
Canonical representation
The canonical representation for ∫ is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Derived datatypes
&short;
&short; is
from by setting the
value of to be 32767 and
to be -32768. The
of &short; is
.
Lexical representation
&short; has a lexical representation consisting
of an optional sign followed by a finite-length sequence of decimal
digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0, 12678, +10000.
Canonical representation
The canonical representation for &short; is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Derived datatypes
&byte;
&byte;
is from
by setting the value of to be 127
and to be -128.
The of &byte; is
.
Lexical representation
&byte; has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0,
126, +100.
Canonical representation
The canonical representation for &byte; is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
nonNegativeInteger
nonNegativeInteger is from
by setting the value of
to be 0. This results in the
standard mathematical concept of the non-negative integers. The
of nonNegativeInteger
is the infinite set {0,1,2,...}. The of
nonNegativeInteger is .
Lexical representation
nonNegativeInteger has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example:
1, 0, 12678967543233, +100000.
Canonical representation
The canonical representation for nonNegativeInteger is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
&unsignedLong; is from
by setting the value of
to be 18446744073709551615.
The of &unsignedLong; is
.
Lexical representation
&unsignedLong; has a lexical representation consisting
of a finite-length sequence of decimal digits (#x30-#x39).
For example: 0,
12678967543233, 100000.
Canonical representation
The canonical representation for unsignedLong is defined
by prohibiting certain options from the
. Specifically,
leading zeroes are prohibited.
&unsignedByte; is from
by setting the value of
to be 255. The
of &unsignedByte; is
.
Lexical representation
&unsignedByte; has a lexical representation consisting
of a finite-length
sequence of decimal digits (#x30-#x39).
For example: 0,
126, 100.
Canonical representation
The canonical representation for unsignedByte is defined
by prohibiting certain options from the
. Specifically,
leading zeroes are prohibited.
Constraining facets
positiveInteger
positiveInteger is from
by setting the value of
to be 1. This results in the standard
mathematical concept of the positive integer numbers.
The of positiveInteger
is the infinite set {1,2,...}. The of
positiveInteger is .
Lexical representation
positiveInteger has a lexical representation consisting
of an optional positive sign ("+") followed by a finite-length
sequence of decimal digits (#x30-#x39).
For example: 1, 12678967543233, +100000.
Canonical representation
The canonical representation for positiveInteger is defined
by prohibiting certain options from the
. Specifically, the
optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Datatype components
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
absent as their value.
Any property identified as a having a set, subset or
value may have an empty value unless this is explicitly ruled out: this is
not the same as absent.
Any property value identified as a superset or a subset of some set may
be equal to that set, unless a proper superset or subset is explicitly
called for.
For more information on the notion of datatype (schema) components,
see Schema Component Details
of .
Readers whose primary interest is in the XML representation of datatype
definitions might wish to skip this section on the first reading,
concentrating instead on .
Simple Type Definition
Simple Type definitions provide for:
Establishing the and
of a datatype, through
the combined set of s specified
in the definition;
Attaching a unique name (actually a ) to the
and .
The Simple Type Definition schema component has the following properties:
Optional. An NCName as defined by
.
Either absent or a
namespace name, as defined in .
One of {atomic, list, union}. Depending on the
value of , further properties are defined as follows:
A
datatype definition (or the
simple ur-type definition).
An or simple type definition.
A non-empty sequence of simple type definitions.
A possibly empty set of .
A set of
If the datatype has been by
then the component
from which it is , otherwise
the .
A subset of {restriction, list, union}.
Optional. An annotation.
Datatypes are identified by their
and . Except
for anonymous datatypes (those with no ),
datatype definitions be uniquely identified
within a schema.
If is
then the of the datatype defined will
be a subset of the of
(which is a subset of the
of ).
If is
then the of the datatype defined will
be the set of finite-length sequence of values from the
of .
If is then the
of the datatype defined will be the
union of the s of each datatype in
.
If is
then the of
must be .
If is
then the of
must be either or .
If is
then
must be a list of datatype definitions.
The value of consists of the set of
s specified directly in the datatype definition
unioned with the possibly empty set of of
.
The value of consists of the set of
s and their values.
If is the empty set then the type can be used
in deriving other types; the explicit values restriction,
list and union prevent further derivations
by , and
respectively.
applicable facets
The s which are allowed
to be members of are dependent on
as specified in the following table:
list of atomic
If is , then
the of be or
.
no circular unions
If is ,
then
it is an if
and
and of any member of
.
Datatype Valid
A string is datatype-valid with respect to a datatype definition if:
it es a literal in the
of the datatype, determined as follows:
if is a member of ,
then the string must be ;
if is not a member of ,
then
if is then
the string must a literal in the
of
if is then
the string must be a sequence of white space separated tokens, each of
which es a literal in the
of
if is then
the string must a literal in the
of at least one member of
the value denoted by the literal ed in the previous step
is a member of the of the datatype, as determined
by it being
with respect to each member of (except
for ).
Simple Type Definition for anySimpleType
There is a simple type definition nearly equivalent to the simple version
of the ur-type definition present
in every schema by definition. It has the following properties:
anySimpleType
http://www.w3.org/2001/XMLSchema
the ur-type definition
the empty set
absent
Fundamental facets
This section provides the details of each
component.
s provide for:
a semantic characterization of the values in a
ordered
provides for:
indicating whether an is
defined on a , and if so,
whether that is
a or a
One of {false, partial, total}.
depends on ,
and
in the component in which a
component appears as a member of
.
When is ,
is inherited from
of .
When is ,
is false.
When is ,
if is true
for every member of
and all members of share a common
ancestor, then is true;
else is false.
bounded
provides for:
indicating whether a is
A .
depends on ,
and
in the component in which a
component appears as a member of
.
When is ,
if one of or
and one of or
are among , then
is true; else
is false.
When is ,
if or both of
and
are among , then
is true; else
is false.
When is ,
if is true
for every member of
and all members of share a common
ancestor, then is true;
else is false.
cardinality
provides for:
indicating whether the
of a is
finite or countably infinite
One of {finite, countably infinite}.
depends on ,
and
in the component in which a
component appears as a member of
.
When is ,
if one of or
and one of or
are among , then
is finite; else
is countably infinite.
When is ,
if or both of
and
are among , then
is finite; else
is countably infinite.
When is ,
if is finite
for every member of , then
is finite;
else is countably infinite.
numeric
provides for:
indicating whether a is
A
depends on ,
, and
in the component
in which a component appears as a member of
.
When is ,
is inherited from
of .
When is ,
is false.
When is ,
if is true
for every member of , then
is true;
else is false.
Constraining facets
This section provides the details of each
component.
s provide for:
Constraining the of a datatype
by specifying optional properties which serve to semantically
characterize the values in the .
Facet Valid
A value in a is facet-valid with
respect to a component if:
the value is facet-valid with respect to the particular
as specified below.
length
provides for:
Constraining a
to values with a specific number of units of length,
where units of length
varies depending on .
A .
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
length and minLength or maxLength
It is an for both
and either of
or
to be members of .
length valid restriction
It is an if
is among the members of of
and is
greater than the of the parent
.
Length Valid
A value in a is facet-valid with
respect to , determined as follows:
if the is then
if is , then the
length of the value, as measured in
characters
be equal to ;
if is or , then the
length of the value, as measured in octets of the binary data,
be equal to ;
if the is ,
then the length of the value, as measured
in list items, be equal to
minLength
provides for:
Constraining a
to values with at least a specific number of units of length,
where units of length
varies depending on .
A .
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
minLength <= maxLength
If both and
are members of , then the
of be less than or equal to the
of .
minLength valid restriction
It is an if
is among the members of of
and is
less than the of the parent
.
minLength Valid
A value in a is facet-valid with
respect to , determined as follows:
if the is then
if is , then the
length of the value, as measured in
characters
be greater than or equal to
;
if is or , then the
length of the value, as measured in octets of the binary data,
be greater than or equal to
;
if the is ,
then the length of the value, as measured
in list items, be greater than or equal
to
maxLength
provides for:
Constraining a
to values with at most a specific number of units of length,
where units of length
varies depending on .
A .
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
maxLength valid restriction
It is an if
is among the members of of
and is
greater than the of the parent
.
maxLength Valid
A value in a is facet-valid with
respect to , determined as follows:
if the is then
if is , then the
length of the value, as measured in
characters
be less than or equal to
;
if is or , then the
length of the value, as measured in octets of the binary data,
be less than or equal to ;
if the is ,
then the length of the value, as measured
in list items, be less than or equal to
pattern
provides for:
Constraining a
to values that are denoted by literals which match a specific
.
A .
Optional. An annotation.
pattern valid
A literal in a is facet-valid with
respect to if:
the literal is among the set of character sequences denoted by
the specified in .
enumeration
provides for:
Constraining a
to a specified set of values.
A set of values from the of the
.
Optional. An annotation.
enumeration valid
A value in a is facet-valid with
respect to if:
the value be one of the values specified in
.
whiteSpace
provides for:
Constraining a according to
the white space normalization rules.
One of {preserve, replace, collapse}.
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
whiteSpace valid restriction
It is an if
is among the members of of
and any of the following conditions is
true:
is replace or preserve
and the of the parent
is collapse
is preserve
and the of the parent
is replace
There are no s associated .
For more information, see the
discussion on white space normalization in
Schema Component Details
in .
maxInclusive
provides for:
Constraining a to values
with a specific .
A value from the of the
.
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
minInclusive <= maxInclusive
It is an for the value specified for
to be greater than the value
specified for for the same datatype.
maxInclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
greater than the of the parent
is among the members of
of
and is
greater than or equal to the of the parent
is among the members of
of
and is
less than the of the parent
is among the members of
of
and is
less than or equal to the of the parent
maxInclusive Valid
A value in an
is facet-valid with respect to , determined as
follows:
if the property in
is true, then the value
be numerically less than or
equal to ;
if the property in
is false (i.e.,
is one of the date and time related
datatypes), then the value be chronologically
less than or equal to ;
maxExclusive
provides for:
Constraining a to values
with a specific .
A value from the of the
.
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
maxInclusive and maxExclusive
It is an for both
and
to be specified for the same datatype.
minExclusive <= maxExclusive
It is an for the value specified for
to be greater than the value
specified for for the same datatype.
maxExclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
greater than the of the parent
is among the members of
of
and is
greater than or equal to the of the parent
is among the members of
of
and is
less than or equal to the of the parent
is among the members of
of
and is
less than or equal to the of the parent
maxExclusive Valid
A value in an
is facet-valid with respect to , determined
as follows:
if the property in
is true, then the
value be numerically less than
;
if the property in
is false (i.e.,
is one of the date and time related
datatypes), then the value be chronologically
less than ;
minExclusive
provides for:
Constraining a to values
with a specific .
A value from the of the
.
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
minInclusive and minExclusive
It is an for both
and
to be specified for the same datatype.
minExclusive < maxInclusive
It is an for the value specified for
to be greater than or equal to the value
specified for for the same datatype.
minExclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
greater than the of the parent
is among the members of
of
and is
greater the of the parent
is among the members of
of
and is
less than or equal to the of the parent
is among the members of
of
and is
greater than or equal to the of the parent
minExclusive Valid
A value in an
is facet-valid with respect to if:
if the property in
is true, then the
value be numerically greater than
;
if the property in
is false (i.e.,
is one of the date and time related
datatypes), then the value be chronologically
greater than ;
minInclusive
provides for:
Constraining a to values
with a specific .
A value from the of the
.
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than .
minInclusive < maxExclusive
It is an for the value specified for
to be greater than or equal to the value
specified for for the same datatype.
minInclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
less than the of the parent
is among the members of
of
and is
greater the of the parent
is among the members of
of
and is
less than or equal to the of the parent
is among the members of
of
and is
greater than or equal to the of the parent
minInclusive Valid
A value in an
is facet-valid with respect to if:
if the property in
is true, then the
value be numerically greater than or equal to
;
if the property in
is false (i.e.,
is one of the date and time related
datatypes), then the value be chronologically
greater than or equal to ;
totalDigits
provides for:
Constraining a to values
with a specific maximum number of decimal digits (#x30-#x39).
A .
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than
.
totalDigits valid restriction
It is an if
is among the members of
of
and is
greater than the of the parent
totalDigits Valid
A value in a is facet-valid with
respect to if:
the number of decimal digits in the value is less than or equal to
;
fractionDigits
provides for:
Constraining a to values
with a specific maximum number of decimal digits in the fractional
part.
A .
A .
Optional. An annotation.
If is true, then types for which
the current type is the cannot specify a
value for other than
.
fractionDigits less than or equal to totalDigits
It is an for to
be greater than .
fractionDigits Valid
A value in a is facet-valid with
respect to if:
the number of decimal digits in the fractional part of the
value is less than or equal to ;
XML representation of datatype definitions
The sections below define correspondences between element information
items and datatype definition components. All the element information
items in the XML representation of a datatype definition are in the
XML Schema namespace, that is their
namespace name
is http://www.w3.org/2001/XMLSchema.
Throughout the following sections, the &i-value; of an attribute
information item or the &i-children; of an element information item means
a string composed of, in order, the &i-ccode; of each character information
item in the &i-attrChildren; of that attribute information item or in the
&i-children; of that element information item respectively.
XML representation of datatype definitions
The XML representation for a schema component
is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the name &i-attribute;, if present,
otherwise null
A set corresponding to the &v-value; of the
final &i-attribute;, if present, otherwise of the &v-value; of the
finalDefault &i-attribute; the ancestor
schema
element information item, if present, otherwise the empty string, as follows:
the empty set;
{restriction, list, union};
a set with members drawn from the set above, each being present or
absent depending on whether the string contains an equivalently named
space-delimited substring.
Although the finalDefault &i-attribute; of
schema may include
values other than
restriction, list or union, those values
are ignored in the determination of
The &v-value; of the targetNamespace &i-attribute;
of the parent schema element information item.
The annotation corresponding to the
element information item in the &i-children;, if present, otherwise
null
A datatype can be
from a datatype or another
datatype by one of three means:
by restriction, by list or by union.
Derivation by restriction
The &v-value; of of
The union of the set of components
resolved to by the facet &i-children; merged with
from , subject to the Facet Restriction Valid
constraints specified in .
The component resolved to by the &v-value; of the
base &i-attribute; or the &i-children;,
whichever is present.
base attribute or simpleType child
Either the base &i-attribute; or the
simpleType &i-child; must be present, but not both.
An electronic commerce schema might define a datatype called
Sku (the barcode number that appears on products) from the
datatype by
supplying a value for the facet.
]]>
In this case, Sku is the name of the new
datatype, is
its and
is the facet.
Derivation by list
list
The component resolved to by the &v-value; of the
itemType &i-attribute;
or the &i-children;,
whichever is present.
itemType attribute or simpleType child
Either the itemType &i-attribute; or the
&i-child; must be present, but not both.
A datatype must be
from an or a datatype,
known as the
of the datatype.
This yields a datatype whose is composed of
finite-length sequences of values from the of the
and whose is
composed of white space separated lists of literals of the
.
A system might want to store lists of floating point values.
]]>
In this case, listOfFloat is the name of the new
datatype, is its
and is the
derivation method.
As mentioned in ,
when a datatype is from a
datatype, the following
s can be used:
regardless of the s that are applicable
to the datatype that serves as the
of the .
For each of ,
and , the
unit of length is measured in number of list items.
The value of
is fixed to the value collapse.
Derivation by union
union
The sequence of components resolved to by the
items in the &v-value; of the
memberTypes &i-attribute;, if any,
in order, followed by the components resolved to by the
&i-children;, if any, in order.
If is union for
any components resolved to above, then
the that is replaced by its
.
memberTypes attribute or simpleType children
Either the memberTypes &i-attribute; must be non-empty or
there must be at least one simpleType &i-child;.
A datatype can be
from one or more , or
other datatypes, known as the
of that datatype.
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
type definition below would accomplish that.
]]>
A header
this is a test
]]>
As mentioned in ,
when a datatype is from a
datatype, the only following
s can be used:
regardless of the s that are
applicable to the datatypes that participate in the
Constraining facets
This section discusses the details of the XML Representation for specifying
s in
a datatype definition.
Single Facet Value
Unless otherwise specifically allowed by this specification
( and
) any given
can only be specifed once within
a single derivation step.
length
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype to represent product codes which must be
exactly 8 characters in length. By fixing the value of the
length facet we ensure that types derived from productCode can
change or set the values of other facets, such as pattern, but
cannot change the length.
]]>
minLength
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which requires strings to have at least one character (i.e.,
the empty string is not in the
of this datatype).
]]>
maxLength
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which might be used to accept form input with an upper limit
to the number of characters that are acceptable.
]]>
pattern
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be a valid
.
The &v-value; of the value &i-attribute;
The annotations corresponding to all the
element information items in the &i-children;, if any.
Multiple patterns
If multiple element information items appear as
&i-children; of a , the &i-value;s should
be combined as if they appeared in a single
as separate
es.
It is a consequence of the schema representation constraint
and of the rules for
that
facets specified on the same step in a type
derivation are ORed together, while
facets specified on different steps of a type derivation
are ANDed together.
Thus, to impose two constraints simultaneously,
schema authors may either write a single which
expresses the intersection of the two s they wish to
impose, or define each on a separate type derivation
step.
The following is the definition of a
datatype which is a better representation of postal codes in the
United States, by limiting strings to those which are matched by
a specific .
]]>
enumeration
The XML representation for an schema
component is an element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be
in the of .
The &v-value; of the value &i-attribute;
The annotations corresponding to all the
element information items in the &i-children;, if any.
Multiple enumerations
If multiple element information items appear
as &i-children; of a the
of the
component should be the set of all such &i-value;s.
The following example is a datatype definition for a
datatype which limits the values
of dates to the three US holidays enumerated. This datatype
definition would appear in a schema authored by an "end-user" and
shows how to define a datatype by enumerating the values in its
. The enumerated values must be
type-valid literals for the .
some US holidaysNew Year's day4th of JulyChristmas
]]>
whiteSpace
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following example is the datatype definition for
the
datatype.
]]>
maxInclusive
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which limits values to integers less than or equal to
100, using .
]]>
maxExclusive
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which limits values to integers less than or equal to
100, using .
]]>
Note that the
of this datatype is identical to
the previous one (named 'one-hundred-or-less').
minInclusive
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which limits values to integers greater than or equal to
100, using .
]]>
minExclusive
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which limits values to integers greater than or equal to
100, using .
]]>
Note that the
of this datatype is identical to
the previous one (named 'one-hundred-or-more').
totalDigits
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which could be used to represent monetary amounts, such as
in a financial management application which does not have figures
of $1M or more and only allows whole cents. This definition would appear
in a schema authored by an "end-user" and shows how to define a datatype by
specifying facet values which constrain the range of the
in a manner specific to the
(different than specifying max/min values
as before).
]]>
fractionDigits
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
The following is the definition of a
datatype which could be used to represent the magnitude
of a person's body temperature on the Celsius scale.
This definition would appear in a schema authored by an "end-user"
and shows how to define a datatype by specifying facet values which
constrain the range of the .
]]>
Conformance
This specification describes two levels of conformance for
datatype processors. The first is
required of all processors. Support for the other will depend on the
application environments for which the processor is intended.
Minimally conforming processors
completely and correctly implement the and
.
Processors which accept schemas in the form of XML documents as described
in are additionally said to provide
conformance to the XML Representation of Schemas,
and , when processing schema documents, completely and
correctly implement all
s
in this specification, and adhere exactly to the
specifications in for mapping the contents of such
documents to schema components
for use in validation.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be minimally conforming
but not necessarily in conformance to
the XML Representation of Schemas.
Schema for Datatype Definitions (normative)
DTD for Datatype Definitions (non-normative)
Datatypes and Facets
Fundamental Facets
The following table shows the values of the fundamental facets
for each datatype.
ISO 8601 Date and Time Formats
ISO 8601 Conventions
The datatypes
, , ,
, , ,
, and
use lexical formats inspired by
. This appendix provides more detail on the ISO
formats and discusses some deviations from them for the datatypes
defined in this specification.
"specifies the representation of dates in the
proleptic Gregorian calendar and times and representations of periods of time".
The proleptic Gregorian calendar includes dates prior to 1582 (the year it came
into use as an ecclesiastical calendar).
It should be pointed out that the datatypes described in this
specification do not cover all the types of data covered by
, nor do they support all the lexical
representations for those types of data.
lexical formats are described using "pictures"
in which characters are used in place of digits. For the primitive datatypes
, ,
, , ,
, and .
these characters have the following meanings:
C -- represents a digit used in the thousands and hundreds components,
the "century" component, of the time element "year". Legal values are
from 0 to 9.
Y -- represents a digit used in the tens and units components of the time
element "year". Legal values are from 0 to 9.
M -- represents a digit used in the time element "month". The two
digits in a MM format can have values from 1 to 12.
D -- represents a digit used in the time element "day". The two digits
in a DD format can have values from 1 to 28 if the month value equals 2,
1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30
if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value
equals 1, 3, 5, 7, 8, 10 or 12.
h -- represents a digit used in the time element "hour". The two digits
in a hh format can have values from 0 to 23.
m -- represents a digit used in the time element "minute". The two digits
in a mm format can have values from 0 to 59.
s -- represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
be followed by decimal seconds to an arbitrary level of precision.
This is represented in the picture by "ss.sss". A value of 60 or more is
allowed only in the case of leap seconds.
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31 in UTC.
Because the leap second is added or subtracted as the last second of the day
in UTC time, the long (or short) minute could occur at other times in local
time. In cases where the leap second is used with an inappropriate month
and day it, and any fractional seconds, should considered as added or
subtracted from the following minute.
For all the information items indicated by the above characters, leading
zeros are required where indicated.
In addition to the above, certain characters are used as designators
and appear as themselves in lexical formats.
T -- is used as time designator to indicate the start of the
representation of the time of day in .
Z -- is used as time-zone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time (UTC) in
, ,
, , ,
, , and .
In the lexical format for the following
characters are also used as designators and appear as themselves in
lexical formats:
P -- is used as the time duration designator, preceding a data element
representing a given duration of time.
Y -- follows the number of years in a time duration.
M -- follows the number of months or minutes in a time duration.
D -- follows the number of days in a time duration.
H -- follows the number of hours in a time duration.
S -- follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
and datatypes derived from it
does not follow the alternative
format of § 5.5.3.2.1 of .
Truncated and Reduced Formats
supports a variety of "truncated" formats in
which some of the characters on the left of specific formats, for example,
the
century, can be omitted.
Truncated formats are, in
general, not permitted for the datatypes defined in this specification
with three exceptions. The datatype uses
a truncated format for
which represents an instant of time that recurs every day.
Similarly, the and
datatypes use left-truncated formats for .
The datatype uses a right and left truncated format for
.
also supports a variety of "reduced" or right-truncated
formats in which some of the characters to the right of specific formats,
such as the
time specification, can be omitted. Right truncated formats are also, in
general,
not permitted for the datatypes defined in this specification
with the following exceptions:
right-truncated representations of are used as
lexical representations for , ,
.
Deviations from ISO 8601 Formats
Sign Allowed
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for , ,
, , .
No Year Zero
The year "0000" is an illegal year value.
More Than 9999 Years
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of ,
, , and .
This follows
.
Adding durations to dateTimes
Given a S and a
D, this appendix specifies how to compute
a E where E is the end of the
time period with start S and duration D i.e.
E = S + D. Such computations are used, for example,
to determine whether a is within a specific
time period. This appendix also addresses the addition of s
to the datatypes , and
which can be viewed as a set of s.
In such cases, the addition is made to the first or starting
in the set.
This is a logical explanation of the process.
Actual implementations are free to optimize as long as they produce the same
results. The calculation uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. It also depends on
the following functions:
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(-1,3) = -1
fQuotient(0,3)...fQuotient(2,3) = 0
fQuotient(3,3) = 1
fQuotient(3.123,3) = 1
modulo(a, b) = a - fQuotient(a,b)*b
modulo(-1,3) = 2
modulo(0,3)...modulo(2,3) = 0...2
modulo(3,3) = 0
modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a - low, high - low)
M = January, March, May, July, August, October, or
December
30
M = April, June, September, or November
29
M = February AND (modulo(Y, 400) = 0 OR
(modulo(Y, 100) != 0) AND modulo(Y, 4) = 0)
28
Otherwise
Algorithm
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is pinned to be within range. Thus April
31 turns into April 30. Then the <day,hour,minute,second> is added. This
latter addition can cause the year and month to change.
Leap seconds are handled by the computation by treating them as overflows.
Essentially, a value of 60
seconds in S is treated as if it were a duration of 60 seconds added to S
(with a zero seconds field). All calculations
thereafter use 60 seconds per minute.
Thus the addition of either PT1M or PT60S to any dateTime will always
produce the same result. This is a special definition of addition which
is designed to match common practice, and -- most importantly -- be stable
over time.
A definition that attempted to take leap-seconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in UTC
is the responsibility of the . They make periodic
announcements as to when
leap seconds are to be added, but this is not known more than a year in
advance. For more information on leap seconds, see .
The following is the precise specification. These steps must be followed in
the same order. If a field in D is not specified, it is treated as if it were
zero. If a field in S is not specified, it is treated in the calculation as if
it were the minimum allowed value in that field, however, after the calculation
is concluded, the corresponding field in E is removed (set to unspecified).
Months (may be modified additionally below)
temp := S[month] + D[month]
E[month] := modulo(temp, 1, 13)
carry := fQuotient(temp, 1, 13)
Years (may be modified additionally below)
E[year] := S[year] + D[year] + carry
Zone
E[zone] := S[zone]
Seconds
temp := S[second] + D[second]
E[second] := modulo(temp, 60)
carry := fQuotient(temp, 60)
Minutes
temp := S[minute] + D[minute] + carry
E[minute] := modulo(temp, 60)
carry := fQuotient(temp, 60)
Hours
temp := S[hour] + D[hour] + carry
E[hour] := modulo(temp, 24)
carry := fQuotient(temp, 24)
Days
if S[day] > maximumDayInMonthFor(E[year], E[month])
A R is a sequence of
characters that denote a set of stringsL(R).
When used to constrain a , a
regular expressionR asserts that only strings
in L(R) are valid literals for values of that type.
A
regular expression is composed from zero or more
es, separated by | characters.
Regular Expression
regExpbranch
( '|' branch )*
For all es S, and for all
s T, valid
s R are:
Denoting the set of strings L(R) containing:
(empty string)
the set containing just the empty string
S
all strings in L(S)
S|T
all strings in L(S) and
all strings in L(T)
A branch consists
of zero or more s, concatenated together.
Branch
branch*
For all s S, and for all
es T, valid
es R are:
Denoting the set of strings L(R) containing:
S
all strings in L(S)
ST
all strings st with s in
L(S) and t in L(T)
A piece is an
, possibly followed by a
.
Piece
piece?
For all s S and non-negative
integers n, m such that
n <= m, valid s
R are:
Denoting the set of strings L(R) containing:
S
all strings in L(S)
S?
the empty string, and all strings in
L(S).
S*
All strings in L(S?) and all strings st
with s in L(S*)
and t in L(S). ( all concatenations
of zero or more strings from L(S) )
S+
All strings st with s in L(S)
and t in L(S*). ( all concatenations
of one or more strings from L(S) )
S{n,m}
All strings st with s in L(S)
and t in L(S{n-1,m-1}). ( All
sequences of at least n, and at most m, strings from L(S) )
S{n}
All strings in L(S{n,n}). ( All
sequences of exactly n strings from L(S) )
S{n,}
All strings in L(S{n}S*) ( All
sequences of at least n, strings from L(S) )
S{0,m}
All strings st with s in L(S?)
and t in L(S{0,m-1}). ( All
sequences of at most m, strings from L(S) )
S{0,0}
The set containing only the empty string
The regular expression language in the Perl Programming Language
does not include a quantifier of the form
S{,m), since it is logically equivalent to S{0,m}.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification. We welcome
further input from implementors and schema authors on this issue.
A quantifier
is one of ?, *, +,
{n,m} or {n,}, which have the meanings
defined in the table above.
A metacharacter
is either ., \, ?,
*, +, {, }(, ), [ or ].
These characters have special meanings in s,
but can be escaped to form s that denote the
sets of strings containing only themselves, i.e., an escaped
behaves like a .
A
normal character is any XML character that is not a
metacharacter. In s, a normal character is an
atom that denotes the singleton set of strings containing only itself.
Normal Character
Char[^.\?*+()|#x5B#x5D]
Note that a can be represented either as
itself, or with a character
reference.
Character Classes
A
character class is an R that identifies a set of charactersC(R). The set of strings L(R) denoted by a
character class R contains one single-character string
"c" for each character c in C(R).
Character Class
charClass |
A character class is either a or a
.
A
character class expression is a surrounded
by [ and ] characters. For all character
groups G, [G] is a valid character class
expression, identifying the set of characters
C([G]) = C(G).
Character Class Expression
charClassExpr'[' ']'
A
character group is either a ,
a , or a .
Character Group
charGroup |
|
A positive character group consists of one or more
s or s, concatenated
together. A positive character group identifies the set of
characters containing all of the characters in all of the sets identified
by its constituent ranges or escapes.
Positive Character Group
posCharGroup
(
|
)+
For all s R, all
s E, and all
s P, valid
s G are:
Identifying the set of characters C(G) containing:
R
all characters in C(R).
E
all characters in C(E).
RP
all characters in C(R) and all
characters in C(P).
EP
all characters in C(E) and all
characters in C(P).
A negative character group is a
preceded by the ^ character.
For all s P, ^P
is a valid negative character group, and C(^P)
contains all XML characters that are not in C(P).
Negative Character Group
negCharGroup'^'
A
character class subtraction is a
subtracted from a or
, using the - character.
For any or
G, and any
C, G-C is a valid
, identifying the set of all characters in
C(G) that are not also in C(C).
A
character rangeR identifies a set of
characters C(R) containing all XML characters with UCS
code points in a specified range.
Character Range
charRange |
|
seRange '-' charOrEsc | XmlCharIncDash[^\#x5B#x5D]
A single XML character is a that identifies
the set of characters containing only itself. All XML characters are valid
character ranges, except as follows:
The [, ], and \ characters are not
valid character ranges;
The ^ character is only valid at the beginning of a
if it is part of a
; and
The - character is a valid character range only at the
beginning or end of a .
A also be written
in the form s-e, identifying the set that contains all XML characters
with UCS code points greater than or equal to the code point
of s, but not greater than the code point of e.
s-e is a valid character range iff:
s is a , or an XML character;
s is not \
If s is the first character in a , then
s is not ^
e is a , or an XML character;
e is not \ or [; and
The code point of e is greater than or equal to the code
point of s;
The code point of a is the code point of the
single character in the set of characters that it identifies.
Character Class Escapes
A character class escape is a short sequence of characters
that identifies predefined character class. The valid character
class escapes are the s, the
s, and the s (including
the s).
Character Class Escape
charClassEsc
(
|
|
|
)
A
single character escape identifies a set containing a only
one character -- usually because that character is difficult or
impossible to write directly into a .
Single Character Escape
SingleCharEsc'\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
The valid s are:
Identifying the set of characters C(R) containing:
\n
the newline character (#xA)
\r
the return character (#xD)
\t
the tab character (#x9)
\\
\
\|
|
\.
.
\-
-
\^
^
\?
?
\*
*
\+
+
\{
{
\}
}
\(
(
\)
)
\[
[
\]
]
specifies a number of possible
values for the "General Category" property
and provides mappings from code points to specific character properties.
The set containing all characters that have property X,
can be identified with a category escape\p{X}.
The complement of this set is specified with the
category escape\P{X}.
([\P{X}] = [^\p{X}]).
is subject to future revision. For example, the
mapping from code points to character properties might be updated.
All processors
support the character properties defined in the version of
that is current at the time this specification became a W3C
Recommendation. However, implementors are encouraged to support the
character properties defined in any future version.
The following table specifies the recognized values of the
"General Category" property.
Category
Property
Meaning
Letters
L
All Letters
Lu
Uppercase
Ll
Lowercase
Lt
Titlecase
Lm
Modifier
Lo
Other
Marks
M
All Marks
Mn
Non-Spacing
Mc
Spacing Combining
Me
Enclosing
Numbers
N
All Numbers
Nd
Decimal Digit
Nl
Letter
No
Other
Punctuation
P
All Punctuation
Pc
Connector
Pd
Dash
Ps
Open
Pe
Close
Pi
Initial quote
(can behave like Ps or Pe depending on usage)
Pf
Final quote
(can behave like Ps or Pe depending on usage)
The properties mentioned above exclude the Cs property.
The Cs property identifies "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
groups code points into a number of blocks
such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul Jamo,
CJK Compatibility, etc.
The set containing all characters that have block name X
(with all white space stripped out),
can be identified with a block escape\p{IsX}.
The complement of this set is specified with the
block escape\P{IsX}.
([\P{IsX}] = [^\p{IsX}]).
Block Escape
IsBlock'Is' [a-zA-Z#x2D]+
The following table specifies the recognized block names (for more
information, see the "Blocks.txt" file in ).
Start Code
End Code
Block Name
Start Code
End Code
Block Name
#x0000
#x007F
BasicLatin
#x0080
#x00FF
Latin-1Supplement
#x0100
#x017F
LatinExtended-A
#x0180
#x024F
LatinExtended-B
#x0250
#x02AF
IPAExtensions
#x02B0
#x02FF
SpacingModifierLetters
#x0300
#x036F
CombiningDiacriticalMarks
#x0370
#x03FF
Greek
#x0400
#x04FF
Cyrillic
#x0530
#x058F
Armenian
#x0590
#x05FF
Hebrew
#x0600
#x06FF
Arabic
#x0700
#x074F
Syriac
#x0780
#x07BF
Thaana
#x0900
#x097F
Devanagari
#x0980
#x09FF
Bengali
#x0A00
#x0A7F
Gurmukhi
#x0A80
#x0AFF
Gujarati
#x0B00
#x0B7F
Oriya
#x0B80
#x0BFF
Tamil
#x0C00
#x0C7F
Telugu
#x0C80
#x0CFF
Kannada
#x0D00
#x0D7F
Malayalam
#x0D80
#x0DFF
Sinhala
#x0E00
#x0E7F
Thai
#x0E80
#x0EFF
Lao
#x0F00
#x0FFF
Tibetan
#x1000
#x109F
Myanmar
#x10A0
#x10FF
Georgian
#x1100
#x11FF
HangulJamo
#x1200
#x137F
Ethiopic
#x13A0
#x13FF
Cherokee
#x1400
#x167F
UnifiedCanadianAboriginalSyllabics
#x1680
#x169F
Ogham
#x16A0
#x16FF
Runic
#x1780
#x17FF
Khmer
#x1800
#x18AF
Mongolian
#x1E00
#x1EFF
LatinExtendedAdditional
#x1F00
#x1FFF
GreekExtended
#x2000
#x206F
GeneralPunctuation
#x2070
#x209F
SuperscriptsandSubscripts
#x20A0
#x20CF
CurrencySymbols
#x20D0
#x20FF
CombiningMarksforSymbols
#x2100
#x214F
LetterlikeSymbols
#x2150
#x218F
NumberForms
#x2190
#x21FF
Arrows
#x2200
#x22FF
MathematicalOperators
#x2300
#x23FF
MiscellaneousTechnical
#x2400
#x243F
ControlPictures
#x2440
#x245F
OpticalCharacterRecognition
#x2460
#x24FF
EnclosedAlphanumerics
#x2500
#x257F
BoxDrawing
#x2580
#x259F
BlockElements
#x25A0
#x25FF
GeometricShapes
#x2600
#x26FF
MiscellaneousSymbols
#x2700
#x27BF
Dingbats
#x2800
#x28FF
BraillePatterns
#x2E80
#x2EFF
CJKRadicalsSupplement
#x2F00
#x2FDF
KangxiRadicals
#x2FF0
#x2FFF
IdeographicDescriptionCharacters
#x3000
#x303F
CJKSymbolsandPunctuation
#x3040
#x309F
Hiragana
#x30A0
#x30FF
Katakana
#x3100
#x312F
Bopomofo
#x3130
#x318F
HangulCompatibilityJamo
#x3190
#x319F
Kanbun
#x31A0
#x31BF
BopomofoExtended
#x3200
#x32FF
EnclosedCJKLettersandMonths
#x3300
#x33FF
CJKCompatibility
#x3400
#x4DB5
CJKUnifiedIdeographsExtensionA
#x4E00
#x9FFF
CJKUnifiedIdeographs
#xA000
#xA48F
YiSyllables
#xA490
#xA4CF
YiRadicals
#xAC00
#xD7A3
HangulSyllables
#xE000
#xF8FF
PrivateUse
#xF900
#xFAFF
CJKCompatibilityIdeographs
#xFB00
#xFB4F
AlphabeticPresentationForms
#xFB50
#xFDFF
ArabicPresentationForms-A
#xFE20
#xFE2F
CombiningHalfMarks
#xFE30
#xFE4F
CJKCompatibilityForms
#xFE50
#xFE6F
SmallFormVariants
#xFE70
#xFEFE
ArabicPresentationForms-B
#xFEFF
#xFEFF
Specials
#xFF00
#xFFEF
HalfwidthandFullwidthForms
#xFFF0
#xFFFD
Specials
is subject to future revision.
For example, the
grouping of code points into blocks might be updated.
All processors
support the blocks defined in the version of
that is current at the time this specification became a W3C
Recommendation. However, implementors are encouraged to support the
blocks defined in any future version of the Unicode Standard.
For example, the for identifying the
ASCII characters is \p{IsBasicLatin}.
A
multi-character escape provides a simple way to identify
a commonly used set of characters:
the set of initial name characters, those
ed by
Letter | '_' | ':'
\I
[^\i]
\c
the set of name characters, those
ed by
NameChar
\C
[^\c]
\d
\p{Nd}
\D
[^\d]
\w
[#x0000-#x10FFFF]-[\p{P}\p{S}\p{C}]
(all characters except the set of "punctuation",
"separator" and "control" characters)
\W
[^\w]
The language defined here does not
attempt to provide a general solution to "regular expressions" over
UCS character sequences. In particular, it does not easily provide
for matching sequences of base characters and combining marks.
The language is targeted at support of "Level 1" features as defined in
. It is hoped that future versions of this
specification will provide support for "Level 2" features.
References
Normative
IEEE. IEEE Standard for Binary Floating-Point Arithmetic.
See
http://standards.ieee.org/reading/ieee/std_public/description/busarch/754-1985_desc.html
World Wide Web Consortium. XML Linking Language (XLink).
Available at:
&xlink;
World Wide Web Consortium. Extensible Markup Language (XML) 1.0, Second
Edition.
Available at: &xmlspec;
XML Schema Part 1: Structures. Available at:
&xsdl;
World Wide Web Consortium. XML Schema Requirements. Available at:
http://www.w3.org/TR/1999/NOTE-xml-schema-req-19990215
World Wide Web Consortium. Namespaces in XML. Available at:
&xmlnsspec;
Tim Berners-Lee, et. al. RFC 2396: Uniform Resource Identifiers (URI):
Generic Syntax.. 1998. Available at:
http://www.ietf.org/rfc/rfc2396.txtRFC
2732: Format for Literal IPv6 Addresses in URL's. 1999.
Available at:
http://www.ietf.org/rfc/rfc2732.txt
N. Freed and N. Borenstein. RFC 2045: Multipurpose Internet Mail Extensions
(MIME) Part One: Format of Internet Message Bodies. 1996. Available at:
http://www.ietf.org/rfc/rfc2045.txt
H. Alvestrand, ed. RFC 1766: Tags for the Identification of Languages
1995. Available at:
http://www.ietf.org/rfc/rfc1766.txt
William D Clinger. How to Read Floating Point Numbers Accurately.
In Proceedings of Conference on Programming Language Design and
Implementation, pages 92-101.
Available at:
ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.ps
The Unicode Consortium. The Unicode Character Database.
Available at:
http://www.unicode.org/Public/3.0-Update1/UnicodeCharacterDatabase-3.0.1.html
Non-normative
L. Masinter and M. Durst.
Internationalized Resource Identifiers
2001. Available at:
http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-07.txt
World Wide Web Consortium. Ruby Annotation. Available at:
http://www.w3.org/TR/2001/WD-ruby-20010216/
World Wide Web Consortium. Hypertext Markup Language, version 4.01. Available at:
&html4;
World Wide Web Consortium. XML Schema Language: Part 2 Primer. Available at:
http://www.w3.org/TR/2001/PR-xmlschema-0-20010316/
Mark Davis. Unicode Regular Expression Guidelines, 1988.
Available at:
http://www.unicode.org/unicode/reports/tr18/
The Perl Programming Language. See
http://www.perl.com/pub/language/info/software.html
ISO (International Organization for Standardization). ISO/IEC
9075-2:1999, Information technology --- Database languages ---
SQL --- Part 2: Foundation (SQL/Foundation).
[Geneva]: International Organization for Standardization, 1999.
See
http://www.iso.ch/cate/d26197.html
International Earth Rotation Service (IERS).
See http://maia.usno.navy.mil
ISO (International Organization for Standardization).
Representations of dates and times, 1988-06-15. Available at:
http://www.iso.ch/markete/8601.pdf
ISO (International Organization for Standardization).
Representations of dates and times, draft revision, 2000.
ISO (International Organization for Standardization).
Language-independent Datatypes. See
http://www.iso.ch/cate/d19346.html
World Wide Web Consortium. RDF Schema Specification.
Available at:
http://www.w3.org/TR/2000/CR-rdf-schema-20000327/Information about Leap Seconds
Available at:
http://tycho.usno.navy.mil/leapsec.990505.html
World Wide Web Consortium.
Extensible Stylesheet Language (XSL).
Available at:
http://www.w3.org/TR/2000/CR-xsl-20001121/
Martin J. Dürst and François Yergeau, eds.
Character Model for the World Wide Web. World Wide Web Consortium
Working Draft. 2001.
Available at:
&charmod;
David M. Gay. Correctly Rounded Binary-Decimal and
Decimal-Binary Conversions. AT&T Bell Laboratories Numerical
Analysis Manuscript 90-10, November 1990.
Available at:
http://cm.bell-labs.com/cm/cs/doc/90/4-10.ps.gz
Acknowledgements (non-normative)
The following have contributed material to this draft:
Asir S. Vedamuthu, webMethods, IncMark Davis, IBM
Co-editor Ashok Malhotra's work on this specification from March 1999 until
February 2001 was supported by IBM.
The editors acknowledge the members of the XML Schema Working Group, the members of other W3C Working Groups, and industry experts in other
forums who have contributed directly or indirectly to the process or content of
creating this document. The Working Group is particularly grateful to Lotus
Development Corp. and IBM for providing teleconferencing facilities.
The current members of the XML Schema Working Group are:
Jim BarnetteDefense Information Systems Agency (DISA)Paul V. BironHealth Level SevenDon BoxDevelopMentorAllen BrownMicrosoftLee BuckTIBCO ExtensibilityCharles E. CampbellInformixWayne CarrIntelPeter ChenBootstrap Alliance and LSUDavid ClearyProgress SoftwareDan ConnollyW3Cstaff contactUgo CordaXeroxRoger L. CostelloMITREHaavard DanielsonProgress SoftwareJosef DietlMozquito TechnologiesDavid EzellHewlett Packard CompanyAlexander FalkAltova GmbHDavid FallsideIBMDan FoxDefense Logistics Information Service (DLIS)Matthew FuchsCommerce OneAndrew GoodchildDistributed Systems Technology Centre (DSTC Pty Ltd)Paul GrossoArborText, IncMartin GudginDevelopMentorDave HollanderContivo, Incco-chairMary HolstegeInvited ExpertJane HunterDistributed Systems Technology Centre (DSTC Pty Ltd)Rick JelliffeAcademia SinicaSimon JohnstonRational SoftwareBob LojekMozquito TechnologiesAshok MalhotraMicrosoftLisa MartinIBMNoah MendelsohnLotus Development CorporationAdrian MichelCommerce OneAlex MilowskiInvited ExpertDon MullenTIBCO ExtensibilityDave PetersonGraphic Communications AssociationJonathan RobieSoftware AGEric SedlarOracle Corp.C. M. Sperberg-McQueenW3Cco-chairBob StreichCalico CommerceWilliam K. StumboXeroxHenry S. ThompsonUniversity of EdinburghMark TuckerHealth Level SevenAsir S. VedamuthuwebMethods, IncPriscilla WalmsleyXMLSolutionsNorm WalshSun MicrosystemsAki YoshidaSAP AGKongyi ZhouOracle Corp.
The XML Schema Working Group has benefited in its work from the
participation and contributions of a number of people not currently
members of the Working Group, including
in particular those named below. Affiliations given are those current at
the time of their work with the WG.
Paula AngersteinVignette CorporationDavid BeechOracle Corp.Gabe Beged-DovRogue Wave SoftwareGreg BumgardnerRogue Wave SoftwareDean BursonLotus Development CorporationMike CokusMITREAndrew EisenbergProgress SoftwareRob EllmanCalico CommerceGeorge FeinbergObject DesignCharles FrankstonMicrosoftErnesto GuerrieriInsoMichael HymanMicrosoftRenato IannellaDistributed Systems Technology Centre (DSTC Pty Ltd)Dianne KennedyGraphic Communications AssociationJanet KoenigSun MicrosystemsSetrag KhoshafianTechnology Deployment International (TDI)Ara KullukianTechnology Deployment International (TDI)Andrew LaymanMicrosoftDmitry LenkovHewlett Packard CompanyJohn McCarthyLawrence Berkeley National LaboratoryMurata MakotoXeroxEve MalerSun MicrosystemsMurray MaloneyMuzmo Communication, acting for Commerce OneChris OldsWall DataFrank OlkenLawrence Berkeley National LaboratoryShriram RevankarXeroxMark ReinholdSun MicrosystemsJohn C. SchneiderMITRELew ShannonNCRWilliam SheaMerrill LynchRalph SwickW3CTony StewartRivcomMatt TimmermansMicrostarSteph TryphonasMicrostar
Revisions from Previous Draft