parent'sowner's'>
]>
XML Schema 1.1 Part 2: Datatypeswd-20050224W3C Working Draft24February2005http://www.w3.org/TR/2005/WD-xmlschema11-2-20050224/XMLXHTML with changes since version 1.0 markedXHTML with changes since previous Working Draft markedIndependent copy of the schema for schema documentsA schema for built-in datatypes only, in a separate namespaceIndependent copy of the DTD for schema documentsList of translationshttp://www.w3.org/TR/xmlschema11-2/http://www.w3.org/TR/2004/WD-xmlschema11-2-20040716/David Petersoninvited expert (SGMLWorks!)davep@iit.eduPaul V. BironKaiser Permanente, for Health Level SevenPaul.V.Biron@kp.orgAshok MalhotraOracle Corporationashokmalhotra@alum.mit.eduC. M. Sperberg-McQueenWorld Wide Web Consortiumcmsmcq@w3.org
This section describes the status of this document at the
time of its publication. Other documents may supersede this document.
A list of current W3C publications and the latest revision of this
technical report can be found in the W3C technical reports index at
http://www.w3.org/TR/.
This is a
Public Working Draft of XML Schema 1.1. It is here made
available for review by W3C members and the public. It is intended to
give an indication of the W3C XML Schema Working Group's intentions
for this new version of the XML Schema language and our progress in
achieving them. It attempts to be complete in indicating
what will change from version 1.0, but does
not specify in all cases how things will
change.
For those primarily interested in the changes since version 1.0,
the appendix, which summarizes
both changes already made and also those in prospect, with links to
the relevant sections of this draft, is the recommended starting
point. Accompanying versions of this document display in color
all changes to normative text since version 1.0 and since the
previous Working Draft.
This draft was published on 24 February 2005.
The major changes are:
A new primitive decimal type has been defined, which retains
information about the precision of the value. This type is
aligned with the floating-point decimal types which will be
part of the next edition of IEEE 754.
In order to align this specification with those being prepared
by the XSL and XML Query Working Groups, a new datatype named
has been introduced.
The conceptual model of the date- and time-related types has
been defined more formally.
Two subtypes of
( and
) have been introduced, each of which is
totally ordered.
A more formal treatment of the fundamental facets of the primitive
datatypes has been adopted.
More formal definitions of the lexical space of most types have
been provided, with detailed descriptions of the mappings from lexical
representation to value and from value to canonical representation.
Please send comments on this Working Draft to
www-xml-schema-comments@w3.org
(archive).
Publication as a Working Draft does not imply endorsement by the
W3C Membership. This is a draft document and may be updated, replaced
or obsoleted by other documents at any time. It is inappropriate to
cite this document as other than work in progress.
This document has been produced by the
W3C XML Schema Working Group
as part of the W3C XML
Activity. The goals of the XML Schema language version 1.1 are
discussed in the Requirements
for XML Schema 1.1 document. The authors of this document are
the members of the XML Schema Working Group. Different parts of this
specification have different editors.
Patent disclosures relevant to this specification may
be found on the Working Group's Patent
disclosure page in conformance with the W3C Patent
Policy of 5 February 2004. An individual who has actual
knowledge of a patent which the individual believes contains Essential
Claim(s) with respect to this specification should disclose the
information in accordance with section
6 of the W3C Patent Policy.
The English version of this specification is the only normative
version. Information about translations of this document is available
at http://www.w3.org/2003/03/Translations/byTechnology?technology=xmlschema.
XML Schema: Datatypes is part 2 of the specification of the XML
Schema language. It defines facilities for defining datatypes to be used
in XML Schemas as well as other XML specifications.
The datatype language, which is itself represented in
XML 1.0, provides a superset of the capabilities found in XML 1.0
document type definitions (DTDs) for specifying datatypes on elements
and attributes.
RQ-152 (xml1.1)
How should this specification be aligned with XML 1.1? The changes in
character set and name characters, and the question of what determines which
ones to use, must be addressed.
EnglishExtended Backus-Naur Form (formal grammar)diff group junk: a few homeless targets; should probably ALWAYS BE SHOW unless nothing is, or it is emptydiff group fa1: RQ-24 facets proposal, changes made BEFORE
the publication of the first public working draft. APPROVED SOME TELECON 2004-10diff group fa1: RQ-24 facets proposal, changes made AFTER the publication
of the first public working draft. APPROVED SOME TELECON 2004-10diff group cvs1: Constructed Values Appendix (div1)diff group cvs1_pwd: Constructed Values Appendix as a whole (to
avoid nested like-named diffs)diff group num1: Numerical Values Appendix (div2); requires cvs1diff group numap1: in-text productions, etc., first cut; requires funbase, nu1, num1diff group funbase: The functions appendix in its entirety. ALWAYS ACCEPT OR SHOWdiff group nu1: basic numerical functions; requires funbase, num1, cvs1diff group du0: first Ph 2 for duration; requires numap, nu1, num1, funbase. NOT YET MARKED; APPROVED pre-FPWDdiff group du1: second set of revs for duration (compare du2)diff group du2: second set of revs for dayTimeDuration and yearMonthDuration (compare du1)diff group dt1: RQ-13 date/time rewrite, first part Ph 2 (d/t app and gDay); requires funbase, nu1, num1; APPROVED 2004-08-27 FTFdiff group dt2: RQ-13 date/time rewrite, second part Ph 2 (time and others); requires dt1, funbase, nu1, num1diff group dtr: date/time nonnormative description (INCLUDES 2 NORMATIVE TABLES); requires dt1diff group dt3: RQ-13 date/time rewrite, third part Ph 2 (time and others); requires dt1, dt2, funbase, nu1, num1diff group dt2-3: RQ-13 date/time rewrite, third part Ph 2 (time and others); marks an item added indt2 and then delled in dt3 as del. Accept ("post"), except reject ("pre") if dt2 is accept and dt3 is reject, and show ("colour") if dt2 is accept and dt3 is show.diff group dt4: RQ-13 date/time rewrite, fourth part Ph 2 (time and others); requires dt1, dt2, dt3, funbase, nu1, num1diff group pd1: RQ-31 precisionDecimal first cut for approval; co-requires pre, pd2, pd3; requires pdfdiff group pdo: RQ-31 precisionDecimal first cut,
deletion of old decimal; co-requires pre, pd1 ,pd3; requires pdf.
2005-01-20: WG chooses two-primitive approach, rejects this change.
2005-01-26: MSM removes this diff group to reduce cruft in the document.
diff group pd2: RQ-31 precisionDecimal first cut,
addition of new aPDedimal; co-requires pre, pd1, pd2; requires pdf.
2005-01-20: WG chooses two-primitive approach, rejects this change.
2005-01-26: MSM removes this diff group to reduce cruft in the document.
diff group pre: Precision Appendix; co-requires pd1,
requires num1 and cvs1.
Final wording approved (with changes) 2005-02-04.diff group pdf: numerical functions just for
precisionDecimal (RQ-31); requires num1 (??).
Final wording approved (with changes) 2005-02-04.diff group pdf: numerical functions for
precisionDecimal (RQ-31) in two-primitive form.
Final wording approved (with changes) 2005-02-04.diff group pdf: numerical functions for precisionDecimal (RQ-31)
in single-primitive form. Removed 2005-01-26 after WG chose two-primitive form.diff group aat: anyAtomicType (RQ-???); may require fa1 ??
APPROVED with changes FTF 2004-11-10.
Changes decided by WG entered (as aatf), 2005-01-25.
Draft final wording approved (with changes) 2005-02-04.
diff group aat1: anyAtomicType (RQ-???); requires aatdiff group trm1: terminological cleanup begun with tightening meaning of derived (RQ-120); diff group rq31facets: with MSM's proposed changes related to facets of
precision decimal. This takes a single-primitive ('unitarian') view of
precision decimal and legacy decimal (here under the name aPdecimal).
Compatible with both rq31m and rq31u.diff group rq31u: with changes for a one-primitive ('unitarian')
version of precision decimal. Incompatible with:
rq31m, which takes the manichean view,
Assumes: pd1, pd2, pre, pdf, num1, pdo(which deletes old decimal),
pd2 (which inserts new aPDecimal).
The WG chose the Manichean decimal proposal over the Unitarian one,
2005-01-20. Diffs for group rq31u were removed 2005-01-26.
diff group rq31m: with changes for a two-primitive ('manichean')
version of precision decimal. Incompatible with:
rq31u, which takes the unitarian view,
pdo, which deletes old decimal,
pd2, which inserts new aPDecimal.
Assumes: pd1, pre, pdf, num1.
Final wording approved (with changes) 2005-02-04.
diff group fa1-fix: MSM's proposed changes for fixing
problems (missing term definitions, in particular) caused by the fact
that fa1 was incomplete and left the document in an unstable
state.diff group iff: with an editorial proposal (2005-01-01) for
being more consistent about the use of conditionals and
biconditionals. When terms are being defined (whether or not marked
as termdefs) or necessary and sufficient conditions for some state are
being given (e.g. in constraint notes, which define terms like 'facet
valid with respect to X'), this diff group proposes to use 'if' only
for conditions which are sufficient but not necessary; if the
conditions are both sufficient and necessary, then use 'if and only
if'.diff group pdf_tweak: for proposed improvements to diff
group pdf (all gone away now, and then come back again).
Final wording approved (with changes) 2005-02-04. diff group review: for marking stuff that is really intended
only for editorial review (usually to be used on ednotes).diff group wdd: for working-draft deviations: changes
between the publication of the first public WD in July and the
advent of thorough and permanent change markup. (Diff group wdd
begun 9 January 2005, but diff not completed. It was looking like
another three hours work.) I.e. wdd should mark all and only those
differences between TR/2004/WD-xmlschema11-2-20040716/datatypes.xml
and xse/datatypes/datatypes.xml which are not already marked. When
we run the result through the dg.xsl filter with wdd set to reject,
the result should be (modulo whitespace and other non-significant
differences) substantively the same as the public WD.
diff group dpno: change proposals transferred
into this file from the experimental fork datatypes.newOrg.xml.
At the moment, the quasi-systematic changes of ID have not been
reproduced.diff group fpwd-rescinded-add: marks some paragraphs added in the first public working draft but
since deleted again.diff group fpwd-rescinded-del:
marks some paragraphs marked as deleted in the first public working draft but
since restored.diff group aatf: anyAtomicType (RQ-141). Changes decided on
by WG at Redwood Shores ftf 2004-11-10.
Draft final wording approved (with changes) 2005-02-04.diff group aatj: anyAtomicType (RQ-141). Proposal for change,
submitted to WG at Brisbane, January 2005 (hence the 'j').
Final wording approved (with changes) 2005-02-04.diff group aatg: anyAtomicType (RQ-141). Changes to
correct errors found in review of aatf, including changes agreed
by WG in telcon of 2005-02-04 when the RQ-141 proposal was
approved.diff group vrd: make validation rules declarative.
Not yet complete. Stems from rq31m edits: first cut at editing
the upper and lower bounds facets included reformulation of the
validation rules to talk about numeric value. When the order
relation for numeric values and pDecimal values was defined, however,
it became clear that the validation rules didn't need that change,
and the remaining change (making them declarative) didn't really
have anything to do with anyAtomicType.diff group fpwd: used to mark things that changed
between 1.0 2E and the first public working draft of July 2004.
(N.B. issues elements and editorial notes are not consistently
marked as added. They may consistently be unmarked.)diff group rq001: marks a phase-2 proposal to resolve
requirement RQ-001, adopted by the WG on 2 March 2004.diff group rq31fix: marks some wording changes
intended to address problems identified by Dave Peterson,
Sandy Gao, and Noah Mendelsohn after the draft final wording
for RQ-31 went to the WG.Micro-component-related changesLast-minute hacks to make the Working Draft
of February 2005 be valid and produce valid clean HTML.
Introduction
RQ-21 (regex/BNF for all primitive types)
Current plan is that all datatypes defined herein will have EBNF productions at least approximately defining their lexical space,
and will include a nonnormative regex derived from the EBNF if a user wishes to copy it directly.
RQ-24 (systematic facets: canonical representations for all datatypes)
It is not possible for all datatypes to have canonical representations of all values without violating the rules of derivation
or adding special-purpose &cfacet;s which the WG does not deem appropriate. The WG has not yet decided how to deal with
datatypes whose lexical and/or canonical mappings are context sensitive.
RQ-148 (clarify use of "truncation)
The word will probably be removed.
RQ-120 (consistent use of "derived)
"Derivations" other than "derivations by restriction" will be renamed "constructions".
RQ-24 (systematic facets: assignment of datatype to nodes without components)
Introduction to Version 1.1
The Working Group has two main goals for this version of W3C XML Schema:
Significant improvements in simplicity of design and clarity of
exposition without loss of backward or forward compatibility;
Provision of support for versioning of XML languages defined using
the XML Schema specification, including the XML transfer syntax for
schemas itself.
These goals are slightly in tension with one another -- the following
summarizes the Working Group's strategic guidelines for changes
between versions 1.0 and 1.1:
Add support for versioning (acknowledging that this may
be slightly disruptive to the XML transfer syntax at the margins)
Allow bug fixes (unless in specific cases we decide that the fix
is too disruptive for a point release)
Allow editorial changes
Allow design cleanup to change behavior in edge cases
Allow relatively non-disruptive changes to type hierarchy (to
better support current and forthcoming international standards and
W3C recommendations)
Allow design cleanup to change component structure (changes
to functionality restricted to edge cases)
Do not allow any significant changes in functionality
Do not allow any changes to XML transfer syntax except those
required by version control hooks and bug fixes
The overall aim as regards compatibility is that
All schema documents conformant to version 1.0 of this
specification should also conform to version 1.1, and should have
the same validation behaviour across 1.0 and 1.1 implementations
(except possibly in edge cases and in the details of the resulting
PSVI);
The vast majority of schema documents conformant to version 1.1 of
this specification should also conform to version 1.0, leaving
aside any incompatibilities arising from support for versioning,
and when they are conformant to version 1.0 (or are made
conformant by the removal of versioning information), should have
the same validation behaviour across 1.0 and 1.1 implementations
(again except possibly in edge cases and in the details of the
resulting PSVI);
Purpose
The specification defines limited
facilities for applying datatypes to document content in that documents
may contain or refer to DTDs that assign types to elements and attributes.
However, document authors, including authors of traditional
documents and those transporting data in XML,
often require a higher degree of type checking to ensure robustness in
document understanding and data interchange.
The table below offers two typical examples of XML instances
in which datatypes are implicit: the instance on the left
represents a billing invoice, the instance on the
right a memo or perhaps an email message in XML.
Data oriented
Document oriented
1999-01-211999-01-25Ashok Malhotra123 Microsoft Ave.HawthorneNY10532-0000555-1234555-4321
]]>
Paul V. BironAshok MalhotraLatest draft
We need to discuss the latest
draft immediately.
Either email me at
mailto:paul.v.biron@kp.org
or call 555-9876
]]>
The invoice contains several dates and telephone numbers, the postal
abbreviation for a state
(which comes from an enumerated list of sanctioned values), and a ZIP code
(which takes a definable regular form). The memo contains many
of the same types of information: a date, telephone number, email address
and an "importance" value (from an enumerated
list, such as "low", "medium" or "high"). Applications which process
invoices and memos need to raise exceptions if something that was
supposed to be a date or telephone number does not conform to the rules
for valid dates or telephone numbers.
In both cases, validity constraints exist on the content of the
instances that are not expressible in XML DTDs. The limited datatyping
facilities in XML have prevented validating XML processors from supplying
the rigorous type checking required in these situations. The result
has been that individual applications writers have had to implement type
checking in an ad hoc manner. This specification addresses
the need of both document authors and applications writers for a robust,
extensible datatype system for XML which could be incorporated into
XML processors. As discussed below, these datatypes could be used in other
XML-related standards as well.
Requirements
The document spells out
concrete requirements to be fulfilled by this specification,
which state that the XML Schema Language must:
provide for primitive data typing, including byte, date,
integer, sequence, SQL and Java primitive datatypes, etc.;
define a type system that is adequate for import/export
from database systems (e.g., relational, object, OLAP);
distinguish requirements relating to lexical data representation
vs. those governing an underlying information set;
allow creation of user-defined datatypes, such as
datatypes that are derived from existing datatypes and which
may constrain certain of its properties (e.g., range,
precision, length, format).
Scope
This portion of the XML Schema Language discusses datatypes that can be
used in an XML Schema. These datatypes can be specified for element
content that would be specified as
#PCDATA and attribute
values of various
types in a DTD. It is the intention of this specification
that it be usable outside of the context of XML Schemas for a wide range
of other XML-related activities such as and
.
Terminology
The terminology used to describe XML Schema Datatypes is defined in the
body of this specification. The terms defined in the following list are
used in building those definitions and in describing the actions of a
datatype processor:
A feature of this specification included solely to ensure that schemas
which use this feature remain compatible with
Conforming documents and processors are permitted to but need
not behave as described.
(Of strings or names:) Two strings or names being compared must be
identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g.
characters with both precomposed and base+diacritic forms) match only if they have
the same representation in both strings. No case folding is performed. (Of strings and
rules in the grammar:) A string matches a grammatical production
if and only if
it belongs to the
language generated by that production.
Conforming documents and processors are required to behave as
described; otherwise they are in error.
A violation of the rules of this specification; results are undefined.
Conforming software detect and report an
error and recover from it.
Constraints and Contributions
This specification provides three different kinds of normative
statements about schema components, their representations in XML and
their contribution to the schema-validation of information items:
Constraints on the schema components themselves, i.e. conditions
components satisfy to be components at all.
Largely to be found in .
Constraints on the representation of schema components in XML. Some but
not all of these are expressed in and
.
Constraints expressed by schema components which information
items satisfy to be schema-valid. Largely
to be found in .
TypeDatatype System
This section describes the conceptual framework behind the
datatype system
defined in this specification. The framework has been influenced by the
standard on language-independent datatypes as
well as the datatypes for and for programming
languages such as Java.
The datatypes discussed in this specification are computer
representations offor the most part well known abstract concepts such as
integer and date. It is not the place of this
specification to thoroughly define these abstract concepts; many other publications
provide excellent definitions. However, this specification will attempt to
describe the abstract concepts well enough that they can be readily recognized
and distinguished from other abstractions with which they may be confused.
Only those operations and relations needed for schema processing are defined in this
specification. Applications using these datatypes are generally expected to implement
appropriate additional functions and/or relations to make the datatype generally
useful. For example, the description herein of the datatype
does not define addition or multiplication, much less all of the operations defined for
that datatype in on which it is based.
Datatype
In this specification,
a datatype is a 3-tuple, consisting of
a) a set of distinct values, called its ,
b) a set of lexical representations, called its
, and c) a set of s
that characterize properties of the ,
individual values or lexical items.
In this specification,
a datatypeis a thing with fourhas three properties:
A , which is
simply a set of values.
What the members of this set are called
(beyond being generically called values)
is influenced by the set of value-space operations and relations used therewith.
A , which is the domain of the
. Some
lexical mappings are context sensitive,
so that the depends on the context in which the
lexical representation occurs.a set of &string;s used to denote the values.
A small collection of functions, relations, and procedures associated with the datatype. Included
are equality and order relations on the , and a
, which is a function on the
onto the .
A , which serves to define and/or identify the datatype.
This specification only defines the operations and relations needed for schema processing. The
choice of terminology for describing/naming the datatypes is selected to guide users and implementers
in how to expand the datatype to be generally useful—i.e., how to recognize the real world
datatypes and their variants for which the datatypes defined herein are
meant to be used for data interchange.
Along with the it is often useful
to have an inverse which provides a standard for
each value. Such a is not required for schema
processing, but is described herein for the benefit of users of this specification, and other
specifications which might find it useful to reference these descriptions normatively.
Value space
A value
space is the set of values for a given datatype.
Each value in the value space of a datatype is denoted by
one or more literals in its .
The value spaceof
a datatype is the set of values for that datatype. Associated
with each value space are selected operations and
relations necessary to permit proper schema processing. Each value in the value space
of a datatype is denoted by one or more character strings in its
, according
to the lexical mapping. (If
the mapping is restricted during a derivation in such a way
that a value has no denotation, that value is dropped from the value space.)
The value spaces of datatypes are abstractions,
and are defined in
to the extent needed to clarify
them for readers. For example, in defining the numerical
datatypes, we assume some general numerical concepts such as number
and integer are known. In many cases we provide references to
other documents providing more complete definitions.
The value spaces and the values therein are abstractions. This specification does not
prescribe any particular internal representations that must be used when implementing these datatypes.
In some cases, there are references to other specifications which do prescribe specific internal
representations; these specific internal representations must be used to comply with those other
specifications, but need not be used to comply with this specification.
In addition, other applications are expected to define additional appropriate
operations and/or relations on these value spaces (e.g., addition and multiplication
on the various numerical datatypes' value spaces), and are permitted where
appropriate to even redefine the operations and relations defined within this
specification, provided that for schema processing the relations and operations
used are those defined herein.
The of a given datatype can
be defined in one of the following ways:
defined elsewhere axiomatically from fundamental notions
(intensional definition)
[see ]
enumerated outright from values of an already defined
datatype (extensional definition)
[see ]
defined by restricting the of
an already defined datatype to a particular subset with a given set
of properties [see ]
defined as a combination of values from one or more already defined
(s) by a specific construction procedure
[see and ]
s have certain properties. For example,
they always have the property of ,
some definition of equality
and might be , by which individual
values within the can be compared to
one another. The properties of s that
are recognized by this specification are defined in
.
The relations of identity, equality, and order are
required for each value space. A very few datatypes have other relations or operations prescribed for the purposes of this
specification.
Identity
The identity relation is always defined. Every value space inherently has an
identity relation. Two things are
identical
if and only if
they are actually the same thing: i.e., if there is no way whatever to
tell them apart. The identity relation is used when making restrictions by enumeration, and when checking
identity constraints. These are the only uses of identity for schema processing.
This does not preclude implementing datatypes by using more than one
internal representation for a given value, provided no mechanism inherent in
the datatype implementation (i.e., other than bit-string-preserving "casting" of
the datum to a different datatype) will distinguish between the two representations.
In the identity relation defined herein, values
from different datatypes' value
spaces are made artificially distinct if they
might otherwise be considered identical. For example, there is a
number two in the
datatype and a number two in the
datatype. In the identity relation defined herein, these
two values are considered distinct. Other applications
making use of these datatypes may choose to consider values such as these identical, but for the
view of datatypes' value
spaces used herein, they are distinct.
WARNING: Care must be taken when identifying values across distinct primitive
datatypes. It turns out that, for example, 0.1 and 0.10000000009 are effectively identical in
but not in . (Neither 0.1 nor 0.10000000009 are in
the value space, but the lexical mapping
of maps both 0.1 and 0.10000000009 to
the same number (0.100000001490116119384765625) that is in the value space.)
Equality
Each datatype has prescribed an equality relation for its value
space. The equality relation for most datatypes is the identity relation. In the few cases
where it is not, it has been carefully defined so as to be a congruence relation for most
other operations of interest to the datatype. (This means simply that if two values are equal
and one is substituted for the other as an argument to any of the operations, the results will always
also be equal.
For example, identity is by definition a congruence relation for all other operations
of interest.) Equality is always a congruence for the order relation.
On the other hand,
equality need not cover the entire value space of the
datatype (though it usually does).
The equality relation is used in conjunction with
order when making restrictions involving order. This is the only use of
equality for schema processing.
In the prior version of
this specification (1.0), equality was always identity. This has been changed
to permit the datatypes defined herein to more closely match the real
world datatypes for which they are intended to be used as transmission formats.
For example, the datatype has an equality which is not the
identity ( −0 = +0 , but they are not identical—although
they were identical in the 1.0 version of this specification), and whose
domain excludes one value, NaN, so that NaN ≠ NaN .
For another example, the datatype previously lost any timezone
information in the as the value was
converted to timezone
Z;
now the timezone is retained and two values representing the
same moment in time but with different remembered timezones are now
equal but not identical.
In the equality relation defined herein, values
from different primitive data spaces are made artificially unequal even if they might
otherwise be considered equal. For example, there is a number two in
the
datatype and a number two in the datatype. In the equality
relation defined herein, these two values are considered unequal. Other
applications making use of these datatypes
may choose to consider values such as these equal (and must do so if they choose to consider
them identical); nonetheless, in the equality relation defined herein, they are unequal.
For the purposes of this specification, there is one equality relation for all values
of all datatypes (the union of the various datatype's individual equalities, if one
consider relations to be sets of ordered pairs). The equality relation is denoted
by = and its negation by ≠, each used as an binary
infix predicate: x = y
and x ≠ y . On
the other hand, identity relationships are always described in words.
Order
Each datatype has an order relation prescribed. This order may be a partial
order, which means that there may be values in the
which are neither equal, less-than, nor greater-than. Such value pairs are
incomparable. In many cases, the prescribed order is the null
order: the ultimate partial order, in which no pairs are less-than or
greater-than; they are all equal or .
Two
values that are neither equal, less-than, nor greater-than are
incomparable.
Two
values that are not are
comparable.
The order relation is used in
conjunction with equality when making restrictions involving order. This is the
only use of order for schema processing.
In this specification, this less-than order relation is denoted by
< (and its inverse by >), the weak order by ≤
(and its inverse by ≥), and the resulting
relation by <>, each used as an binary infix predicate:
x < y , x ≤ y ,
x > y , x ≥ y ,
and x &inc; y .
The weak order less-than-or-equal means less-than or
equaland one can tell which. For example, the P1M
(one month) is not less-than-or-equal P31D (thirty-one
days) because P1M is not less than P31D, nor is P1M equal to P31D. Instead,
P1M is with P31D.) The formal definition of order for
() insures that this is true.
The value spaces of primitive datatypes are abstractions, which may have values in common. In
the order relation defined herein, these value spaces are made artificially . For example,
the numbers two and three are values in both the decimal&pD; datatype and the float datatype. In the
order relation defined herein, two in the decimal datatype and three in the float datatype are
incomparable values. Other applications making use of these datatypes may choose to consider
values such as these comparable.
While it is not an error to attempt to compare values from the
value spaces of two different primitive datatypes, they will alway be and therefore
unequal: If x and y are in the value spaces of different primitive
datatypes then x &inc; y (and
hence x ≠ y ).
Lexical space
In addition to its , each datatype also
has a lexical space.
A
lexical space is the set of valid literals
for a datatype.
For example, "100" and "1.0E2" are two different literals from the
of which both
denote the same value. The type system defined in this specification
provides a mechanism for schema designers to control the set of values
and the corresponding set of acceptable literals of those values for
a datatype.
The literals in the s defined in this specification
have the following characteristics:
The number of literals for each value has been kept small; for many
datatypes there is a one-to-one mapping between literals and values.
This makes it easy to exchange the values between different systems.
In many cases, conversion from locale-dependent representations will
be required on both the originator and the recipient side, both for
computer processing and for interaction with humans.
Textual, rather than binary, literals are used.
This makes hand editing, debugging, and similar activities possible.
Where possible, literals correspond to those found in common
programming languages and libraries.
Canonical Lexical Representation
While the datatypes defined in this specification have, for the most part,
a single lexical representation i.e. each value in the datatype's
is denoted by a single literal in its
, this is not always the case. The
example in the previous section showed two literals for the datatype
which denote the same value. Similarly, there
be
several literals for one of the date or time datatypes that denote the
same value using different timezone indicators.
A
canonical lexical representation
is a set of literals from among the valid set of literals
for a datatype such that there is a one-to-one mapping between literals
in the canonical lexical representation and
values in the .
The Lexical Space and Lexical Mapping
The
lexical mapping for a datatype is a prescribed function whose domain is a prescribed set of character
strings (the ) and whose range is the
of that datatype.
The
lexical space of a datatype is the prescribed domain of
the lexical mapping
for that datatype.
The
members of the are lexical
representations of the values to which they are mapped.
Should a derivation be made using a derivation mechanism that
removes lexical representations from
the to the extent that one or more values cease
to have any , then those values are
dropped from the .
This could happen by means of a facet.
Conversely, should a derivation remove values then their
lexical representations are dropped
from the unless there is a facet value whose
impact is defined to cause the otherwise-dropped
to be mapped to another value instead.
There are currently no facets with such an impact. There may be
in the future.
For example, '100' and '1.0E2' are two different
lexical
representations from the datatype
which both denote the same value. The datatype
system defined in this specification provides mechanisms for schema designers
to control the and the corresponding set of acceptable
lexical
representations of those values for a datatype.
Canonical Mapping
RQ-129 (remove dependency on canonical representations)
The dependencies are in Part 1; they will be resolved there. Text in this Part will reflect that canonical representation
are provided for the benefit of other users, including other specifications that might want to reference these datatypes.
Given the "pattern" &cfacet;, restricting away canonical representations cannot be prohibited without undue processing
expense. A warning will be inserted, and RQ-129 will insure that loss of canonical representations will not affect schema processing.
While the datatypes defined in this specification generally have
a single for each value (i.e., each value in the datatype's
is denoted by a single
representation in its
), this is not always the case. The
example in the previous section shows two lexical
representations from the
datatype which denote the same value.
The
canonical mapping is a prescribed subset of the inverse of a
which is
one-to-one and whose domain (where possible) is the entire range of the
(the
). Thus a
selects one
for each
value in the .
The
canonical representation of a value in the
of a datatype is the
associated with that value
by the datatype's .
Canonical mappings are not
available for datatypes whose lexical
mappings are context dependent (i.e., mappings for which the value
of a
depends on the context in which it occurs, or for which a character string
may or may not be a valid
similarly depending on its context)
Canonical
representations are provided where feasible for the use of other appilications; they are not
required for schema processing itself. A conforming schema processor implementation is
not required to implement canonical mappings.
Facets
RQ-24 (systematic approach to facets)
This decision is not yet written up herein: The four informational facets, each of which have only one property,
will be lumped into one facet having four properties. This will represent a further technical change to the
facet structure, but will not result in any additional or lost information in a schema.
A facet is a single
defining aspect of a . Generally
speaking, each facet characterizes a
along independent axes or dimensions.
The facets of a datatype serve to distinguish those aspects of
one datatype which differ from other datatypes.
Rather than being defined solely in terms of a prose description
the datatypes in this specification are defined in terms of
the synthesis of facet values which together determine the
and properties of the datatype.
Facets are of two types: fundamental facets that define
the datatype and non-fundamental or constraining
facets that constrain the permitted values of a datatype.
Facets are designated and named
values that either provide information about an aspect of the datatype (information
facets) or control some aspect of the datatype
(&cfacet;s). For example, each datatype has a
facet whose
value generally tells something about the finiteness of the datatype, and each datatype has
a facet whose value controls the "normalization" of the
raw data-character string in the XML document undergoes prior to being treated as a potential
member of the .
Facets are of two kinds:
information facets provide the
application with some information about the datatype, and
&cfacet; values may be set or changed
during derivation (subject to facet-specific controls)
and which control various aspects of the derived datatype. For example,
is an information facet and is a &cfacet;. The various information
facets are described in and &cfacet;s in
.
In the 1.0 version of this specification, information facets were called
"fundamental facets". Information facets are not required for schema processing,
but some applications use them.
Fundamental facets
A fundamental facet is an abstract property which
serves to semantically characterize the values in a
.
All fundamental facets are fully described in
.
Constraining or Non-fundamental facets
A
constraining facet is an optional property that can be
applied to a datatype to constrain its .
Constraining the consequently constrains
the . Adding
s to a
is described in .
All constraining facets are fully described in
.
Datatype dichotomies
It is useful to categorize the datatypes defined in this specification
along various dimensions, forming a set of characterization dichotomies.
Atomic vs. list vs. union datatypes
The first distinction to be made is that between
, and
datatypes.
Atomic datatypes
are those having values which are regarded by this specification as
being indivisible.Atomic
datatypes are
and all
datatypes derived from it.
List
datatypes are those having values each of which consists of a
finite-length (possibly empty) sequence of values of an
datatype.
List datatypes are those which are explicitly constructed as lists, or are derived from another list datatype.
Union
datatypes are those whose s and
s are the union of
the s and
s of one or more other datatypes.Union datatypes are those which are explicitly constructed as lists, or are derived from another union datatype.
For example, a single token which matchesNmtoken from
could be the value of an
datatype (); while a sequence of such tokens
could be the value of a datatype
().
Atomic datatypes
datatypes can be either
or . The
of an datatype
is a set of "atomic" values, which for the purposes of this specification,
are not further decomposable.
An datatype
has a consisting of a set of
atomic values which for purposes of this specification
are not further decomposable.
The of
an datatype is a set of literals
whose internal structure is specific to the datatype in question.
There is one special
atomic type () and a number of
atomic types, which have
as their base type.
All other atomic types are derived by restriction either from
one of the primitive atomic types or from another ordinary atomic
type. No user-defined type may have
as its base type.
List datatypes
Several type systems (such as the one described in
) treat datatypes as
special cases of the more general notions of aggregate or collection
datatypes.
datatypes are always .
The of a
datatype is a set of finite-length sequences of
values. The of a
datatype is a set of literals whose internal
structure is a space-separated
sequence of literals of the
datatype of the items in the
.
The or
datatype that participates in the definition of a datatype
is known as the itemType of that datatype.
]]> 8 10.5 12
]]>
A datatype can be
from an ordinary
datatype whose allows space
(such as
or ) or a
datatype any of whose
's
allows space.
In such a case, regardless of the input, list items
will be separated at space boundaries.
]]>
<someElement xsi:type='listOfString'>
this is not list item 1
this is not list item 2
this is not list item 3
</someElement>
In the above example, the value of the someElement element
is not a of 3;
rather, it is a of
18.
When a datatype is from a
datatype, the following
s apply:
For each of ,
and , the unit of length is
measured in number of list items. The value of
is fixed to the value collapse.
For datatypes the
is composed of space-separated
literals of its . Hence, any
specified when a new datatype is
from a datatype is matched against
each literal of the datatype and
not against the literals of the datatype that serves as its
.
123 456123 987 456123 987 567 456
]]>
The for the
datatype is defined as the lexical form in which
each item in the has the canonical lexical
representation of its .
Union datatypes
The and
of a datatype are the union of the
s and s of
its .
datatypes are always .
Currently, there are no
datatypes.
A prototypical example of a type is the
maxOccurs attribute on the
element element
in XML Schema itself: it is a union of nonNegativeInteger
and an enumeration with the single member, the string "unbounded", as shown below.
use="optional"
use="optional" default="1"
]]>
Any number (greater than 1) of ordinary
or s can participate in a type.
The datatypes that participate in the
definition of a datatype are known as the
memberTypes of that datatype.
The order in which the are specified in the
definition (that is, the order of the <simpleType> children of the <union>
element, or the order of the s in the memberTypes
attribute) is significant.
During validation, an element or attribute's value is validated against the
in the order in which they appear in the
definition until a match is found. The evaluation order can be overridden
with the use of xsi:type.
For example, given the definition below, the first instance of the <size> element
validates correctly as an , the second and third as
.
]]>1
large1
]]>
The for a
datatype is defined as the lexical form in which
the values have the canonical lexical representation
of the appropriate .
A datatype which is in this specification
need not be an atomic datatype in any programming language used to
implement this specification. Likewise, a datatype which is a
in this specification need not be a "list"
datatype in any programming language used to implement this specification.
Furthermore, a datatype which is a in this
specification need not be a "union" datatype in any programming
language used to implement this specification.
Primitive vs. derived datatypesConstructed Datatypes
Next, we distinguish between ,
, and
datatypes.
Primitive
datatypes are those that are not defined in terms of other datatypes;
they exist ab initio.
Derived
datatypes are those that are defined in terms of other datatypes.
Constructed
datatypes are those that are defined in terms of other datatypes.
For example, in this specification, is a well-defined
mathematical
concept that cannot be defined in terms of other datatypes, while
a is a special case of the more general datatype
.
RQ-141 (add abstract
anyAtomicType)RQ-24 (systematic facets: status and value space of
anySimpleType)
A new special datatype will be introduced as a child
of anySimpleType and the base type of all primitive atomic datatypes.
The simple ur-type definitiondefinition of
is a special restriction of
the ur-type definition
whose name is anySimpleType in the XML Schema namespace.
anySimpleType can be
considered as the of all
datatypes.anySimpleType is considered to have an unconstrained lexical space and a
consisting of the union of the
s of all the
datatypes and the set of all lists of all members of the
s of all the
datatypes.
The datatypes defined by this specification fall into both
the and
categories. It is felt that a judiciously chosen set of
datatypes will serve the widest
possible audience by providing a set of convenient datatypes that
can be used as is, as well as providing a rich enough base from
which the variety of datatypes needed by schema designers can be
.
In the example above, is
from .
A datatype which is in this specification
need not be a "primitive" datatype in any programming language used to
implement this specification. Likewise, a datatype which is
in this specification need not be a
"derived" datatype in any programming language used to implement
this specification.
As described in more detail in ,
each datatype
be defined in terms of another datatype in one of three ways: 1) by assigning
s which serve to restrict the
of the
datatype to a subset of that of the ; 2) by creating
a datatype whose
consists of finite-length sequences of values of its
; or 3) by creating a
datatype whose consists of the union of the
s of its .
Derived by restriction
A datatype is said to be
by restriction from another datatype
when values for zero or more s are specified
that serve to constrain its and/or its
to a subset of those of its
.
Every
datatype that is by
is defined in terms of an existing datatype, referred to as its
base type. base types can be either
or .
Derived by list
A datatype can be
from another datatype (its ) by creating
a that consists of a finite-length sequence
of values of its .
Derived by union
One datatype can be from one or more
datatypes by ing their s
and, consequently, their s.
Built-in vs. user-derived datatypes
Built-in
datatypes are those which are defined in this specification,
and can be either or
;
User-derived datatypes are those
datatypes that are defined by individual schema designers.
Conceptually there is no difference between the
datatypes
included in this specification and the
datatypes which will be created by individual schema designers.
The datatypes
are those which are believed to be so common that if they were not
defined in this specification many schema designers would end up
"reinventing" them. Furthermore, including these
datatypes in this specification serves to
demonstrate the mechanics and utility of the datatype generation
facilities of this specification.
A datatype which is in this specification
need not be a "built-in" datatype in any programming language used
to implement this specification. Likewise, a datatype which is
in this specification need not
be a "user-derived" datatype in any programming language used to
implement this specification.
Datatypes and Schemas
Datatypes as defined above exist, in the abstract, independently of
whether they have any relation to schemas as defined in this
specification. Datatypes are tied to schemas either by explicit
description in this specification, or by user mechanisms prescribed in
this specdification for use in user-created schemas.
The user-usable mechanism prescribed by this specification is the
ability to add additional s to schemas.
s and their use within schemas are described in
. A selects a
particular datatype and gives it a name and a place in the
schema's datatype hierarchy, which
is a structuring of all the datatypes associated with a schema.
The Datatype Derivation Hierarchy
Datatypes associated with a schema are organized in a hierarchy
that exactly parallels the datatypes' defining (or selecting)
s in the schema's corresponding schema type hierarchy, as described in . A datatype is immediately
derived from another
if and only if
it is immediately below the other
(i.e., away from the root) in the derivation
hierarchy.A datatype is the base type of another
if and only if
the other
is immediately derived from it.A datatype is derived from
another
if and only if
there is a chain of datatypes beginning with it
and ending with the other. It is often easiest to
determine a datatype's location in the hierarchy by examining the
corresponding in the schema
type hierarchy.
At the root of the hierarchy are two
special datatypes, and . is the real
root; is from .
All other (ordinary)
datatypes are from these two
special datatypes. The most important class of datatypes
from these two are
the primitive datatypes, all of which are described in . Starting with the
primitive datatypes, all other schema-usable datatypes are either
facet-derived, constructed as lists, or constructed as unions.
Atomic, List, and Union Datatypes
Ordinary datatypes may be characterized as atomic,
list, or union datatypes.
An atomic
datatype is one which is from
. Since only (and all)
primitive datatypes are from , all other atomic datatypes are from primitives.
A list
datatype is one that is constructed to have lists of values from some
other datatype, or any datatype subsequently from a list datatype
The other
datatype from which a list datatype is constructed is the list
datatype's item type. Datatypes that
are constructed as lists are
from
, so all list datatypes are from .
A union
datatype is one that is constructed to have the of values from some
other datatypes, or any datatype subsequently from a union datatype
The
other datatypes from which a union datatype is constructed are the
union datatype's member types.
Datatypes that are constructed as
unions are from , so all union datatypes are from .
All datatypes in the datatype
hierarchy of a schema that are not from or are facet-derived from their base types. The mechanisms of
construction and facet-derivation are
described in .
Placing a Datatype in the Hierarchy
Special and primitive datatypes are placed in the hierarchy by explicit rules in this specification. As mentioned above, is at the root of the hierarchy, is from , and all primitive datatypes—and only primitive datatypes—are from .
A constructed datatype ( or ) is always from . A facet-derived datatype is always from its . These are the only ways a datatype not special or primitive can be placed in a schema's datatype hierarchy.
The special, primitive, and other ordinary datatypes described in this specification are present in every schema's datatype hierarchy. Any others depend on the schema.
YYY
Built-in datatypesBuilt-in s and their Datatypes
Each built-in datatype in this specification (both
and
) can be uniquely addressed via a
URI Reference constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype
For example, to address the datatype, the URI is:
http://www.w3.org/2001/XMLSchema#int
Additionally, each facet definition element can be uniquely
addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the facet
For example, to address the maxInclusive facet, the URI is:
http://www.w3.org/2001/XMLSchema#maxInclusive
Additionally, each facet usage in a built-in datatype definition
can be uniquely addressed via a URI constructed as follows:
the base URI is the URI of the XML Schema namespace
the fragment identifier is the name of the datatype, followed
by a period (".") followed by the name of the facet
For example, to address the usage of the maxInclusive facet in
the definition of int, the URI is:
http://www.w3.org/2001/XMLSchema#int.maxInclusive
Namespace considerations
The datatypes defined by this specification
are designed to be used with the &schema-language; as well as other
XML specifications.
To facilitate usage within the &schema-language;, the
datatypes in this specification have the namespace name:
http://www.w3.org/2001/XMLSchema
To facilitate usage in specifications other than the &schema-language;,
such as those that do not want to know anything about aspects of the
&schema-language; other than the datatypes, each
datatype is also defined in the namespace whose URI is:
http://www.w3.org/2001/XMLSchema-datatypes
This applies to both
and
datatypes.
Each datatype is also associated with a
unique namespace. However, datatypes
do not come from the namespace defined by this specification; rather,
they come from the namespace of the schema in which they are defined
(see XML Representation of
Schemas in ).
Special DatatypesSimple Type Definitions
Special datatypes
There are two specials. (All others are ordinary.) The
special s are, unlike the ordinary ones, more
important as s than as datatypes.
anySimpleType
xxx
anyAtomicType
xxx
Primitive s
and Ddatatypes
The datatypes defined by this specification
are described below. For each datatype, the
and
are defined, s which apply
to the datatype are listed and any datatypes
from this datatype are specified.
datatypes can only be added by revisions
to this specification.
string
The string datatype
represents character strings in XML. The
of string is the set of finite-length sequences of
characters (as defined in
) that the
Char production from .
A character is an atomic unit of
communication; it is not further specified except to note that every
character has a corresponding
Universal Character Set code point, which is an integer.
Many human languages have writing systems that require
child elements for control of aspects such as bidirectional formating or
ruby annotation (see and Section 8.2.4
Overriding the
bidirectional algorithm: the BDO element of ).
Thus, string, as a simple type that can contain only
characters but not child elements, is often not suitable for representing text.
In such situations, a complex type that allows mixed content should be considered.
For more information, see Section 5.5
Any Element, Any Attribute
of .
As noted in , the fact that this specification does
not specify an
order relation for
order for
does not preclude other applications from treating
strings as being ordered.
Constraining fFacets
Derived datatypesConstructed and Immediately Deriveds
boolean
boolean has the
required to support the mathematical
concept of binary-valued logic: {true, false}.
Lexical representation
An instance of a datatype that is defined as
can have the following legal literals {true, false, 1, 0}.
Canonical representation
The canonical representation for boolean is the set of
literals {true, false}.
Constraining facets
&Odec;
RQ-150 (minimum number of digits for decimal)
The minimum number of digits implementations are required to support
will be lowered to 16 digits; a health warning will be added to note
that implementations of derived datatypes may support more digits of
precision than the base decimal type does, but that they are not required
to do so.
&odec;
represents a subset of the real numbers, which can be represented by decimal numerals.
The of &odec;
is the set of numbers that can be obtained by
multiplyingdividing
an integer by a non-positivenegative
power of ten, i.e., expressible as
i × 10^-ni / 10n
where i and n are integers
and
n >= 0n ≥ 0.
Precision is not reflected in this value space;
the number 2.0 is not distinct from the number 2.00.
(The datatype may be used
for values in which precision is significant.)
The order relation on &odec;
is the order relation on real numbers, restricted
to this subset.
All processors
support &odec; numbers with a minimum of
1816 decimal digits
(i.e., with a
of 18they must support all values which would be
allowed by a simple type definition which set
to 16). However, processors set
an application-defined limit on the maximum number of decimal digits
they are prepared to support, in which case that application-defined
maximum number be clearly documented.
Lexical representation
&odec; has a lexical representation
consisting of a finite-length sequence of decimal digits (#x30-#x39) separated
by a period as a decimal indicator.
An optional leading sign is allowed.
If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional.
If the fractional part is zero, the period and following zero(es) can
be omitted.
For example: -1.23, 12678967.543233, +100000.00, 210.
The Lexical Representation
&odec;LexicalRep |
The lexical space of &odec; is the set of
lexical representations which match the grammar given above, or
(equivalently) the regular expression
-?(([0-9]+(.[0-9]*)?)|(.[0-9]+)).
The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in:
Canonical representation
The canonical representation for &odec; is defined by
prohibiting certain options from the
. Specifically, the preceding
optional "+" sign is prohibited. The decimal point is required. Leading and
trailing zeroes are prohibited subject to the following: there must be at least
one digit to the right and to the left of the decimal point which may be a zero.
The mapping from values to canonical representations
is given formally in:
Constraining facets
Derived datatypesDatatypes based on &odec;&pD;
The &pD;
datatype represents decimal numbers, together
with their (arithmetic) precisionthe
numeric value and (arithmetic) precision of decimal numbers which retain
precision; it also
includes special values for positive and negative infinity and
not a number, and it differentiates
between positive zero and negative
zero. The special values are
introduced to make the datatype correspond closely to
decimal datatypes
whose definition is planned for the
next revision of IEEE/ANSI 754the
floating-point decimal datatypes described by the forthcoming
revision of IEEE/ANSI 754.
Precision is sometimes given in absolute, sometimes in relative
terms. The arithmetic precision of a value is
expressed in absolute quantitative terms,
by indicating
how many digits to the right of the decimal point are significant.5 has an arithmetic precision of 0, and
5.01 an arithmetic precision of 2.
Value Space
Properties of
Values
numericalValuea &decimal;, positiveInfinity,
negativeInfinity or notANumberarithmeticPrecisionan &integer; or absent;
absent if and only if is a .signpositive, negative, or absent;
must be positive if
is positive or positiveInfinity, must be negative
if is negative or negativeInfinity,
must be absent if and only if is notANumber
The property is redundant except when
is zero; in other cases, the value is fully determined by the
value.
Code optimization may well make it desirable to separate out the
and the absolute value of the
, which will make implementation easier,
but the verbal descriptions of such things as equality
and order somewhat more complicated.
As explained below, the lexical
representation of the value object whose
is notANumber is NaN. Accordingly, in English text we
use NaN to refer to that value. Similarly we use INF
and −INF to refer to the two value objects whose
is positiveInfinity and negativeInfinity. These three value objects
are also informally called not-a-number, positive infinity,
and negative infinity.
The latter two together are called
the infinities.
Equality and order for are defined as follows:
Two numerical values
are ordered (or equal) as their
values are ordered (or equal).
(This means
thethat
two zeros with a given
but
different s
are equal;
negative zeros are not ordered less than positive zeros.)
A numerical value n
is less than, equal to, or greater than
and a value v other than INF, -INF, or NaN
as n is less than, equal to, or greater than
the of v.
(This comparison is necessary when comparing
values to upper and lower bounds.)
INF is equal only to itself, and is greater than
−INF and all numerical values.
−INF is equal only to itself, and is less than
INF and all numerical values.
NaN is incomparable with all values, including
itself.
Lexical Mapping
's lexical space is the set of all
decimal numerals with or without a decimal
point, numerals in scientific (exponential) notation, and
the character strings INF,
+INF, -INF,
and NaN.
The
facet can remove any one or two of the three subsets of
numerals, with corresponding reductions in
the value space. Using this facet
rather than will change the canonical
mapping to insure that the resulting datatype will still have canonical
representations of all its values.
Lexical Space
precisionDecimalRep | |
|
The lexical mapping and canonical mapping
for are the following functions:
Simple Type Definition for &pD;
The of is present in every
schema. It has the following properties:
of
&pD;http://www.w3.org/2001/XMLSchemaThe
The empty setatomic{
a facet with
= collapse and
= true
a facet with the value
{nodecimal, decimal, scientific}
The description of canonical representations for float and double needs to be cleaned up.
RQ-140 (positive and negative zero in float and double)
Two zeros will be provided similar to those in precisionDecimal
float
is patterned after the IEEE single-precision 32-bit floating point type
. The basic of
float consists of the values
m × 2^e, where m
is an integer whose absolute value is less than
2^24, and e is an integer
between -149 and 104, inclusive. In addition to the basic
described above, the
of float also contains the
following
three
special values:
positive and negative infinity and not-a-number
(NaN).
The order relation on float
is: x < y iff y - x is positive
for x and y in the value space.
Positive infinity is greater than all other non-NaN values.
NaN equals itself but is incomparable with (neither greater than nor less than)
any other value in the .
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the are equal and vice versa).
Identity must be used for the few operations that are defined in this Recommendation.
Applications using any of the datatypes defined in this Recommendation may use different
definitions of equality for computational purposes; -based computation systems
are examples. Nothing in this Recommendation should be construed as requiring that
such applications use identity as their equality relationship when computing.
Any value incomparable with the value used for the four bounding facets
(, ,
, and ) will be
excluded from the resulting restricted . In particular,
when "NaN" is used as a facet value for a bounding facet, since no other
float values are
comparable with it,
the result is a
either having NaN as its only member (the inclusive cases) or that is empty
(the exclusive cases). If any other value is used for a bounding facet,
NaN will be excluded from the resulting restricted ;
to add NaN back in requires union with the NaN-only space.
This datatype differs from that of in that there is only one
NaN and only one zero. This makes the equality and ordering of values in the data
space differ from that of only in that for schema purposes NaN = NaN.
A literal in the representing a
decimal number d maps to the normalized value
in the of float that is
closest to d in the sense defined by
; if d is
exactly halfway between two such values then the even value is chosen.
Lexical representation
float values have a lexical representation
consisting of a mantissa followed, optionally, by the character
E or e,
followed by an exponent. The exponent
be an . The mantissa must be a
number. The representations
for exponent and mantissa must follow the lexical rules for
and . If the
E or e and
the following exponent are omitted, an exponent value of 0 is assumed.
The special values
positive
and negative infinity and not-a-number have lexical representations
INF, -INF and
NaN, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, -1E4, 1267.43233E12, 12.78e-2, 12, -0, 0
and INF are all legal literals for float.
Canonical representation
The canonical representation for float is defined by
prohibiting certain options from the
. Specifically, the exponent
must be indicated by "E". Leading zeroes and the preceding optional "+" sign
are prohibited in the exponent.
If the exponent is zero, it must be indicated by "E0".
For the mantissa, the preceding optional "+" sign is prohibited
and the decimal point is required.
Leading and trailing zeroes are prohibited subject to the following:
number representations must
be normalized such that there is a single digit
which is non-zero
to the left of the decimal point and at least a single digit to the
right of the decimal point
unless the value being represented is zero. The canonical
representation for zero is 0.0E0.
NaN has the canonical form NaN. Infinity and
negative infinity have the canonical forms INF and
-INF respectively. Besides these special
values, the general form of the canonical form for float
is a mantissa, which is a decimal, followed by E
followed by an exponent which is an integer. Leading zeroes and
the preceding optional + sign are prohibited in the
exponent. If the exponent is zero it must be indicated by
E0. For the mantissa, the preceding optional
+ sign is prohibited and the decimal point is
required. Leading and trailing zeroes are prohibited subject to
the following: number representations must be normalized such that
there is a single digit which is non-zero to the left of the decimal
point and at least a single digit to the right of the decimal point
unless the value being represented is zero. The canonical form of
positive zero is 0.0E0. The canonical form for negative zero
is -0.0E0. Beyond the one required digit after the decimal point
in the mantissa, there must be as many, but only as many, additional
digits as are needed to uniquely distinguish the value from all other
values for the datatype after rounding.
Constraining facets
double
The double
datatype
is patterned after the
IEEE double-precision 64-bit floating point
type . The basic
of double consists of the values
m × 2^e, where m
is an integer whose absolute value is less than
2^53, and e is an
integer between -1075 and 970, inclusive. In addition to the basic
described above, the
of double also contains
the following
three
special values:
positive and negative infinity and not-a-number
(NaN).
The order relation on double
is: x < y iff y - x is positive
for x and y in the value space.
Positive infinity is greater than all other non-NaN values.
NaN equals itself but is incomparable with (neither greater than nor less than)
any other value in the .
"Equality" in this Recommendation is defined to be "identity" (i.e., values that
are identical in the are equal and vice versa).
Identity must be used for the few operations that are defined in this Recommendation.
Applications using any of the datatypes defined in this Recommendation may use different
definitions of equality for computational purposes; -based computation systems
are examples. Nothing in this Recommendation should be construed as requiring that
such applications use identity as their equality relationship when computing.
Any value incomparable with the value used for the four bounding facets
(, ,
, and ) will be
excluded from the resulting restricted . In particular,
when "NaN" is used as a facet value for a bounding facet, since no other
double values are
comparable with it,
the result is a
either having NaN as its only member (the inclusive cases) or that is empty
(the exclusive cases). If any other value is used for a bounding facet,
NaN will be excluded from the resulting restricted ;
to add NaN back in requires union with the NaN-only space.
This datatype differs from that of in that there is only one
NaN and only one zero. This makes the equality and ordering of values in the data
space differ from that of only in that for schema purposes NaN = NaN.
A literal in the representing a
decimal number d maps to the normalized value
in the of double that is
closest to d; if d is
exactly halfway between two such values then the even value is chosen.
This is the best approximation of d
(, ), which is more
accurate than the mapping required by .
Lexical representation
double values have a lexical representation
consisting of a mantissa followed, optionally, by the character "E" or
"e", followed by an exponent. The exponent be
an integer. The mantissa must be
a number. The representations
for exponent and mantissa must follow the lexical rules for
and
. If the E or e and
the following exponent are omitted, an exponent value of 0 is assumed.
The special values
positive
and negative infinity and not-a-number have lexical representations
INF, -INF and
NaN, respectively.
Lexical representations for zero may take a positive or negative sign.
For example, -1E4, 1267.43233E12, 12.78e-2, 12, -0, 0
and INF
are all legal literals for double.
Canonical representation
The canonical representation for double is defined by
prohibiting certain options from the
. Specifically, the exponent
must be indicated by "E". Leading zeroes and the preceding optional "+" sign
are prohibited in the exponent.
If the exponent is zero, it must be indicated by "E0".
For the mantissa, the preceding optional "+" sign is prohibited
and the decimal point is required.
Leading and trailing zeroes are prohibited subject to the following:
number representations must
be normalized such that there is a single digit
which is non-zero
to the left of the decimal point and at least a single digit to the
right of the decimal point
unless the value being represented is zero. The canonical
representation for zero is 0.0E0.
NaN has the canonical form NaN. Infinity and
negative infinity have the canonical forms INF and
-INF respectively. Besides these special
values, the general form of the canonical form for double
is a mantissa, which is a decimal, followed by E
followed by an exponent which is an integer. Leading zeroes and
the preceding optional + sign are prohibited in the
exponent. If the exponent is zero it must be indicated by
E0. For the mantissa, the preceding optional
+ sign is prohibited and the decimal point is
required. Leading and trailing zeroes are prohibited subject to
the following: number representations must be normalized such that
there is a single digit which is non-zero to the left of the decimal
point and at least a single digit to the right of the decimal point
unless the value being represented is zero. The canonical form of
positive zero is 0.0E0. The canonical form for negative zero
is -0.0E0. Beyond the one required digit after the decimal point
in the mantissa, there must be as many, but only as many, additional
digits as are needed to uniquely distinguish the value from all other
values for the datatype after rounding.
Constraining facets
duration
duration represents a duration of time. The of duration is a six-dimensional
space where the coordinates designate the Gregorian year, month, day,
hour, minute, and second components defined in § 5.5.3.2 of
, respectively. These components are ordered
in their significance by their order of appearance i.e. as year,
month, day, hour, minute, and second.
All processors support year values with a minimum of 4 digits (i.e.,
YYYY) and a minimum fractional second precision of
milliseconds or three decimal digits (i.e. s.sss).
However, processors set an application-defined limit on the maximum number
of digits they are prepared to support in these two cases, in which
case that application-defined maximum number
be clearly documented.
duration is a datatype that represents
durations of time. The concept of duration being captured is
drawn from those of , specifically
durations without fixed endpoints. For example,
15 days (whose most common lexical representation
in is P15D) is
a value; 15 days beginning 12 July
1995 and 15 days ending 12 July 1995 are
not. can provide addition and
subtraction operations between values and
between / value pairs,
and can be the result of subtracting
values. However, only addition to and subtraction from isare required for XML Schema processing
and isare defined in .
The
The of is present in every
schema. It has the following properties:
of
durationhttp://www.w3.org/2001/XMLSchemaThe The empty setatomicabsent{a facet}TBDTBDabsentabsentThe empty sequence
The facet in the
's
has the following properties:
Facet of the
collapsefalseThe empty sequence
Value Space
Durations can be modeled in at least two ways: as six-property tuples (similar to
the seven-property model used for other date/time datatypes) or as two-property tuples
(somewhat similar to the alternative one-property timeOnTimeline model especially useful for
order). For
durations, it is useful to use the latter: values are two-property
tuples. (Note, however, that the
six-property model was implicitly used in Schema 1.0. The only effective difference to the user caused
by this change is in the canonical representations.) See
for more information on the seven-property model.
Properties of Values
monthsecondMust not be negative if is positive, and
not be positive if is negative. is partially ordered. Equality and order are defined in terms of that of
, and are determined by adding each value pair
in turn to the following four values:
1696-09-01T00:00:00Z
1697-02-01T00:00:00Z
1903-03-01T00:00:00Z
1903-07-01T00:00:00Z
If all four resulting value pairs are
ordered the same way (less than, equal, or greater than), then the
original pair of values is ordered the same way; otherwise the original pair
is incomparable.
These four values are chosen so as to maximize
the possible differences in results that could occur, such as the difference when adding
P1M and P30D: 1697-02-01T00:00:00Z + P1M < 1697-02-01T00:00:00Z + P30D ,
but 1903-03-01T00:00:00Z + P1M > 1903-03-01T00:00:00Z + P30D , so
that P1M <> P30D . If two values are ordered the same way
when added to each of these four values, they will retain the same order when added
to any other values, unless one is within a leap-second and either the other
is also or is the beginning moment of the next second—in which case, the two results will
be equal even though the original
values were not. Therefore, two values are
incomparable if and only if they can ever result in different orders when added to any value not within a leap-second.
This minor anomaly is the result of having unaware of leap-seconds while the
other date/time primitive datatypes are leap-second aware.
It turns out that under the definition just given, two
values are equal if and only if they are identical.
Two totally ordered datatypes ( and
) are derived from in .
There are many ways to implement ,
some of which do not base the implementation on the two-component
model. This specification does not prescribe any particular
implementation, as long as the visible results are isomorphic to those
described herein.
Lexical representation
The lexical representation for duration is the
extended format PnYn
MnDTnH nMnS, where
nY represents the number of years, nM the
number of months, nD the number of days, 'T' is the
date/time separator, nH the number of hours,
nM the number of minutes and nS the
number of seconds. The number of seconds can include decimal digits
to arbitrary precision.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary
unsigned integer, i.e., an integer that
conforms to the pattern [0-9]+..
Similarly, the value of the Seconds component
allows an arbitrary unsigned decimal.
Following , at least one digit must
follow the decimal point if it appears. That is, the value of the Seconds component
must conform to the pattern [0-9]+(\.[0-9]+)?.
Thus, the lexical representation of
duration does not follow the alternative
format of § 5.5.3.2.1 of .
An optional preceding minus sign ('-') is
allowed, to indicate a negative duration. If the sign is omitted a
positive duration is indicated. See also .
For example, to indicate a duration of 1 year, 2 months, 3 days, 10
hours, and 30 minutes, one would write: P1Y2M3DT10H30M.
One could also indicate a duration of minus 120 days as:
-P120D.
Reduced precision and truncated representations of this format are allowed
provided they conform to the following:
If the number of years, months, days, hours, minutes, or seconds in any
expression equals zero, the number and its corresponding designator
be omitted. However, at least one number and its designator
be present.
The seconds part have a decimal fraction.
The designator 'T' mustshall
be absent if and only if all of the time items are absent.
The designator 'P' must always be present.
For example, P1347Y, P1347M and P1Y2MT2H are all allowed;
P0Y1347M and P0Y1347M0D are allowed. P-1347M is not allowed although
-P1347M is allowed. P1Y2MT is not allowed.
Lexical Space
The lexical representations of are
more or less based on the pattern:
PnYnMnDTnHnMnS
More precisely, the of is the set of character
strings that satisfy as defined by the following productions:
Lexical Representation Fragments
duYearFragYduMonthFragMduDayFragDduHourFragHduMinuteFragMduSecondFrag( | ) SduYearMonthFrag(?) | duTimeFragT ((? ?) |
(?) |
)duDayTimeFrag(?) | Lexical Representation
durationLexicalRep-? P ((?) | )
Thus, a consists of one or more of a ,
, , ,
, and/or , in order, with letters
P and T (and perhaps a -) where appropriate.
The production is equivalent to this regular expression
-?P(((([0-9]+Y([0-9]+M)?)|
( ([0-9]+M) ) )(([0-9]+D(T(([0-9]+H([0-9]+M)?([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+M) ([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+(\.[0-9]+)?S) ) ))?)|
( (T(([0-9]+H([0-9]+M)?([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+M) ([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+(\.[0-9]+)?S) ) )) ) )?)|
( (([0-9]+D(T(([0-9]+H([0-9]+M)?([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+M) ([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+(\.[0-9]+)?S) ) ))?)|
( (T(([0-9]+H([0-9]+M)?([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+M) ([0-9]+(\.[0-9]+)?S)?)|
( ([0-9]+(\.[0-9]+)?S) ) )) ) ) ) )
once you delete the
whitespace. Redundant parentheses are shown as
ghosts; some find them helpful in reading the expression.)
The for is called
herein, is defined as follows:.
Canonical mappings are not used during schema processing. They are provided in this specification
for the benefit of other users of these datatype definitions who may find them useful, and for other specifications
which might find it useful to reference them normatively.
The canonical
mapping for
is called
herein, is defined as follows:.
Order relation on duration
In general, the on duration
is a partial order since there is no determinate relationship between certain
durations such as one month (P1M) and 30 days (P30D).
The
of two duration values x and
y is x < y iff s+x < s+y
for each qualified s
in the list below. These values for s cause the greatest deviations in the addition of
dateTimes and durations. Addition of durations to time instants is defined
in .
1696-09-01T00:00:00Z
1697-02-01T00:00:00Z
1903-03-01T00:00:00Z
1903-07-01T00:00:00Z
The following table shows the strongest relationship that can be determined
between example durations. The symbol <> means that the order relation is
indeterminate. Note that because of leap-seconds, a seconds field can vary
from 59 to 60. However, because of the way that addition is defined in
, they are still totally ordered.
Relation
P1Y
> P364D
<> P365D
<> P366D
< P367D
P1M
> P27D
<> P28D
<> P29D
<> P30D
<> P31D
< P32D
P5M
> P149D
<> P150D
<> P151D
<> P152D
<> P153D
< P154D
Implementations are free to optimize the computation of the ordering relationship. For example, the following table can be used to
compare durations of a small number of months against days.
Months
1
2
3
4
5
6
7
8
9
10
11
12
13
...
Days
Minimum
28
59
89
120
150
181
212
242
273
303
334
365
393
...
Maximum
31
62
92
123
153
184
215
245
276
306
337
366
397
...
Facet Comparison for durations
In comparing duration
values with , ,
and facet values
indeterminate comparisons should be considered as "false".
Totally ordered durations
Certain derived datatypes of durations can be guaranteed have a total order. For
this, they must have fields from only one row in the list below and the time zone
must either be required or prohibited.
year, month
day, hour, minute, second
For example, a datatype could be defined to correspond to the
datatype Year-Month interval that required a four digit
year field and a two digit month field but required all other fields to be unspecified. This datatype could be defined as below and would have a total order.
]]>&CFacet;s
dateTime
dateTime values
may be viewed as objects with integer-valued
year, month, day, hour and minute properties,
a decimal-valued second property,
and a boolean timezoned property.
Each such object also has one decimal-valued
method or computed property, timeOnTimeline,
whose value is always a decimal
number; the values are dimensioned in seconds,
the integer 0 is 0001-01-01T00:00:00 and the value
of timeOnTimeline for other dateTime
values is computed using the Gregorian algorithm
as modified for leap-seconds.
The timeOnTimeline values form two related
"timelines", one for timezoned
values and one for non-timezoned values.
Each timeline is a copy of the
of ,
with integers given units of seconds.
represents
instants of time, optionally marked
with a particular timezone. Values representing
the same instant but having
different timezones are equal but not identical.
The of
dateTime is closely related
to the dates and times described in ISO 8601.
For clarity, the text above specifies a
particular origin point for the timeline.
It should be noted, however, that schema processors need not expose the
timeOnTimeline value to schema users, and there is no requirement that a
timeline-based implementation use the particular origin described here in
its internal representation.
Other interpretations of the which lead to the
same results (i.e., are isomorphic) are of course acceptable.
All timezoned times are Coordinated Universal Time
(, sometimes called
"Greenwich Mean Time").
Universal
Coordinated Time
(UTC) is an adaptation of International Atomic Time (TAI)
which closely approximates observed astronomical time by adding
leap-seconds to
selected days.A
leap-second is an additional second added
to the last day of December, June, October, or March,
when such an adjustment is deemed necessary by the
International Earth Rotation and Reference Systems Service
in order to keep within 0.9 seconds
of observed astronomical time. When leap seconds are
introduced, the last minute in the day has more than
sixty seconds.
In theory leap seconds can also be removed from a
day, but this has not yet occurred.
Other timezones indicated in lexical representations
are converted to
during conversion of literals to values.
"Local" or untimezoned times are presumed to be
the time in the timezone of some
unspecified locality as prescribed
by the appropriate legal authority;
currently there are no legally prescribed
timezones which are durations
whose magnitude is greater than 14 hours.
The value of each numeric-valued property
(other than timeOnTimeline) is limited to
the maximum value within the interval
determined by the next-higher property.
For example, the day value can never be 32,
and cannot even be 29 for month 02 and year 2002 (February 2002).
The date and time datatypes described in this recommendation were inspired
by . '0001' is the lexical
representation of the year 1 of the Common Era
(1 CE, sometimes written "AD 1" or "1 AD"). There
is no year 0, and '0000' is not a valid lexical representation.
'-0001' is the lexical representation of the year 1 Before
Common Era (1 BCE, sometimes written "1 BC").
Those using this (1.0) version of this Recommendation to
represent negative years should be aware that the interpretation of lexical
representations beginning with a '-' is likely to change in
subsequent versions.
makes no mention of the year 0; in
the form '0000' was disallowed and this recommendation disallows it as well.
However, , which became
available just as we were completing version
1.0, allows the form '0000', representing the year
1 BCE. A number of external commentators
have also suggested that '0000' be
allowed, as the lexical representation for 1 BCE,
which is the normal usage in
astronomical contexts.
It is the intention of the XML Schema
Working Group to allow '0000' as a lexical representation in the
dateTime, date, gYear, and
gYearMonth datatypes in a subsequent version
of this Recommendation. '0000' will be the lexical representation of 1
BCE (which is a leap year), '-0001' will become the lexical representation of 2
BCE (not 1 BCE as in this (1.0) version), '-0002' of 3 BCE, etc.
See the conformance note in which
applies to this datatype as well.
Value Space
uses the
, with no properties
except
permitted
to be absent. The property remains
.
Day-of-month Values
The value
&must; be
no more than 30 if
is one of 4, 6, 9, or 11;
no more than 28
if is 2 and
is not divisible 4,
or is divisible by 100 but not by 400;
and no more than 29 if
is 2 and
is divisible by 400, or by 4 but not by 100.
Leap-second Values
The value
&must; be
less than 60 if
is absent or if the remaining values
do not correspond to a dateTime at which a leap-second was introduced into
by the responsible authorities;
if the
hour and minute in the specified timezone
allow a real leap-second then the value
&must; be less than or equal to 60 plus the number of
leap-seconds introduced on that date. (At
the time of publication of this specification, no
more than one leap-second has ever been introduced at
a timeand it appears unlikely that
this will ever happen.
No negative leap-seconds have been
introduced, but if any should be introduced in future,
adding that negative number will result
in a value limit of 59 or lower.)
See the conformance note in which
applies to the and
values of this datatype.
Equality and order are as prescribed
in .
values are ordered
by their value.
Since the order of a
value having a
with another value whose
is absent is determined
by imputing timezones of both +14:00
and −14:00 to the untimezoned value, many such
combinations will be
because the two imputed
timezones yield different orders.
Although and other
types related to dates and times have only a partial order, it
is possible for datatypes derived from to have
total orders, if they are restricted (e.g. using the
facet) to the subset of values with, or
the subset of values without, timezones. Similar restrictions
on other date- and time-related types will similarly produce
totally ordered subtypes. Note, however, that
such restrictions do not affect the value shown, for a given
, in the facet.
Order and equality are essentially the same for
in this version of this specification as
they were in version 1.0. However, since values
now distinguish timezones, equal
values with different s
are not identical, and values with extreme
s may no longer be equal
to any value with a smaller .
Lexical representation
The of dateTime consists of
finite-length sequences of characters of the form:
'-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?,
where
'-'? yyyy is a
four-or-more digit optionally negative-signed
numeral that represents the year; if more than four digits, leading zeros
are prohibited, and '0000' is prohibited
(see the Note above ;
also note that a plus sign is not permitted);
the remaining '-'s are separators between
parts of the date portion;
the first mm is a two-digit
numeral that represents the month;
dd is a two-digit numeral
that represents the day;
'T' is a separator indicating that time-of-day follows;
hh is a two-digit numeral
that represents the hour; '24' is permitted if the
minutes and seconds represented are zero,
and the dateTime value so
represented is the first instant of the following day (the hour property of a
dateTime object in the
cannot have
a value greater than 23);
':' is a separator between parts of the time-of-day portion;
the second mm is a
two-digit numeral that represents the minute;
ss is a two-integer-digit numeral that represents the
whole seconds;
'.' s+ (if present) represents the
fractional seconds;
zzzzzz (if present) represents
the timezone (as described below).
For example, 2002-10-10T12:00:00-05:00 (noon on 10 October 2002, Central Daylight
Savings Time as well as Eastern Standard Time
in the U.S.) is 2002-10-10T17:00:00Z,
five hours later than 2002-10-10T12:00:00Z.
For further guidance on arithmetic with dateTimes and durations,
see .
Canonical representation
Except for trailing fractional zero digits in the seconds representation,
'24:00:00' time representations,
and timezone (for timezoned values), the mapping
from literals to values is one-to-one. Where there is more than
one possible representation, the canonical representation is as follows:
The 2-digit numeral representing
the hour must not be '24';
The fractional second string, if present,
must not end in '0';
for timezoned values, the timezone must be represented with
'Z' (All timezoned dateTime values are
.).
Lexical Mappings
The lexical representations for are as follows:
Lexical Space
dateTimeLexicalRep--T ((::) |
) ?
Day-of-month Representations
Within a , a &mustnot;
begin with the digit 3 or be 29
unless the value to
which it would map would satisfy the value constraint on
values
(Constraint: Day-of-month Values) given above.
Leap-second Representations
Within a , a &mustnot;
begin with the digit 6 unless the value to
which it would map, in conjunction with the rest of the values,
would satisfy the value constraint on leap-second values
(Constraint: Leap-second Values) given
above. Should a negative leap-second be declared, the
is further limited to those which would
satisfy the even-tighter value constraint on .
The production
is equivalent to this regular expression
once whitespace is removed, except
that the constraints above are not enforced.
\-?([1-9][0-9][0-9][0-9]+)|(0[0-9][0-9][0-9])\-(0[1-9])|(1[0-2])\-(0[1-9])([12][0-9])|(3[01]) T(([01][0-9])|(2[0-3]):[0-5][0-9]:(([0-65][0-9])|(60))(.[0-9]+)?)|(24:00:00(.[0-9]+)?) ([+\-](0[0-9])|(1[0-4]):[0-5][0-9])?
Note that neither the production
production
nor this regular
expression alone enforce the constraints on given above.
The lexical mapping and canonical mapping
for are the following functions:
Timezones
Timezones are durations with (integer-valued) hour and minute properties
(with the hour magnitude limited to at most 14, and the minute magnitude
limited to at most 59, except that if the hour magnitude is 14, the minute
value must be 0); they may be both positive or both negative.
The lexical representation of a timezone is a string of the form:
(('+' | '-') hh ':' mm) | 'Z',
where
hh is a two-digit numeral
(with leading zeros as required) that
represents the hours,
mm is a two-digit
numeral that represents the minutes,
'+' indicates a nonnegative duration,
'-' indicates a nonpositive duration.
The mapping so defined is one-to-one, except that '+00:00',
'-00:00', and 'Z' all represent the same zero-length duration
timezone, ; 'Z' is its canonical
representation.
When a timezone is added to a dateTime, the result is the date
and time "in that timezone". For example, 2002-10-10T12:00:00+05:00 is
2002-10-10T07:00:00Z and 2002-10-10T00:00:00+05:00 is 2002-10-09T19:00:00Z.
Order relation on dateTime
dateTime value objects on either
timeline are totally ordered by their timeOnTimeline
values; between the two timelines, dateTime
value objects are ordered by their
timeOnTimeline values when their timeOnTimeline values differ by more than
fourteen hours, with those whose difference is a duration of 14 hours or less
being incomparable.
In general, the
order
relation on dateTime
is a partial order since there is no determinate relationship between certain
instants. For example, there is no determinate ordering between
(a) 2000-01-20T12:00:00 and (b) 2000-01-20T12:00:00Z. Based on
timezones currently in use, (c) could vary from 2000-01-20T12:00:00+12:00 to
2000-01-20T12:00:00-13:00. It is, however, possible for this range to expand or
contract in the future, based on local laws. Because of this, the following
definition uses a somewhat broader range of indeterminate values:
+14:00..-14:00.
The following definition uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. The notation (Q
& "-14:00") means adding the timezone -14:00 to Q, where Q did not
already have a timezone. This is a
logical explanation of the process. Actual
implementations are free to optimize as
long as they produce the same results.
The ordering between two dateTimes P
and Q is defined by the following algorithm:
A.Normalize P and Q. That is, if there is a timezone present, but
it is not Z, convert it to Z using the addition operation defined in
Thus 2000-03-04T23:00:00+03:00
normalizes to 2000-03-04T20:00:00Z
B. If P and Q either both have a time zone or both do not have a time
zone, compare P and Q field by field from the year field down to the
second field, and return a result as soon
as it can be determined. That is:
For each i in {year, month, day, hour, minute, second}
If P[i] and Q[i] are both
not specified, continue to the next i
If P[i] is not specified
and Q[i] is, or vice versa, stop and return
P <> Q
If P[i] < Q[i], stop and return P < Q
If P[i] > Q[i], stop and return P > Q
Stop and return P = Q
C.Otherwise, if P contains a time zone and Q does not, compare
as follows:
P < Q if P < (Q with time zone +14:00)
P > Q if P > (Q with time zone -14:00)
P <> Q otherwise, that is, if (Q with time zone
+14:00) < P < (Q with time zone -14:00)
D. Otherwise, if P does not contain a time zone and Q does, compare
as follows:
P < Q if (P with time zone -14:00) < Q.
P > Q if (P with time zone +14:00) > Q.
P <> Q otherwise, that is, if (P with
time zone +14:00) < Q < (P with time zone -14:00)
Examples:
Determinate
Indeterminate
2000-01-15T00:00:00
< 2000-02-15T00:00:00
2000-01-01T12:00:00 <>
1999-12-31T23:00:00Z
2000-01-15T12:00:00
< 2000-01-16T12:00:00Z
2000-01-16T12:00:00 <>
2000-01-16T12:00:00Z
2000-01-16T00:00:00
<> 2000-01-16T12:00:00Z
Totally ordered dateTimes
Certain derived types from dateTime
can be guaranteed have a total order. To
do so, they must require that a specific
set of fields are always specified, and
that remaining fields (if any) are always unspecified. For example, the date
datatype without time zone is defined to contain exactly year, month, and day.
Thus dates without time zone have a total order among themselves.
Constraining facets
time
time
represents an instant of time that recurs every day. The
of time is the space
of time of day values as defined in § 5.3 of
. Specifically, it is a set of zero-duration daily
time instances.
represents instants of time that recur at the same point in each
calendar day, or that occur in some arbitrary calendar day.
Since the lexical representation allows an optional time zone
indicator, time values are partially ordered because it may
not be able to determine the order of two values one of which has a
time zone and the other does not. The order relation on
time values is the
using an arbitrary date. See also
. Pairs
of time values with or without time
zone indicators are totally ordered.
See the conformance note in which
applies to the seconds part of this datatype as well.
Value Space
uses the , with
, ,
and required
to be absent. remains
.
Leap-second Values
The value
&must; be
less than 60 if
is absent or if the remaining
values do not correspond to a time
at which, on some day, a leap-second has been introduced
into by the responsible
authorities;
if
the hour and minute in the specified timezone
allow a real leap-second then the value
&must; be less than 60 plus the largest number of
leap-seconds introduced on any date. (At
the time of publication of this specification, the
largest number of leap-seconds ever introduced at one time
is 1, so the largest legal
value is 60. Historically, all leap-seconds have been introduced
in the last minute of December and June in no
more than one leap-second has ever been introduced at
a time and it appears unlikely that this will ever occur. No negative leap-seconds have been
introduced, but if any should be introduced in future,
adding that negative number will result
in a value limit of 59 or lower.)
See the conformance note in which
applies to the value of this datatype.
RQ-13 (time
zone crosses date line)
The "seven property model" rewrite of
date/time datatype descriptions includes
a carefully crafted definition of order
that insures that for repeating datatypes (time, gDay, etc.), timezoned values
will be compared as though they are on the same calendar day
(local property values) so that in any given timezone,
the days start at local
00:00:00 and end immediately before local 24:00:00.
Days in timezones other than Z do not run from 00:00:00Z to 24:00:00Z.
Equality and order are as prescribed in
. values
(points in time in an arbitrary day) are ordered
taking into account their .
A calendar ( or local time) day with an early
timezone begins earlier than the same calendar day with a later
timezone. Since the timezones allowed spread over 28 hours,
there are timezone pairs for which a given calendar day in the two
timezones are totally disjoint—the earlier day ends before the
same day starts in the later timezone. The moments in time
represented by a single calendar day are spread over a 52-hour
interval, from the beginning of the day in the +14:00 timezone to the
end of that day in the −14:00 timezone.
Since the order of a value
having a
with another value whose
is absent is determined
by imputing timezones of both +14:00 and −14:00
to the untimezoned value, many such combinations will be
because the two imputed
timezones yield different orders. However,
for a given untimezoned value, there will
always be timezoned values at one or both
ends of the 52-hour interval that are
comparable
(because the interval of
incomparability
is only 24 hours wide).
Examples that show the difference from version 1.0 of this specification (see
for the notations):
A day is a calendar (or local
time) day in each timezone.
08:00:00+10:00 < 17:00:00+10:00
(just as 08:00:00Z has always been less than
17:00:00Z, but in version 1.0
08:00:00+10:00 > 17:00:00+10:00 )
A value in a calendar day with an early timezone
may precede every value in a later calendar day:
00:00:00+01:00 is less than every value with
Z
A calendar day with a very early timezone may be completely
disjoint from a calendar day with a very late timezone:
Each value with
+13:00 is less than every
value with −13:00
values do not always
convert to
in the same way as in 1.0, since a time
in a timezone may convert to
a time on
a different day (whereas time
conversions in version 1.0 wrapped around
by ignoring the day during conversion):
22:00:00Z > 03:00:00+05:00
(since 1971-12-31T03:00:00+05 is 1979-12-30T22:00:00Z,
not 1979-12-31T22:00:00Z); in the previous
version of this specification 22:00:00Z =
03:00:00+05:00 )
Lexical representation
The lexical representation for time is the left
truncated lexical representation for :
hh:mm:ss.sss with optional
following time zone indicator. For example,
to indicate 1:20 pm for Eastern
Standard Time which is 5 hours behind
Coordinated Universal Time (),
one would write: 13:20:00-05:00. See also
.
Canonical representation
The canonical representation for time is defined
by prohibiting certain options from the
.
Specifically, either the time zone must
be omitted or, if present, the time zone must be Coordinated Universal
Time () indicated by a "Z".
Additionally, the canonical representation for midnight is 00:00:00.
Lexical Mappings
The lexical representations for
are projections of
those of , as follows:
Lexical Space
timeLexicalRep((::) |
) ?
Leap-second Representations
An &mustnot; begin with the
digit 6 unless the value to
which it would map would satisfy the value
constraint on leap-second values given above.
The production
is equivalent to this
regular expression, once whitespace is
removed, except
that the regular expression does not
enforce the constraint just given):
(([01][0-9])|(2[0-3]):[0-5][0-9]:[0-6][0-9])|(24:00:00) (([01][0-9])|(2[0-3]):[0-5][0-9]:(([0-65][0-9])|(60))(.[0-9]+)?)|(24:00:00(.[0-9]+)?) ([+\-](0[0-9])|(1[0-4]):[0-5][0-9])?Note that neither
the production
nor this regular
expression alone enforce the constraint
on given above.
The lexical mapping and canonical mapping
for are the following functions:
Constraining facets
date
The
of date
consists of top-open intervals of
exactly one day in length on the timelines of
, beginning on the beginning moment of each day (in
each timezone), i.e. '00:00:00', up
to but not including '24:00:00' (which is
identical with
'00:00:00'date
represents top-open intervals of exactly one day in length on the timelines of
, beginning on the beginning moment of each day (in
each timezone), up to but not including the beginning
moment of the next day). For nontimezoned values, the top-open
intervals disjointly cover the nontimezoned timeline,
one per day. For timezoned
values, the intervals begin at every minute and therefore overlap.
A "date object" is an object with year,
month, and day properties just like those
of objects, plus
an optional timezone-valued
timezone property. (As with values of timezones are a
special case of durations.)
Just as a object corresponds to a point on one of the
timelines, a date object corresponds to an interval on one
of the two timelines as just described.
Timezoned date values track the starting moment of their day, as
determined by their timezone; said timezone is generally recoverable for
canonical representations.
The recoverable timezone is that duration which
is the result of subtracting the first moment (or any moment) of the timezoned
date from the first moment (or the
corresponding moment) on the
same date.s are
always durations between '+12:00' and
'-11:59'. This "timezone normalization"
(which follows automatically from the definition of the date) is explained more in
.
For example: the first moment of 2002-10-10+13:00 is 2002-10-10T00:00:00+13,
which is 2002-10-09T11:00:00Z, which is also the first moment of 2002-10-09-11:00.
Therefore 2002-10-10+13:00 is 2002-10-09-11:00;
they are the same interval.
For most timezones, either the first moment or last moment of the day (a
value, always
) will have a date portion
different from that of the date itself!
However, noon of that date (the midpoint of the interval) in that
(normalized) timezone will always have the same date portion as the
date itself, even when that noon point in time is normalized to
. For example,
2002-10-10-05:00 begins during 2002-10-09Z and 2002-10-10+05:00
ends during 2002-10-11Z, but noon of both 2002-10-10-05:00 and 2002-10-10+05:00
falls in the interval which is 2002-10-10Z.
Value Space
uses the , with
, ,
and required
to be absent. remains
.
Day-of-month Values
The value &must; be
no more than 30 if
is one of 4, 6, 9, or 11, no more than 28
if is 2 and
is not divisble 4,
or is divisible by 100 but not by 400,
and no more than 29 if
is 2 and
is divisible by 400, or by 4 but not by 100.
See the conformance note in which
applies to the year
part
value of this datatype
as well.
Equality and order are as prescribed in
.
In version 1.0 of this specification, values
did not retain a timezone explicitly, but for timezones not too far from
their timezone could be recovered based on
their value's first moment on the timeline. The
retains all timezones.
Examples that show the difference from version 1.0 (see
for the notations):
A day is a calendar (or local
time) day in each timezone,
including the timezones outside of +12:00 through -11:59 inclusive:
2000-12-12+13:00 < 2000-12-12+11:00
(just as 2000-12-12+12:00 has always been less than
2000-12-12+11:00, but in version 1.0
2000-12-12+13:00 > 2000-12-12+11:00 ,
since 2000-12-12+13:00's recoverable
timezone was −11:00)
Similarly:
2000-12-12+13:00 = 2000-12-13−11:00
(whereas under 1.0, as just
stated, 2000-12-12+13:00 = 2000-12-12−11:00)
Lexical representation
For the following discussion, let the
"date portion" of a
or date object be an object
similar to a or
date object, with similar year, month, and day properties, but no
others, having the same value for these properties as the original
or date object.
The of
date consists of finite-length
sequences of characters of the form:
'-'? yyyy '-' mm '-' dd zzzzzz?
where the date and optional timezone are represented exactly the
same way as they are for . The first moment of the
interval is that represented by:
'-' yyyy '-' mm '-' dd 'T00:00:00' zzzzzz?
and the least upper bound of the interval is the timeline point represented
(noncanonically) by:
'-' yyyy '-' mm '-' dd 'T24:00:00' zzzzzz?.
The of a date will always be
a duration between '+12:00' and
'11:59'. Timezone lexical representations, as
explained for , can range from '+14:00' to '-14:00'.
The result is that literals of dates with very large or very
negative timezones will map to a "normalized" date value with a
different from that represented in the original
representation, and a matching difference
of +/- 1 day in the date itself.
Canonical representation
Given a member of the date, the
date portion of the canonical
representation (the entire representation
for nontimezoned values, and all but the
timezone representation for timezoned values)
is always the date portion
of the canonical
representation of the interval midpoint
(the representation,
truncated on the right to eliminate 'T' and all following characters).
For timezoned values, append the canonical
representation of the .
Lexical Mappings
The lexical representations for
are projections of
those of , as follows:
Lexical Space
dateLexicalRep--?
Day-of-month Representations
Within a
,
a &mustnot;
begin with the digit 3 or be 29
unless the value to
which it would map would satisfy the value constraint on
values
(Constraint: Day-of-month Values) given above.
The production
is equivalent to this
regular expression,
except that it does not enforce
the constraint just noted:
\-?([1-9][0-9][0-9][0-9]+)|(0[0-9][0-9][0-9])\-(0[1-9])|(1[0-2])\-([0-2][0-9])|(3[01])((+|\-)(0[0-9]|1[0-4]):[0-5][0-9])?Note that neither
the production
nor this regular
expression alone enforce the constraint
on given above.
The lexical mapping and canonical mapping
for are the following functions:
gYearMonth
gYearMonth represents a specific gregorian month in a specific
gregorian year. The of
gYearMonth is the set of Gregorian calendar months as defined
in § 5.2.1 of . Specifically, it is a set
of one-month long, non-periodic instances e.g. 1999-10 to represent the whole
month of 1999-10, independent of how many days this month has.
gYearMonth
represents specific whole Gregorian months in specific
Gregorian years.
Since the lexical representation allows an optional
time zone indicator, gYearMonth values are partially ordered
because it may not be possible to unequivocally determine the order of two
values one of which has a time zone and the other does not. If
gYearMonth values are considered as periods of time, the order
relation on gYearMonth values is the order relation on their
starting instants. This is discussed in .
See also . Pairs of
gYearMonth values with or without time zone indicators are
totally ordered.
Because month/year combinations in one calendar only rarely correspond
to month/year combinations in other calendars, values of this type
are not, in general, convertible to simple values corresponding to month/year
combinations in other calendars. This type should therefore be used
with caution in contexts where conversion to other calendars is desired.
See the conformance note in which
applies to the year part of this datatype
as well.
Value Space
uses the , with
, ,
, and required
to be absent. remains
.
See the conformance note in which
applies to the value of this datatype.
Equality and order are as prescribed in
.
In version 1.0 of this specification,
values did not
retain a timezone explicitly, but timezones not too far from
could be recovered based on the
value's first moment on the timeline. The
simply retains all timezones.
An example that shows the difference from version 1.0 (see
for the notations):
A day is a calendar (or local time) day in
each timezone, including the timezones outside of +12:00 through
−11:59 inclusive:
2000-12+13:00 < 2000-12+11:00
(just as 2000-12+12:00 has always been less than 2000−12+11:00,
but in version 1.0 2000-12+13:00 >
2000-12+11:00 , since 2000−12+13:00's recoverable
timezone was −11:00)
Lexical
representationMappings
The lexical representation for gYearMonth is the reduced
(right truncated) lexical representation for :
CCYY-MM. No left truncation is allowed. An optional following time
zone qualifier is allowed. To accommodate year values outside the
range from 0001 to 9999, additional digits
can be added to the left of this representation and a
preceding "-" sign is allowed.
For example, to indicate the month of May 1999, one would write: 1999-05.
See also .
The lexical representations for
are projections of
those of , as follows:
Lexical Space
gYearMonthLexicalRep-?
The is equivalent to this regular expression:
\-?([1-9][0-9][0-9][0-9]+)|(0[0-9][0-9][0-9])\-(0[1-9])|(1[0-2])((+|\-)(0[0-9]|1[0-4]):[0-5][0-9])?
The lexical mapping and canonical mapping
for are the following functions:
Constraining facets
gYear
gYear represents a
gregorian calendar year. The of
gYear is the set of Gregorian calendar years as defined in
§ 5.2.1 of . Specifically, it is a set of one-year
long, non-periodic instances
e.g. lexical 1999 to represent the whole year 1999, independent of
how many months and days this year has.
gYear
represents Gregorian calendar years.
Since the lexical representation allows an optional time zone
indicator, gYear values are partially ordered because it may
not be possible to unequivocally determine
the order of two values one of which has a
time zone and the other does not. If
gYear values are considered as periods of time, the order relation
on gYear values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of
gYear values with or without time
zone indicators are totally ordered.
Because years in one calendar only rarely correspond to years
in other calendars, values of this type
are not, in general, convertible to simple values corresponding to years
in other calendars. This type should therefore be used with caution
in contexts where conversion to other calendars is desired.
See the conformance note in which
applies to the year part of this datatype
as well.
Value Space
uses the , with
, , ,
, and required
to be absent. remains
.
See the conformance note in which
applies to the value of this datatype.
Equality and order are as prescribed in
.
In version 1.0 of this specification,
values did not
retain a timezone explicitly, but timezones not too far from
could be recovered based on the
value's first moment on the timeline. The
simply retains all timezones.
An example that shows the difference from version 1.0 (see
for the notations):
A day is a calendar (or local time) day in
each timezone, including the timezones outside of +12:00 through
−11:59 inclusive:
2000+13:00 < 2000+11:00
(just as 2000+12:00 has always been less than 2000+11:00,
but in version 1.0 2000+13:00 >
2000+11:00 , since 2000+13:00's recoverable
timezone was −11:00)
Lexical
representationMappings
The lexical representation for gYear is the reduced (right
truncated) lexical representation for : CCYY.
No left truncation is allowed. An optional following time
zone qualifier is allowed as for . To
accommodate year values outside the range from 0001 to 9999, additional
digits can be added to the left of this representation and a preceding
"-" sign is allowed.
For example, to indicate 1999, one would write: 1999.
See also .
The lexical representations for
are projections of
those of , as follows:
Lexical Space
gYearLexicalRep-?
The is equivalent to this regular expression:
\-?([1-9][0-9][0-9][0-9]+)|(0[0-9][0-9][0-9])((+|\-)(0[0-9]|1[0-4]):[0-5][0-9])?
The lexical mapping and canonical mapping
for are the following functions:
Constraining facets
gMonthDay
gMonthDay is a gregorian date that recurs, specifically a day of
the year such as the third of May. Arbitrary recurring dates are not
supported by this datatype. The of
gMonthDay is the set of calendar
dates, as defined in § 3 of . Specifically,
it is a set of one-day long, annually periodic instances.
represents whole calendar
days that recur at the same point in each calendar year, or that occur
in some arbitrary calendar year.
This datatype can be used, for example, to record
birthdays; an instance of the datatype could be used to say that
someone's birthday occurs on the 14th of September every year.
Since the lexical representation allows an optional time zone
indicator, gMonthDay values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of which has a
time zone and the other does not. If
gMonthDay values are considered as periods of time,
in an arbitrary leap year, the order relation
on gMonthDay values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gMonthDay values with or without time zone indicators are totally ordered.
Because day/month combinations in one calendar only rarely correspond
to day/month combinations in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.
Value Space
uses the , with
, , , and required
to be absent. remains
.
Day-of-month Values
The value &must; be no more than 30 if
is one of 4, 6, 9, or 11, and no more than 29 if is 2.
Equality and order are as prescribed in
.
In version 1.0 of this specification, values
did not retain a timezone explicitly, but for timezones not too far from
their timezone could be recovered based on
their value's first moment on the timeline. The
retains all timezones.
An example that shows the difference from version 1.0 (see
for the notations):
A day is a calendar (or local time) day in each timezone,
including the timezones outside of +12:00 through −11:59 inclusive:
--12-12+13:00 < --12-12+11:00
(just as --12-12+12:00 has always been less than
--12-12+11:00, but in version 1.0
--12-12+13:00 > --12-12+11:00 , since
--12-12+13:00's recoverable
timezone was −11:00)
Lexical
representationMappings
The lexical representation for gMonthDay is the left
truncated lexical representation for : --MM-DD.
An optional following time
zone qualifier is allowed as for .
No preceding sign is allowed. No other formats are
allowed. See also .
The lexical representations for
are projections
of those of , as follows:
Lexical Space
gMonthDayLexicalRep---?
Day-of-month Representations
Within a , a &mustnot;
begin with the digit 3 or be 29
unless the value to
which it would map would satisfy the value constraint on
values
(Constraint: Day-of-month Values) given above.
The is equivalent to this regular
expression(note that it does
not enforce the constraint just mentioned):
\-\-(0[1-9])|(1[0-2])\-([0-2][0-9])|(3[01])((+|\-)(0[0-9]|1[0-4]):[0-5][0-9])?Note that neither
the production
nor this regular
expression alone enforce the constraint
on given above.
This datatype can be used to represent a
specific day in a month. To say, for example, that my birthday occurs
on the 14th of September ever year.
The lexical mapping and canonical mapping
for are the following functions:
Constraining facets
gDay
gDay is a gregorian day that recurs, specifically a day
of the month such as the 5th of the month. Arbitrary recurring days
are not supported by this datatype. The
of gDay is the space of a set of calendar
dates as defined in § 3 of . Specifically,
it is a set of one-day long, monthly periodic instances.
gDayis a datatype that represents
whole days within an arbitrary month—days that recur at the same
point in each (Gregorian) month. This datatype can
beis used to represent a specific day of the month.
To say, for example, that I get my paycheckindicate, for example, that an employee gets a paycheck on the 15th of each month. (Obviously, days
beyond 28 cannot occur in all months; they are nonetheless permitted, up to 31.)
Since the lexical representation allows an optional time zone
indicator, gDay values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of
which has a time zone and the other does not. If
gDay values are considered as periods of time,
in an arbitrary month that has 31 days,
the order relation
on gDay values is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gDay
values with or without time zone indicators are totally ordered.
Because days in one calendar only rarely
correspond to days in other calendars,
gdayvalues
of this type do not, in general, have any straightforward or
intuitive representation in terms of most
othernon-Gregorian
calendars.
This
typegday
should therefore be used with caution in contexts where conversion to
other calendars is desired.
Value Space
uses the , with
, ,
, ,
and required to be
absent. remains
and
must be between 1 and 31 inclusive.
Equality and order are as prescribed in
. Since
values (days) are ordered by their first moments, it is possible
for apparent anomalies to appear in the order when
values
are differ by at least 24
hours. (It is possible for
values to differ by up to 28 hours.)
Examples that may appear anomalous (see for the notations):
---15 < ---16 , but ---15−13:00 > ---16+13:00
---15−11:00 = ---16+13:00
---15−13:00 <> ---16 ,
because ---15−13:00 > ---16+14:00
and ---15−13:00 < 16−14:00
Timezones do not cause wrap-around at the end of the month:
---31−13:00 in one month
may start after ---01+13:00 in the next
month,the last day of a
given month in timezone −13:00 may start after the first
day of the next month in timezone +13:00, as
measured on the global timeline,
but nonetheless
---01+13:00 < ---31−13:00 .
Lexical
representationMappings
The lexical representation for gDay is the left
truncated lexical representation for : ---DD .
An optional following time
zone qualifier is allowed as for . No preceding sign is
allowed. No other formats are allowed. See also .
Lexical Mappings
The lexical representations for are
restrictionsprojections
of those of , as follows:
Lexical Space
gDayLexicalRep---?
The is equivalent to this regular expression:
---\-\-\-([0-2][0-9]|3[01])((+|\-)(0[0-9]|1[0-4]):[0-5][0-9])?
The lexical mapping and canonical mapping for are defined as follows:
&CFacet;s
gMonth
gMonth is a gregorian month that recurs every year.
The
of gMonth is the space of a set of calendar
months as defined in § 3 of . Specifically,
it is a set of one-month long, yearly periodic instances.
This datatype can be used to represent a
specific month. To say, for example, that Thanksgiving falls in the month of
November.gDaygMonth
represents whole (Gregorian) months
within an arbitrary year—months that recur at the same point in
each year. It might be used, for example, to say what
month annual Thanksgiving celebrations fall in different countries
(--11 in the United States, --10 in Canada, and possibly other months in
other countries).
Since the lexical representation allows an optional time zone
indicator, gMonth values are partially ordered because it may
not be possible to unequivocally determine the order of two values one of which has a
time zone and the other does not. If
gMonth values are considered as periods of time, the order relation
on gMonth is the order relation on their starting instants.
This is discussed in . See also
. Pairs of gMonth
values with or without time zone indicators are totally ordered.
Because months in one calendar only rarely correspond
to months in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.
Value Space
uses the , with
, , , , and required
to be absent. remains
.
Equality and order are as prescribed in
.
In version 1.0 of this specification, values
did not retain a timezone explicitly, but for timezones not too far from
their timezone could be recovered based on
their value's first moment on the timeline. The
retains all timezones.
An example that shows the difference from version 1.0 (see
for the notations):
A month is a calendar (or local time) month in each timezone,
including the timezones outside of +12:00 through −11:59 inclusive:
--12+13:00 < --12+11:00
(just as --12+12:00 has always been less than --12+11:00, but in version 1.0
--12+13:00 > --12+11:00 , since --12+13:00's recoverable
timezone was −11:00)
Lexical
representationMappings
The lexical representation for gMonth is the left
and right truncated lexical representation for : --MM.
An optional following time
zone qualifier is allowed as for . No preceding sign is
allowed. No other formats are allowed. See also .
The lexical representations for are projections of
those of , as follows:
Lexical Space
gMonthLexicalRep--?
The is equivalent to this regular expression:
\-\-(0[1-9])|(1[0-2])((+|\-)(0[0-9]|1[0-4]):[0-5][0-9])?
The lexical mapping and canonical mapping
for are defined as follows:
Constraining facets
hexBinary
hexBinary represents
arbitrary hex-encoded binary data. The of
hexBinary is the set of finite-length sequences of binary
octets.
Lexical Representation
hexBinary has a lexical representation where
each binary octet is encoded as a character tuple, consisting of two
hexadecimal digits ([0-9a-fA-F]) representing the octet code. For example,
"0FB7" is a hex encoding for the 16-bit integer 4023
(whose binary representation is 111110110111).
Canonical Representation
The canonical representation for hexBinary is defined
by prohibiting certain options from the
. Specifically, the lower case
hexadecimal digits ([a-f]) are not allowed.
Constraining facets
base64Binary
base64Binary
represents Base64-encoded arbitrary binary data. The of
base64Binary is the set of finite-length sequences of binary
octets. For base64Binary data the
entire binary stream is encoded using the Base64
Alphabet in
.
The lexical forms of base64Binary values are limited to the 65 characters
of the Base64 Alphabet defined in , i.e., a-z,
A-Z, 0-9, the plus sign (+), the forward slash (/) and the
equal sign (=), together with the characters defined in as white space.
No other characters are allowed.
For compatibility with older mail gateways, suggests that
base64 data should have lines limited to at most 76 characters in length. This
line-length limitation is not mandated in the lexical forms of base64Binary
data and must not be enforced by XML Schema processors.
The lexical space of base64Binary is given by the following grammar
(the notation is that used in ); legal lexical forms must match
the Base64Binary production.
Note that this grammar requires the number of non-whitespace characters in the lexical
form to be a multiple of four, and for equals signs to appear only at the end of the
lexical form; strings which do not meet these constraints are not legal lexical forms
of base64Binary because they cannot successfully be decoded by base64
decoders.
The above definition of the lexical space is more restrictive than that
given in as regards whitespace -- this is not an issue
in practice. Any string compatible with the RFC can occur in
an element or attribute validated by this type, because the facet of this type is fixed
to collapse, which means that all leading and trailing whitespace
will be stripped, and all internal whitespace collapsed to single space
characters, before the above grammar is enforced.
The canonical lexical form of a base64Binary data value is the base64
encoding of the value which matches the Canonical-base64Binary production in the following
grammar:
For some values the canonical form defined above does not conform to
, which requires
breaking with linefeeds at appropriate intervals.
The length of a base64Binary value is the number of octets it contains.
This may be calculated from the lexical form by removing whitespace and padding characters
and performing the calculation shown in the pseudo-code below:
Note on encoding: explicitly references US-ASCII encoding. However,
decoding of base64Binary data in an XML entity is to be performed on the
Unicode characters obtained after character encoding processing as specified by
Constraining facets
anyURI
anyURI represents a Uniform Resource Identifier Reference
(URI). An anyURI value can be absolute or relative, and may
have an optional fragment identifier (i.e., it may be a URI Reference). This
type should be used to specify the intention that the value fulfills
the role of a URI as defined by , as amended by
.
The mapping from anyURI values to URIs is as
defined by the URI reference escaping procedure
defined in
Section 5.4 Locator Attribute
of (see also
Section 87Character Encoding in URI References
of ). This means
that a wide range of internationalized resource identifiers can be specified
when an anyURI is called for, and still be understood as
URIs per , as amended by ,
where appropriate to identify resources.
Section 5.4 Locator Attribute
of requires that relative URI references be absolutized
as defined in before use. This is an XLink-specific
requirement and is not appropriate for XML Schema, since neither the
nor the
of the type are restricted to absolute URIs. Accordingly
absolutization must not be performed by schema processors as part of schema
validation.
Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed
fragment
identifiers. Because it is
impractical for processors to check that a value is a
context-appropriate URI reference, this specification follows the
lead of (as amended by )
in this matter: such rules and restrictions are not part of type validity
and are not checked by processors.
Thus in practice the above definition imposes only very modest obligations
on processors.
Lexical representation
The of anyURI is
finite-length character sequences which, when the algorithm defined in
Section 5.4 of is applied to them, result in strings
which are legal URIs according to , as amended by
.
Spaces are, in principle, allowed in the
of anyURI, however, their use is highly discouraged
(unless they are encoded by %20).
Constraining facets
QName
QName represents
XML qualified names.
The of QName is the set of
tuples {namespace name,
local part},
where namespace name
is an
and local part is
an .
The of QName is the set
of strings that the
QName production of .
The mapping between literals in the and
values in the of QName requires
a namespace declaration to be in scope for the context in which QName
is used.
Constraining facets
The use of , and
on datatypes from is
deprecated. Future versions of this specification may
remove these facets for this datatype.
NOTATION
NOTATION
represents the NOTATION attribute
type from . The
of NOTATION is the set of s
of notations declared in the current schema.
The of NOTATION is the set
of all names of notations
declared in the current schema (in the form of s).
enumeration facet value required for NOTATION
It is an for NOTATION
to be used directly in a schema. Only datatypes that are
from NOTATION by
specifying a value for can be used
in a schema.
For compatibility (see ) NOTATION
should be used only on attributes
and should only be used in schemas with no
target namespace.
Constraining facets
The use of , and
on datatypes from is
deprecated. Future versions of this specification may
remove these facets for this datatype.
Derived datatypesOther Built-in s
This section gives conceptual definitions for all
datatypes
defined by this specification. The XML representation used to define
datatypes (whether
or ) is
given in section and the complete
definitions of the datatypes are provided in Appendix A
.
normalizedString
normalizedString
represents white space normalized strings.
The of normalizedString is the
set of strings that do not
contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters.
The of normalizedString is the
set of strings that do not
contain the carriage return (#xD),
line feed (#xA)
nor tab (#x9) characters.
The of normalizedString is .
Constraining facets
Derived datatypes
token
token
represents tokenized strings.
The of token is the
set of strings that do not
contain the
carriage return (#xD),
line feed (#xA) nor tab (#x9) characters, that have no
leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces.
The of token is the
set of strings that do not contain the
carriage return (#xD),
line feed (#xA) nor tab (#x9) characters, that have no
leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces.
The of token is .
Constraining facets
Derived datatypes
language
language
represents natural language identifiers as defined by
by
.
The of language is the
set of all strings that are valid language identifiers as defined
.
The of
language is the set of all strings that
conform to the pattern [a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
.
The of language is .
Constraining facets
NMTOKEN
NMTOKEN represents
the NMTOKEN attribute type
from . The of
NMTOKEN is the set of tokens that
the Nmtoken production in
. The of
NMTOKEN is the set of strings that
the Nmtoken production in
. The of
NMTOKEN is .
For compatibility (see ) NMTOKEN
should be used only on attributes.
Constraining facets
Derived datatypes
NMTOKENS
NMTOKENS
represents the NMTOKENS attribute
type from . The
of NMTOKENS is the set of finite, non-zero-length sequences of
s. The
of NMTOKENS is the set of space-separated lists of tokens,
of which each token is in the of
. The of
NMTOKENS is .
For compatibility (see )
NMTOKENS should be used only on attributes.
Constraining facets
Name
Name
represents XML Names.
The of Name is
the set of all strings which the
Name production of
. The of
Name is the set of all strings which
the Name production of
. The of Name
is .
Constraining facets
Derived datatypes
NCName
NCName represents XML
"non-colonized" Names. The of
NCName is the set of all strings which
the NCName production of
. The of
NCName is the set of all strings which
the NCName production of
. The of
NCName is .
Constraining facets
Derived datatypes
ID
ID represents the
ID attribute type from
. The of
ID is the set of all strings that
the NCName production in
. The
of ID is the set of all
strings that the
NCName production in
.
The of ID is .
For compatibility (see )
ID should be used only on attributes.
Constraining facets
IDREF
IDREF represents the
IDREF attribute type from
. The of
IDREF is the set of all strings that
the NCName production in
. The
of IDREF is the set of
strings that the
NCName production in
.
The of IDREF is .
For compatibility (see ) this datatype
should be used only on attributes.
Constraining facets
Derived datatypes
IDREFS
IDREFS represents the
IDREFS attribute type from
. The of
IDREFS is the set of finite, non-zero-length sequences of
s.
The of IDREFS is the
set of space-separated lists of tokens, of which each token is in the
of .
The of IDREFS is
.
For compatibility (see ) IDREFS
should be used only on attributes.
Constraining facets
ENTITY
ENTITY represents the
ENTITY attribute type from
. The of
ENTITY is the set of all strings that
the NCName production in
and have been declared as an
unparsed entity in
a document type definition.
The of ENTITY is the set
of all strings that the
NCName production in
.
The of ENTITY is .
The of ENTITY is scoped
to a specific instance document.
For compatibility (see ) ENTITY
should be used only on attributes.
Constraining facets
Derived datatypes
ENTITIES
ENTITIES
represents the ENTITIES attribute
type from . The
of ENTITIES is the set of finite, non-zero-length sequences of
s that have been declared as
unparsed entities
in a document type definition.
The of ENTITIES is the
set of space-separated lists of tokens, of which each token is in the
of .
The of ENTITIES is
.
The of ENTITIES is scoped
to a specific instance document.
For compatibility (see ) ENTITIES
should be used only on attributes.
Constraining facets
integer
integer is
from by fixing the
value of to be 0 and
disallowing the trailing decimal point.
This results in the standard
mathematical concept of the integer numbers. The
of integer is the infinite
set {...,-2,-1,0,1,2,...}. The of
integer is .
Lexical representation
integer has a lexical representation consisting of a finite-length sequence
of decimal digits (#x30-#x39) with an optional leading sign. If the sign is omitted,
"+" is assumed. For example: -1, 0, 12678967543233, +100000.
Canonical representation
The canonical representation for integer is defined
by prohibiting certain options from the
. Specifically, the preceding optional "+" sign is prohibited and leading zeroes are prohibited.
nonPositiveInteger is from
by setting the value of
to be 0. This results in the
standard mathematical concept of the non-positive integers.
The of nonPositiveInteger
is the infinite set {...,-2,-1,0}. The
of nonPositiveInteger is .
Lexical representation
nonPositiveInteger has a lexical representation consisting of
an optional preceding sign
followed by a finite-length sequence of decimal digits (#x30-#x39).
The sign may be "+" or may be omitted only for
lexical forms denoting zero; in all other lexical forms, the negative
sign ("-") must be present.
For example: -1, 0, -12678967543233, -100000.
Canonical representation
The canonical representation for nonPositiveInteger is defined
by prohibiting certain options from the
.
In the canonical form for zero, the sign must be
omitted. Leading zeroes are prohibited.
negativeInteger is from
by setting the value of
to be -1. This results in the
standard mathematical concept of the negative integers. The
of negativeInteger
is the infinite set {...,-2,-1}. The
of negativeInteger is .
Lexical representation
negativeInteger has a lexical representation consisting of
a negative sign ("-") followed by a finite-length
sequence of decimal digits (#x30-#x39). For example: -1, -12678967543233, -100000.
Canonical representation
The canonical representation for negativeInteger is defined
by prohibiting certain options from the
. Specifically, leading zeroes are prohibited.
Constraining facets
long
long is
from by setting the
value of to be 9223372036854775807
and to be -9223372036854775808.
The of long is
.
Lexical representation
long has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0,
12678967543233, +100000.
Canonical representation
The canonical representation for long is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Derived datatypes
int
int
is from by setting the
value of to be 2147483647 and
to be -2147483648. The
of int is .
Lexical representation
int has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0,
126789675, +100000.
Canonical representation
The canonical representation for int is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Derived datatypes
short
short is
from by setting the
value of to be 32767 and
to be -32768. The
of short is
.
Lexical representation
short has a lexical representation consisting
of an optional sign followed by a finite-length sequence of decimal
digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0, 12678, +10000.
Canonical representation
The canonical representation for short is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
Derived datatypes
byte
byte
is from
by setting the value of to be 127
and to be -128.
The of byte is
.
Lexical representation
byte has a lexical representation consisting
of an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0,
126, +100.
Canonical representation
The canonical representation for byte is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
nonNegativeInteger
nonNegativeInteger is from
by setting the value of
to be 0. This results in the
standard mathematical concept of the non-negative integers. The
of nonNegativeInteger
is the infinite set {0,1,2,...}. The of
nonNegativeInteger is .
Lexical representation
nonNegativeInteger has a lexical representation consisting of
an optional sign followed by a finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted,
the positive sign ("+") is assumed.
If the sign is present, it must be "+" except for lexical forms
denoting zero, which may be preceded by a positive ("+") or a negative ("-") sign.
For example:
1, 0, 12678967543233, +100000.
Canonical representation
The canonical representation for nonNegativeInteger is defined
by prohibiting certain options from the
. Specifically, the
the optional "+" sign is prohibited and leading zeroes are prohibited.
unsignedLong is from
by setting the value of
to be 18446744073709551615.
The of unsignedLong is
.
Lexical representation
unsignedLong has a lexical representation consisting
of a finite-length sequence of decimal digits (#x30-#x39).
For example: 0,
12678967543233, 100000.
Canonical representation
The canonical representation for unsignedLong is defined
by prohibiting certain options from the
. Specifically,
leading zeroes are prohibited.
Constraining facets
Derived datatypes
unsignedInt
unsignedInt is from
by setting the value of
to be 4294967295. The
of unsignedInt is
.
Lexical representation
unsignedInt has a lexical representation consisting
of a finite-length
sequence of decimal digits (#x30-#x39). For example: 0,
1267896754, 100000.
Canonical representation
The canonical representation for unsignedInt is defined
by prohibiting certain options from the
. Specifically,
leading zeroes are prohibited.
unsignedByte is from
by setting the value of
to be 255. The
of unsignedByte is
.
Lexical representation
unsignedByte has a lexical representation consisting
of a finite-length
sequence of decimal digits (#x30-#x39).
For example: 0,
126, 100.
Canonical representation
The canonical representation for unsignedByte is defined
by prohibiting certain options from the
. Specifically,
leading zeroes are prohibited.
Constraining facets
positiveInteger
positiveInteger is from
by setting the value of
to be 1. This results in the standard
mathematical concept of the positive integer numbers.
The of positiveInteger
is the infinite set {1,2,...}. The of
positiveInteger is .
Lexical representation
positiveInteger has a lexical representation consisting
of an optional positive sign ("+") followed by a finite-length
sequence of decimal digits (#x30-#x39).
For example: 1, 12678967543233, +100000.
Canonical representation
The canonical representation for positiveInteger is defined
by prohibiting certain options from the
. Specifically, the
optional "+" sign is prohibited and leading zeroes are prohibited.
Constraining facets
yearMonthDuration
yearMonthDuration is a datatype from
by restricting its lexical
representations to instances of
. The of
yearMonthDuration
is therefore that of restricted to those whose
property is 0. This results in a duration datatype which is totally ordered.
The always-zero is formally retained in order that
's (abstract) value space truly be a subset of that of
An obvious implementation optimization is to ignore the zero and implement
values simply as values.
The Lexical Mapping
The lexical space is reduced from that of by
disallowing and
fragments in the lexical representations.
The
, called herein, is that of restricted to the lexical space.The Lexical
Representation
yearMonthDurationLexicalRep-? P
The regular expression -?P([0-9]+Y)?([0-9]+M)? has
instances that are not in the lexical space—but they are not in the lexical space of
either, so it serves as a relatively simple regular expression
that extracts from the
of those representations that are instances of .
Canonical mappings are not used during schema processing. They are provided in this specification
for the benefit of other users of these datatype definitions who may find them useful, and for other specifications
which might find it useful to reference them normatively.
The is that of restricted in its
range to the (which reduces its domain to omit any
values not in the value space).
The value whose and
are both zero has no in this datatype since its
in (PT0S)
is not in the
of .
&CFacet;s
has the following &cfacet;s:
pattern
eunmeration
whitespace
minInclusive
minExclusive
maxInclusive
maxExclusive
dayTimeDuration
dayTimeDuration is a datatype from
by restricting its lexical
representations to instances of
. The of
dayTimeDuration
is therefore that of restricted to those whose
property is 0. This results in a duration datatype which is totally ordered.
The Lexical Space
The lexical space is reduced from that of by
disallowing and
fragments in the lexical representations.
The
, called herein, is that of restricted to the lexical space.
The Lexical Representation
dayTimeDurationLexicalRep-? P
The regular expression -?P([0-9]+D)?(T([0-9]+H)?([0-9]+M)?([0-9]+(.[0-9]+)?S)?)? has several
instances that are not in the lexical space—but they are not in the lexical space of
either, so it serves as a relatively simple regular expression that extracts from
the
of those representations that are instances of .
Canonical mappings are not used during schema processing. They are provided in this specification
for the benefit of other users of these datatype definitions who may find them useful, and for other specifications
which might find it useful to reference them normatively.
The is that of restricted
to the
The is that of restricted
in its
range to the (which reduces its domain to omit any
values not into the value
space).
&CFacet;s
has the following &cfacet;s:
pattern
eunmeration
whitespace
minInclusive
minExclusive
maxInclusive
maxExclusive
Datatype components
The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
absent as their value.
Any property identified as a having a set, subset or
value may have an empty value unless this is explicitly ruled out: this is
not the same as absent.
Any property value identified as a superset or a subset of some set may
be equal to that set, unless a proper superset or subset is explicitly
called for.
For more information on the notion of datatype (schema) components,
see Schema Component Details
of .
Simple Type Definition
Simple Type definitions provide for:
Establishing the and
of a datatype, through
the combined set of s specified
in the definition;
Attaching a unique name (actually a ) to the
and .
In the case of
datatypes,
identifying a datatype with its definition in this specification.
In the case of datatypes,
defining the datatype in terms of other datatypes.
Attaching a to the datatype.
The Simple Type Definition Schema Component
The Simple Type Definition schema component has the following properties:
Datatypes are identified by their
and . Except
for anonymous datatypes (those with no ),
datatype definitions be uniquely identified
within a schema.
If is
then the of the datatype defined will
be a subset of the of
(which is a subset of the
of ).
If is
then the of the datatype defined will
be the set of finite-length sequence of values from the
of .
If is then the
of the datatype defined will be the
union of the s of each datatype in
.
If is
then the of
must be .
If is
then the of
must be either or .
If is
then
must be a list of datatype definitions.
The value of consists of the set of
sfundamental facets
and constraining facets
specified directly in the datatype definition
unioned with the possibly empty set of of
.
The value of consists of the set of
sfundamental
facets and their values.
If is the empty set then the type can be used
in deriving other types; the explicit values restriction,
list and union prevent further derivations
by , and
respectively.
XML Representation of Simple Type Definition Schema Components
The XML representation for a schema component
is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the name &i-attribute;name attribute, if present,
otherwise nullabsent
A setsubset of
{restriction, list, union}
corresponding to the sequence which is
the &v-value; of the
final &i-attribute;final attribute,
if present, otherwise the &v-value; of the
finalDefault &i-attribute;finalDefault attribute of the ancestor
schema
element information itemschema element,
if present, andotherwise the empty string, as follows:
the empty set;
{restriction, list, union};
a set with members drawn from the set above, each being present or
absent depending on whether the string contains an equivalently named
space-delimited substring.
Although the finalDefault &i-attribute; of
schema may include
values other than
restriction, list or union, those values
are ignored in the determination of
otherwise the
empty sequence. The constant restriction is present
in the set if and only if restriction or #all
is present in the sequence; similarly also
for list and union.
Although the finalDefault attribute of a schema
may include &string;s other than restriction,
list or union, those other values
are ignored in the determination of .
The &v-value; of the
targetNamespace &i-attribute;
of the parent schema element information itemtargetNamespace attribute
of the parent schema element information item.
The annotation corresponding to the
element information item in the &i-children;, if present, otherwise
nullA sequence whose one term is the
annotation
corresponding to the child
element information item, if one is present, otherwise absent
A datatype can be
from a datatype or another
datatype by one of three means:
by restriction, by list or by union.
A can be added to a schema by deriving it from
another ordinary
either by direct derivation, explicit
construction as a list, or explicit
construction as a union.
A user-defined
can be directly derived from an ordinary
, or constructed from an
ordinary as a list, or constructed from a
sequence of ordinary s as a union.
Direct Derivation by restriction
The
&v-value; ofsame value as that of of The union of the set of
fundamental facets and constraining facets
componentsThe set of
components
resolved to by the facet &i-children; merged with
from , subject to
the Facet Restriction Validapplicable facet
constraints specified in
the applicable facet
constraints specified in .
The component resolved to by the &v-value; of the
base &i-attribute;base attribute
or the &i-children;child element,
whichever is present.
An electronic commerce schema might define a datatype called
SkuSKU
(the barcode number that appears on products) from the
datatype by
supplying a value for the facet.
In this case, SkuSKU is the name of the new
datatype, is
its
and
is the facet.a
facet is introduced in the
direct derivation.
Derivation by listConstruction as a List
list
The component resolved to by the &v-value; of the
itemType &i-attribute;
or the &i-children;,
whichever is present.
A datatype must be
from an or a datatype,
known as the
of the datatype.
This yields a datatype whose is composed of
finite-length sequences of values from the of the
and whose is
composed of space-separated lists of literals of the
.
A system might want to store lists of floating point values.
]]>
In this case, listOfFloat is the name of the new
datatype, is its
and is the
derivation method.
As mentioned in ,
when a datatype is from a
datatype, the following
s can be used:
regardless of the s that are applicable
to the datatype that serves as the
of the .
For each of ,
and , the
unit of length is measured in number of list items.
The value of
is fixed to the value collapse.
Derivation by unionConstruction as a Union
union
The sequence of components resolved to by the
items in the &v-value; of the
memberTypes &i-attribute;, if any,
in order, followed by the components resolved to by the
&i-children;, if any, in order.
If is union for
any components resolved to above, then
the is replaced by its
.
A datatype can be
from one or more ordinary
, or
other datatypes, known as the
of that datatype.
As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The
type definition below would accomplish that.
]]>
A header
this is a test
]]>
As mentioned in ,
when a datatype is from a
datatype, the only following
s can be used:
regardless of the s that are
applicable to the datatypes that participate in the
Constraints on XML Representation of Simple Type Definition
Single Facet Value
Unless otherwise specifically allowed by this specification
( and
) any given
can only be specifed once within
a single derivation step.
itemType attribute or simpleType child
Either the itemType &i-attribute; or the
&i-child; of the element
must be present, but not both.
base attribute or simpleType child
Either the base &i-attribute; or the
simpleType &i-child; of the
element must be present, but not both.
memberTypes attribute or simpleType children
Either the memberTypes &i-attribute; of the
element must be non-empty or
there must be at least one simpleType &i-child;.
Simple Type Definition Validation Rules
Facet Valid
A value in a is facet-valid with
respect to a component
if and only if:
the value is facet-valid with respect to the particular
as specified below.
Datatype Valid
A string is datatype-valid with respect to a datatype definition
if and only if:
it es a literal in the
of the datatype, determined as follows:
if is a member of ,
then the string must be ;
if is not a member of ,
then
if is then
the string must a literal in the
of
if is then the string must
be a sequence of space-separated tokens, each of which es a literal in the
of
if is then
the string must a literal in the
of at least one member of
the value denoted by the literal ed in the previous step
is a member of the of the datatype, as determined
by it being
with respect to each member of (except
for ).
Constraints on Simple Type Definition Schema Components
applicable facets
The s which are allowed
to be members of are dependent on
as specified in the following table:
list of atomic
If is , then
the of be or
.
no circular unions
If is ,
then
it is an if
and
and of any member of
.
The Simple Type Definition Hierarchy
The Constraints just given serve, among other things, to insure
that s are properly placed the schema type hierarchy by virtue of the setting
of their .
Simple Type Definition for anySimpleType
There is a simple type definition nearly equivalent to the simple version
of the ur-type definition present
in every schema by definition. It has the following properties:
Simple Type Definition of the Ur-Type
anySimpleTypehttp://www.w3.org/2001/XMLSchemathe ur-type definitionThe empty setnull
Built-in Simple Type Definitions
The definition of is
present in every schema. It has the following properties:
Simple Type Definition of anySimpleTypeThe of anySimpleTypehttp://www.w3.org/2001/XMLSchemaanyTypeThe empty setabsentabsentThe empty setThe empty setglobalabsentabsentThe empty sequence
The
definition of
is the root of the Simple Type Definition
hierarchy, and as such mediates between the other
simple type definitionss,
which all eventually trace back to it via their
properties, and thus to the
definition of ,
which is
its.
The of is present in every schema. It has the
following properties:
Simple Type Definition of anyAtomicType of anyAtomicTypehttp://www.w3.org/2001/XMLSchemaThe empty setatomicabsentThe empty setThe empty setglobalabsentabsentThe empty sequence
Simple type definitions for all the built-in primitive datatypes,
namely , , , , ,
, , ,
, , ,
, , ,
, ,
,
are present by definition in every schema. All
are in the XML Schema namespace (http://www.w3.org/2001/XMLSchema),
have an atomic
with an empty (unless otherwise specified in this specification)
and as their
, and themselves as
.
Similarly, simple type definitions for all the built-in &derived;
datatypes are present by definition in every schema, with properties
as specified in and as represented
in XML in .
Fundamental Facets
equal
Every supports the notion of equality,
with the following rules:
for any a and b in
the ,
either a is equal to b,
denoted a = b, or a
is not equal to b, denoted a != b
there is no pair a and b
from the such that both
a = b and a != b
for all a in the ,
a = a
for any a and b
in the ,
a = b if and only if b = a
for any a, b and
c in the ,
if a = b and
b = c, then a = c
for any a and b
in the
if a = b, then a
and b cannot be distinguished
(i.e., equality is identity)
the s of all
datatypes are disjoint (they do not
share any values)
On every datatype, the operation Equal is defined in terms of the equality
property of the : for any values
a, b drawn from the
, Equal(a,b) is
true if a = b, and false otherwise.
Note that in consequence of the above:
given A and
B where
A and B are disjoint,
every pair of values a from A
and b from B,
a != b
two values which are members of the
of the same datatype may always be
compared with each other
if a datatype T is
by from
A, B, ...
then the of T is the
union of s of its
A, B, ....
Some values in the of
T are also values in the
of A.
Other values in the of
T will be values in the
of B and so on.
Values in the of T
which are also in the of
A can be compared with other values in the
of A according
to the above rules. Similarly for values of type
T and B and all the other
.
if a datatype T' is
by from an atomic datatype T
then the of T' is
a subset of the of T.
Values in the s of
T and T' can be compared
according to the above rules
if datatypes T' and T'' are
by from a
common atomic ancestor T then the
s of T' and
T'' may overlap. Values in the
s
of T' and T'' can be
compared according to the above rules
There is no schema component corresponding to the equal.
InformationFundamental Facets
RQ-24 (systematic approach to facets)
The decision that the four informational facets, each of which have only one property,
will be lumped into one facet having four properties has been rescinded by the WG before it
made it into the text of this specification.
(Information
facets were called "fundamental facets" in the 1.0 version
of this specification.)
The purpose of an
is to provide a limited piece of information about some aspect
of a datatype. Each
fundamental facet is a schema component that
provides a limited piece of information about some aspect
of each datatype.
For example, is a
.
Most informationfundamental facets
are given a value
fixed with each primitive datatype's definition, and this value is not changed by
subsequent derivations (even when
it would perhaps be reasonable to expect an application to give a more accurate value based
on the &cfacet;s used to define the derivation). The
and facets
are exceptions to this rule; their values may change as a result of certain
derivations.
Schema components are identified by kind. InformationFundamental
is not a kind of component. Each kind of
(ordered,
bounded, etc.) is realized as
a separate kind of schema component.
A can occur only
in the of a , and this is the
only place where components
occur. The
in whose
an
component occurs is that
component's parent.A
in whose a
component occurs is that
component's owner. Each kind of
component occurs (once) in each 's set.
The value of any component can always
be calculated from other properties of its .
Fundamental facets are not required for schema processing,
but some applications use them.
ordered
An
order relation on a
is a mathematical relation that imposes a
or a on the
members of the .
A
, and hence a datatype, is said to be
ordered if there exists an
defined for that
.
A partial order is an
that is irreflexive, asymmetric and
transitive.
A has the following properties:
for no a in the ,
a < a
(irreflexivity)
for all a and b
in the ,
a < b
implies not(b < a)
(asymmetry)
for all a, b
and c in the ,
a < b and b < c
implies a < c
(transitivity)
The notation a <> b is used to indicate the
case when a != b and neither
a < b nor b < a.
For any values a and b
from different s,
a <> b.
When a <> b, a and b are incomparable,otherwise they are comparable.
A total order is an
such that for no a and b
is it the case that a <> b.
A has all of the properties specified
above for , plus
the following property:
for all a and b
in the ,
either a < b or b < a
or a = b
The fact that this specification does not define an
for some datatype does not
mean that some other application cannot treat that datatype as
being ordered by imposing its own order relation.
provides for:
indicating whether an is
defined on a , and if so,
whether that is
a or a
Some datatypes have a nontrivial order relation associated with
their value spaces (see ). (There is always a
trivial partial ordering wherein every value pair that is not
equal is incomparable, which could be associated with any value space.) The
ordered facet value is a "near-boolean": one of false,
partial, and total, as prescribed in
for datatypes;
all datatypes inherit this value without change. The
vale
for a and value
for a
is always false
and the value for a is computed as described below.
A false value means no order is prescribed;
a total value
assures that the prescribed order is a total
order; a partial value means
there is no simple means prescribed to be sure
the prescribed order is either tivial
or total based on the
derivation mechanism.
that the prescribed order is a partial
order, but not (for the primitive type in question) a total order.
Derivation of new datatypes from datatypes
with partial orders may impose constraints which make the
effective ordering either a trivial order or a non-trivial total order,
but the value of the facet is not changed to
reflect this.
A
, and hence a datatype, is said to be
ordered if this specification prescribes a non-trivial
order for that
.
Some of the real-world datatypes which are the basis for those defined herein
are ordered in some applications, even though no order is prescribed for schema-processing
purposes. For example, is sometimes ordered, and
and datatypes from
ordered datatypes are sometimes given lexical
orderings. They are not ordered for schema-processing purposes.
The ordered Schema Component
depends on ,
and
in the component in which a
component appears as a member of
.
When is ,
is inherited from
of .
For all types
is as specified in the table in .
When is ,
is false.
When is ,
is partial unless one of the
following:
If every member of
is derived from a common ancestor other than
the simple ur-type,
then is the same as that
ancestor's ordered facet
If every member of has a
of false for the ordered
facet, then is false
depends on
the owner's, ,
and .
the
&owners.Diff; is atomic
the is
is as specified in the
table in .
is the owner's's .
the owner's is list
is false.
the owner's is union;
every member of the
&owners.Diff;is derived from a common ancestor other than
the simple ur-typehas
atomic and has the same
is the same as the
component's in that
common ancestor'sprimitive
type definition's.
each member of the &owners.Diff; has an
component in its
whose is false
is
false.
is
partial.
bounded
A value u in an U
is said to be an inclusive upper bound of a
V
(where V is a subset of U)
if and only if
for all v in V,
u >= v.
A value u in an U
is said to be an exclusive upper bound of a
V
(where V is a subset of U)
if and only if
for all v in V,
u > v.
A value l in an L
is said to be an inclusive lower bound of a
V
(where V is a subset of L)
if and only if
for all v in V,
l <= v.
A value l in an L
is said to be an exclusive lower bound of a
V
(where V is a subset of L)
if and only if
for all v in V,
l < v.
A datatype is bounded
if and only if
its has either an
or an
and either an or
an
.
provides for:
indicating whether a is
Some ordered datatypes have the property that
there is one value greater than or equal to every other value, and
another that less than or equal to every other value. (In the
case of derived datatypes, these two values may not beare
not necessarily in the value space of the derived datatype,
but they must be in the value
space of the primitive datatype from which they have been derived.)
The bounded facet value is and is
generally true for such bounded datatypes.
However, it will remain false when the mechanism for imposing
such a bound is difficult to detect, as, for example, when the
boundedness occurs because of derivation using a
component.
The bounded Schema Component
depends on
the
&owners.Diff;,
and
in the component in which a
component appears as a member of
.
When the
is ,
is as specified in the
table in . Otherwise, when the
&owners.Diff; is atomic,
if one of or
and one of or
are among members of
the owner's set, then
is true;
elseotherwise is false.
When the owner's is list,
if or both of
and
are among , then
is true; else is false.
When the owner's is union,
if is true
for every member of and all members of
the
owner's
set and all of these share
a common ancestor, then is true;
elseotherwise is false.
cardinality
Every
has associated with it the concept of
cardinality. Some s
are finite, some are countably infinite while still others could
conceivably be uncountably infinite (although no
defined by this specification is uncountable infinite). A datatype is
said to have the cardinality of its
.
It
is sometimes useful to categorize s
(and hence, datatypes) as to their cardinality. There are two
significant cases:
s that are finite
s that are countably infinite
provides for:
indicating whether the
of a is
finite or countably infinite
Every value space has a specific number of members. This number can be characterized as
finite or infinite. (Currently there are no datatypes with infinite
value spaces larger than countable.) The cardinality facet value is
either finite or countably infinite and is generally finite for datatypes with
finite value spaces. However, it will remain countably infinite when the mechanism for
causing finiteness is difficult to detect, as, for example, when finiteness occurs because of a
derivation using a component.
The cardinality Schema Component
depends on the &owners.Diff;,
, and
in the component in which a
component appears as a member of
.
When is and
of
is finite, then is
finite.
When is and
of
is countably infinite and either of the following
conditions are true, then is
finite; else
is countably infinite:
one of , ,
is among ,
all of the following are true:
one of or
is among
one of or
is among
either of the following are true:
is among
is one of ,
, , ,
or or any type
from them
When the is
, is as specified in the
table in . Otherwise, when
the &owners.Diff; is atomic,
is countably infinite unless any of the following
conditions are true, in which case is
finite:
the &owners.Diff;'s
is finite,
at least one of , ,
or is a member of the
&owners.Diff; set,
all of the following are true:
one of or
is a member of the &owners.Diff; set
one of or
is a member of the &owners.Diff; set
either of the following are true:
is a member of the &owners.Diff; set
is one of ,
, , ,
or
When the
parent'sowner's is list,
if or both of
and
are among members of the
&owners.Diff; set
and the &owners.Diff;'s
is finite
then is finite;
elseotherwise is countably infinite.
When the &owners.Diff; is union,
if 's is finite
for every member of the &owners.Diff; set then
is finite,
elseotherwise
is countably infinite.
numeric
A datatype is said to be
numeric
if and only if
its values are conceptually quantities (in some
mathematical number system).
A datatype whose values
are not is said to be
non-numeric.
provides for:
indicating whether a is
Some value spaces are made up of things that
are generally consideredconceptuallynumeric, others are
not. The numeric facet value indicates which are
considered numeric.
The numeric Schema Component
depends on the &owners.Diff;,
, and
in the component
in which a component appears as a member of
.
When the is , is as specified in the
table in . Otherwise, when the &owners.Diff; is atomic,
is inherited from
the &owners.Diff;'s of .
For all types
is as specified in the table in .
When the &owners.Diff; is list,
is false.
When the
&owners.Diff; is union,
if 's is true
for every member of the &owners.Diff; set then
is true,
elseotherwise is false.
Constraining Facets
&Cfacet;s
are schema components whose values may be set or changed
during derivation (subject to facet-specific controls)
to control various aspects of the derived datatype. For example,
is a .
&Cfacet;s are given a value as part of
the derivation
defining a datatype; a few
constraining facets have default values
that are also provided for datatypes.
Schema components are identified by kind. Constraining
is not a kind of component. Each kind of
(whiteSpace,
length, etc.) is a separate kind of schema component.
length
RQ-147b (phase out length facet)
The WG is considering the ramifications of removing the length &cfacet;, letting the schema document elements that currently set that
facet set both minLength and maxLength instead.
length is the number
of units of length, where units of length
varies depending on the type that is being from.
The value of
length be a
.
For and datatypes from ,
length is measured in units of
characters as defined in .
For , length is measured in units of
characters (as for ).
For and and datatypes from them,
length is measured in octets (8 bits) of binary data.
For datatypes by ,
length is measured in number of list items.
For and datatypes from ,
length will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for length
and in attempting to infer storage requirements from a given value for
length.
provides for:
Constraining a
to values with a specific number of units of length,
where units of length
varies depending on .
The following is the definition of a
datatype to represent product codes which must be
exactly 8 characters in length. By fixing the value of the
length facet we ensure that types derived from productCode can
change or set the values of other facets, such as pattern, but
cannot change the length.
]]>
The length Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of length Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
length Validation Rules
Length Valid
A value in a is facet-valid with
respect to , determined as follows:if and only if:
if the is then
if is or , then the length of the value,
as measured in
characters
be equal to ;
if is or , then the length of
the value, as measured in octets of the binary data,
be
equal to ;
if is or , then any is facet-valid.
if the is ,
then the length of the value, as measured in list items,
be
equal to
The use of
on datatypes from and
is deprecated. Future versions of this
specification may remove this facet for these datatypes.
Constraints on length Schema Components
length and minLength or maxLength
If is a member of then
It is an error for to be a member of
unless
the of <= the of and
there is type definition from which this one is derived by
one or more restriction steps in which has the same
and is not specified.
It is an error for to be a member of
unless
the of <= the of and
there is type definition from which this one is derived by
one or more restriction steps in which has the same
and is not specified.
length valid restriction
It is an if
is among the members of of
and is
not equal to the of the parent
.
minLength
minLength is
the minimum number of units of length, where
units of length varies depending on the type that is being
from.
The value of minLength be a .
For and datatypes from ,
minLength is measured in units of
characters as defined in .
For and and datatypes from them,
minLength is measured in octets (8 bits) of binary data.
For datatypes by ,
minLength is measured in number of list items.
For and datatypes from ,
minLength will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for minLength
and in attempting to infer storage requirements from a given value for
minLength.
provides for:
Constraining a
to values with at least a specific number of units of length,
where units of length
varies depending on .
The following is the definition of a
datatype which requires strings to have at least one character (i.e.,
the empty string is not in the
of this datatype).
]]>
The minLength Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of minLength Schema Component
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
minLength Validation Rules
minLength Valid
A value in a is facet-valid with
respect to , determined as follows:
if the is then
if is or
, then the
length of the value, as measured in
characters
be greater than or equal to
;
if is or , then the
length of the value, as measured in octets of the binary data,
be greater than or equal to
;
if is or , then
any is facet-valid.
if the is ,
then the length of the value, as measured
in list items, be greater than or equal
to
The use of
on datatypes from and
is deprecated. Future versions of this
specification may remove this facet for these datatypes.
Constraints on minLength Schema Components
minLength <= maxLength
If both and
are members of , then the
of be less than or equal to the
of .
minLength valid restriction
It is an if
is among the members of of
and is
less than the of the parent
.
maxLength
maxLength is
the maximum number of units of length, where
units of length varies
depending on the type that is being from.
The value of maxLength be a .
For and datatypes from ,
maxLength is measured in units of
characters as defined in .
For and and datatypes from them,
maxLength is measured in octets (8 bits) of binary data.
For datatypes by ,
maxLength is measured in number of list items.
For and datatypes from ,
maxLength will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for maxLength
and in attempting to infer storage requirements from a given value for
maxLength.
provides for:
Constraining a
to values with at most a specific number of units of length,
where units of length
varies depending on .
The following is the definition of a
datatype which might be used to accept form input with an upper limit
to the number of characters that are acceptable.
]]>
The maxLength Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of maxLength Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
maxLength Validation Rules
maxLength Valid
A value in a is facet-valid with
respect to , determined as follows:
if the is then
if is or
, then the
length of the value, as measured in
characters
be less than or equal to
;
if is or , then the
length of the value, as measured in octets of the binary data,
be less than or equal to ;
if is or , then
any is facet-valid.
if the is ,
then the length of the value, as measured
in list items, be less than or equal to
The use of
on datatypes from and
is deprecated. Future versions of this
specification may remove this facet for these datatypes.
Constraints on maxLength Schema Components
maxLength valid restriction
It is an if
is among the members of of
and is
greater than the of the parent
.
pattern
pattern is a constraint on the
of a datatype which is achieved by
constraining the to literals
which match a specific pattern. The value of pattern be a .
provides for:
Constraining a
to values that are denoted by literals which match a specific
.
The following is the definition of a
datatype which is a better representation of postal codes in the
United States, by limiting strings to those which are matched by
a specific .
]]>
The pattern Schema Component
XML Representation of pattern Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be a valid
.
The &v-value; of the value &i-attribute;
The annotations corresponding to all the
element information items in the &i-children;, if any.
Constraints on XML Representation of pattern
Multiple patterns
If multiple element information items appear as
&i-children; of a , the &i-value;s should
be combined as if they appeared in a single
as separate
es.
It is a consequence of the schema representation constraint
and of the rules for
that
facets specified on the same step in a type
derivation are ORed together, while
facets specified on different steps of a type derivation
are ANDed together.
Thus, to impose two constraints simultaneously,
schema authors may either write a single which
expresses the intersection of the two s they wish to
impose, or define each on a separate type derivation
step.
pattern Validation Rules
pattern valid
A literal in a is facet-valid with
respect to
if and only if:
the literal is among the set of character sequences denoted by
the specified in .
enumeration
enumeration constrains the
to a specified set of values.
enumeration does not impose an order relation on the
it creates; the value of the
property of the
datatype remains that of the datatype from which it is
.
provides for:
Constraining a
to a specified set of values.
The following example is a datatype definition for a
datatype which limits the values
of dates to the three US holidays enumerated. This datatype
definition would appear in a schema authored by an "end-user" and
shows how to define a datatype by enumerating the values in its
. The enumerated values must be
type-valid literals for the .
some US holidaysNew Year's day4th of JulyChristmas
]]>
The enumeration Schema Component
XML Representation of enumeration Schema Components
The XML representation for an schema
component is an element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be
in the of .
The &v-value; of the value &i-attribute;
The annotations corresponding to all the
element information items in the &i-children;, if any.
Constraints on XML Representation of enumeration
Multiple enumerations
If multiple element information items appear
as &i-children; of a the
of the
component should be the set of all such &i-value;s.
enumeration Validation Rules
enumeration valid
A value in a is facet-valid with
respect to
if and only if
the value is one of the values specified in
Constraints on enumeration Schema Components
enumeration valid restriction
It is an if any member of is not in the
of .
whiteSpace
whiteSpace constrains the
of types from such that
the various behaviors
specified in Attribute Value Normalization
in are realized. The value of
whiteSpace must be one of {preserve, replace, collapse}.
No normalization is done, the value is not changed (this is the
behavior required by for element content)
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return)
are replaced with #x20 (space)
After the processing implied by replace, contiguous
sequences of #x20's are collapsed to a single #x20, and leading and
trailing #x20's are removed.
The notation #xA used here (and elsewhere in this specification) represents
the Universal Character Set (UCS) code point hexadecimal A (line feed), which is denoted by
U+000A. This notation is to be distinguished from 
,
which is the XML character reference
to that same UCS code point.
whiteSpace is applicable to all and
datatypes. For all
datatypes other than (and types
by from it) the value of whiteSpace is
collapse and cannot be changed by a schema author; for
the value of whiteSpace is
preserve; for any type by
from
the value of whiteSpace can
be any of the three legal values. For all datatypes
by the
value of whiteSpace is collapse and cannot
be changed by a schema author. For all datatypes
by whiteSpace does not apply directly; however, the
normalization behavior of types is controlled by
the value of whiteSpace on that one of the
against which the
is successfully validated.
For more information on whiteSpace, see the
discussion on white space normalization in
Schema Component Details
in .
provides for:
Constraining a according to
the white space normalization rules.
The following example is the datatype definition for
the
datatype.
]]>
The whiteSpace Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of whiteSpace Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
whiteSpace Validation Rules
There are no s associated .
For more information, see the
discussion on white space normalization in
Schema Component Details
in .
Constraints on whiteSpace Schema Components
whiteSpace valid restriction
It is an if
is among the members of of
and any of the following conditions is
true:
is replace or preserve
and the of the parent
is collapse
is preserve
and the of the parent
is replace
maxInclusive
maxInclusive is the inclusive upper
boundinclusive upper bound of the
for a datatype with the property. The value of
maxInclusive
be equal to some value
in the of the .
provides for:
Constraining a to values with a
specific inclusive upper boundinclusive upper
bound.
The following is the definition of a
datatype which limits values to integers less than or equal to
100, using .
]]>
The maxInclusive Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of maxInclusive Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be equal to some value
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
maxInclusive Validation Rules
maxInclusive Valid
A value v
in an
is facet-valid with respect to
,
determined as followsif
and only if one of the following is true:
if the
numeric property in
property
of the component in
is true, then the value
be
numerically less than or
equal to ;
if the
numeric property in
property
of the component in
is false (i.e.,
is one of the date and time related
datatypes), then the value
be
chronologically
less than or equal to ;
The component in
has a of true,
and
v is
numerically less than or equal to .
The component in
has a of false
(i.e., is
one of the date and time related datatypes), and v
is chronologically less than or equal to .
Constraints on maxInclusive Schema Components
minInclusive <=≤ maxInclusive
It is an for the value specified for
to be greater than the value
specified for for the same datatype.
maxInclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
greater than the of
the parentthat.
is among the members of
of
and is
greater than or equal to the of
the parentthat.
is among the members of
of
and is
less than the of
the parentthat.
is among the members of
of
and is
less than or equal to the of
the parentthat.
maxExclusive
maxExclusive is the exclusive upper
boundexclusive upper bound of the
for a datatype with the property. The value of
maxExclusive
be equal to some value
in the
of the or
be equal to in .
provides for:
Constraining a to values with a
specific exclusive upper boundexclusive upper
bound.
The following is the definition of a
datatype which limits values to integers less than or equal to
100, using .
]]>
Note that the
of this datatype is identical to
the previous one (named 'one-hundred-or-less').
The maxExclusive Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of maxExclusive Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be equal to some value
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
maxExclusive Validation Rules
maxExclusive Valid
A value v
in an
is facet-valid with respect to
, determined
as followsif
and only if one of the following is true:
if the
numeric property in
property
of the component in
is true, then the
value be numerically less than
;
if the
numeric property in
property
of the component in
is false (i.e.,
is one of the date and time related
datatypes), then the value be chronologically
less than ;
The component in
has a of true,
and
v is numerically less than
.
The component in
has a of false (i.e.,
is one of the date and time related
datatypes), and v is chronologically
less than .
Constraints on maxExclusive Schema Components
maxInclusive and maxExclusive
It is an for both
and
to be specified in the same derivation step of a datatype definition.
minExclusive <= maxExclusive
It is an for the value specified for
to be greater than the value
specified for for the same datatype.
maxExclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
greater than the of
the parentthat.
is among the members of
of
and is
greater than the of
the parentthat.
is among the members of
of
and is
less than or equal to the of
the parentthat.
is among the members of
of
and is
less than or equal to the of
the parentthat.
minExclusive
minExclusive is the exclusive lower
boundexclusive lower bound of the
for a datatype with the property. The value of
minExclusive
be equal to some value
in the
of the or
be equal to in .
provides for:
Constraining a to values with a
specific exclusive lower
boundexclusive lower bound.
The following is the definition of a
datatype which limits values to integers greater than or equal to
100, using .
]]>
Note that the
of this datatype is identical to the
previousfollowing
one (named 'one-hundred-or-more').
The minExclusive Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of minExclusive Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be equal to some value
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
minExclusive Validation Rules
minExclusive Valid
A value in an
is facet-valid with respect to
if and only if:
if the
numeric property in
property
of the component in
is true, then the
value
be
numerically greater than
;
if the
numeric property in
property
of the component in
is false (i.e.,
is one of the date and time related
datatypes), then the value
be
chronologically
greater than ;
The component in
has a of true,
and the value is numerically greater than
.
The component in
has a of false (i.e.,
is one of the date
and time related datatypes), and the value is chronologically greater
than .
Constraints on minExclusive Schema Components
minInclusive and minExclusive
It is an for both
and
to be specified for the same datatype.
minExclusive < maxInclusive
It is an for the value specified for
to be greater than or equal to the value
specified for for the same datatype.
minExclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
less than the of
the parentthat.
is among the members of
of
and is
greater the of
the parentthat.
is among the members of
of
and is
less than the of
the parentthat.
is among the members of
of
and is
greater than or equal to the of
the parentthat.
minInclusive
minInclusive is the inclusive lower
boundinclusive lower bound of the
for a datatype with the property. The value of
minInclusive
be equal to some value
in the
of the .
provides for:
Constraining a to values with a
specific inclusive lower boundinclusive lower
bound.
The following is the definition of a
datatype which limits values to integers greater than or equal to
100, using .
]]>
The minInclusive Schema Component
If is true, then types for which
the current type is the cannot specify a
value for other than .
XML Representation of minInclusive Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
be equal to some value
in the of .
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
minInclusive Validation Rules
minInclusive Valid
A value in an
is facet-valid with respect to
if and only if:
if the
numeric property in
property
of the component in
is true, then the
value
be
numerically greater than or equal to
;
if the
numeric property in
property
of the component in
is false (i.e.,
is one of the date and time related
datatypes), then the value
be
chronologically
greater than or equal to ;
Constraints on minInclusive Schema Components
minInclusive < maxExclusive
It is an for the value specified for
to be greater than or equal to the value
specified for for the same datatype.
minInclusive valid restriction
It is an if any of the following conditions
is true:
is among the members of
of
and is
less than the of
the parentthat.
is among the members of
of
and is
greater the of
the parentthat.
is among the members of
of
and is
less than or equal to the of
the parentthat.
is among the members of
of
and is
greater than or equal to the of
the parentthat.
totalDigits
totalDigits controls the maximum number of values in the
of datatypes from , by restricting it to
numbers that are expressible as i × 10^-n where i and n are integers such that
|i| < 10^totalDigits and 0
<= n <= totalDigits. The value of
totalDigits be a .
totalDigits
restricts the magnitude and of values in the
value spaces
of and
and datatypes derived from them.
The effect must be described separately for the two primitive types.
For ,
if the of is
t, the effect is to require that values be equal to
i × 10ni / 10n, for some
integers i and n, with
| i | < 10t
and
0 ≤ n ≤ t.
This has as a consequence that the values are expressible
using at most t digits in decimal notation.
For , values with of
nV and
of aP, if the
of is
t, the effect is to require that (aP + 1 +
log10(| nV |)
1) ≤ t, for values other than
zero, NaN, and the infinities. This means in effect that values are
expressible in scientific notation
using at most t digits for the coefficient.
The of
&must; be
a .
The term totalDigits is chosen to reflect the fact that
it restricts the to those values that
can be represented lexically using at most
totalDigits digits in
decimal notation, or at most totalDigits digits
for the coefficient, in scientific notation.
Note that it does not restrict
the directly; a lexical
representation that adds
additional leading zero digits or trailing
fractionalnon-significant
leading or trailing
zero digits is still permitted.
It also has no effect on the values
NaN, INF, and -INF.
The totalDigits Schema Component
If is true, then types for which
the current type is the cannot&must; not specify a
value for other than
.
XML Representation of totalDigits Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
totalDigits Validation Rules
totalDigits Valid
A value in a is facet-valid with respect to
if and only if:
that value is expressible as i × 10^-n where
i and n are integers such that
|i| < 10^ and
0 <= n <= .
A value v
is facet-valid with respect to a facet with
a of t if and only
if one of the following is true:
v is a value with
of
NaN, INF,
-INFpositiveInfinity,
negativeInfinity, notANumber,
or zero.
v is a value
with of nV and of aP, and
nVv
is not NaN, INF, -INF, or zero,
and (aP + 1 +
log10(| nV |)
1) ≤ t.
v is a value equal to
i × 10ni / 10n,
for some
integers i and n, with
| i | < 10t
and
0 ≤ n ≤ t.
Constraints on totalDigits Schema Components
totalDigits valid restriction
It is an if
is among the members of
of
and is
greater than the of
the parent
.
It is an if the owner's
has a facet
among its
and
is
greater than the of
that facet.
fractionDigits
fractionDigitscontrols the size of the minimum
difference between values in the of
datatypes from decimal, by
restricting the to numbers that are
expressible as i × 10^-n where i and n are integers and 0 <= n <= fractionDigits.places an upper limit on the of values: if the of
fractionDigits = f, then the value space is
restricted to values equal to
i / 10n for some integers
i and
n and
0 ≤ n ≤ f.
The value of
fractionDigits be a
The term fractionDigits is chosen to reflect the fact that it
restricts the to those values that can be
represented lexically
in decimal notation using at most
fractionDigits
to the right of the decimal point. Note that it does not restrict
the directly; a
non-lexical representation that adds
additional
leading zero digits or non-significant trailing
fractionalnon-significant
leading or trailing zero digits is still permitted.
The following is the definition of a
datatype which could be used to represent the magnitude
of a person's body temperature on the Celsius scale.
This definition would appear in a schema authored by an "end-user"
and shows how to define a datatype by specifying facet values which
constrain the range of the .
]]>
The fractionDigits Schema Component
If is true, then
types for which the current type is the cannot&must; not
specify a value for other
than .
XML Representation of fractionDigits Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
fractionDigits Validation Rules
fractionDigits Valid
A value in a
is facet-valid with
respect to
if and only if
that value is expressible as i × 10^-n where
i and n
are integers and 0 <= n <= .
that value is equal to
i / 10n for integer
i and
n, with
0 ≤ n ≤ .
Constraints on fractionDigits Schema Components
fractionDigits less than or equal to totalDigits
It is an for
the of
to be greater than that of
.
fractionDigits valid restriction
It is an if is among the members of of
and
is greater than the of
the parentthat.
maxScale
maxScale places an upper limit on the of values: if the
of maxScale =
m, then only values with
≤ m
are retained in
the .
As a consequence, every value in the value space will have
equal to
i / 10n for some integers
i and n, with
n ≤ m.
The of
must be an .
If it is negative, the numeric values of the datatype are
restricted to multiples of 10 (or 100, or …).
The term maxScale is chosen to reflect the fact that it
restricts the to those values that can
be represented lexically in scientific notation using an integer
coefficient and a scale (or negative exponent) no greater than
. (It has nothing to do with the use of the
term scale to denote the radix or base of a
notation.) Note that does not restrict the
directly; a lexical representation
that adds non-significant leading or trailing zero digits, or that uses
a lower exponent with a non-integer coefficient is still permitted.
The following is the definition of a user-defined
datatype which could be used to represent a floating-point decimal
datatype which allows seven decimal digits for the coefficient and
exponents between −95 and 96. Note that the scale is −1 times
the exponent.
]]>
The maxScale Schema Component
If is true, then
types for which the current type is the
&must; not specify a value for other
than .
XML Representation of maxScale Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
maxScale Validation Rules
maxScale Valid
A value v is facet-valid with
respect to if and only if one of the
following is true:
v has
less than or equal to the
of .
The of v
is absent.
Constraints on maxScale Schema Components
maxScale valid restriction
It is an if
is among the members of of
and
is greater than the of that .
minScale
minScale places a lower limit on
the of values.
If the of minScale
is m, then the value space is restricted to values with
≥ m.
As a consequence, every value in the value space will have
equal to
i / 10n for some integers
i and n,
with n ≥ m.
The term minScale is chosen to reflect the fact that it
restricts the to those values that can
be represented lexically in exponential form using an integer
coefficient and a scale (negative exponent)
at least as large as minScale. Note that
it does not restrict the directly; a
lexical representation that adds additional leading zero digits,
or that uses a larger exponent (and a correspondingly smaller coefficient)
is still permitted.
The following is the definition of a user-defined
datatype which could be used to represent amounts in a decimal
currency; it corresponds to a SQL column definition of
DECIMAL(8,2). The effect is to allow values
between -999,999.99 and 999,999.99, with a fixed interval
of 0.01 between values.
]]>
The minScale Schema Component
If is true, then types for which
the current type is the
&must; not specify a
value for other than
.
XML Representation of minScale Schema Components
The XML representation for a schema
component is a element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:
The &v-value; of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
minScale Validation Rules
minScale Valid
A value v is facet-valid with
respect to
if and only if one of the following is true:
v has
greater than or equal to
the
of .
The of v
is absent.
Constraints on minScale Schema Components
minScale less than or equal to maxScale
It is an for to
be greater than .
Note that it is not an error for to
be greater than .
minScale valid restriction
It is an if
is among the members of of
and
is less than the of that .
lexicalMappings
The facet is new; like the datatype , it has been added in version 1.1 of this specification.
The XML Schema Working Group is interested in feedback from users,
schema authors, and software developers on whether it is useful and
should be retained, or not.
It has been suggested that the facet be
made applicable also to other numeric types (,
, , and datatypes derived
from them); the Working Group is also interested in hearing the
community's views on this question.
The lexicalMappings facet
restricts the of datatypes in a controlled way. When the lexical
space is constrained using facets, it is possible
to produce datatypes for which some values have no canonical lexical
representations in the lexical space; when the
lexicalMappings facet is used, the
is automatically adjusted
appropriately.
The facet does not restrict the
directly; but if
scientific is not among the s in the
value, then the may be diminished. For
example, some values have lexical representations only
in scientific notation. If nodecimal is the only present
then only &integer; values with a
of zero have lexical
representations and
are hence in the .
A value v can be serialized
successfully for a datatype constrained by
if any of the following are true:
scientific is a member of , or
decimal is a member of
and v's
is not less than zero, or
v's
is equal to zero or is absent.
The lexicalMappings Schema Component
controls the usability of the
three partial lexical
mappings, , , and , by restricting
the .
If is true, then types
for which the current type is the
&must; not specify a value for
other than .
XML Representation of lexicalMappings Schema Components
The XML representation for a schema component is
a
element information item. The correspondences between the properties
of the information item and properties of the component are as
follows:
The set of s named in the list which is the actual value of the value &i-attribute;
The &v-value; of the fixed &i-attribute;, if present, otherwise false
The annotations corresponding to all the
element information items in the &i-children;, if any.
lexicalMappings Validation Rules
Lexical Representation Valid against lexicalMappings facet
A L is
facet-valid
with respect to if
and only if
any one of
the following is true:
nodecimal is a member of and L is a ,
or
decimal is a member of
and L is a , or
scientific is a member of and L is a ,
or
L is a .
Constraints on lexicalMappings Schema Components
lexicalMappings valid restriction
It is an if
is not a subset
of the of the
among the
of the
.
lexicalMappings Schema Component Inheritance
If during a derivation a new
is not prescribed, the new's has the same property values
as the parent's
. In any case, there is exactly one
facet in the
set
of each
from .
Derivation and the Derivation Hierarchy
Derivation is a simple concept.
A
is immediately derived from another if its
is the other.A
is derived from another if it is
from the other or (recursively) from a third
that is itself
from the other.
The constraints on the various facets and the explicit descriptions of
immediate derivations for
the s defined in this specification insure that
derivation puts the s
in a tree-structured hierarchy. This in turn induces a matching hierarchy
on the datatypes selected/defined by s. (A datatype
is derived from another if its corresponding is derived
from that of the other.)
The hierarchy is a subtree of the
type hierarchy of Simple and Complex Types described
in Part 1 of this specification. In the context
of that larger hierarchy, all derivations of s are by
restriction.
Various Kinds of s
The built-ins
are those defined in
,
and are a priori in every schema; they form the base of
the hierarchy. There are three
kinds of s (and
their corresponding datatypes):
and
are the two specials; all others, whether or
not, are ordinary.
The of specials is absent.
s
from the
are primitive.
Other built-ins are those which could have been introduced into schemas by
users (with a different ), but are
cosidered sufficiently useful that they are made
to insure their availability everywhere with common names.
User-defineds (over and above the s) may be added to the hierarchy in any schema.
They may be added in three ways:
List
construction. The result is a whose
is
and whose is list.
Union
construction. The result is a whose
is
and whose is union.
Direct
derivation. The result is a whose
is a and
whose is the same as that of its
. All
directly deriveds have value
spaces and lexical
spaces that are subsets of those of their
s. All
s not
from the or the
are directly
derived from their s.
Ordinarys and their
associated datatypes are atomic, list, or union.
Atomics are
those derived from the ; their
is atomic and their
is not absent.
Lists
are those whose is list; they
are from the
and their is not absent.
Unions
are those whose is union; they
are derived from the and their
is not absent.
In each of the above, ,
, and
are absent
unless otherwise specified.
The Structure of
Ordinary Datatypes
The most basic datatypes are . These
begin with the datatypes. Their
value spaces are all disjoint
(at least for schema processing purposes) and are defined in this
specification. From these are
all of the other datatypes. (Note that
is not itself .)
Each datatype has a
consisting of sequences
(lists) of values from an datatype (the
item type); lexical
representations are space-separated representations of values
from the item type. These begin with
explicitly constructed
lists (which are
from ). From these are
all of the other
datatypes.
Only values from the item type which have
lexical
representations without internal whitespace
can be in the lists of values, since otherwise the sequence
of values in which they might occur would have no
.
Similarly, each datatype
has a consisting of
values from any and all of a sequence of
datatypes (the member types). These begin
with explicitly constructed
unions (which are
from ). From these
are all of the other
datatypes.
&String;s that are lexical
representations of values from more than one member type are
mapped to the represented value from the first such datatype in the
sequence of member types. Other values whose
lexical representations
are all intercepted in this way can only be represented
in elements by using an xsi:type attribute.
The Shape of the Hierarchy
Auxiliary Components
Conformance
This specification describes two levels of conformance for
datatype processors. The first is
required of all processors. Support for the other will depend on the
application environments for which the processor is intended.
Minimally conforming processors
completely and correctly implement the and
.
Processors which accept schemas in the form of XML documents as described
in (and other relevant portions of
) are additionally said to provide
conformance to the XML Representation of Schemas,
and , when processing schema documents, completely and
correctly implement all
s
in this specification, and adhere exactly to the
specifications in (and other relevant portions of
) for mapping
the contents of such
documents to schema components
for use in validation.
By separating the conformance requirements relating to the concrete
syntax of XML schema documents, this specification admits processors
which validate using schemas stored in optimized binary representations,
dynamically created schemas represented as programming language data
structures, or implementations in which particular schemas are compiled
into executable code such as C or Java. Such processors can be said to
be minimally conforming
but not necessarily in conformance to
the XML Representation of Schemas.
Schema for Datatype Definitions (normative)
DTD for Datatype Definitions (non-normative)
Temporary Stuff (to be added elsewhere)
All
processors support year values
with a minimum of 4 digits (i.e.,
YYYY) and a minimum fractional second precision of
milliseconds or three decimal digits (i.e. s.sss).
However, processors
set an application-defined limit on the maximum number
of digits they are prepared to support in these two cases, in which
case that application-defined maximum number
be clearly documented.
Derived
datatypes are those that are
defined in terms of other datatypes.Derived
datatypes are and
.
Built-up Value Spaces
Some datatypes, such as , describe well-known mathematically abstract
systems. Others, such as the date/time datatypes, describe real-life,
applied systems. Certain
of the systems described by datatypes, both abstract and
applied, have values in their value spaces most easily described as things having several properties, which in turn have values which are
in some sense primitive or are from the value spaces of simpler datatypes.
In this document, the arguments to functions are assumed to be call by
value unless explicitly noted to the contrary, meaning that if the argument is modified
during the processing of the algorithm, that modification is not reflected in the
outside world. On the other hand, the arguments to procedures are assumed
to be call by location, meaning that modifications are so reflected,
since that is the only way the processing of the algorithm can have any effect.
Properties always have values. An optional
property is permitted but not required to have the special
value absent.
Those values that are more primitive, and are used (among other things) herein to
construct object value spaces but which we do not explicitly define are described here:
A number (without precision) is an
ordinary mathematical number; see for a discussion of
ordinary versus precision-carrying numbers. The numbers generally used in describing datatypes are &decimal;s and &integer;s.
An enumerated constant is an
undefined thing whose only property is that it is unequal to any other
constants and to any member of any defined datatype.
(There are a few
constants which are specified by name to be members of the value space of more than
one primitive datatype. Such constants are differentiated by their name and
associated datatype; this is because members of the value space of distinct primitive
datatypes are always distinct. Apart from that, constants are differentiated one
from the other by their name. They have no other inherent properties; their effect is defined
in the context in which they occur. Examples of constants are positiveInfinity and absent.
Numerical Values
The following standard operators are defined here in case the reader is unsure of their definition:
If m and n are numbers, then
mdivn is the greatest integer in
m / n .
If m and n are numbers, then
mmodn is
(m / n) − ( mn) .
n 1 is a convenient and short way of
expressing the greatest integer in n.
Precision
Numbers are sometimes thought of as including both a numerical value and a
precision. Precision can be thought of as a band
plus or minus from the numerical value itself. For
example, five plus-or-minus two or two million to the
nearest thousand.
There is a smaller class of precision numbers which do
not require the plus-or-minus in order to indicate their
precision. They indicate their precision by the number of digits to the
right of the decimal point. 5.0 has precision plus-or-minus 0.05, but
5.00 has precision plus-or-minus 0.005.
There is also a kind of precision
where the plus-or-minus is expressed as a percentage (or other proportion) of
the numerical value, rather than an exact value: 15 plus-or-minus
10 percent or 15000 plus-or-minus 10 percent, where the
same percentage indicates a different absolute precision depending on the
size. This kind of precision is properly called geometric
precision; the absolute precision first described is properly called
arithmetic precision.
A close approximation to geometric precision also can, for some combinations
of numerical value and precision, be indicated without the
plus-or-minus: The precision is indicated by the total
number of digits (not counting leading zero digits). 5.0 has precision
plus-or-minus 1 percent but 5.00 has precision plus-or-minus one-tenth percent.
Geometric precision doesn't quite match with the digit count. 5.0 and 50
both have precision plus-or-minus 1 percent but 1.5 and 15 both have precision
plus-or-minus 3 percent. For various reasons we choose to call this digit-count
precision floating-point precision.
The datatype described in this specification
embodies both arithmetic and floating-point precision for numbers whose numerical
values are &decimal;s, with arithmetic precision describable simply
by the number of fraction digits. It turns out that for these particular precision
numbers, there is a relation between the arithmetic precision (expressed as the
number of fraction digits) and floating-point precision (expressed as the total
number of digits, excluding redundant leading zero digits). If a
is the arithmetic precision of a number whose
numerical value is n, then the floating-point precision is
(log(| n − 10a |) + 1) 1 .
This formula, of course, doesn't work for numerical value zero. In that case, we find it convenient (and
consonant with established practice) to freeze floating-point precision at 1 and still allow various
arithmetic precision values.
One point needs to be made about the notations and the precisions they can
indicate. It's impossible for ordinary decimal notation to indicate a
positive arithmetic precision (as in one million to the nearest thousand);
this needs scientific notation: 1000E3 (or 1.000E6).
Exact Lexical Mappings
Numerals and Fragments Thereof
digit0-9unsignedNoDecimalPtNumeral+signedDecimalPtNumeral(+ | -) noDecimalPtNumeral(+ | -)? fracFrag+unsignedDecimalPtNumeral(.?) | (.)unsignedFullDecimalPtNumeral.decimalPtNumeral(+ | -)? unsignedScientificNotationNumeral( | ) (e | E) scientificNotationNumeral(+ | -)?
Some numerical datatypes include some or all of three non-numerical values:
positiveInfinity, negativeInfinity, and notANumber. Their lexical spaces
include non-numeral lexical representations for these non-numeric values:
Special Non-numerical Lexical Representations Used With Numerical Datatypes
numericalSpecialRepINF | +INF | -INF | NaN
Date/time Values
RQ-122 (define
dateTime value space)
Much of the material defining the various date/time datatypes is
found here and is or will be referenced in the sections defining each
individual date/time datatype. See e.g. .
There are several different primitive but
related datatypes defined in the specification which pertain to
various combinations of dates and times, and parts thereof. They
all use related value-space models, which are described in detail in
this section. It is not difficult for a casual reader of the
descriptions of the individual datatypes elsewhere in this
specification to misunderstand some of the details of just what the
datatypes are intended to represent, so more detail is presented here
in this section.
All of the value spaces for dates and times
described here represent moments or periods of time in Universal
Coordinated Time (UTC).
Universal
Coordinated Time (UTC)
is an adaptation of TAI which closely approximates UT1 by adding
leap-seconds to selected
days.
A
leap-second is an additional second added
to the last day of December, June, October, or March,
when such an adjustment is deemed necessary by the
International Earth Rotation and Reference Systems Service
in order to keep within 0.9 seconds
of observed astronomical time. When leap seconds are
introduced, the last minute in the day has more than
sixty seconds.
In theory leap seconds can also be removed from a
day, but this has not yet occurred.
Dates and Times in the Real World
Except for the tables of lengths of months
and occurrences of leap-seconds, this section is informative, not
normative.
There are various concepts involving dates (counting days) and
times (counting moments) that have developed over the millenia.
This section does not pretend to be a complete tutorial on the
history; it only discusses the methods which are necessary to
understand just which set of the possible reasonable choices has been
adopted for Schema date/time datatypes.
Seconds, Minutes, and Days
A day is, at least approximately, the time of one rotation of the
Earth about its axis with respect to the Sun. Each day is
divided into 24 hours; each hour into 60 minutes, and each minute
usually into 60 seconds. (The hedges in those
sentences are deliberate, and their resolution shows why one must be
careful to insure that all users of Schema date/time datatypes are in
fact correctly using the same datatype.) For the purposes of
this section and the next, a day always begins
and endscenters
(i.e., its noon is) when the rotation of the Earth about its axis
places the Sun exactly (at least for UT1, and approximately for the
others) overhead (at its zenith)
at 0 degrees longitudelocally.
Thus a day is (usually) 86400
(= 60 × 60 × 24) seconds.
Universal Time 1
(UT1) is real time: One day is (exactly,
or at least as close as can be astronomically measured) one revolution
of the Earth about its axis with respect to the Sun. The day is
divided into 86400 equal-length seconds, which may vary in length from
day to day.International Atomic Time (TAI or
Temps Atomique International) is time measured in seconds
as established by a collection of atomic clocks maintained by various
national standards agencies. The time counts that
Schema has chosen to represent are based on : Universal Coordinated Time
(UTC) is an adaptation of TAI which closely approximates
UT1 by adding leap-seconds to
selected days. Relations
between them are as follows:
TAI seconds are all the same length, and there are exactly
86400 seconds in each day.
UT1 seconds vary in length, but there are exactly 86400
seconds each day. Days always have the sun at zenith at noon in
Greenwich, England. (As a historical note, the TAI second,
defined in 1956 in terms of the excitation frequency of Cesium atoms,
was chosen to be the average length of a UT1 second during the year
1900.)
Noon of TAI days do not necessarily match the Sun at the
zenith. In 1958, TAI was promulgated and synchronized with UT1.
Since then, the difference has been slowly increasing, with a given
number of seconds from that date measured in UT1 coming later than
that same number measured in TAI.
seconds are the same as TAI
seconds, but day boundaries are kept approximately in sync
with UT1 by adding an extra leap-second or so to a day
once in a while; therefore occasionally a day is not exactly 86400 seconds. In
1972,
was synchronized with TAI (and UT1) to lock them all together retroactively to the date when TAI was synchronized with
UT1. is now kept
within 0.9 seconds of UT1 by an international standards organization
which declares on an ad hoc basis when additional
leap-seconds are added (or subtracted, although the physical
situations that might require substraction seem unlikely to
occur). As of 2003, the difference between the two is 32
seconds. New leap-seconds are always added immediately preceding
midnight (when the Earth's rotation puts the Sun opposite the noon
zenith) at 0 degrees longitude (i.e., midnight in the timezone so
determined).
As of the writing of this specification, leap-seconds have been added
to at the end of each of the
following days (as identified by the Gregorian calendar, see ), and no future leap-seconds have been
announced:
Date
Number of Leap-seconds
Date
Number of Leap-seconds
1960-12-31
1.422818
1975-12-31
1
1961-07-31
0.224752
1976-12-31
1
1961-01-31
0.198288
1977-12-31
1
1963-10-30
0.8514208
1978-12-31
1
1963-12-31
0.0685152
1989-12-31
1
1964-03-31
0.217936
1981-06-30
1
1964-08-31
0.298288
1982-06-30
1
1964-01-31
0.258112
1983-06-30
1
1965-02-28
0.176464
1985-06-30
1
1965-06-30
0.258112
1987-12-31
1
1965-08-31
0.180352
1989-12-31
1
1965-12-31
0.158112
1990-12-31
1
1968-01-31
1.872512
1992-06-30
1
1971-12-31
3.814318
1993-06-30
1
1972-06-30
1
1994-06-30
1
1972-12-31
1
1995-12-31
1
1973-12-31
1
1997-06-30
1
1974-12-31
1
1998-12-31
1
Leap-seconds added prior to 1972-06-30 (when
's first post-adoption leap-second was
added) were inherited from previous standard
times. (Data in the table was derived from data provided by
the US Naval Observatory.)
There are inherently no precise measurements of the difference
between UT1 on the one hand and proleptic (i.e., used to measure times
prior to their adoption) TAI and on
the other before 1958, although they are known (by virtue of early
astornomical records) to differ from UT1 by several hours around year
0000. Users must be aware that they differ, if they deal with
extremely accurate measures over widely separated moments, and must be
sure they know which system is being used.
Schema date/time datatypes (except ) are
leap-second-aware; that is to say, they use rather than UT1 or TAI. is a special case; it is
not leap-second aware, but the algorithm for adding
durations to or subtracting them from other date/time datatypes
compensates.
Counting Days: Years and Months
Once one decides on how many seconds are in each day, one must also
count the days—and months and years. The standard used for
Schema date/time datatypes is the so-called Gregorian
calendar. Since days are (generally) 86400 seconds, and
one wants each year to correspond to one complete cycle of the Earth
around the Sun (which is not exactly a multiple of 86400 seconds), and
traditionally months have various numbers of days, the following
algorithm was chosen to determine which days fell in which months in
which years: Counting from an agreed-upon arbitrary day, years are
numbered consecutively, each year has 12 months (numbered 1 through
12, as well as named) within it, and each day has between 28 and 31
days (also numbered from 1), depending on the month and year according
to the following table:
Month
Nbr of Days
1 (January)
31
2 (February)
If the associated year is divisble by 400, or by 4 but not 100,
then 29; otherwise 28
3 (March)
31
4 (April)
30
5 (May)
31
6 (June)
30
7 (July)
31
8 (August)
31
9 (September)
30
10 (October)
31
11 (November)
30
12 (December)
31
For example, the three numbers (year, month, and day) for 20
January 2003 (2003-01-20) are 2003, 1, and 20 respectively.
RQ-123 (year 0000 in date/time datatypes)
The following rewrite includes allowing year 0000 (1 BCE) and
redefining all the lexical representations with negative years from
that specified in Schema 1.0, as warned in a Note in Schema 1.0
2E. A formal Note calling attention to this change elsewhere in
the "normative" part of this specification will be added.
The count of years, months, and days were made official and locked
to real time by decree of (the Roman Catholic) Pope
Gregory in 1582 (from which comes the name
Gregorian). Since then, and somewhat even before,
days had been counted with reasonable historical accuracy so that the
Gregorian calendar algorithm can even be used proleptically, i.e., to
establish dates prior to its official adoption. By relatively
recent convention (it began to be adopted by astronomers during the
1800s), there is a year numbered zero; this makes calculating the
difference between two dates easier. The year called 1 of
the Common Era (1 CE, or 1 AD)
is numbered one; the preceding year is numbered zero, not minus
one. (Warning: The date using the proleptic Gregorian calendar
will not generally be the same for a given day as the date using the
Julian calendar which was in common use prior to the
adoption of the Gregorian calendar, nor will Gregorian years
before the Common Era (BCE, or
BC) be numbered the same as with the current standard
negative numbering.)
There are also standard schemes for numbering days without
reference to months and years. The most common is the modified Julian date
(MJD), which counts days from 17 Nov 1858
(1858-11-17). The older count is the Julian date (JD), which sets
its zero day exactly 2,400,000.5 days earlier than MJD. (JD
days begin at noon!) Schema, however, counts seconds rather than days
and arbitrarily begins its initial moment at the beginning of 1 Jan 1
CE (0001-01-01), to describe certain functions. (Since a schema
implementation need not expose this count, implementers are free to
use other base moments and/or to count by days, providing they retain
awareness of leap-seconds.)
Note that the JD day-counting scheme is not the same
as the Julian calendar which was supplanted by the
Gregorian calendar described above.
Timezones: When does a Day Start?
All of the preceding discussion applies to real
times at the Greenwich meridian, the meridian
where longitude is 0 degrees. Human society has found it
convenient to have noon all over the globe at least approximately when
the Sun is overhead—and more recently also to have moments numbered
the same in nearby localities, with the differences between separated
localities well-known. Thus the invention of timezones. A
timezone is a way of describing a local time by
specifying the number of hours and minutes which must be added to the
standard time to get the local time. The
standard time is selected to be that where noon is when
the Sun is exactly overhead at 0 degrees longitude; is officially locked to that particular
timezone. Schema date/time datatypes (except ) are timezone-sensitive; that is to say, they retain
knowledge of a timezone if one is specified in a lexical
representation.
A moment in time is like a point on a line; the point does not
change if we change where we put zero on the line, but the number we
use to represent that point changes. Similarly, when one
specifies a moment in time, one can specify the same moment regardless
of which timezone one specifies, but the numbers one uses for year,
month, day, hour, minute, and second will be different.
The Seven-property Model
There are two distinct ways to model moments in time: either
by tracking their year, month, day, hour, minute and second (with
fractional seconds as needed), or by tracking
their time (measured generally in seconds or
days) from some starting moment. Each has
its advantages. The two are isomorphic;
the Gregorian calendar algorithm, modified for
leap-seconds,
is the isomorphism from the first to the
second and is one-to-one. For
definiteness, we choose to model the first
using five &integer; and one &decimal; properties. We superimpose
the second by providing one &decimal;-valued
function which gives the corresponding count of
seconds from zero (the time on the time line).
There is also a seventh property which
specifies the timezone. Values for the
six primary properties are always stored in
, so having
the timezone makes it possible to calculate the corresponding
rawlocal
values, as they would be reckoned in that timezonetheir
local values (the values shown in the lexical
representations), rather than converted to
.
Properties of
Date/time
Seven-property Models
yearan &integer;monthan &integer; between 1 and 12 inclusivedayan &integer; between 1 and
28, 29, 30, or 31 inclusive,
possibly
restricted further depending on
and houran &integer; between 0 and 23 inclusive,
or 24 if both and
are zerominutean &integer; between 0 and 59 inclusiveseconda &decimal; greater than or equal
to 0, less than 60 except
when there is a leap-second at the time
describedas prescribed
in the table of leap-seconds
in to
0 and less than 7061, always subject
to a datatype-dependent leap-second restriction; must be less
than 60 if is absent.timezonean &integer; between −840
and 840 inclusive
The model just described is called herein the
seven-property model for date/time
datatypes. It is used as is
for ; all other date/time
datatypes except use the
same model except that some of the six primary
properties are required to have the
value absent, instead of being required
to have a numerical value. (An
property, like ,
is always permitted
to have the value absent.)
values are limited to 14 hours,
which is 840 (= 60 × 14) minutes.
Leap-seconds are not permitted when is
absent, because the presence
of a leap-second value together with particular
and values determines a
(unique modulo timezones
plus or minus 12 hours) so
it might as well be explicit. (All date/time
datatypes that do not require
to be absent also prohibit
that
value for and
from being absent.)
As of the time this specification was published,
leap-seconds (always one leap-second) have been introduced
by the responsible authorities at the end (in )
of the following days:
1972-06-30
1972-12-31
1973-12-31
1974-12-31
1975-12-31
1976-12-31
1977-12-31
1978-12-31
1989-12-31
1981-06-30
1982-06-30
1983-06-30
1985-06-30
1987-12-31
1989-12-31
1990-12-31
1992-06-30
1993-06-30
1994-06-30
1995-12-31
1997-06-30
1998-12-31
While
calculating, property values from the
1971-12-31T00:00:00 are used to fill in
for those that are absent, except
that if is absent
but is not, the largest permitted
day for that month is used. 1971-12-31T00:00:00
happens to permit both the maximum number
of days and the maximum number of seconds.
While calculating, property values from the
1972-12-31T00:00:00 are used to fill in
for those that are absent, except
that if is absent
but is not, the largest permitted
day for that month is used. 1972-12-31T00:00:00
happens to permit both the maximum number
of days and the maximum number of seconds.
Values from any one date/time datatype using the seven-component
model (all except )
are ordered the same as their values,
except that if one value's
is absent and the other's is not, and using maximum and minimum
values for the one whose
is actually absent
changes the resulting (strict)
inequality, the original two values are incomparable.
Lexical Mappings
Each
lexical representation is made up
of certain date/time fragments, each of which
corresponds to a particular property of the datatype
value. They are defined by
the following productions.
Date/time Lexical Representation Fragments
yearFrag-?
((1-9+)) |
(0))monthFrag(01-9) |
(1010-2)dayFrag(0-20 1-9) | (12) |
(301)hourFrag(01) |
(20-4)minuteFrag0-5secondFrag0-6(0-5) |
60 (.+)?endOfDayFrag24:00:00 (.0+)?timezoneFragZ |
((+ | -) (0 | 10-4) :)
Each fragment other than defines a subset of the
of ;
the corresponding is the
lexical
mapping restricted to that subset. These fragment
lexical
mappings are combined separately for each date/time datatype (other
than ) to make up
the complete lexical
mapping for that datatype. The
mapping is
used to obtain the value of the property,
the mapping is used to obtain the value of the
property, etc. Each datatype
which specifies some properties to be mandatorily
absent also does not permit the corresponding
lexical fragments in its lexical representations.
(The redundancy between Z, +00:00,
and -00:00,
and the possibility of trailing fractional 0
digits for , are the only
redundancies preventing these mappings from being one-to-one.)
The following fragment canonical
mappings for each value-object
property are combined as appropriate to make the
for each date/time datatype (other
than ):
Function Definitions
The more important functions and
procedures defined here are summarized in the
text When there is a text summary, the name of the function in each is a
hot-link to the same name in the other. All other links
to these functions link to the complete definition in this section.
Generic Number-related Functions
The following functions are used with various numeric and date/time datatypes.
Auxiliary Functions for Operating on Numeral Fragments
digitValue
&integer;a nonnegative &integer; less than tendmatches Maps each digit to its numerical value.Return
0 when d = 0 ,
1 when d = 1 ,
2 when d = 2 ,
etc.
digitSequenceValue
&integer;a nonnegative &integer;Sa finite sequence of
&string;s, each term matching .Maps a sequence of digits to the position-weighted sum of the terms numerical values.Return the sum of
(Si) × 10length(S)−i
where i runs over the domain of S.
fractionDigitSequenceValue
&integer;a nonnegative &integer;Sa finite sequence of
&string;s, each term matching .Maps a sequence of digits to the position-weighted sum of the terms numerical values, weighted appropriately for fractional digits.Return the sum of
(Si) − 10−i
where i runs over the domain of S.
fractionFragValue
&decimal;a nonnegative &decimal;Nmatches Maps a to the appropriate fractional &decimal;.N is necessarily the left-to-right concatenation of a finite sequence S of
&string;s, each term matching .Return (S).Generic Numeral-to-Number Lexical Mappings
unsignedNoDecimalMap
&integer;a nonnegative &integer;Nmatches Maps an to its numerical value.N is the left-to-right concatenation of a finite sequence S of
&string;s, each term matching .Return (S).noDecimalMap
&integer;a nonnegative &integer;Nmatches Maps an to its numerical value.N necessarily consists of an optional sign(+ or -) and then
a &string; U that matches .Return
−(U) when - is present, and
(U) otherwise.
unsignedDecimalPtMap
&decimal;a nonnegative &decimal;Dmatches Maps an to its numerical value.D necessarily consists of an optional &string; N matching ,
a decimal point, and then an optional &string; F matching .Return
(N) when F is not present,
(F) when N is not present, and
(N) + (F)
otherwise.
decimalPtMap
&decimal;a &decimal;Nmatches Maps a to its numerical value.N necessarily consists of an optional sign(+ or -) and then
an instance U of .
Return
−(U) when - is present, and
(U) otherwise.
scientificMap
&decimal;a &decimal;Nmatches Maps a to its numerical value.N necessarily consists of an instance C of either or
, either an e or an E, and then an instance
E of .Return
(C) − 10 ^ (E)
when a . is present in N, and
(C) − 10 ^ (E)
otherwise.
Auxiliary Functions for Producing Numeral Fragments
digitmatches ibetween 0 and 9 inclusiveMaps each &integer; between 0 and 9 to the corresponding .Return
0 when i = 0 ,
1 when i = 1 ,
2 when i = 2 ,
etc.
digitRemainderSeqsequence of &integer;ssequence of nonnegative &integer;sia nonnegative &integer;Maps each nonnegative &integer; to a sequence of &integer;s used by to ultimately create an .Return that sequence s for which
s0 = i and
sj+1 = sj 10 .
digitSeqsequence of &integer;ssequence of &integer;s where each term is between 0 and 9 inclusiveia nonnegative &integer;Maps each nonnegative &integer; to a sequence of &integer;s used by to create an .Return that sequence s for which
sj =(i)j 10 .
lastSignificantDigit
&integer;a nonnegative &integer;sa sequence of nonnegative &integer;sMaps a sequence of nonnegative &integer;s to the index of the first zero term.Return the smallest nonnegative &integer; j such that
s(i)j+1 is 0.
FractionDigitRemainderSeqsequence of &decimal;sa sequence of nonnegative &decimal;sfnonnegative and less than 1Maps each nonnegative &decimal; less than 1 to a sequence of &decimal;s used by to ultimately create an .Return that sequence s for which
s0 = f − 10 , and
sj+1 = (sj 1) − 10 .
fractionDigitSeqsequence of &integer;sa sequence of integer;s where each term is between 0 and 9 inclusivefnonnegative and less than 1Maps each nonnegative &decimal; less than 1 to a sequence of &integer;s used by to ultimately create an .Return that sequence s for which
sj = (f)j 1 .
fractionDigitsCanonicalFragmentMapmatches fnonnegative and less than 1Maps each nonnegative &decimal; less than 1 to a &string; used by to create an .Return
((f)0) &concat; . . . &concat;
((f)((f))) .Generic Number to Numeral Canonical Mappings
unsignedNoDecimalPtCanonicalMapmatches ia nonnegative &integer;Maps a nonnegative &integer; to a , its .Return
((i)((i))) &concat;
. . . &concat;
((i)0) . (Note
that the concatenation is in reverse order.)noDecimalPtCanonicalMapmatches ian &integer;Maps an &integer; to a , its .Return
- &concat; (−i)
when i is negative,
(i) otherwise.
unsignedDecimalPtCanonicalMapmatches na nonnegative &decimal;Maps a nonnegative &decimal; to a , its .Return (n1) &concat;
. &concat; (n1) .decimalPtCanonicalMapmatches na &decimal;Maps a &decimal; to a , its .Return
- &concat; (−i)
when i is negative,
(i) otherwise.
unsignedScientificCanonicalMapmatches na nonnegative &decimal;Maps a nonnegative &decimal; to a , its .
Return (n / 10log(n) 1) &concat;
E &concat;
(log(n) 1)
scientificCanonicalMapmatches na &decimal;Maps a &decimal; to a , its .Return
Lexical Mapping for Non-numerical s Used With Numerical Datatypes
specialRepValueone of positiveInfinity,
negativeInfinity, or notANumber.Smatches Maps the lexical
representations of s used with some
numerical datatypes to those s.
Return
positiveInfinity when S is
INF or +INF,
negativeInfinity when S is
-INF, and
notANumber when S is
NaN
Canonical Mapping for Non-numerical s Used With Numerical Datatypes
specialRepCanonicalMapmatches cone of positiveInfinity,
negativeInfinity, and notANumberMaps the s used with some numerical datatypes to their canonical representations.Return
INF when c is positiveInfinity
-INF when c is negativeInfinity
NaN when c is notANumber
Auxiliary
Functions for Reading Instances of decimalPtPrecision
&integer;an &integer;LEXmatches Maps a onto
an &integer;
presumably intended as
thethe &integer;-valued
ofan
&integer;; used in calculating the of
a value.LEX necessarily contains a decimal point (.) and may
optionally contain a following F consisting of some number
n of s.Return
n when F is present, and
0 otherwise.
scientificPrecision
&integer;an &integer;LEXmatches Maps a onto
an &integer; presumably intended
as thethe &integer;-valued
ofan
&integer;; used in calculating the of
a value.LEX necessarily contains a or C preceeding an exponent indicator (E or e,
and a following E.Return
−(E)−1 × (E) when C
is a , and
(C) − (E)
otherwise.
Lexical Mapping
&pD;LexicalMapa valueLEXmatches Maps a onto a complete value.pD be a complete value.
Set pD's to
(LEX) when
LEX is an instance of ,
(LEX) when
LEX is an instance of ,
(LEX) when
LEX is an instance of and
(LEX) otherwise.
sSet
pD's to
0 when LEX is a ,
(LEX) when LEX
is a ,
(LEX) when LEX
is a , and
absent otherwise
Set pD's to
absent when LEX is NaN
negative when
the first character of LEX is -, and
positive otherwise.
Return pD.
Lexical Mapping
&odec;LexicalMapa valueLEXmatches Maps a onto a value.d be a value.
Set d to
(LEX) when
LEX is an instance of , and
(LEX) when
LEX is an instance of ,
Return d.
Canonical Mapping
&odec;CanonicalMapa &string; matching da valueMaps a to its , a .
If d is an integer, then return
(d).
Otherwise, return
(d).
Canonical Mapping
&pD;CanonicalMapa &string; matching pDa valueMaps a to its , a .
Let nV be the of pD.
Let aP be the of pD.
If pD is one of NaN, INF, or -INF, then return
(nV).
Otherwise, if nV is an integer and aP is zero and
1E-6 ≤ nV ≤ 1E6, then return
(nV).
Otherwise, if aP is greater than zero and
1E-6 ≤ nV ≤ 1E6, then let s be
(nV).
Let f be the number of fractional digits in s;
f will invariably be less than or equal to aP.
Return the concatenation of s with
aP − f
occurrences of the digit 0.
Otherwise, it will be the case that
nV is less than 1E−6 or greater than 1E6.
Let
s be
(nV).
m be the part of s which precedes the E.
n be the part of s which follows the E.
p be the integer denoted by n.
f be the number of fractional digits in m;
note that f will invariably be less than or equal to
aP + p.
t be a string consisting of
aP + p − f
occurrences of the digit 0,
preceded by a decimal point if and only if
m contains no decimal point and
aP + p − f is
greater than zero.
Return the concatenation
m & t & E & n.
-related Definitions
The following functions are primarily used with the datatype
and its derivatives.Auxiliary -related Functions
Operating on Representation Fragments
duYearFragmentMap
&integer;a nonnegative &integer;Ymatches Maps a to an &integer;, intended as part of the value of the property of a value.Y is necessarily the letter Y followed by a numeral N:Return (N).duMonthFragmentMap
&integer;a nonnegative &integer;Mmatches Maps a to an &integer;, intended as part of the value of the property of a value.M is necessarily the letter M followed by a numeral N:Return (N).duDayFragmentMap
&integer;a nonnegative &integer;Dmatches Maps a to an &integer;, intended as part of the value of the property of a value.D is necessarily the letter D followed by a numeral N:Return (N).duHourFragmentMap
&integer;a nonnegative &integer;Hmatches Maps a to an &integer;, intended as part of the value of the property of a value.D is necessarily the letter D followed by a numeral N:Return (N).duMinuteFragmentMap
&integer;a nonnegative &integer;Mmatches Maps a to an &integer;, intended as part of the value of the property of a value.M is necessarily the letter M followed by a numeral N:Return (N).duSecondFragmentMap
&decimal;a nonnegative &decimal;Smatches Maps a to a &decimal;, intended as part of the value of the property of a value.S is necessarily S followed by a numeral N:Return
(N) when . occurs
in N, and
(N) otherwise.
duYearMonthFragmentMap
&integer;a nonnegative &integer;YMmatches Maps a into an &integer;, intended as part of the property of a value.YM necessarily consists of an
instance Y of and/or an instance M of
:
y be (Y) (or 0 if Y is not present) and
m be (M) (or 0 if M is not present).
Return 12 × y + m .duTimeFragmentMap
&decimal;a nonnegative &decimal;Tmatches Maps a into a &decimal;, intended as part of the property of a value.T necessarily consists of an instance
H of , and/or an instance M of
, and/or an instance S of
.
h be (H)
(or 0 if H is not present),
m be (M)
(or 0 if M is not present), and
s be (S)
(or 0 if S is not present).
Return
3600 × h + 60 × m + s .duDayTimeFragmentMap
&decimal;a nonnegative &decimal;DTmatches Maps a into a &decimal;, which is the potential value of the property of a value.DT necesarily consists of an instance
D of and/or an instance T of
.
d be (D)
(or 0 if D is not present) and
t be (T)
(or 0 if T is not present).
Return 86400 × d + t .
The Lexical Mapping
durationMapa complete valueDURmatches Separates the into the month part and the seconds part,
then maps them into the and of the
value.DUR consists of possibly a
leading -, followed by
P and then an instance Y of
and/or an instance D of
:Return a whose
value is
0 if Y is not present,
−(Y) if
both - and Y are present, and
(Y) otherwise.
and whose
value is
0 if D is not present,
−(D) if
both - and D are present, and
(D) otherwise.
The Lexical Mapping
yearMonthDurationMapa complete valueYMmatches Maps the lexical representation into the of a
value. (A
's is always
zero.) is a restriction of .YM necessarily consists of
an optional leading -, followed by
P and then an instance Y of
:Return a whose
value is
−(Y) if - is
present in YM and
(Y) otherwise, and
value is (necessarily) 0.
The Lexical Mapping
dayTimeDurationMapa complete valueDTa valueMaps the lexical representation into the of a
value. (A
's is always
zero.) is a restriction of .DT necessarily
consists of possibly a leading -, followed by
P and then an instance D of
:Return a whose
value is (necessarily) 0, and
value is
−(D) if - is
present in DT and
(D) otherwise.
Auxiliary -related Functions
Producing Representation Fragments
duYearMonthCanonicalFragmentMapa &string; matching yma nonnegative &integer;Maps a nonnegative &integer;, presumably the absolute value of the of a value, to a , a fragment of a .
y be ym 12 , and
m be ym 12 ,
Return
(y) &concat; Y &concat; (m) &concat; M
when neither y nor m is zero,
(y) &concat; Y
when y is not zero but m is, and
(m) &concat; M
when y is zero.
duDayCanonicalFragmentMapa &string; matching da nonnegative &integer;Maps a nonnegative &integer;, presumably the day normalized value from the of a value, to a , a fragment of a .Return
(d) &concat; D
when d is not zero, and
the empty string () when d is zero.
duHourCanonicalFragmentMapa &string; matching ha nonnegative &integer;Maps a nonnegative &integer;, presumably the hour normalized value from the of a value, to a , a fragment of a .Return
(h) &concat; H
when h is not zero, and
the empty string () when h is zero.
duMinuteCanonicalFragmentMapa &string; matching ma nonnegative &integer;Maps a nonnegative &integer;, presumably the minute normalized value from the of a value, to a , a fragment of a .Return
(m) &concat; M
when m is not zero, and
the empty string () when m is zero.
duSecondCanonicalFragmentMapmatches sa nonnegative &decimal;Maps a nonnegative &decimal;, presumably the second normalized value from the of a value, to a , a fragment of a .Return
(s) &concat; S
when s is a non-zero integer,
(s) &concat; S
when s is not an integer, and
the empty string () when s is zero.
duTimeCanonicalFragmentMapa &string; matching ha nonnegative &integer;ma nonnegative &integer;sa nonnegative &decimal;Maps three nonnegative numbers, presumably the hour, minute, and second normalized values from a 's , to a , a fragment of a .Return
T &concat;
(h) &concat;
(m) &concat;
(s)
when h, m, and s are not all zero, and
the empty string () when all arguments are zero.
duDayTimeCanonicalFragmentMapmatches ssa nonnegative &decimal;Maps a nonnegative &decimal;, presumably the absolute value of the of a value, to a , a fragment of a .
d is
ss 86400 ,
h is
(ss 86400) 3600 ,
m is
(ss 3600) 60 , and
s is
ss 60 ,
Return
(d) &concat;
(h, m, s)
when ss is not zero and
T0S when ss is zero.
The Canonical Mapping
durationCanonicalMapmatches va complete valueMaps a 's property values to fragments and combines the fragments into a complete .
m be v's ,
s be v's , and
sgn be - if m or
s is negative and
the empty string () otherwise.
Return
sgn &concat; P &concat;
(| m |) &concat;
(| s |)
when neither m nor s is zero,
sgn &concat; P &concat;
(| m |)
when m is not zero but s is, and
sgn &concat; P &concat;
(| s |)
when m is zero.
The Canonical Mapping
yearMonthDurationCanonicalMapmatches yma complete valueMaps a 's value to
a . (The value is necessarily zero and is ignored.) is a restriction of .
m be ym's and
sgn be - if m is negative and
the empty string () otherwise.
Return sgn &concat; P &concat;
(| m |) .
The Canonical Mapping
dayTimeDurationCanonicalMapmatches dta complete valueMaps a 's value to
a . (The value is necessarily zero and is ignored.) is a restriction of .
s be dt's and
sgn be - if s is negative and
the empty string () otherwise.
Return sgn &concat; P &concat;
(| s |) .
Date/time-related Definitions
When adding and subtracting numbers from date/time properties, the
immediate results may not conform to the limits specified.
Accordingly, the following procedures are used to
normalize potential property values to
corresponding values that do conform to the appropriate limits.
Normalization is required when dealing with timezone changes (as when
converting to
and
from
rawlocal
values) and when
adding values to or subtracting them from
values.
Date/time Datatype Normalizing Procedures
normalizeMonthyran &integer;moan &integer;
Add (mo − 1) 12 to yr.
Set mo to
(mo − 1) 12 + 1 .
normalizeDayyran &integer;moan &integer;daan &integer;Normalizes month and year values to values that obey the appropriate constraints.
(yr, mo)
Repeat until da is positive and not greater than
the limit specified
in the table of day limits in
(which depends on yr and
mo)(yr, mo):
If da exceeds the upper limit from the table then:
Subtract that limit from da.
Add 1 to mo.
(yr, mo)
If da is not positive then:
Subtract 1 from mo.
(yr, mo)
Add the new upper limit from the table to da.
normalizeMinuteyran &integer;moan &integer;daan &integer;hran &integer;mian &integer;Normalizes minute, hour, month, and year values to values that obey the appropriate constraints.
Add mi 60 to hr.
Set mi to mi 60 .
Add hr 24 to da.
Set hr to hr 24 .
(yr, mo, da).
lsiNormalizeSecondyran &integer;yran &integer;daan &integer;hran &integer;mian &integer;sea &decimal;Normalizes second, minute, hour, month, and year values to values that obey the appropriate
constraints. (This algorithm is leap-second insensitive.)
Add se 60 to mi.
Set se to
se0 when se ≥ 60 .
(yr, mo, da, hr, mi).
lssNormalizeSecondyr an &integer;mo an &integer;da an &integer;hr an &integer;mi an &integer;se a &decimal;Normalizes second, minute, hour, month, and year values to values that obey the appropriate
constraints. (This algorithm is leap-second sensitive.)
(yr, mo, da).
Add 60 × mi + 3600 × hr
to se .
Set mi and hr to zero.
Repeat until se is nonnegative and less than 86400
plus the number of leap-seconds
which occurred on the date in questionspecified by
the leap-second table in
(which depends on
yr, mo, and da):
If se equals or exceeds 86400 plus the upper limit from the table then:
Subtract (86400 plus that leap-second count) from se.
Add 1 to da.
If se is negative then:
Subtract 1 from da.
Add 86400 plus the new leap-second count from the table to se.
(yr, mo, da).
If se is less than 86340 then:
Set mi to se 60.
Set se to se 60.
If se is not less than 86340 then:
Set mi to 1439.
Subtract 86340 from se.
(yr, mo, da, hr, mi)
The rawlocal-value functions following all have very similar algorithms
RawLocal Properties of Date/time Seven-property Models
rawlocalYear
&integer;an &integer;dta valueReturns the rawlocal year value of a , i.e., the local timezone year, as opposed to the year. (This matters only near the year boundaries.)
yr be 1971 when dt's is absent, and dt's otherwise,
mo be 12 or dt's , similarly,
da be (the limit specified in the table of day
limits in (which depends on yr and
mo)) or dt's ,
similarly,
hr be 0 or dt's , similarly, and
mi be 0 or dt's , similarly.
Add to mi
(yr, mo, da, hr, mi).
If or dt's is absent,
return dt's ; otherwise, return yr.
rawlocalMonth
&integer;an &integer;dta valueReturns the rawlocal month value of a , i.e., the local timezone month, as opposed to the month. (This matters only near the month boundaries.)
yr be 1971 when dt's is absent, and dt's otherwise,
mo be 12 or dt's , similarly,
da be (the limit specified in the table of day
limits in (which depends on yr and
mo)) or dt's , similarly,
hr be 0 or dt's , similarly, and
mi be 0 or dt's , similarly.
Add to mi
(yr, mo, da, hr, mi).
If or dt's is absent,
return dt's ; otherwise, return mo.
rawlocalDay
&integer;an &integer;dta valueReturns the rawlocal day value of a , i.e., the local timezone day, as opposed to the day.
yr be 1971 when dt's is absent, and dt's otherwise,
mo be 12 or dt's , similarly,
da be (the limit specified in the table of day
limits in (which depends on yr and
mo)) or dt's ,
similarly,
hr be 0 or dt's , similarly, and
mi be 0 or dt's , similarly.
Add to mi
(yr, mo, da, hr, mi).
If or dt's is absent,
return dt's ; otherwise, return da.
rawlocalHour
&integer;an &integer;dta valueReturns the rawlocal hour value of a , i.e., the local timezone hour, as opposed
to the hour.
yr be 1971 when dt's is absent, and dt's otherwise,
mo be 12 or dt's , similarly,
da be (the limit specified in the table of day
limits in (which depends on yr and
mo)) or dt's ,
similarly,
hr be 0 or dt's , similarly, and
mi be 0 or dt's , similarly.
Add to mi
(yr, mo, da, hr, mi).
If or dt's is absent,
return dt's ; otherwise, return hr.
rawlocalMinute
&integer;an &integer;dta valueReturns the rawlocal minute value of a , i.e., the local timezone minute, as opposed
to the minute.
yr be 1971 when dt's is absent, and dt's otherwise,
mo be 12 or dt's , similarly,
da be (the limit specified in the table of day
limits in (which depends on yr and
mo)) or dt's ,
similarly,
hr be 0 or dt's , similarly, and
mi be 0 or dt's , similarly.
Add to mi
(yr, mo, da, hr, mi).
If or dt's is absent,
return dt's ; otherwise, return mi.
rawlocalSecond
&decimal;a &decimal;dta valueReturns the rawlocal second value of a , i.e., the local timezone second,
as opposed to the second; however, for seconds, there is no difference.Return value unchanged.setDateTimeFromRawLocaldta valuerawYran &integer;rawMoan &integer;rawDaan &integer;rawHran &integer;rawMian &integer;rawSean &decimal;Sets the properties of a from the rawlocal values provided (the local timezone
values, as opposed to the values). absent values are given default values
for computation, but ultimately absent properties remain absent.
yr be rawYear when rawYear
is not absent, dt's
when dt's is not
absent
but rawYear is, and 1971 otherwise,
mo be rawMo, dt's
, or 12, similarly,
da be rawDa, dt's
, or the limit specified in the table of day
limits in (which depends on yr and
mo), similarly,
hr be rawHr, dt's
, or 0, similarly,
mi be rawMi, dt's
, or 0, similarly, and
se be rawSe, dt's
, or 0, similarly.
If dt's is not absent,
Subtract from mi
(yr,mo,da,hr,mi).
Set to yr when is not absent,
to mo when is not absent, etc.
Date/time Auxiliary Functions
daysInMonthyan &integer;man &integer; between 1 and 12Returns the number of the last day of the month
for any combination of year and month.
Return:
30 when m is 4, 6, 9, or 11,
28 when m is 2 and y is divisble by
400, or by 4 but not by 100, or is absent,
28 otherwise and m is 2, and
31 otherwise (m is 1, 3, 5, 7, 8, 10, or 12)
setDateTimeFromLocaldta valuerawYran
&integer;rawMoan
&integer;rawDaan
&integer;rawHran
&integer;rawMian
&integer;rawSean
&decimal;Sets the properties of a from
absent values are given default values
for computation, but ultimately absent properties
remain absent.
yr be rawYear when rawYear
is not absent, dt's
when dt's is not
absent
but rawYear is, and 1971 otherwise,
mo be rawMo, dt's
, or 12, similarly,
da be rawDa, dt's
, or
(yr, mo),
similarly,
hr be rawHr, dt's
, or 0, similarly,
mi be rawMi, dt's
, or 0, similarly, and
se be rawSe, dt's
, or 0, similarly.
Set dt's to yr
when is not absent,
dt's to mo
when is not absent, etc.
Time on Timeline for Date/time Seven-property
Model Datatypes
timeOnTimeline
&decimal;a &decimal;dta
valueMaps a value to the &decimal;
representing its position on the time line.
yr be
19701971
when dt's is absent,
and dt's
− 1 otherwise,
mo be 12 or
dt's , similarly,
da be
(the limit specified in the table of day
limits)(yr+1, mo) − 1
or (dt's
) − 1 , similarly,
hr be 0 or
dt's , similarly, and
mi be 0 or
dt's , similarly.
Subtract from mi
when is not absent.
()
Set ToTl to
31536000 × yr .
(Leap-year Days, , and )
Add 86400 ×
(yr 400 −
yr 100 +
yr 4) to
ToTl.
Add 86400 × (total
number of days in months less than mo, from table in
)Summ < mo(yr, m) to
ToTl
Add 86400 × da to
ToTl.
(Leap-seconds)
Add (the count of leap-seconds prior to
the dateTime value of dt with
absent properties defaulted),
from the list in )
to ToTl.
(, ,
and )
Add 3600 × hr +
60 × mi + se
to ToTl.
Return ToTl.
Partial Date/time Lexical Mappings
yearFragValue
&integer;an &integer;YRmatches Maps a , part of a 's ,
onto an &integer;, presumably the property of a value.Return (YR)monthFragValue
&integer;an &integer;MOmatches Maps a , part of a 's ,
onto an &integer;, presumably the property of a value.Return (MO)dayFragValue
&integer;an &integer;DAmatches Maps a , part of a 's ,
onto an &integer;, presumably the property of a value.Return (DA)hourFragValue
&integer;an &integer;HRmatches Maps a , part of a 's ,
onto an &integer;, presumably the property of a value.Return (HR)minuteFragValue
&integer;an &integer;MImatches Maps a , part of a 's ,
onto an &integer;, presumably the property of a value.Return (MI)secondFragValue
&decimal;a &decimal;SEmatches Maps a , part of a 's ,
onto a &decimal;, presumably the property of a value.Return
(SE) when no decimal point occurs in SE, and
(SE) otherwise.
timezoneFragValue
&integer;an &integer;TZmatches Maps a , part of a 's ,
onto an &integer;, presumably the property of a value.TZ necessarily consists of either just Z, or
a sign (+ or -) followed by an instance H of
, a colon, and an instance M of Return
0 when TZ is Z,
−((H) × 60 + (M))
when the sign is -, and
(H) × 60 + (M)
otherwise.
Lexical Mapping
dateTimeLexicalMapa complete valueLEXmatches Maps a to
a value.LEX necessarily includes an
instance Y of ,
an instance MO of ,
and an instance D of
hyphen-separated, an instance H of ,
an instance MI of ,
and an instance S of ,
colon-separated and optionally followed by an instance
T of .dt be a complete
value with all property values absent.
Set dt's to
(T) when T
is present and absent otherwise.
(da,
(Y),
(MO),
(D),
(H),
(MI),
(S))
Return dt.
Lexical Mapping
timeLexicalMapa complete
valueLEXmatches
Maps a to
a value.LEX necessarily includes
an instance H of ,
an instance M of ,
and an instance S of ,
colon-separated and optionally followed by an instance
T of .ti be a complete
value with all property values absent.
Set ti's to
(T) when T
is present and absent otherwise.
(ti,
absent, absent, absent,
(H),
(M),
(S))
Return ti.
Lexical Mapping
dateLexicalMapa complete valueLEXmatches
Maps a to a
value.LEX necessarily includes
an instance Y of ,
an instance M of ,
and an instance D of ,
hyphen-separated and optionally followed by an instance
T of .da be a complete
value with all property values absent.
Set da's to
(T) when T
is present and absent otherwise.
(da,
(Y),
(M),
(D),
absent, absent, absent)
Return da.
Lexical Mapping
gYearMonthLexicalMapa complete valueLEXmatches
Maps a to a
value.LEX necessarily includes
an instance Y of
and an instance M of ,
hyphen-separated and optionally followed by an instance
T of .gYM be a
value with all property values absent.
Set gYM's to
(T) when T
is present and absent otherwise.
(gYM,
(Y),
(M),
absent, absent, absent, absent)
Return gYM.
Lexical Mapping
gYearLexicalMapa complete valueLEXmatches Maps a to
a value.LEX necessarily includes
an instance Y of ,
optionally followed by an instance T
of .gY be a
value with all property values absent.
Set gY's to
(T) when T
is present and absent otherwise.
(gY,
(Y), absent,
absent, absent, absent, absent)
Return gY.
Lexical Mapping
gMonthDayLexicalMapa complete valueLEXmatches
Maps a to a
value.LEX necessarily includes
an instance M of
and an instance D of ,
hyphen-separated and optionally followed by an instance
T of .gMD be a
value with all property values absent.
Set gMD's to
(T) when T
is present and absent otherwise.
(gMD,
absent, (Y),
(M),
absent, absent, absent)
Return gMD.
Lexical Mapping
gDayLexicalRepMapa complete valueLEXmatches Maps a to
a value.LEX necessarily includes
an instance D of ,
optionally followed by an instance T
of .gD be a complete
value with all property values absent.
Set gD's to
(T) when T
is present and absent otherwise.
(gD,
absent, absent,
(D), absent,
absent, absent)
Return gD.
Lexical Mapping
gMonthLexicalMapa complete valueLEXmatches Maps a to
a value.LEX necessarily includes
an instance M of ,
optionally followed by an instance T
of .gM be a
value with all property values absent.
Set gM's to
(T) when T
is present and absent otherwise.
(gM,
absent, (M),
absent, absent,
absent, absent)
Return gM.
Auxiliary Functions for Date/time Canonical Mappings
unsTwoDigitCanonicalFragmentMapmatches ia nonnegative &integer; less than 100Maps a nonnegative &integer; less than 100 onto an unsigned always-two-digit numeral.Return (i 10) &concat;
(i 10)fourDigitCanonicalFragmentMapmatches ian &integer; whose absolute value is less than 10000Maps an &integer; between -10000 and 10000 onto an always-four-digit numeral.Return
- &concat; (−i 100) &concat;
(−i 100) when
i is negative,
(i 100) &concat;
(i 100) otherwise.
Partial Date/time Canonical Mappings
yearCanonicalFragmentMapmatches yan &integer; Maps an &integer;, presumably the property of a value,
onto a , part of a 's .Return
(y) when |y| > 9999 .
(y) otherwise.
monthCanonicalFragmentMapmatches man &integer; between 1 and 12 inclusiveMaps an &integer;, presumably the property of a value,
onto a , part of a 's .Return (m)dayCanonicalFragmentMapmatches dan &integer; between 1 and 31 inclusive
(may be limited further depending on associated and )Maps an &integer;, presumably the property of a value,
onto a , part of a 's .Return (d)hourCanonicalFragmentMapmatches han &integer; between 0 and 23 inclusive.Maps an &integer;, presumably the property of a value,
onto a , part of a 's .Return (h)minuteCanonicalFragmentMapmatches man &integer; between 0 and 59 inclusive.Maps an &integer;, presumably the property of a value,
onto a , part of a 's .Return (m)secondCanonicalFragmentMapmatches sa nonnegative &decimal; less than 70Maps a &decimal;, presumably the property of a value,
onto a , part of a 's .Return
(s)
when s is an integer, and
(s1) &concat;
. &concat; (s1)
otherwise.
timezoneCanonicalFragmentMapmatches tan &integer; between −840 and 840 inclusiveMaps an &integer;, presumably the property of a value,
onto a , part of a 's .Return
Z when t is zero,
- &concat; (−t 60) &concat;
: &concat;
(−t 60) when
t is negative, and
Canonical Mapping
dateTimeCanonicalMapmatches dta complete valueMaps a value to a .DT be
((dt)dadt's ) &concat;
- &concat;
((dt)dadt's ) &concat;
- &concat;
((dt))dadt's ) &concat;
T &concat;
((dt)dt's ) &concat;
: &concat;
((dt)dt's ) &concat;
: &concat;
((dt)dt's ) .
Return
DT when
tidt's
is absent, and
DT &concat;
(dt's ) otherwise.
Canonical Mapping
timeCanonicalMapmatches tia complete valueMaps a value to a .T be
((ti)ti's ) &concat;
: &concat;
((ti)ti's ) &concat;
: &concat;
((ti)ti's ) .
Return
T when
ti's
is absent, and
T &concat;
(ti's ) otherwise.
Canonical Mapping
dateCanonicalMapmatches daa complete valueMaps a value to a .D be
((da)da's ) &concat;
- &concat;
((da)da's ) &concat;
- &concat;
((da))da's ) .
Return
D when
da's
is absent, and
D &concat;
(tida's )
otherwise.
Canonical Mapping
gYearMonthCanonicalMapmatches yma complete valueMaps a value to a .YM be
(ym's ) &concat;
- &concat;
(ym's ) .
Return
YM when ym's
is absent, and
YM &concat;
(ym's )
otherwise.
Canonical Mapping
gYearCanonicalMapmatches gYa complete valueMaps a value to a .Return
(gY's )
when gY's
is absent, and
(gY's ) &concat;
(gY's )
otherwise.
Canonical Mapping
gMonthDayCanonicalMapmatches mda complete valueMaps a value to a .MD be -- &concat;
(md's ) &concat;
- &concat;
(md's ) .
Return
MD when md's
is absent, and
MD &concat;
(md's )
otherwise.
Canonical Mapping
gDayCanonicalRepMapmatches gDa complete valueMaps a value to a .Return
The following table shows the values of the fundamental facets
for each datatype.
ISO 8601 Date and Time Formats
ISO 8601 Conventions
The datatypes
, , ,
, , ,
, and
use lexical formats inspired by
.
Following , the lexical forms of
these datatypes can include only the characters #20 through #7F.
This appendix provides more detail on the ISO
formats and discusses some deviations from them for the datatypes
defined in this specification.
"specifies the representation of dates in the
proleptic Gregorian calendar and times and representations of periods of time".
The proleptic Gregorian calendar includes dates prior to 1582 (the year it came
into use as an ecclesiastical calendar).
It should be pointed out that the datatypes described in this
specification do not cover all the types of data covered by
, nor do they support all the lexical
representations for those types of data.
lexical formats are described using "pictures"
in which characters are used in place of decimal digits.
The allowed decimal digits are (#x30-#x39).
For the primitive datatypes
, ,
, , ,
, and .
these characters have the following meanings:
C -- represents a digit used in the thousands and hundreds components,
the "century" component, of the time element "year". Legal values are
from 0 to 9.
Y -- represents a digit used in the tens and units components of the time
element "year". Legal values are from 0 to 9.
M -- represents a digit used in the time element "month". The two
digits in a MM format can have values from 1 to 12.
D -- represents a digit used in the time element "day". The two digits
in a DD format can have values from 1 to 28 if the month value equals 2,
1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30
if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value
equals 1, 3, 5, 7, 8, 10 or 12.
h -- represents a digit used in the time element "hour". The two digits
in a hh format can have values from 0 to
24.
If the value of the hour element is 24 then the values of the minutes
element and the seconds element must be 00 and 00.
m -- represents a digit used in the time element "minute". The two digits
in a mm format can have values from 0 to 59.
s -- represents a digit used in the time element "second". The two
digits in a ss format can have values from 0 to 60. In the formats
described in this specification the whole number of seconds
be followed by decimal seconds to an arbitrary level of precision.
This is represented in the picture by "ss.sss". A value of 60 or more is
allowed only in the case of leap seconds.
Strictly speaking, a value of
60 or more is not sensible unless the month and day could
represent March 31, June 30, September 30, or December 31 in .
Because the leap second is added or subtracted as the last second of the day
in time, the long (or short) minute could occur at other times in local
time. In cases where the leap second is used with an inappropriate month
and day it, and any fractional seconds, should considered as added or
subtracted from the following minute.
For all the information items indicated by the above characters, leading
zeros are required where indicated.
In addition to the above, certain characters are used as designators
and appear as themselves in lexical formats.
T -- is used as time designator to indicate the start of the
representation of the time of day in .
Z -- is used as time-zone designator, immediately (without a space)
following a data element expressing the time of day in Coordinated
Universal Time () in
, ,
, , ,
, , and .
In the lexical format for the following
characters are also used as designators and appear as themselves in
lexical formats:
P -- is used as the time duration designator, preceding a data element
representing a given duration of time.
Y -- follows the number of years in a time duration.
M -- follows the number of months or minutes in a time duration.
D -- follows the number of days in a time duration.
H -- follows the number of hours in a time duration.
S -- follows the number of seconds in a time duration.
The values of the
Year, Month, Day, Hour and Minutes components are not restricted but
allow an arbitrary integer. Similarly, the value of the Seconds component
allows an arbitrary decimal. Thus, the lexical format for
and datatypes derived from it
does not follow the alternative
format of § 5.5.3.2.1 of .
Truncated and Reduced Formats
supports a variety of "truncated" formats in
which some of the characters on the left of specific formats, for example,
the
century, can be omitted.
Truncated formats are, in
general, not permitted for the datatypes defined in this specification
with three exceptions. The datatype uses
a truncated format for
which represents an instant of time that recurs every day.
Similarly, the and
datatypes use left-truncated formats for .
The datatype uses a right and left truncated format for
.
also supports a variety of "reduced" or right-truncated
formats in which some of the characters to the right of specific formats,
such as the
time specification, can be omitted. Right truncated formats are also, in
general,
not permitted for the datatypes defined in this specification
with the following exceptions:
right-truncated representations of are used as
lexical representations for , ,
.
Deviations from ISO 8601 Formats
Sign Allowed
An optional minus sign is allowed immediately preceding, without a space,
the lexical representations for , ,
, , .
No Year Zero
The year "0000" is an illegal year value.
More Than 9999 Years
To accommodate year values greater than 9999, more than four digits are
allowed in the year representations of ,
, , and .
This follows
.
Time zone permitted
The lexical representations for the datatypes ,
, , ,
and permit an optional
trailing time zone specificiation.
Adding durations to dateTimes
Given a S and a D, this
appendix specifies how to compute a E where E is the
end of the time period with start S and duration D i.e. E = S + D. Such
computations are used, for example, to determine whether a
is within a specific time period. This appendix also addresses the addition of
s to the datatypes ,
, , and
, which can be viewed as a set of s.
In such cases, the addition is made to the first or starting
in the set.
This is a logical explanation of the process.
Actual implementations are free to optimize as long as they produce the same
results. The calculation uses the notation S[year] to represent the year
field of S, S[month] to represent the month field, and so on. It also depends on
the following functions:
fQuotient(a, b) = the greatest integer less than or equal to a/b
fQuotient(-1,3) = -1
fQuotient(0,3)...fQuotient(2,3) = 0
fQuotient(3,3) = 1
fQuotient(3.123,3) = 1
modulo(a, b) = a - fQuotient(a,b)*b
modulo(-1,3) = 2
modulo(0,3)...modulo(2,3) = 0...2
modulo(3,3) = 0
modulo(3.123,3) = 0.123
fQuotient(a, low, high) = fQuotient(a - low, high - low)
M = January, March, May, July, August, October, or
December
30
M = April, June, September, or November
29
M = February AND (modulo(Y, 400) = 0 OR
(modulo(Y, 100) != 0) AND modulo(Y, 4) = 0)
28
Otherwise
Algorithm
Essentially, this calculation is equivalent to separating D into <year,month>
and <day,hour,minute,second> fields. The <year,month> is added to S.
If the day is out of range, it is pinned to be within range. Thus April
31 turns into April 30. Then the <day,hour,minute,second> is added. This
latter addition can cause the year and month to change.
Leap seconds are handled by the computation by treating them as overflows.
Essentially, a value of 60
seconds in S is treated as if it were a duration of 60 seconds added to S
(with a zero seconds field). All calculations
thereafter use 60 seconds per minute.
Thus the addition of either PT1M or PT60S to any dateTime will always
produce the same result. This is a special definition of addition which
is designed to match common practice, and -- most importantly -- be stable
over time.
A definition that attempted to take leap-seconds into account would need to
be constantly updated, and could not predict the results of future
implementation's additions. The decision to introduce a leap second in
is the responsibility of the . They make periodic
announcements as to when
leap seconds are to be added, but this is not known more than a year in
advance. For more information on leap seconds, see .
The following is the precise specification. These steps must be followed in
the same order. If a field in D is not specified, it is treated as if it were
zero. If a field in S is not specified, it is treated in the calculation as if
it were the minimum allowed value in that field, however, after the calculation
is concluded, the corresponding field in E is removed (set to unspecified).
Months (may be modified additionally below)
temp := S[month] + D[month]
E[month] := modulo(temp, 1, 13)
carry := fQuotient(temp, 1, 13)
Years (may be modified additionally below)
E[year] := S[year] + D[year] + carry
Zone
E[zone] := S[zone]
Seconds
temp := S[second] + D[second]
E[second] := modulo(temp, 60)
carry := fQuotient(temp, 60)
Minutes
temp := S[minute] + D[minute] + carry
E[minute] := modulo(temp, 60)
carry := fQuotient(temp, 60)
Hours
temp := S[hour] + D[hour] + carry
E[hour] := modulo(temp, 24)
carry := fQuotient(temp, 24)
Days
if S[day] > maximumDayInMonthFor(E[year], E[month])
A R is a sequence of
characters that denote a set of stringsL(R).
When used to constrain a , a
regular expressionR asserts that only strings
in L(R) are valid literals for values of that type.
Unlike some popular regular expression languages (including those
defined by Perl and standard Unix utilities), the regular
expression language defined here implicitly anchors all regular
expressions at the head and tail, as the most common use of
regular expressions in is to match entire literals.
For example, a datatype from such
that all values must begin with the character A (#x41) and end with the character
Z (#x5a) would be defined as follows:
In regular expression languages that are not implicitly anchored at the head and tail,
it is customary to write the equivalent regular expression as:
^A.*Z$
where "^" anchors the pattern at the head and "$" anchors at the tail.
In those rare cases where an unanchored match is desired, including
.* at the beginning and ending of the regular expression will
achieve the desired results. For example, a datatype from string such that all values must contain at least 3 consecutive A (#x41) characters somewhere within the value could be defined as follows:
A
regular expression is composed from zero or more
es, separated by | characters.
Regular Expression
regExp
( '|' )*
For all es S, and for all
s T, valid
s R are:
Denoting the set of strings L(R) containing:
(empty string)
the set containing just the empty string
S
all strings in L(S)
S|T
all strings in L(S) and
all strings in L(T)
A branch consists
of zero or more s, concatenated together.
Branch
branch*
For all s S, and for all
es T, valid
es R are:
Denoting the set of strings L(R) containing:
S
all strings in L(S)
ST
all strings st with s in
L(S) and t in L(T)
A piece is an
, possibly followed by a
.
Piece
piece?
For all s S and non-negative
integers n, m such that
n <= m, valid s
R are:
Denoting the set of strings L(R) containing:
S
all strings in L(S)
S?
the empty string, and all strings in
L(S).
S*
All strings in L(S?) and all strings st
with s in L(S*)
and t in L(S). ( all concatenations
of zero or more strings from L(S) )
S+
All strings st with s in L(S)
and t in L(S*). ( all concatenations
of one or more strings from L(S) )
S{n,m}
All strings st with s in L(S)
and t in L(S{n-1,m-1}). ( All
sequences of at least n, and at most m, strings from L(S) )
S{n}
All strings in L(S{n,n}). ( All
sequences of exactly n strings from L(S) )
S{n,}
All strings in L(S{n}S*) ( All
sequences of at least n, strings from L(S) )
S{0,m}
All strings st with s in L(S?)
and t in L(S{0,m-1}). ( All
sequences of at most m, strings from L(S) )
S{0,0}
The set containing only the empty string
The regular expression language in the Perl Programming Language
does not include a quantifier of the form
S{,m}, since it is logically equivalent to S{0,m}.
We have, therefore, left this logical possibility out of the regular
expression language defined by this specification.
A quantifier
is one of ?, *, +,
{n,m} or {n,}, which have the meanings
defined in the table above.
A metacharacter
is either ., \, ?,
*, +, {, }(, ), [ or ].
These characters have special meanings in s,
but can be escaped to form s that denote the
sets of strings containing only themselves, i.e., an escaped
behaves like a .
A
normal character is any XML character that is not a
metacharacter. In s, a normal character is an
atom that denotes the singleton set of strings containing only itself.
Normal Character
Char[^.\?*+()|#x5B#x5D]
Note that a can be represented either as
itself, or with a character
reference.
Character Classes
A
character class is an R that identifies a set of charactersC(R). The set of strings L(R) denoted by a
character class R contains one single-character string
"c" for each character c in C(R).
Character Class
charClass |
|
A character class is either a or a
.
A
character class expression is a surrounded
by [ and ] characters. For all character
groups G, [G] is a valid character class
expression, identifying the set of characters
C([G]) = C(G).
Character Class Expression
charClassExpr'[' ']'
A
character group is either a ,
a , or a .
Character Group
charGroup |
|
A positive character group consists of one or more
s or s, concatenated
together. A positive character group identifies the set of
characters containing all of the characters in all of the sets identified
by its constituent ranges or escapes.
Positive Character Group
posCharGroup
(
|
)+
For all s R, all
s E, and all
s P, valid
s G are:
Identifying the set of characters C(G) containing:
R
all characters in C(R).
E
all characters in C(E).
RP
all characters in C(R) and all
characters in C(P).
EP
all characters in C(E) and all
characters in C(P).
A negative character group is a
preceded by the ^ character.
For all s P, ^P
is a valid negative character group, and C(^P)
contains all XML characters that are not in C(P).
Negative Character Group
negCharGroup'^'
A
character class subtraction is a
subtracted from a or
, using the - character.
Character Class Subtraction
charClassSub
( |
) '-'
For any or
G, and any
C, G-C is a valid
, identifying the set of all characters in
C(G) that are not also in C(C).
A
character rangeR identifies a set of
characters C(R) containing all XML characters with UCS
code points in a specified range.
Character Range
charRange |
seRange '-' charOrEsc | XmlChar[^\#x2D#x5B#x5D]XmlCharIncDash[^\#x5B#x5D]
A single XML character is a that identifies
the set of characters containing only itself. All XML characters are valid
character ranges, except as follows:
The [, ], - and \ characters are not
valid character ranges;
The ^ character is only valid at the beginning of a
if it is part of a
The grammar for as
given above is ambiguous, but the second and third bullets above
together remove the ambiguity.
A also be written
in the form s-e, identifying the set that contains all XML characters
with UCS code points greater than or equal to the code point
of s, but not greater than the code point of e.
s-e is a valid character range iff:
s is a , or an XML character;
s is not \
If s is the first character in a , then
s is not ^
e is a , or an XML character;
e is not \ or [; and
The code point of e is greater than or equal to the code
point of s;
The code point of a is the code point of the
single character in the set of characters that it identifies.
Character Class Escapes
A character class escape is a short sequence of characters
that identifies predefined character class. The valid character
class escapes are the s, the
s, and the s (including
the s).
Character Class Escape
charClassEsc
(
|
|
|
)
A
single character escape identifies a set containing a only
one character -- usually because that character is difficult or
impossible to write directly into a .
Single Character Escape
SingleCharEsc'\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
The valid s are:
Identifying the set of characters C(R) containing:
\n
the newline character (#xA)
\r
the return character (#xD)
\t
the tab character (#x9)
\\
\
\|
|
\.
.
\-
-
\^
^
\?
?
\*
*
\+
+
\{
{
\}
}
\(
(
\)
)
\[
[
\]
]
specifies a number of possible
values for the "General Category" property
and provides mappings from code points to specific character properties.
The set containing all characters that have property X,
can be identified with a category escape\p{X}.
The complement of this set is specified with the
category escape\P{X}.
([\P{X}] = [^\p{X}]).
is subject to future revision. For example, the
mapping from code points to character properties might be updated.
All processors
support the character properties defined in the version of
that is current at the time this specification became a W3C
Recommendation. However, implementors are encouraged to support the
character properties defined in any future version.
The following table specifies the recognized values of the
"General Category" property.
Category
Property
Meaning
Letters
L
All Letters
Lu
uppercase
Ll
lowercase
Lt
titlecase
Lm
modifier
Lo
other
Marks
M
All Marks
Mn
nonspacing
Mc
spacing combining
Me
enclosing
Numbers
N
All Numbers
Nd
decimal digit
Nl
letter
No
other
Punctuation
P
All Punctuation
Pc
connector
Pd
dash
Ps
open
Pe
close
Pi
initial quote
(may behave like Ps or Pe depending on usage)
Pf
final quote
(may behave like Ps or Pe depending on usage)
The properties mentioned above exclude the Cs property.
The Cs property identifies "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
groups code points into a number of blocks
such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul Jamo,
CJK Compatibility, etc.
The set containing all characters that have block name X
(with all white space stripped out),
can be identified with a block escape\p{IsX}.
The complement of this set is specified with the
block escape\P{IsX}.
([\P{IsX}] = [^\p{IsX}]).
Block Escape
IsBlock'Is' [a-zA-Z0-9#x2D]+
The following table specifies the recognized block names (for more
information, see the "Blocks.txt" file in ).
Start Code
End Code
Block Name
Start Code
End Code
Block Name
#x0000
#x007F
BasicLatin
#x0080
#x00FF
Latin-1Supplement
#x0100
#x017F
LatinExtended-A
#x0180
#x024F
LatinExtended-B
#x0250
#x02AF
IPAExtensions
#x02B0
#x02FF
SpacingModifierLetters
#x0300
#x036F
CombiningDiacriticalMarks
#x0370
#x03FF
Greek
#x0400
#x04FF
Cyrillic
#x0530
#x058F
Armenian
#x0590
#x05FF
Hebrew
#x0600
#x06FF
Arabic
#x0700
#x074F
Syriac
#x0780
#x07BF
Thaana
#x0900
#x097F
Devanagari
#x0980
#x09FF
Bengali
#x0A00
#x0A7F
Gurmukhi
#x0A80
#x0AFF
Gujarati
#x0B00
#x0B7F
Oriya
#x0B80
#x0BFF
Tamil
#x0C00
#x0C7F
Telugu
#x0C80
#x0CFF
Kannada
#x0D00
#x0D7F
Malayalam
#x0D80
#x0DFF
Sinhala
#x0E00
#x0E7F
Thai
#x0E80
#x0EFF
Lao
#x0F00
#x0FFF
Tibetan
#x1000
#x109F
Myanmar
#x10A0
#x10FF
Georgian
#x1100
#x11FF
HangulJamo
#x1200
#x137F
Ethiopic
#x13A0
#x13FF
Cherokee
#x1400
#x167F
UnifiedCanadianAboriginalSyllabics
#x1680
#x169F
Ogham
#x16A0
#x16FF
Runic
#x1780
#x17FF
Khmer
#x1800
#x18AF
Mongolian
#x1E00
#x1EFF
LatinExtendedAdditional
#x1F00
#x1FFF
GreekExtended
#x2000
#x206F
GeneralPunctuation
#x2070
#x209F
SuperscriptsandSubscripts
#x20A0
#x20CF
CurrencySymbols
#x20D0
#x20FF
CombiningMarksforSymbols
#x2100
#x214F
LetterlikeSymbols
#x2150
#x218F
NumberForms
#x2190
#x21FF
Arrows
#x2200
#x22FF
MathematicalOperators
#x2300
#x23FF
MiscellaneousTechnical
#x2400
#x243F
ControlPictures
#x2440
#x245F
OpticalCharacterRecognition
#x2460
#x24FF
EnclosedAlphanumerics
#x2500
#x257F
BoxDrawing
#x2580
#x259F
BlockElements
#x25A0
#x25FF
GeometricShapes
#x2600
#x26FF
MiscellaneousSymbols
#x2700
#x27BF
Dingbats
#x2800
#x28FF
BraillePatterns
#x2E80
#x2EFF
CJKRadicalsSupplement
#x2F00
#x2FDF
KangxiRadicals
#x2FF0
#x2FFF
IdeographicDescriptionCharacters
#x3000
#x303F
CJKSymbolsandPunctuation
#x3040
#x309F
Hiragana
#x30A0
#x30FF
Katakana
#x3100
#x312F
Bopomofo
#x3130
#x318F
HangulCompatibilityJamo
#x3190
#x319F
Kanbun
#x31A0
#x31BF
BopomofoExtended
#x3200
#x32FF
EnclosedCJKLettersandMonths
#x3300
#x33FF
CJKCompatibility
#x3400
#x4DB5
CJKUnifiedIdeographsExtensionA
#x4E00
#x9FFF
CJKUnifiedIdeographs
#xA000
#xA48F
YiSyllables
#xA490
#xA4CF
YiRadicals
#xAC00
#xD7A3
HangulSyllables
#xE000
#xF8FF
PrivateUse
#xF900
#xFAFF
CJKCompatibilityIdeographs
#xFB00
#xFB4F
AlphabeticPresentationForms
#xFB50
#xFDFF
ArabicPresentationForms-A
#xFE20
#xFE2F
CombiningHalfMarks
#xFE30
#xFE4F
CJKCompatibilityForms
#xFE50
#xFE6F
SmallFormVariants
#xFE70
#xFEFE
ArabicPresentationForms-B
#xFEFF
#xFEFF
Specials
#xFF00
#xFFEF
HalfwidthandFullwidthForms
#xFFF0
#xFFFD
Specials
The blocks mentioned above exclude the HighSurrogates,
LowSurrogates and HighPrivateUseSurrogates blocks.
These blocks identify "surrogate" characters, which do not
occur at the level of the "character abstraction" that XML instance documents
operate on.
is subject to future revision.
For example, the
grouping of code points into blocks might be updated.
All processors
support the blocks defined in the version of
that is current at the time this specification became a W3C
Recommendation. However, implementors are encouraged to support the
blocks defined in any future version of the Unicode Standard.
For example, the for identifying the
ASCII characters is \p{IsBasicLatin}.
A
multi-character escape provides a simple way to identify
a commonly used set of characters:
the set of initial name characters, those
ed by
Letter | '_' | ':'
\I
[^\i]
\c
the set of name characters, those
ed by
NameChar
\C
[^\c]
\d
\p{Nd}
\D
[^\d]
\w
[#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
(all characters except the set of "punctuation",
"separator" and "other" characters)
\W
[^\w]
The language defined here does not
attempt to provide a general solution to "regular expressions" over
UCS character sequences. In particular, it does not easily provide
for matching sequences of base characters and combining marks.
The language is targeted at support of "Level 1" features as defined in
. It is hoped that future versions of this
specification will provide support for "Level 2" features.
Changes since version 1.0
Changes Already Made
The revision of (including the new derived datatypes
and ), in ,
and , has been approved.
Datatypes, Facets and Related Rewrites
The model of an abstract datatype has been made more precise and
explicit. has mostly been rewritten
and been formally blessed by the WG. Driving
this new text is not only a desire on the part of the WG to make it
more precise and explicit but also a specific formal
requirement to redo the handling of facets
(RQ-24 (systematic
facets)). The primary intent of this requirement was to
move the description of equality, identity, and order to .
RQ-24 (systematic
facets: separate identity and equality) directed that we provide
for equality that in some cases was different from identity.
Most datatypes still use identity as their equality, but the new
precisionDecimal and the redesigned date/time datatypes will
not. It is also intended that and
will use this capability to separate minus zero
from plus zero; they will be non-identical but equal (see RQ-140).
The of the
component for list datatypes is
now always false, reflecting the fact that datatypes are not ordered (except by the
trivial order), and hence cannot reasonably be bounded.
Units of length have been selected for all datatypes that are
permitted the length &cfacet;
(RQ-6
(length for [almost] all primitive
types)).
Numerical Datatypes
The datatype has been added. It
is intended to support the floating-point decimal datatypes
defined in the forthcoming version of IEEE 754. The
datatype differs from
in that values carry not only a
numeric value but also an
(arithmetic) precision.
Date/time Datatypes
RQ-2 (canonical rep of duration) resulted in the adoption
of a new two-property model for and the rewriting of . A few additional
changes have been proposed, and in versions of this specification that show adds and dels these changes are so
marked. In addition, two new derived datatypes ( and )
have been added in satisfaction of RQ-20 (ordered duration types).
RQ-122 (define
dateTime value space) has resulted in a revision of the value
space for all date/time datatypes (except , which was changed as a result of another
requirement). The most visible effect of this change was to
cause the values to retain knowledge of their timezone, which is
explained in the new material. The new version specifies a
seven-property model used uniformly for values in all of these
datatypes, described in
. Only has been rewritten to match this new generic approach;
the other date/time datatype descriptions will be rewritten in a
future draft. These rewrites and the new have not yet been approved by the WG; in versions
of this specification that show adds and dels, this material shows as
a proposed change.
In addition to the normative material, the
nonnormative was added to explain in more
detail the model of dates and times behind the seven-property model,
so that there will be no confusion about the handling of such things
as leap-seconds.
The seven property model rewrite of date/time datatype descriptions includes a carefully crafted definition of order
that insures that for repeating datatypes (time, gDay, etc.), timezoned values will be compared as though they are on
the same "calendar day" ("local" property values) so that in any given timezone, the days start at "local" 00:00:00 and
end not quite including "local" 24:00:00. Days are not 00:00:00Z to 24:00:00Z in timezones other than Z. This covers
the requirements of RQ-13 (time zone crosses date line). In addition,
in satisfaction of RQ-123 (year 0000 in date/time datatypes), the lexical
representation 0000 for years is made legal and the mapping of values with negative years onto the timeline
has been changed to match. E.g., the year 0000 is 1 B.C.E., the year −0001 is 2 B.C.E., etc. (This is a
change from version 1.0 of this specification.)
Specific Outstanding Issues
In addition to the changes already made, the Working Group has decided on
a number of further changes which have not yet been reflected in this draft.
These are indicated throughout the text as issues, including more or less
detail on the intended resolution. The ones remaining in this draft are
summarized below, linked to their occurrence in the text above, where more
detail can be found, including links to the original requirement or other
point of origin.
Glossary (non-normative)
The listing below is for the benefit of readers of a printed version of this
document: it collects together all the definitions which appear in the
document above.
References
Normative
World Wide Web Consortium. XML Base.
Available at:
http://www.w3.org/TR/2001/REC-xmlbase-20010627/
IEEE. IEEE Standard for Binary Floating-Point Arithmetic.
See
http://standards.ieee.org/reading/ieee/std_public/description/busarch/754-1985_desc.html
World Wide Web Consortium. XML Linking Language (XLink).
Available at: &xlink;.
Note: only the URI reference escaping procedure defined in
Section 5.4 is normatively referenced.
Extensible
Markup Language (XML) 1.0, Second EditionThird Edition, Tim Bray et al., eds., W3C,
6 October 20004 February 2004. See
http://www.w3.org/TR/2000/REC-xml-20001006http://www.w3.org/TR/2004/REC-xml-20040204/
XML Schema Version 1.1 Part 1:
Structures. Available at: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
&xsdl;XML Schema Requirements , Ashok Malhotra and Murray Maloney, eds.,
W3C, 15 February 1999. See http://www.w3.org/TR/1999/NOTE-xml-schema-req-19990215
World Wide Web Consortium. Namespaces in XML. Available at:
&xmlnsspec;
Tim Berners-Lee, et. al. RFC 2396: Uniform Resource Identifiers (URI):
Generic Syntax.. 1998. Available at:
http://www.ietf.org/rfc/rfc2396.txtRFC
2732: Format for Literal IPv6 Addresses in URL's. 1999.
Available at:
http://www.ietf.org/rfc/rfc2732.txt
N. Freed and N. Borenstein. RFC 2045: Multipurpose Internet Mail Extensions
(MIME) Part One: Format of Internet Message Bodies. 1996. Available at:
http://www.ietf.org/rfc/rfc2045.txt
H. Alvestrand, ed. RFC 3066: Tags for the Identification of Languages
1995. Available at:
http://www.ietf.org/rfc/rfc3066.txt
William D Clinger. How to Read Floating Point Numbers Accurately.
In Proceedings of Conference on Programming Language Design and
Implementation, pages 92-101.
Available at:
ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.ps
The Unicode Consortium. The Unicode Character Database.
Available at:
http://www.unicode.org/Public/3.1-Update/UnicodeCharacterDatabase-3.1.0.html
Non-normative
XML Schema Requirements , Ashok Malhotra and Murray Maloney, eds.,
W3C, 15 February 1999. See http://www.w3.org/TR/1999/NOTE-xml-schema-req-19990215
M. Dürst and M. Suignard
.
Internationalized Resource Identifiers
2002. Available at:
http://www.w3.org/International/iri-edit/draft-duerst-iri-04.txt
http://www.w3.org/International/iri-edit/draft-duerst-iri-08.txt
World Wide Web Consortium. Ruby Annotation. Available at:
http://www.w3.org/TR/2001/WD-ruby-20010216
http://www.w3.org/TR/2001/REC-ruby-20010531
World Wide Web Consortium. Hypertext Markup Language, version 4.01. Available at:
&html4;
World Wide Web Consortium. XML Schema Language: Part 0 Primer. Available at:
&primer;
Mark Davis. Unicode Regular Expression Guidelines, 1988.
Available at:
http://www.unicode.org/unicode/reports/tr18/
The Perl Programming Language. See
http://www.perl.com/pub/language/info/software.html
ISO (International Organization for Standardization). ISO/IEC
9075-2:1999, Information technology --- Database languages ---
SQL --- Part 2: Foundation (SQL/Foundation).
[Geneva]: International Organization for Standardization, 1999.
See
http://www.iso.ch/cate/d26197.html
International Earth Rotation Service (IERS).
See http://maia.usno.navy.mil
ISO (International Organization for Standardization).
Representations of dates and times, 1988-06-15.
ISO (International Organization for Standardization).
Representations of dates and times, draft revision, 1998.
ISO (International Organization for Standardization).
Representations of dates and times, second edition, 2000-12-15.
ISO (International Organization for Standardization).
Language-independent Datatypes. See
http://www.iso.ch/cate/d19346.html
World Wide Web Consortium. RDF Schema Specification.
Available at:
http://www.w3.org/TR/2000/CR-rdf-schema-20000327/Information about Leap Seconds
Available at:
http://tycho.usno.navy.mil/leapsec.990505.html
World Wide Web Consortium.
Extensible Stylesheet Language (XSL).
Available at:
http://www.w3.org/TR/2000/CR-xsl-20001121
http://www.w3.org/TR/2001/REC-xsl-20011015/
Martin J. Dürst and François Yergeau, eds.
Character Model for the World Wide Web. World Wide Web Consortium
Working Draft 1.0:
Fundamentals. 2001.
Available at: &charmod;
David M. Gay. Correctly Rounded Binary-Decimal and
Decimal-Binary Conversions. AT&T Bell Laboratories Numerical
Analysis Manuscript 90-10, November 1990.
Available at:
http://cm.bell-labs.com/cm/cs/doc/90/4-10.ps.gz
Acknowledgements (non-normative)
TheAlong with the
editors thereof, the following have
contributed material to the first
editionversion
of this specification:
Asir S. Vedamuthu, webMethods, IncMark Davis, IBM
Co-editor Ashok Malhotra's work on this specification from March 1999 until
February 2001 was supported by IBM, and from then
until May 2004 by Microsoft. Since July 2004 his work
on this specification has been supported by Oracle Corporation.
The XML Schema Working Group acknowledges with thanks the members
of other W3C Working Groups and industry experts in other
forums who have contributed directly or indirectly to the creation
of this document and its predecessor.
At the time this Working Draft is published, the members
in good standing of the XML Schema Working Group are:
Leonid ArbouzovSun MicrosystemsPeter ChenBootstrap Alliance and LSUTony CincottaNISTDavid EzellNational Association of Convenience StoreschairSandy GaoIBMAndrew GoodchildDistributed Systems Technology Centre (DSTC Pty Ltd)Mary HolstegeMark LogicKohsuke KawaguchiSun MicrosystemsAshok MalhotraOracle CorporationNoah MendelsohnIBMRavi MurthyOracle CorporationPaul PedersenMark Logic CorporationDave PetersonInvited ExpertAnli ShundiTIBCO ExtensibilityC. M. Sperberg-McQueenW3Cstaff contactHoylen SueDistributed Systems Technology Centre (DSTC Pty Ltd)Henry S. ThompsonUniversity of EdinburghAsir S. VedamuthuwebMethods, IncKongyi ZhouOracle Corp.
The XML Schema Working Group has benefited in its work from the
participation and contributions of a number of people who are no
longer members of the Working Group in good standing at the time
of publication of this Working Draft. Their names are given below.
In particular we note
with sadness the accidental death of Mario Jeckle shortly before
publication of the first Working Draft of XML Schema 1.1.
Affiliations given are those current at the time of their first work
with the WG.
Paula AngersteinVignette CorporationJim BarnetteDefense Information Systems Agency (DISA)David BeechOracle Corp.Gabe Beged-DovRogue Wave SoftwareLaila BenhlimaEcole Mohammadia d'Ingenieurs Rabat (EMI)Doris BernardiniDefense Information Systems Agency (DISA)Paul V. BironHealth Level SevenDon BoxDevelopMentorAllen BrownMicrosoftLee BuckTIBCO ExtensibilityGreg BumgardnerRogue Wave SoftwareDean BursonLotus Development CorporationCharles E. CampbellInvited expertOriol CarboUniversity of EdinburghWayne CarrIntelTyng-Ruey ChuangAcademia SinicaDavid ClearyProgress SoftwareMike CokusMITREDan ConnollyW3Cstaff contactUgo CordaXeroxRoger L. CostelloMITREJoey CoyleHealth Level SevenHaavard DanielsonProgress SoftwareJosef DietlMozquito TechnologiesKenneth DolsonDefense Information Systems Agency (DISA)Andrew EisenbergProgress SoftwareRob EllmanCalico CommerceTim EwaldDevelopmentorAlexander FalkAltova GmbHDavid FallsideIBMGeorge FeinbergObject DesignDan FoxDefense Logistics Information Service (DLIS)Charles FrankstonMicrosoftMatthew FuchsCommerce OneXan GreggTIBCO ExtensibilityPaul GrossoArbortext, IncMartin GudginDevelopMentorErnesto GuerrieriInsoDave HollanderHewlett-Packard Companyco-chairNelson HungCorelJane HunterDistributed Systems Technology Centre (DSTC Pty Ltd)Michael HymanMicrosoftRenato IannellaDistributed Systems Technology Centre (DSTC Pty Ltd)Mario JeckleDaimlerChryslerRick JelliffeAcademia SinicaMarcel JemioData Interchange Standards AssociationSimon JohnstonRational SoftwareDianne KennedyGraphic Communications AssociationJanet KoenigSun MicrosystemsSetrag KhoshafianTechnology Deployment International (TDI)Melanie KudelaUniform Code CouncilAra KullukianTechnology Deployment International (TDI)Andrew LaymanMicrosoftDmitry LenkovHewlett-Packard CompanyBob LojekMozquito TechnologiesJohn McCarthyLawrence Berkeley National LaboratoryMatthew MacKenzieXML GlobalMurata MakotoXeroxEve MalerSun MicrosystemsMurray MaloneyMuzmo Communication, acting for Commerce OneLisa MartinIBMJim MeltonOracle CorpAdrian MichelCommerce OneAlex MilowskiInvited ExpertDon MullenTIBCO ExtensibilityChris OldsWall DataFrank OlkenLawrence Berkeley National LaboratoryShriram RevankarXeroxMark ReinholdSun MicrosystemsJonathan RobieSoftware AGCliff SchmidtMicrosoftJohn C. SchneiderMITREEric SedlarOracle Corp.Lew ShannonNCRWilliam SheaMerrill LynchJerry L. SmithDefense Information Systems Agency (DISA)John StantonDefense Information Systems Agency (DISA)Tony StewartRivcomBob StreichCalico CommerceWilliam K. StumboXeroxRalph SwickW3CJohn TebbuttNISTRoss ThompsonContivoMatt TimmermansMicrostarJim TrezzoOracle Corp.Steph TryphonasMicrostarMark TuckerHealth Level SevenScott VorthmannTIBCO ExtensibilityPriscilla WalmsleyXMLSolutionsNorm WalshSun MicrosystemsCherry WashingtonDefense Information Systems Agency (DISA)Aki YoshidaSAP AG