W3C

XML Schema 1.1 Part 1: Structures

W3C Working Draft 31 August 2006

This version:
http://www.w3.org/TR/2006/WD-xmlschema11-1-20060831/
Latest version:
http://www.w3.org/TR/xmlschema11-1/
Previous versions:
http://www.w3.org/TR/2006/WD-xmlschema11-1-20060330/ http://www.w3.org/TR/2005/WD-xmlschema11-1-20050224/ http://www.w3.org/TR/2004/WD-xmlschema11-1-20040716/
Editors:
Henry S. Thompson, University of Edinburgh <ht@inf.ed.ac.uk>
C. M. Sperberg-McQueen, World Wide Web Consortium <cmsmcq@w3.org>
Shudi (Sandy) Gao 高殊镝, IBM <sandygao@ca.ibm.com>
Noah Mendelsohn, IBM <noah_mendelsohn@us.ibm.com>
David Beech, Oracle Corporation (retired) <davidbeech@earthlink.net>
Murray Maloney, Muzmo Communications <murray@muzmo.com>

This document is also available in these non-normative formats: XML, XHTML with changes since version 1.0 marked, XHTML with changes since previous Working Draft marked, Independent copy of the schema for schema documents, Independent copy of the DTD for schema documents, Independent tabulation of components and microcomponents, and List of translations.


Abstract

XML Schema: Structures specifies the XML Schema definition language, which offers facilities for describing the structure and constraining the contents of XML documents, including those which exploit the XML Namespace facility. The schema language, which is itself represented in XML and uses namespaces, substantially reconstructs and considerably extends the capabilities found in XML document type definitions (DTDs). This specification depends on XML Schema 1.1 Part 2: Datatypes.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Public Working Draft of XML Schema 1.1. It is here made available for review by W3C members and the public. It is intended to give an indication of the W3C XML Schema Working Group's intentions for this new version of the XML Schema language and our progress in achieving them. It attempts to be complete in indicating what will change from version 1.0, but does not specify in all cases how things will change.

This draft was published on 31 August 2006. The major changes since the previous draft are:

For those primarily interested in the changes since version 1.0, the Changes since version 1.0 (§G) appendix, which summarizes both changes already made and also those in prospect, with links to the relevant sections of this draft, is the recommended starting point. Accompanying versions of this document display in color all changes to normative text since version 1.0 and since the previous Working Draft.

Please send comments on this Working Draft to www-xml-schema-comments@w3.org (archive).

||

Although feedback based on any aspect of this specification is welcome, there are certain aspects of the design presented herein for which the Working Group is particularly interested in feedback. These are designated "priority feedback" aspects of the design, and identified as such in editorial notes at appropriate points in this draft.

||

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The goals of the XML Schema language version 1.1 are discussed in the Requirements for XML Schema 1.1 document. The authors of this document are the members of the XML Schema Working Group. Different parts of this specification have different editors.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

The English version of this specification is the only normative version. Information about translations of this document is available at http://www.w3.org/2003/03/Translations/byTechnology?technology=xmlschema.

Text in this document which does not now have Working Group consensus is marked: ||this way||.


Table of Contents

1 Introduction
    1.1 Introduction to Version 1.1
    1.2 Purpose
    1.3 Dependencies on Other Specifications
    1.4 Documentation Conventions and Terminology
2 Conceptual Framework
    2.1 Overview of XML Schema
    2.2 XML Schema Abstract Data Model
    2.3 Constraints and Validation Rules
    2.4 Conformance
    2.5 Names and Symbol Spaces
    2.6 Schema-Related Markup in Documents Being Validated
    2.7 Representation of Schemas on the World Wide Web
3 Schema Component Details
    3.1 Introduction
    3.2 Attribute Declarations
    3.3 Element Declarations
    3.4 Complex Type Definitions
    3.5 AttributeUses
    3.6 Attribute Group Definitions
    3.7 Model Group Definitions
    3.8 Model Groups
    3.9 Particles
    3.10 Wildcards
    3.11 Identity-constraint Definitions
    3.12 Assertions
    3.13 Notation Declarations
    3.14 Annotations
    3.15 Simple Type Definitions
    3.16 Schemas as a Whole
4 Schemas and Namespaces: Access and Composition
    4.1 Layer 1: Summary of the Schema-validity Assessment Core
    4.2 Layer 2: Schema Documents, Namespaces and Composition
    4.3 Layer 3: Schema Document Access and Web-interoperability
5 Schemas and Schema-validity Assessment
    5.1 Errors in Schema Construction and Structure
    5.2 Assessing Schema-Validity
    5.3 Missing Sub-components
    5.4 Responsibilities of Schema-aware Processors

Appendices

A Schema for Schema Documents (Structures) (normative)
B References (normative)
C Outcome Tabulations (normative)
    C.1 Validation Rules
    C.2 Contributions to the post-schema-validation infoset
    C.3 Schema Representation Constraints
    C.4 Schema Component Constraints
D Terminology for implementation-defined features
    D.1 Subset of the Post-schema-validation Infoset
    D.2 Terminology of schema construction
E Required Information Set Items and Properties (normative)
F Checklist of implementation-defined features
G Changes since version 1.0
    G.1 Changes already made
    G.2 Outstanding issues
H Implementing 'actually restricts'
I Checking content-type restriction
J Schema Components Diagram (non-normative)
K Glossary (non-normative)
L DTD for Schemas (non-normative)
M Analysis of the Unique Particle Attribution Constraint (non-normative)
N References (non-normative)
O Acknowledgements (non-normative)


1 Introduction

This document sets out the structural part (XML Schema: Structures) of the XML Schema definition language.

Chapter 2 presents a Conceptual Framework (§2) for XML Schemas, including an introduction to the nature of XML Schemas and an introduction to the XML Schema abstract data model, along with other terminology used throughout this document.

Chapter 3, Schema Component Details (§3), specifies the precise semantics of each component of the abstract model, the representation of each component in XML, with reference to a DTD and XML Schema for an XML Schema document type, along with a detailed mapping between the elements and attribute vocabulary of this representation and the components and properties of the abstract model.

Chapter 4 presents Schemas and Namespaces: Access and Composition (§4), including the connection between documents and schemas, the import, inclusion and redefinition of declarations and definitions and the foundations of schema-validity assessment.

Chapter 5 discusses Schemas and Schema-validity Assessment (§5), including the overall approach to schema-validity assessment of documents, and responsibilities of schema-aware processors.

The normative appendices include a Schema for Schema Documents (Structures) (normative) (§A) for the XML representation of schemas and References (normative) (§B).

The non-normative appendices include the DTD for Schemas (non-normative) (§L) and a Glossary (non-normative) (§K).

This document is primarily intended as a language definition reference. As such, although it contains a few examples, it is not primarily designed to serve as a motivating introduction to the design and its features, or as a tutorial for new users. Rather it presents a careful and fully explicit definition of that design, suitable for guiding implementations. For those in search of a step-by-step introduction to the design, the non-normative [XML Schema: Primer] is a much better starting point than this document.

next sub-section1.1 Introduction to Version 1.1

The Working Group has three main goals for this version of W3C XML Schema:

These goals are in tension with one another. The Working Group's strategic guidelines for changes between versions 1.0 and 1.1 can be summarized as follows:

  1. Support for versioning (acknowledging that this may be slightly disruptive to the XML transfer syntax at the margins)
  2. Support for co-occurrence constraints (which will certainly involve additions to the XML transfer syntax, which will not be understood by 1.0 processors)
  3. Bug fixes (unless in specific cases we decide that the fix is too disruptive for a point release)
  4. Editorial changes
  5. Design cleanup will possibly change behavior in edge cases
  6. Non-disruptive changes to type hierarchy (to better support current and forthcoming international standards and W3C recommendations)
  7. Design cleanup will possibly change component structure (changes to functionality restricted to edge cases)
  8. No significant changes in existing functionality
  9. No changes to XML transfer syntax except those required by version control hooks, co-occurrence constraints and bug fixes

The aim with regard to compatibility is that

previous sub-section next sub-section1.3 Dependencies on Other Specifications

The definition of XML Schema: Structures depends on the following specifications: [XML-Infoset], [XML-Namespaces 1.1], [XPath], [XPath 2.0], and [XML Schema: Datatypes].

See Required Information Set Items and Properties (normative) (§E) for a tabulation of the information items and properties specified in [XML-Infoset] which this specification requires as a precondition to schema-aware processing.

[XML Schema: Datatypes] defines some datatypes which depend on definitions in [XML 1.1] and [XML-Namespaces 1.1]; those definitions, and therefore the datatypes based on them, vary between version 1.0 ([XML 1.0], [XML-Namespaces 1.0]) and version 1.1 ([XML 1.1], [XML-Namespaces 1.1]) of those specifications. In any given schema-validity-·assessment· episode, the choice of the 1.0 or the 1.1 definition of those datatypes is implementation-defined.

Conforming implementations of this specification may provide either the 1.1-based datatypes or the 1.0-based datatypes, or both. If both are supported, the choice of which datatypes to use in a particular assessment episode should be under user control.

Note:  Implementations may provide the heuristic of using the 1.1 datatypes if the input is labeled as XML 1.1, and the 1.0 datatypes if the input is labeled 1.0. It should be noted however that the XML version number is not required to be present in the input to an assessment episode, and in any case the heuristic should be subject to override by users, to support cases where users wish to accept XML 1.1 input but validate it using the 1.0 datatypes, or accept XML 1.0 input and validate it using the 1.1 datatypes.
Note:  Some users will perhaps wish to accept only XML 1.1 input, or only XML 1.0 input. Conforming implementations of this specification which accept XML input may accept XML 1.0, XML 1.1, or both and may provide user control over which versions of XML to accept.

previous sub-section 1.4 Documentation Conventions and Terminology

The section introduces the highlighting and typography as used in this document to present technical material.

Aspects of this document which the Working Group are committed to changing, but where (all) changes are not yet in place, are signalled by the appearance of an Issue, with a link to the associated version 1.1 Requirement, for example:

All such issues are tabulated in Outstanding issues (§G.2).

Special terms are defined at their point of introduction in the text. For example [Definition:]  a term is something used with a special meaning. The definition is labeled as such and the term it defines is displayed in boldface. The end of the definition is not specially marked in the displayed or printed text. Uses of defined terms are links to their definitions, set off with middle dots, for instance ·term·.

Non-normative examples are set off in boxes and accompanied by a brief explanation:

Example
<schema targetNamespace="http://www.example.com/XMLSchema/1.0/mySchema">
And an explanation of the example.

The definition of each kind of schema component consists of a list of its properties and their contents, followed by descriptions of the semantics of the properties:

Schema Component: Example
{example property}
A Component component. Required.

An example property

References to properties of schema components are links to the relevant definition as exemplified above, set off with curly braces, for instance {example property}.

The correspondence between an element information item which is part of the XML representation of a schema and one or more schema components is presented in a tableau which illustrates the element information item(s) involved. This is followed by a tabulation of the correspondence between properties of the component and properties of the information item. Where context may determine which of several different components may arise, several tabulations, one per context, are given. The property correspondences are normative, as are the illustrations of the XML representation element information items.

In the XML representation, bold-face attribute names (e.g. count below) indicate a required attribute information item, and the rest are optional. Where an attribute information item has an enumerated type definition, the values are shown separated by vertical bars, as for size below; if there is a default value, it is shown following a colon. Where an attribute information item has a built-in simple type definition defined in [XML Schema: Datatypes], a hyperlink to its definition therein is given.

The allowed content of the information item is shown as a grammar fragment, using the Kleene operators ?, * and +. Each element name therein is a hyperlink to its own illustration.

Note: The illustrations are derived automatically from the Schema for Schema Documents (Structures) (normative) (§A). In the case of apparent conflict, the Schema for Schema Documents (Structures) (normative) (§A) takes precedence, as it, together with the ·Schema Representation Constraints·, provide the normative statement of the form of XML representations.
XML Representation Summary: example Element Information Item

<example
  count = integer
  size = (large | medium | small) : medium>
  Content: (all | any*)
</example>

Example Schema Component
Property
Representation
 
Description of what the property corresponds to, e.g. the value of the size [attribute]
 

References to elements in the text are links to the relevant illustration as exemplified above, set off with angle brackets, for instance <example>.

References to properties of information items as defined in [XML-Infoset] are notated as links to the relevant section thereof, set off with square brackets, for example [children].

Properties which this specification defines for information items are introduced as follows:

PSVI Contributions for example information items
[new property]
The value the property gets.

References to properties of information items defined in this specification are notated as links to their introduction as exemplified above, set off with square brackets, for example [new property].

The following highlighting is used for non-normative commentary in this document:

Note: General comments directed to all readers.

Within normative prose in this specification, the words may, should, must and must not are defined as follows:

may
Conforming documents and XML Schema-aware processors are permitted to but need not behave as described.
should
It is recommended that conforming documents and XML Schema-aware processors behave as described, but there can be valid reasons for them not to; it is important that the full implications be understood and carefully weighed before adopting behavior at variance with the recommendation.
must
Conforming documents and XML Schema-aware processors are required to behave as described; otherwise they are in error.
must not
Conforming documents and XML Schema-aware processors are forbidden to behave as described; if they do they are in error.

These definitions describe in terms specific to this document the meanings assigned to these terms by [IETF RFC 2119]. The specific wording follows that of [XML 1.1].

This specification provides a definition of error and of conformant processors' responsibilities with respect to errors in Schemas and Schema-validity Assessment (§5).

2 Conceptual Framework

This chapter gives an overview of XML Schema: Structures at the level of its abstract data model. Schema Component Details (§3) provides details on this model, including a normative representation in XML for the components of the model. Readers interested primarily in learning to write schema documents may wish to first read [XML Schema: Primer] for a tutorial introduction, and only then consult the sub-sections of Schema Component Details (§3) named XML Representation of ... for the details.

next sub-section2.1 Overview of XML Schema

An XML Schema is a set of components such as type definitions and element declarations. These can be used to assess the validity of well-formed element and attribute information items (as defined in [XML-Infoset]), and furthermore may specify augmentations to those items and their descendants. This augmentation makes explicit information which may have been implicit in the original document, such as normalized and/or default values for attributes and elements and the types of element and attribute information items. The input information set can also be augmented with information about the validity of the item, or about other properties described in this specification. [Definition:]  We refer to the augmented infoset which results from conformant processing as defined in this specification as the post-schema-validation infoset, or PSVI. Conforming processors may provide access to some or all of the PSVI, as described in Subset of the Post-schema-validation Infoset (§D.1). The mechanisms by which processors provide such access to the PSVI are neither defined nor constrained by this specification.

Issue (RQ-142i): Issue 2846 (RQ-142 PSVI properties), Issue 2822 (RQ-144 required properties)

Version 1.0 included several properties in the PSVI whose absence carried information (e.g. [type definition]), while at the same time not being completely clear about which PSVI properties, if any, were required. The Working Group intends to eliminate the former and clarify the latter.

Resolution:

For 142, which mandates that insofar as possible absence of a property should not in general signify, when it does explicit 'if-and-only-if' language is required, the effect is distributed throughout the PSVI sub-sub-sections in section 3.

The Working Group appears to be close to consensus (although no final decision has been made) on views which can be summarized thus:

  1. We should eliminate any dependency on the absence of specific properties (i.e. important situations should be describable and distinguishable in terms of properties and their values, without appeal to the absence of particular properties), or if this proves unfeasible in particular cases we should say explicitly that a property is present "if and only if" certain conditions apply. Any remaining "if" (if any) would be a true conditional, not an equivalence.
  2. Any specification of a class of processors (including ours) can require specific additional information not in the PSVI, though should note that interoperability is better if applications depend only on the properties present in the PSVI as we define it.
  3. In our own specification of processor classes, we should be explicit that processors may provide additional information. (Or alternatively be explicit that they must not -- but the chair believes the WG consensus was to allow it.)

For 144, a few general remarks here about flexible-but-firm conformance are wanted here; most of the new work should end up in section 4 and/or 5.

Schema-validity assessment has two aspects:

1 Determining local schema-validity, that is whether an element or attribute information item satisfies the constraints embodied in the relevant components of an XML Schema;
2 Synthesizing an overall validation outcome for the item, combining local schema-validity with the results of schema-validity assessments of its descendants, if any, and adding appropriate augmentations to the infoset to record this outcome.

Throughout this specification, [Definition:]  the word valid and its derivatives are used to refer to clause 1 above, the determination of local schema-validity.

Throughout this specification, [Definition:]   the word assessment is used to refer to the overall process of local validation, schema-validity assessment and infoset augmentation.

previous sub-section next sub-section2.2 XML Schema Abstract Data Model

        2.2.1 Type Definition Components
        2.2.2 Declaration Components
        2.2.3 Model Group Components
        2.2.4 Constraint Components
        2.2.5 Group Definition Components
        2.2.6 Annotation Components

This specification builds on [XML 1.1] and [XML-Namespaces 1.1]. The concepts and definitions used herein regarding XML are framed at the abstract level of information items as defined in [XML-Infoset]. By definition, this use of the infoset provides a priori guarantees of well-formedness (as defined in [XML 1.1]) and namespace conformance (as defined in [XML-Namespaces 1.1]) for all candidates for ·assessment· and for all ·schema documents·.

Just as [XML 1.1] and [XML-Namespaces 1.1] can be described in terms of information items, XML Schemas can be described in terms of an abstract data model. In defining XML Schemas in terms of an abstract data model, this specification rigorously specifies the information which must be available to a conforming XML Schema processor. The abstract model for schemas is conceptual only, and does not mandate any particular implementation or representation of this information. To facilitate interoperation and sharing of schema information, a normative XML interchange format for schemas is provided.

[Definition:]  Schema component is the generic term for the building blocks that comprise the abstract data model of the schema. [Definition:]   An XML Schema is a set of ·schema components·. There are 14 kinds of component in all, falling into three groups. The primary components, which may (type definitions) or must (element and attribute declarations) have names, are as follows:

  • Simple type definitions
  • Complex type definitions
  • Attribute declarations
  • Element declarations

The secondary components, are as follows:

  • Attribute group definitions
  • Identity-constraint definitions
  • Assertions
  • Model group definitions
  • Notation declarations

Finally, the "helper" components provide small parts of other components; they are not independent of their context:

  • Annotations
  • Model groups
  • Particles
  • Wildcards
  • Attribute Uses

The name [Definition:]  Component covers all the different kinds of component defined in this specification.

During ·validation·, [Definition:]  declaration components are associated by (qualified) name to information items being ·validated·.

On the other hand, [Definition:]  definition components define internal schema components that can be used in other schema components.

[Definition:]  Declarations and definitions may and in some cases must have and be identified by names, which are NCNames as defined by [XML-Namespaces 1.1].

[Definition:]  Several kinds of component have a target namespace, which is either ·absent· or a namespace name, also as defined by [XML-Namespaces 1.1]. The ·target namespace· serves to identify the namespace within which the association between the component and its name exists. In the case of declarations, this in turn determines the namespace name of, for example, the element information items it may ·validate·.

Note: At the abstract level, there is no requirement that the components of a schema share a ·target namespace·. Any schema for use in ·assessment· of documents containing names from more than one namespace will of necessity include components with different ·target namespaces·. This contrasts with the situation at the level of the XML representation of components, in which each schema document contributes definitions and declarations to a single target namespace.

·Validation·, defined in detail in Schema Component Details (§3), is a relation between information items and schema components. For example, an attribute information item may ·validate· with respect to an attribute declaration, a list of element information items may ·validate· with respect to a content model, and so on. The following sections briefly introduce the kinds of components in the schema abstract data model, other major features of the abstract model, and how they contribute to ·validation·.

2.2.1 Type Definition Components

The abstract model provides two kinds of type definition component: simple and complex.

[Definition:]  This specification uses the phrase type definition in cases where no distinction need be made between simple and complex types.

Type definitions form a hierarchy with a single root. The subsections below first describe characteristics of that hierarchy, then provide an introduction to simple and complex type definitions themselves.

2.2.1.1 Type Definition Hierarchy

[Definition:]  Except for a distinguished ·ur-type definition·, every ·type definition· is, by construction, either a ·restriction· or an ·extension· of some other type definition. The graph of these relationships forms a tree known as the Type Definition Hierarchy.

[Definition:]  The type definition used as the basis for an ·extension· or ·restriction· is known as the base type definition of that definition.

[Definition:]  ||A type defined with the same constraints as its ·base type definition·, or with more, is said to be a restriction|| ||A type defined by appropriate use of facets or declarations so as to validate a subset of what another type definition validates, with consistent PSVI outcomes, is a restriction of the other type||. ||The added constraints might include narrowed ranges or reduced alternatives. Members of a type, A, whose definition is a ·restriction· of the definition of another type, B, are always members of type B as well.||

Issue (RQ-17i):Issue 2820 (RQ-17 simplify restriction rules)

Version 1.0 made clear that the intention for derivation by restriction was that restrictions validated a subset of what their base validated. However, the constructive rules for what constituted valid content model restrictions for complex type definition not only failed to enforce this completely correctly, but also ruled out various cases which evidently should have been allowed. The Working Group has decided to shift to a much higher level statement of what constitutes a valid restriction, appealing directly to the subset requirement, in order to address these problems.

Resolution:

A major change in definition/presentation, with only modest changes in consequences for schemas and validity, will be made, by defining restriction for complex type definitions in terms of the desired result, that is that all members of a restricted type are members of its base type. In the normative part of the spec. this will be done by appeal to local validity.

"Clarifying: R restricts B: any EII that is locally valid [per R] must also be locally valid [per B], with side conditions on properties on terms you appeal to [to] get same child allowed by two content models." [-F2F 2004-03-12, section Subsumption (W3C-member-only link)]

A non-normative appendix will provide references to published algorithms for enforcing the constraint.

[Definition:]  A complex type definition which allows element or attribute content in addition to that allowed by another specified type definition is said to be an extension.

[Definition:]  A distinguished complex type definition, the ur-type definition, whose name is ||anyType||||rootType|| in the XML Schema namespace, is present in each ·XML Schema·, serving as the root of the type definition hierarchy for that schema.

||

[Definition:]  A further special complex type definition, whose name is anyType in the XML Schema namespace, is also present in each ·XML Schema·. The definition of anyType serves as default type definition for element declarations whose XML representation does not specify one.

||
2.2.1.2 Simple Type Definition

A simple type definition is a set of constraints on strings and information about the values they encode, applicable to the ·normalized value· of an attribute information item or of an element information item with no element children. Informally, it applies to the values of attributes and the text-only content of elements.

Each simple type definition, whether built-in (that is, defined in [XML Schema: Datatypes]) or user-defined, is a ·restriction· of its ·base type definition·. [Definition:]  The simple ur-type definition, a special ·restriction· of the ·ur-type definition·, whose name is anySimpleType in the XML Schema namespace is the root of the ·Type Definition Hierarchy· for the simple type definitions. The ·simple ur-type definition· is considered to have an unconstrained lexical space, and a value space consisting of the union of the value spaces of all the built-in primitive datatypes and the set of all lists of all members of the value spaces of all the built-in primitive datatypes. The built-in list datatypes all have the ·simple ur-type definition· as their ·base type definition·.

[Definition:]  There is a further special datatype called anyAtomicType, a ·restriction· of the ·simple ur-type definition·, which is the ·base type definition· of all the primitive built-in datatypes. It too is considered to have an unconstrained lexical space. Its value space consists of the union of the value spaces of all the built-in primitive datatypes.

The mapping from lexical space to value space is unspecified for items whose type definition is the ·simple ur-type definition·or ·anyAtomicType·. Accordingly this specification does not constrain processors' behavior in areas where this mapping is implicated, for example checking such items against enumerations, constructing default attributes or elements whose declared type definition is the ·simple ur-type definition·, checking identity constraints involving such items.

Note: The Working Group expects to return to this area in a future version of this specification.

[XML Schema: Datatypes] provides mechanisms for defining new simple type definitions by ·restricting· one of the built-in primitive or ordinary datatypes. It also provides mechanisms for constructing new simple type definitions whose members are lists of items themselves constrained by some other simple type definition, or whose membership is the union of the memberships of some other simple type definitions. Such list and union simple type definitions are also ·restrictions· of the ·simple ur-type definition·.

For detailed information on simple type definitions, see Simple Type Definitions (§3.15) and [XML Schema: Datatypes]. The latter also defines an extensive inventory of pre-defined simple types.

2.2.1.3 Complex Type Definition

A complex type definition is a set of attribute declarations and a content type, applicable to the [attributes] and [children] of an element information item respectively. The content type may require the [children] to contain neither element nor character information items (that is, to be empty), or to be a string which belongs to a particular simple type, or to contain a sequence of element information items which conforms to a particular model group, with or without character information items as well.

Each complex type definition other than the ·ur-type definition· is either

or

A complex type which extends another does so by having additional content model particles at the end of the other definition's content model, or by having additional attribute declarations, or both.

Note: This specification allows only appending, and not other kinds of extensions. This decision simplifies application processing required to cast instances from derived to base type. Future versions may allow more kinds of extension, requiring more complex transformations to effect casting.

For detailed information on complex type definitions, see Complex Type Definitions (§3.4).

2.2.2 Declaration Components

There are three kinds of declaration component: element, attribute, and notation. Each is described in a section below. Also included is a discussion of element substitution groups, which is a feature provided in conjunction with element declarations.

2.2.2.2 Element Substitution Group

In XML, the name and content of an element must correspond exactly to the element type referenced in the corresponding content model.

[Definition:]  Through the new mechanism of element substitution groups, XML Schemas provides a more powerful model supporting substitution of one named element for another. Any top-level element declaration can serve as the defining member, or head, for an element substitution group. Other top-level element declarations, regardless of target namespace, can be designated as members of the substitution group headed by this element. In a suitably enabled content model, a reference to the head ·validates· not just the head itself, but elements corresponding to any other member of the substitution group as well.

All such members must have type definitions which are either the same as the head's type definition or restrictions or extensions of it. Therefore, although the names of elements can vary widely as new namespaces and members of the substitution group are defined, the content of member elements is strictly limited according to the type definition of the substitution group head.

Note that element substitution groups are not represented as separate components. They are specified in the property values for element declarations (see Element Declarations (§3.3)).

2.2.2.4 Notation Declaration

A notation declaration is an association between a name and an identifier for a notation. For an attribute information item to be ·valid· with respect to a NOTATION simple type definition, its value must have been declared with a notation declaration.

For detailed information on notation declarations, see Notation Declarations (§3.13).

2.2.3 Model Group Components

The model group, particle, and wildcard components contribute to the portion of a complex type definition that controls an element information item's content.

2.2.3.2 Particle

A particle is a term in the grammar for element content, consisting of either an element declaration, a wildcard or a model group, together with occurrence constraints. Particles contribute to ·validation· as part of complex type definition ·validation·, when they allow anywhere from zero to many element information items or sequences thereof, depending on their contents and occurrence constraints.

The name [Definition:]  Term is used to refer to any of the three kinds of components which can appear in particles. All ·Terms· are themselves ·Annotated Components·. [Definition:]  A basic term is an Element Declaration or a Wildcard. [Definition:]  A basic particle is a Particle whose {term} is a ·basic term·.

[Definition:]  A particle can be used in a complex type definition to constrain the ·validation· of the [children] of an element information item; such a particle is called a content model.

Note: XML Schema: Structures ·content models· are similar to but more expressive than [XML 1.1] content models; unlike [XML 1.1], XML Schema: Structures applies ·content models· to the ·validation· of both mixed and element-only content.

Each content model, indeed each particle, denotes a set of sequences of element information items. Regarding that set of sequences as a language, the set of sequences recognized by a particle P may be written L(P). [Definition:]  A particle P is said to accept or recognize the members of L(P).

Note: The language accepted by a content model plays a role in determining whether an element information item is locally valid or not: if the appropriate content model does not accept the sequence of elements among its children, then the element information item is not locally valid. (Some additional constraints must also be met: not every sequence in L(P) is locally valid against P. See Principles of Validation against Groups (§3.8.4.2).)

No assumption is made, in the definition above, that the items in the sequence are themselves valid; only the expanded names of the items in the sequence are relevant in determining whether the sequence is accepted by a particle. Their validity does affect whether their parent is (recursively) valid as well as locally valid.

If a sequence S is a member of L(P), then it is necessarily possible to trace a path through the ·basic particles· within P, with each item within S corresponding to a matching particle within P. The sequence of particles within P corresponding to S is called the ·path· of S in P.

Note: This ·path· has nothing to do with [XPath] or XPath expressions. When there may otherwise be danger of confusion, the ·path· described here may be referred to as the ·match path· of S in P.

For detailed information on particles, see Particles (§3.9).

2.2.4 Constraint Components

2.2.4.1 Identity-constraint Definition

An identity-constraint definition is an association between a name and one of several varieties of identity-constraint related to uniqueness and reference. All the varieties use [XPath] expressions to pick out sets of information items relative to particular target element information items which are unique, or a key, or a ·valid· reference, within a specified scope. An element information item is only ·valid· with respect to an element declaration with identity-constraint definitions if those definitions are all satisfied for all the descendants of that element information item which they pick out.

For detailed information on identity-constraint definitions, see Identity-constraint Definitions (§3.11).

Note:  Identity constraints currently uses XPath 1.0. This may change in future working drafts of this specification to use XPath 2.0. Such change will not affect evaluation of identity constraints, given the XPath subset it uses.
2.2.4.2 Assertion

An assertion is a predicate associated with a type, which is checked for each instance of the type. Depending on their formulation, assertions are either required to be true of the instance, or required to be false. If an element or attribute information item fails to satisfy an assertion associated with a given type, then that information item is not locally ·valid· with respect to that type.

For detailed information on Assertions, see Assertions (§3.12).

Note:  Assertions are currently only allowed to be specified in complex types. It may be deemed useful to also include assertions in named model group definitions and/or attribute groups, or even simple types, if proved useful.

previous sub-section next sub-section2.3 Constraints and Validation Rules

The [XML 1.1] specification describes two kinds of constraints on XML documents: well-formedness and validity constraints. Informally, the well-formedness constraints are those imposed by the definition of XML itself (such as the rules for the use of the < and > characters and the rules for proper nesting of elements), while validity constraints are the further constraints on document structure provided by a particular DTD.

The preceding section focused on ·validation·, that is the constraints on information items which schema components supply. In fact however this specification provides four different kinds of normative statements about schema components, their representations in XML and their contribution to the ·validation· of information items:

Schema Component Constraint
[Definition:]  Constraints on the schema components themselves, i.e. conditions components must satisfy to be components at all. Located in the sixth sub-section of the per-component sections of Schema Component Details (§3) and tabulated in Schema Component Constraints (§C.4).
Schema Representation Constraint
[Definition:]  Constraints on the representation of schema components in XML beyond those which are expressed in Schema for Schema Documents (Structures) (normative) (§A). Located in the third sub-section of the per-component sections of Schema Component Details (§3) and tabulated in Schema Representation Constraints (§C.3).
Validation Rules
[Definition:]  Contributions to ·validation· associated with schema components. Located in the fourth sub-section of the per-component sections of Schema Component Details (§3) and tabulated in Validation Rules (§C.1).
Schema Information Set Contribution
[Definition:]  Augmentations to ·post-schema-validation infoset·s expressed by schema components, which follow as a consequence of ·validation· and/or ·assessment·. Located in the fifth sub-section of the per-component sections of Schema Component Details (§3) and tabulated in Contributions to the post-schema-validation infoset (§C.2).

The last of these, schema information set contributions, are not as new as they might at first seem. XML validation augments the XML information set in similar ways, for example by providing values for attributes not present in instances, and by implicitly exploiting type information for normalization or access. (As an example of the latter case, consider the effect of NMTOKENS on attribute white space, and the semantics of ID and IDREF.) By including schema information set contributions, this specification makes explicit some features that XML leaves implicit.

previous sub-section next sub-section2.4 Conformance

This specification describes three levels of conformance for schema aware processors. The first is required of all processors. Support for the other two will depend on the application environments for which the processor is intended.

[Definition:]  Minimally conforming processors must completely and correctly implement the ·Schema Component Constraints·, ·Validation Rules·, and ·Schema Information Set Contributions· contained in this specification.

[Definition:]  ·Minimally conforming· processors which accept schemas represented in the form of XML documents as described in Layer 2: Schema Documents, Namespaces and Composition (§4.2) are additionally said to be schema-document aware. Such processors must, when processing schema documents, completely and correctly implement all ·Schema Representation Constraints· in this specification, and must adhere exactly to the specifications in Schema Component Details (§3) for mapping the contents of such documents to ·schema components· for use in ·validation· and ·assessment·.

[Definition:]  A ·minimally conforming· processor which is not ·schema-document aware· is said to be a non-schema-document-aware processor.

Note: By separating the conformance requirements relating to the concrete syntax of XML schema documents, this specification admits processors which use schemas stored in optimized binary representations, dynamically created schemas represented as programming language data structures, or implementations in which particular schemas are compiled into executable code such as C or Java. Such processors can be said to be ·minimally conforming· but not necessarily ·schema-document aware·.

[Definition:]  Web-aware processors are network-enabled processors which are not only both ·minimally conforming· and ·schema-document aware·, but which additionally must be capable of accessing schema documents from the World Wide Web as described in Representation of Schemas on the World Wide Web (§2.7) and How schema definitions are located on the Web (§4.3.2). .

||
Note: In version 1.0 of this specification the class of ·schema-document aware· processors was termed "conformant to the XML Representation of Schemas". Similarly, the class of ·Web-aware· processors was called "fully conforming".
||
Note: Although this specification provides just these three standard levels of conformance, it is anticipated that other conventions can be established in the future. For example, the World Wide Web Consortium is considering conventions for packaging on the Web a variety of resources relating to individual documents and namespaces. Should such developments lead to new conventions for representing schemas, or for accessing them on the Web, new levels of conformance can be established and named at that time. There is no need to modify or republish this specification to define such additional levels of conformance.

See Schemas and Namespaces: Access and Composition (§4) for a more detailed explanation of the mechanisms supporting these levels of conformance.

previous sub-section next sub-section2.5 Names and Symbol Spaces

As discussed in XML Schema Abstract Data Model (§2.2), most schema components (may) have ·names·. If all such names were assigned from the same "pool", then it would be impossible to have, for example, a simple type definition and an element declaration both with the name "title" in a given ·target namespace·.

Therefore [Definition:]  this specification introduces the term symbol space to denote a collection of names, each of which is unique with respect to the others. A symbol space is similar to the non-normative concept of namespace partition introduced in [XML-Namespaces 1.1]. There is a single distinct symbol space within a given ·target namespace· for each kind of definition and declaration component identified in XML Schema Abstract Data Model (§2.2), except that within a target namespace, simple type definitions and complex type definitions share a symbol space. Within a given symbol space, names are unique, but the same name may appear in more than one symbol space without conflict. For example, the same name can appear in both a type definition and an element declaration, without conflict or necessary relation between the two.

Locally scoped attribute and element declarations are special with regard to symbol spaces. Every complex type definition defines its own local attribute and element declaration symbol spaces, where these symbol spaces are distinct from each other and from any of the other symbol spaces. So, for example, two complex type definitions having the same target namespace can contain a local attribute declaration for the unqualified name "priority", or contain a local element declaration for the name "address", without conflict or necessary relation between the two.

previous sub-section next sub-section2.6 Schema-Related Markup in Documents Being Validated

The XML representation of schema components uses a vocabulary identified by the namespace name http://www.w3.org/2001/XMLSchema. For brevity, the text and examples in this specification use the prefix xs: to stand for this namespace; in practice, any prefix can be used.

Issue (RQ-153i):Issue 3047 (RQ-153 XSD 1.1 namespace)

This specification must choose either to use the same namespace as XML Schema 1.0, or to use a different namespace, or to use more than one namespace. An explicit decision should be made.

XML Schema: Structures also defines several attributes for direct use in any XML documents. These attributes are in a different namespace, which has the namespace name http://www.w3.org/2001/XMLSchema-instance. For brevity, the text and examples in this specification use the prefix xsi: to stand for this latter namespace; in practice, any prefix can be used. All schema processors have appropriate attribute declarations for these attributes built in, see Attribute Declaration for the 'type' attribute (§3.2.7), Attribute Declaration for the 'nil' attribute (§3.2.7), Attribute Declaration for the 'schemaLocation' attribute (§3.2.7) and Attribute Declaration for the 'noNamespaceSchemaLocation' attribute (§3.2.7).

2.6.1 xsi:type

The Simple Type Definition (§2.2.1.2) or Complex Type Definition (§2.2.1.3) used in ·validation· of an element is usually determined by reference to the appropriate schema components. An element information item in an instance may, however, explicitly assert its type using the attribute xsi:type. The value of this attribute is a ·QName·; see QName Interpretation (§3.16.3) for the means by which the ·QName· is associated with a type definition.

2.6.2 xsi:nil

XML Schema: Structures introduces a mechanism for signaling that an element must be accepted as ·valid· when it has no content despite a content type which does not require or even necessarily allow empty content. An element may be ·valid· without content if it has the attribute xsi:nil with the value true. An element so labeled must be empty, but can carry attributes if permitted by the corresponding complex type.

3 Schema Component Details

next sub-section3.1 Introduction

The following sections provide full details on the composition of all schema components, together with their XML representations and their contributions to ·assessment·. Each section is devoted to a single component, with separate subsections for

  1. properties: their values and significance
  2. XML representation and the mapping to properties
  3. constraints on representation
  4. validation rules
  5. ·post-schema-validation infoset· contributions
  6. constraints on the components themselves

The sub-sections immediately below introduce conventions and terminology used throughout the component sections.

3.1.1 Components and Properties

Components are defined in terms of their properties, and each property in turn is defined by giving its range, that is the values it may have. This can be understood as defining a schema as a labeled directed graph, where the root is a schema, every other vertex is a schema component or a literal (string, boolean, decimal) and every labeled edge is a property. The graph is not acyclic: multiple copies of components with the same name in the same ·symbol space· must not exist, so in some cases re-entrant chains of properties will exist. Equality of components for the purposes of this specification is always defined as equality of names (including target namespaces) within symbol spaces.

Issue (RQ-125i):Issue 2837 (RQ-125 identity of anonymous types), Issue 2842 (RQ-134 inherited portions of content model)

Version 1.0 was deliberately reticent in stating identity conditions for components. With hindsight this was a mistake, and will be corrected.

Resolution:

Add {scope} property to type definition components which will either be the enclosing element declaration or "global", by analogy with element declarations {scope}. [For further context, see F2F 2004-03-12, section RQ-125 (W3C-member-only link).]

This change will solve the anonymous type equality problem by giving an unequivocal answer to the "who am I?" question for such types by way of the answer "Your identity is determined by your scope's identity."

Note: A schema and its components as defined in this chapter are an idealization of the information a schema-aware processor requires: implementations are not constrained in how they provide it. In particular, no implications about literal embedding versus indirection follow from the use below of language such as "properties . . . having . . . components as values".

Component properties are simply named values. Most properties have either other components or literals (that is, strings or booleans or enumerated keywords) for values, but in a few cases, where more complex values are involved, [Definition:]  a property value may itself be a collection of named values, which we call a property record.

[Definition:]  Throughout this specification, the term absent is used as a distinguished property value denoting absence. Again this should not be interpreting as constraining implementations, as for instance between using a null value for such properties or not representing them at all.

Any property not defined as optional is always present; optional properties which are not present are taken to have ·absent· as their value. Any property identified as a having a set, subset or list value may have an empty value unless this is explicitly ruled out: this is not the same as ·absent·. Any property value identified as a superset or subset of some set may be equal to that set, unless a proper superset or subset is explicitly called for. By 'string' in Part 1 of this specification is meant a sequence of ISO 10646 characters identified as legal XML characters in [XML 1.1].

Note: It is implementation-defined whether a schema processor uses the definition of legal character from [XML 1.1] or [XML 1.0].

3.1.2 XML Representations of Components

The principal purpose of XML Schema: Structures is to define a set of schema components that constrain the contents of instances and augment the information sets thereof. Although no external representation of schemas is required for this purpose, such representations will obviously be widely used. To provide for this in an appropriate and interoperable way, this specification provides a normative XML representation for schemas which makes provision for every kind of schema component. [Definition:]  A document in this form (i.e. a <schema> element information item) is a schema document. For the schema document as a whole, and its constituents, the sections below define correspondences between element information items (with declarations in Schema for Schema Documents (Structures) (normative) (§A) and DTD for Schemas (non-normative) (§L)) and schema components. All the element information items in the XML representation of a schema must be in the XML Schema namespace, that is their [namespace name] must be http://www.w3.org/2001/XMLSchema. Although a common way of creating the XML Infosets which are or contain ·schema documents· will be using an XML parser, this is not required: any mechanism which constructs conformant infosets as defined in [XML-Infoset] is a possible starting point.

Two aspects of the XML representations of components presented in the following sections are constant across them all:

  1. All of them allow attributes qualified with namespace names other than the XML Schema namespace itself: these appear as annotations in the corresponding schema component;
  2. All of them allow an <annotation> as their first child, for human-readable documentation and/or machine-targeted information.

3.1.3 The Mapping between XML Representations and Components

For each kind of schema component there is a corresponding normative XML representation. The sections below describe the correspondences between the properties of each kind of schema component on the one hand and the properties of information items in that XML representation on the other, together with constraints on that representation above and beyond those implicit in the Schema for Schema Documents (Structures) (normative) (§A).

The language used is as if the correspondences were mappings from XML representation to schema component, but the mapping in the other direction, and therefore the correspondence in the abstract, can always be constructed therefrom.

In discussing the mapping from XML representations to schema components below, the value of a component property is often determined by the value of an attribute information item, one of the [attributes] of an element information item. Since schema documents are constrained by the Schema for Schema Documents (Structures) (normative) (§A), there is always a simple type definition associated with any such attribute information item. [Definition:]  The phrase actual value is used to refer to the member of the value space of the simple type definition associated with an attribute information item which corresponds to its ·normalized value·. This will often be a string, but may also be an integer, a boolean, a URI reference, etc. This term is also occasionally used with respect to element or attribute information items in a document being ·validated·.

Many properties are identified below as having other schema components or sets of components as values. For the purposes of exposition, the definitions in this section assume that (unless the property is explicitly identified as optional) all such values are in fact present. When schema components are constructed from XML representations involving reference by name to other components, this assumption may be violated if one or more references cannot be resolved. This specification addresses the matter of missing components in a uniform manner, described in Missing Sub-components (§5.3): no mention of handling missing components will be found in the individual component descriptions below.

Forward reference to named definitions and declarations is allowed, both within and between ·schema documents·. By the time the component corresponding to an XML representation which contains a forward reference is actually needed for ·validation· an appropriately-named component may have become available to discharge the reference: see Schemas and Namespaces: Access and Composition (§4) for details.

3.1.4 White Space Normalization during Validation

Throughout this specification, [Definition:]  the initial value of some attribute information item is the value of the [normalized value] property of that item. Similarly, the initial value of an element information item is the string composed of, in order, the [character code] of each character information item in the [children] of that element information item.

The above definition means that comments and processing instructions, even in the midst of text, are ignored for all ·validation· purposes.

[Definition:]  The normalized value of an element or attribute information item is an ·initial value· whose white space, if any, has been normalized according to the value of the whiteSpace facet of the simple type definition used in its ·validation·:

preserve
No normalization is done, the value is the ·normalized value·
replace
All occurrences of #x9 (tab), #xA (line feed) and #xD (carriage return) are replaced with #x20 (space).
collapse
Subsequent to the replacements specified above under replace, contiguous sequences of #x20s are collapsed to a single #x20, and initial and/or final #x20s are deleted.

If the simple type definition used in an item's ·validation· is the ·simple ur-type definition·, then the ·normalized value· must be determined as in the preserve case above.

There are three alternative validation rules which may supply the necessary background for the above: Attribute Locally Valid (§3.2.4) (clause 3), Element Locally Valid (Type) (§3.3.4) (clause 3.1.3) or Element Locally Valid (Complex Type) (§3.4.4) (clause 1.2).

These three levels of normalization correspond to the processing mandated in XML for element content, CDATA attribute content and tokenized attributed content, respectively. See Attribute Value Normalization in [XML 1.1] for the precedent for replace and collapse for attributes. Extending this processing to element content is necessary to ensure a consistent ·validation· semantics for simple types, regardless of whether they are applied to attributes or elements. Performing it twice in the case of attributes whose [normalized value] has already been subject to replacement or collapse on the basis of information in a DTD is necessary to ensure consistent treatment of attributes regardless of the extent to which DTD-based information has been made use of during infoset construction.

Note: Even when DTD-based information has been appealed to, and Attribute Value Normalization has taken place, the above definition of ·normalized value· may mean further normalization takes place, as for instance when character entity references in attribute values result in white space characters other than spaces in their ·initial value·s.
Note: The values replace and collapse may appear to provide a convenient way to "unwrap" text (i.e. undo the effects of pretty-printing and word-wrapping). In some cases, especially highly constrained data consisting of lists of artificial tokens such as part numbers or other identifiers, this appearance is correct. For natural-language data, however, the whitespace processing prescribed for these values is not only unreliable but will systematically remove the information needed to perform unwrapping correctly. For Asian scripts, for example, a correct unwrapping process will replace line boundaries not with blanks but with zero-width separators or nothing. In consequence, it is normally unwise to use these values for natural-language data, or for any data other than lists of highly constrained tokens.

previous sub-section next sub-section