W3C

Associating Schemas with XML documents 1.0 (Third Edition)

W3C Working Group Note 09 October 2012

This version:
http://www.w3.org/TR/2012/NOTE-xml-model-20121009/
Latest version:
http://www.w3.org/TR/xml-model/
Previous version:
http://www.w3.org/TR/2011/NOTE-xml-model-20110811/
Editors:
Paul Grosso <paul@paulgrosso.name>
Jirka Kosek <jirka@kosek.cz>

This document is also available in these non-normative formats: XML and version with differences between the Second and Third Edition marked.


Abstract

This document allows schemas using any schema definition language to be associated with an XML document by including one or more processing instructions with a target of xml-model in the document's prolog.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is the third publication of this document as a Working Group Note. This document is a product of the XML Core Working Group as part of the W3C XML Activity. The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/2003/03/Translations/byTechnology?technology=xml-model.

The content of the first and second editions of this WG Note was reviewed by various audiences including ISO JTC1/SC34. This third edition differs from the second edition merely in an update of this Status and an addition to the references to reflect the publication of ISO/IEC 19757-11; the technical content is unchanged.

Please submit any comments on this document to xml-editor@w3.org; public archives are available.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

However, this specification is being jointly developed by the W3C and ISO/IEC JTC1/SC34. The technical content of this specification and that of ISO/IEC 19757-11 is identical.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Note: On 14 November 2012 we edited the document in place to fix a bug in the style sheet that shows the comparison with ISO/IEC 19757-11 above.

Table of Contents

1 Introduction
2 Conformance requirements
3 The xml-model processing instruction

Appendices

A Normative References
B Examples (Non-Normative)
C Suggested use of schematypens for determining schema language (Non-Normative)


1 Introduction

(This section is non-normative.)

There are several document schema definition languages in common use today that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) documents. Some schema languages provide their own syntax for associating schemas with documents (DTD, W3C XML Schema) and some languages (RELAX NG, Schematron) do not provide schema association mechanisms at all. The purpose of this specification is to define a common, schema-agnostic syntax for associating schema documents written in any schema definition language with a given XML document.

This specification defines the syntax and processing expectations for an xml-model processing instruction. Such processing instructions associate one or more schemas with the XML document in which they are present. The associated schemas may be written in any schema definition language. Applications can use the associated schemas for any purpose including those such as document validation, content completion in interactive editors, or creating models for data binding. Presence of an xml-model processing instruction is not in itself an instruction to any processor to validate the document, nor is it a statement that the document is not to be processed without validation. It is a declarative statement of a relationship between the document and one or more external schemas. This specification does not prescribe what, if anything, an application does with an xml-model processing instruction. The presence of an xml-model processing instruction referencing a DTD does not affect the validity of the document which contains it.

It should be noted that this specification is not meant as a replacement for other technologies that provide more general and indirect schema association features like NVDL and XProc. This specification is complementary technology which can be used when it is necessary to store ad-hoc schema associations directly inside XML document.

2 Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words must, must not, should, should not and may in the normative parts of this document are to be interpreted as described in RFC 2119. These words do not appear in all uppercase letters in this specification [RFC2119].

Documents

A document is considered to conform to this specification if it satisfies all must-level criteria in this specification that apply to documents.

xml-model processors

XML defines an application as a software module which receives the information content of an XML document from an XML processor. [Definition: An xml-model processor is such an application which processes XML processing instructions in accordance with this specification.]

An xml-model processor may be part of a larger XML application, or may function independently. In either case, [Definition: an application is the consumer of the pseudo-attribute information defined in this specification.]

An xml-model processor is considered to be a conforming xml-model processor if it satisfies all must-level criteria in this specification that apply to xml-model processors; xml-model processors do not have to check or enforce any of the constraints on documents.

This specification is defined with reference to the vocabulary for XML provided by the XML Information Set as well as the rules for parsing pseudo-attributes from a string as defined in the Associating Style Sheets with XML documents Recommendation [ASSOCSS].

The productions in this specification use the same notation as used in the XML Recommendation. Tokens in the grammar and terms used in this specification that are not defined in this specification are defined in the XML Recommendation [XML] or the Associating Style Sheets [ASSOCSS] Recommendation.

3 The xml-model processing instruction

[Definition: A processing instruction information item is said to be a potential xml-model processing instruction if it has the [target] property xml-model and it is in the [children] property of a document information item and appears before the element information item of the document information item's [children] property.]

For such potential xml-model processing instructions, xml-model processors must report to the application the parsing result of invoking the rules for parsing pseudo-attributes from a string, using the processing instruction information item's [content] property as the string.

[Definition: A potential xml-model processing instruction is said to be an xml-model processing instruction if the parsing result is not an error when invoking the rules for parsing pseudo-attributes from a string, using the processing instruction information item's [content] property as the string.]

Documents must not contain processing instruction information items with the [target] property xml-model that are not xml-model processing instructions.

An xml-model processor must process all xml-model processing instructions properly and must pass on to the application the full parsing result for each xml-model processing instruction.

An xml-model processing instruction will match the following production:

Production for xml-model processing instruction
[1]   XmlModelPI   ::=   "<?xml-model" (S PseudoAtts)? - (Char* "?>" Char*) "?>"

Documents may specify the following pseudo-attributes on xml-model processing instructions, unless otherwise stated:

href

Specifies the location of the referenced schema. Documents must specify this pseudo-attribute. Documents must set the value to a string that matches the grammar for <IRI-reference> given in RFC 3987 [RFC3987].

type

Specifies the content type of the referenced schema. If unspecified, the xml-model processor should return a parsing result that would be identical to that when the value is given as application/xml. The value of this pseudo-attribute is advisory in that it is intended to be used by an application only when no other source of media type information becomes available during retrieval of the schema itself.

schematypens

Specifies the namespace name of the schema language in which the referenced schema is written. The application can use this value when determining whether it can make use of the referenced schema.

charset

Specifies the character encoding for the referenced schema. If specified, documents must set the value to a valid character encoding name, which must be the name or alias labeled as "preferred MIME name" in the IANA Character Sets registry, if there is one, or the encoding's name, if none of the aliases are so labeled [IANACHARSET].

title

Gives the title (or other human readable description) of the referenced schema. If specified, documents may use any string as the value.

group

If, for any xml-model processing instruction, its group pseudo-attribute has a non-empty value, special rules for associating schemas apply as follows:

  1. By default only schemas which do not have a group pseudo-attribute specified or schemas which have an empty value in the group pseudo-attribute on the corresponding xml-model processing instruction are treated as being associated with XML document.

  2. An application may provide an interface for specifying a group name. If the group name is specified, only schemas which have the same value specified in the group pseudo-attribute on the corresponding xml-model processing instruction are considered to be associated with the XML document.

phase

Gives the phase name of the validation function for use with a Schematron schema. If specified, documents may use any string as the value. If specified, the xml-model processor should include this information in the parsing result (regardless of the language of the associated schema).

If the associated schema is a Schematron schema, and the parsing result includes the phase pseudo-attribute, then the application is expected to use the value of this pseudo-attribute as the phase name of the validation function (see ISO/IEC 19757-3, Section 6.1 [ISO/IEC 19757-3]).

To allow for extensibility, documents may specify other pseudo-attributes on xml-model processing instructions.

This specification provides a way to associate multiple schemas with a given XML document. Furthermore, there exist other ways certain schemas can be associated with a given XML document. Regardless of the association method, this specification does not prescribe the processing order when multiple schemas are associated with a given XML document.

In particular, this specification does not define the interaction of xml-model processing instructions with xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes which provide hints for locating schema in W3C XML Schema. Applications supporting both xml-model processing instructions and xsi:schemaLocation/xsi:noNamespaceSchemaLocation attributes may provide means for specifying which information takes precedence.

A Normative References

ASSOCSS
Associating Style Sheets with XML documents 1.0 (Second Edition). W3C, 28 October 2010. (See http://www.w3.org/TR/2010/REC-xml-stylesheet-20101028/.)
IANACHARSET
Character Sets. IANA, May 2007. (See http://www.iana.org/assignments/character-sets.)
INFOSET
XML Information Set, J. Cowan, R. Tobin. W3C, February 2004. (See http://www.w3.org/TR/xml-infoset/.)
ISO/IEC 19757-3
ISO/IEC 19757-3:2006. Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron, International Organization for Standardization and International Electrotechnical Commission. 2006.
ISO/IEC 19757-11
ISO/IEC 19757-11:2011. Information technology — Document Schema Definition Languages (DSDL) — Part 11: Schema association, International Organization for Standardization and International Electrotechnical Commission. 2011. (See http://standards.iso.org/ittf/PubliclyAvailableStandards/c054793_ISO_IEC_19757-11_2011.zip.)
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
RFC3987
Internationalized Resource Identifiers (IRIs), M. Dürst, M. Suignard. IETF, January 2005. (See http://www.ietf.org/rfc/rfc3987.txt.)
XML
Extensible Markup Language, T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, F. Yergeau. W3C, November 2008. (See http://www.w3.org/TR/xml/.)

B Examples (Non-Normative)

Example: Multiple schemas associated
<?xml version="1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.rng"?>
<?xml-model href="http://www.docbook.org/xml/5.0/xsd/docbook.xsd"?>
<book xmlns="http://docbook.org/ns/docbook">
 …
</book>
Example: Alternative schema groups
<?xml-model href="xhtml-transitional.xsd"?>
<?xml-model href="xhtml-strict.xsd" 
   group="Strict"
   title="Check against strict document type"?>
<?xml-model href="xhtml-strict-additional-constraints.sch" 
   group="Strict"
   title="Check against strict document type complex constraints"?>

C Suggested use of schematypens for determining schema language (Non-Normative)

Use of a combination of schematypens and type allows for the identification of many widely used schema languages as shown in the following table.

Schema languagetypeschematypens
DTDapplication/xml-dtdunspecified*
W3C XML Schemaunspecified* or application/xmlhttp://www.w3.org/2001/XMLSchema
RELAX NGunspecified* or application/xmlhttp://relaxng.org/ns/structure/1.0
RELAX NG – compact syntaxapplication/relax-ng-compact-syntaxunspecified*
Schematronunspecified* or application/xmlhttp://purl.oclc.org/dsdl/schematron
NVDLunspecified* or application/xmlhttp://purl.oclc.org/dsdl/nvdl/ns/structure/1.0
* A value of “unspecified” above indicates that the corresponding pseudo-attribute is not specified on the xml-model processing instruction.
Example: Multiple schemas associated
<?xml version="1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/xsd/docbook.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
<book xmlns="http://docbook.org/ns/docbook">
 …
</book>