W3C

Associating Schemas with XML documents 1.0 (First Edition)

Editor's Draft 15 March 2010

This version:
http://www.w3.org/XML/2010/01/xml-model/
Previous version:
None
Editors:
Paul Grosso, PTC/Arbortext <pgrosso@ptc.com>
Jirka Kosek <jirka@kosek.cz>

This document is also available in these non-normative formats: XML.


Abstract

This document allows schemas using any schema definition language to be associated with an XML document by including one or more processing instructions with a target of xml-model in the document's prolog.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a product of the XML Core Working Group as part of the W3C XML Activity. The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/2003/03/Translations/byTechnology?technology=xml-model.

This current draft has no official status. It is submitted for general review by W3C members and the public at this time in anticipation of its being published as a Working Group Note (WG Note) at a future date.

Please submit any comments on this document to xml-editor@w3.org; public archives are available.

This is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Conformance requirements
3 The xml-model processing instruction

Appendices

A Normative References
B Examples (Non-Normative)
C Suggested use of schematypens for determining schema language (Non-Normative)


1 Introduction

(This section is non-normative.)

There are several document schema definition languages in common use today that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) documents. Some schema languages provide their own syntax for associating schemas with documents (DTD, W3C XML Schema) and some languages (RELAX NG, Schematron) do not provide schema association mechanisms at all. The purpose of this specification is to define a common, schema-agnostic syntax for associating schema documents written in any schema definition language with a given XML document.

This specification defines the syntax and processing expectations for an xml-model processing instruction. Such processing instructions associate one or more schemas with the XML document in which they are present. The associated schemas may be written in any schema definition language. Applications can use the associated schemas for any purpose including those such as document validation, content completion in interactive editors, or creating models for data binding. Presence of an xml-model processing instruction is not in itself an instruction to any processor to validate the document, nor is it a statement that the document is not to be processed without validation. It is a declarative statement of a relationship between the document and one or more external schemas.
Editorial note: PBG2010-03-15
The previous two sentences address ISSUE-5

It should be noted that this specification is not meant as a replacement for other technologies that provide more general and indirect schema association features like NVDL and XProc. This specification is complementary technology which can be used when it is necessary to store ad-hoc schema associations directly inside XML document.
Editorial note: PBG2010-03-15
This paragraph addresses ISSUE-4

2 Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words must, must not, should, should not and may in the normative parts of this document are to be interpreted as described in RFC 2119. These words do not appear in all uppercase letters in this specification [RFC2119].

Documents

A document is considered to conform to this specification if it satisfies all must-level criteria in this specification that apply to documents.

xml-model processors

XML defines an application as a software module which receives the information content of an XML document from an XML processor. [Definition: An xml-model processor is such an application which processes XML processing instructions in accordance with this specification.]

An xml-model processor may be part of a larger XML application, or may function independently. In either case, [Definition: an application is the consumer of the pseudo-attribute information defined in this specification.]

An xml-model processor is considered to be a conforming xml-model processor if it satisfies all must-level criteria in this specification that apply to xml-model processors; xml-model processors do not have to check or enforce any of the constraints on documents.

This specification is defined with reference to the vocabulary for XML provided by the XML Information Set as well as the rules for parsing pseudo-attributes from a string as defined in the Associating Style Sheets with XML documents Recommendation [ASSOCSS].

The productions in this specification use the same notation as used in the XML Recommendation. Tokens in the grammar and terms used in this specification that are not defined in this specification are defined in the XML Recommendation [XML] or the Associating Style Sheets [ASSOCSS] Recommendation.

3 The xml-model processing instruction

[Definition: A processing instruction information item is said to be a potential xml-model processing instruction if it has the [target] property xml-model and it is in the [children] property of a document information item and appears before the element information item of the document information item's [children] property.]

For such potential xml-model processing instructions, xml-model processors must report to the application the parsing result of invoking the rules for parsing pseudo-attributes from a string, using the processing instruction information item's [content] property as the string.

[Definition: A potential xml-model processing instruction is said to be an xml-model processing instruction if the parsing result is not an error when invoking the rules for parsing pseudo-attributes from a string, using the processing instruction information item's [content] property as the string.]

Documents must not use processing instruction information items with the [target] property xml-model if they are not xml-model processing instructions.

An xml-model processor must process all xml-model processing instructions properly and must pass on to the application the full parsing result for each xml-model processing instruction.
Editorial note: PBG2010-03-15
The “shoulds” in this paragraph have been changed to “musts” as part of addressing ISSUE-2

An xml-model processing instruction will match the following production:

Production for xml-model processing instruction
[1]   XmlModelPI   ::=   "<?xml-model" (S PseudoAtts)? - (Char* "?>" Char*) "?>"

Documents may specify the following pseudo-attributes on xml-model processing instructions, unless otherwise stated:

href

Specifies the location of the referenced schema. Documents must specify this pseudo-attribute. Documents must set the value to a string that matches the grammar for <IRI-reference> given in RFC 3987 [RFC3987].

type

Specifies the content type of the referenced schema. If unspecified, the xml-model processor should return a parsing result that would be identical to that when the value is given as application/xml. The value of this pseudo-attribute is advisory in that it is intended to be used by an application only when no other source of media type information becomes available during retrieval of the stylesheet itself.
Editorial note: PBG2010-03-15
The previous sentence addresses ISSUE-3

schematypens

Specifies the namespace name of the schema language in which the referenced schema is written. The application can use this value when determining whether it can make use of the referenced schema.

Editorial note: PBG2010-03-15
The “schematypens” pseudo-attribute has been added as part of addressing ISSUE-1
charset

Specifies the character encoding for the referenced schema. If specified, documents must set the value to a valid character encoding name, which must be the name or alias labeled as "preferred MIME name" in the IANA Character Sets registry, if there is one, or the encoding's name, if none of the aliases are so labeled [IANACHARSET].

title

Gives the title (or other human readable description) of the referenced schema. If specified, documents may use any string as the value.

group

If, for any xml-model processing instruction, its group pseudo-attribute has a non-empty value, special rules for associating schemas apply as follows:

  1. By default only schemas which do not have a specified group pseudo-attribute with a non-empty value on the corresponding xml-model processing instruction are treated as being associated with XML document.

  2. An application may provide an interface for specifying a group name. If the group name is specified, only schemas which have the same value specified in the group pseudo-attribute on the corresponding xml-model processing instruction are considered to be associated with the XML document.

phase

Gives the phase name of the validation function for use with a Schematron schema. If specified, documents may use any string as the value. If specified, the xml-model processor should include this information in the parsing result (regardless of the language of the associated schema).

If the associated schema is a Schematron schema, and the parsing result includes the phase pseudo-attribute, then the application is expected to use the value of this pseudo-attribute as the phase name of the validation function (see ISO/IEC 19757-3, Section 6.1 [ISO/EIC 19757-3]).

To allow for extensibility, documents may specify other pseudo-attributes on xml-model processing instructions.
Editorial note: PBG2010-03-15
The “must not” in this paragraph has been changed to “may” as part of addressing ISSUE-2

This specification provides a way to associate multiple schemas with a given XML document. Furthermore, there exist other ways certain schemas can be associated with a given XML document. Regardless of the association method, this specification does not prescribe the processing order when multiple schemas are associated with a given XML document.

In particular, this specification does not define the interaction of xml-model processing instructions with xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes which provide hints for locating schema in W3C XML Schema. Applications supporting both xml-model processing instructions and xsi:schemaLocation/xsi:noNamespaceSchemaLocation attributes may provide means for specifying which information takes precedence.

A Normative References

ASSOCSS
Associating Style Sheets with XML documents 1.0 (Second Edition). W3C, 4 December 2009. (See http://www.w3.org/XML/2009/12/xml-stylesheet/.)
IANACHARSET
Character Sets. IANA, May 2007. (See http://www.iana.org/assignments/character-sets.)
INFOSET
XML Information Set, J. Cowan, R. Tobin. W3C, February 2004. (See http://www.w3.org/TR/xml-infoset/.)
ISO/EIC 19757-3
ISO/IEC 19757-3:2006. Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron, International Organization for Standardization and International Electrotechnical Commission. 2006.
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
RFC3987
Internationalized Resource Identifiers (IRIs), M. Dürst, M. Suignard. IETF, January 2005. (See http://www.ietf.org/rfc/rfc3987.txt.)
XML
Extensible Markup Language, T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, F. Yergeau. W3C, November 2008. (See http://www.w3.org/TR/xml/.)

B Examples (Non-Normative)

Example: Multiple schemas associated
<?xml version="1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.rng"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.xsd"?>
<book xmlns="http://docbook.org/ns/docbook">
 …
</book>
Example: Alternative schema groups
<?xml-model href="xhtml-transitional.xsd"?>
<?xml-model href="xhtml-strict.xsd" 
   group="Strict"
   title="Check against strict document type"?>
<?xml-model href="xhtml-strict-additional-constraints.sch" 
   group="Strict"
   title="Check against strict document type complex constraints"?>

C Suggested use of schematypens for determining schema language (Non-Normative)

Editorial note: PBG2010-03-15
This informative appendix has been added as part of addressing ISSUE-1

Use of a combination of schematypens and type allows for the identification of many widely used schema languages as shown in the following table.

Schema languagetypeschematypens
DTDapplication/xml-dtdunspecified
W3C XML Schemaunspecified or application/xmlhttp://www.w3.org/2001/XMLSchema
RELAX NGunspecified or application/xmlhttp://relaxng.org/ns/structure/1.0
RELAX NG – compact syntaxapplication/relax-ng-compact-syntaxunspecified
Schematronunspecified or application/xmlhttp://purl.oclc.org/dsdl/schematron
NVDLunspecified or application/xmlhttp://purl.oclc.org/dsdl/nvdl/ns/structure/1.0
Example: Multiple schemas associated
<?xml version="1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
<book xmlns="http://docbook.org/ns/docbook">
 …
</book>