Associating Schemas with XML documents 1.0 (First Edition)

1 Introduction

(This section is non-normative.)

There are several document schema definition languages in common use today that can be used to specify one or more validation processes performed against Extensible Markup Language (XML) documents. Some schema languages provide their own syntax for associating schemas with documents (DTD, W3C XML Schema) and some languages (RELAX NG, Schematron) do not provide schema association mechanisms at all. The purpose of this specification is to define a common, schema-agnostic syntax for associating schema documents written in any schema definition language with a given XML document.

This specification defines the syntax and processing expectations for an xml-model processing instruction. Such processing instructions associate one or more schemas with the XML document in which they are present. The associated schemas may be written in any schema definition language. Applications can use the associated schemas for any purpose including those such as document validation, content completion in interactive editors, or creating models for data binding. Presence of an xml-model processing instruction is not in itself an instruction to any processor to validate the document, nor is it a statement that the document is not to be processed without validation. It is a declarative statement of a relationship between the document and one or more external schemas.
Editorial note: PBG 2010-03-15
The previous two sentences address ISSUE-5

It should be noted that this specification is not meant as a replacement for other technologies that provide more general and indirect schema association features like NVDL and XProc. This specification is complementary technology which can be used when it is necessary to store ad-hoc schema associations directly inside XML document.
Editorial note: PBG 2010-03-15
This paragraph addresses ISSUE-4

2 Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words must, must not, should, should not and may in the normative parts of this document are to be interpreted as described in RFC 2119. These words do not appear in all uppercase letters in this specification [RFC2119].

Documents

A document is considered to conform to this specification if it satisfies all must-level criteria in this specification that apply to documents.

xml-model processors

XML defines an application as a software module which receives the information content of an XML document from an XML processor. [Definition: An xml-model processor is such an application which processes XML processing instructions in accordance with this specification.]

An xml-model processor may be part of a larger XML application, or may function independently. In either case, [Definition: an application is the consumer of the pseudo-attribute information defined in this specification.]

An xml-model processor is considered to be a conforming xml-model processor if it satisfies all must-level criteria in this specification that apply to xml-model processors; xml-model processors do not have to check or enforce any of the constraints on documents.

This specification is defined with reference to the vocabulary for XML provided by the XML Information Set as well as the rules for parsing pseudo-attributes from a string as defined in the Associating Style Sheets with XML documents Recommendation [ASSOCSS].

The productions in this specification use the same notation as used in the XML Recommendation. Tokens in the grammar and terms used in this specification that are not defined in this specification are defined in the XML Recommendation [XML] or the Associating Style Sheets [ASSOCSS] Recommendation.

3 The xml-model processing instruction

[Definition: A processing instruction information item is said to be a potential xml-model processing instruction if it has the [target] property xml-model and it is in the [children] property of a document information item and appears before the element information item of the document information item's [children] property.]

For such potential xml-model processing instructions, xml-model processors must report to the application the parsing result of invoking the rules for parsing pseudo-attributes from a string, using the processing instruction information item's [content] property as the string.

[Definition: A potential xml-model processing instruction is said to be an xml-model processing instruction if the parsing result is not an error when invoking the rules for parsing pseudo-attributes from a string, using the processing instruction information item's [content] property as the string.]

Documents must not use processing instruction information items with the [target] property xml-model if they are not xml-model processing instructions.

An xml-model processor must process all xml-model processing instructions properly and must pass on to the application the full parsing result for each xml-model processing instruction.
Editorial note: PBG 2010-03-15
The “shoulds” in this paragraph have been changed to “musts” as part of addressing ISSUE-2

An xml-model processing instruction will match the following production:

Production for xml-model processing instruction

[1] XmlModelPI ::= "<?xml-model" (S PseudoAtts)? - (Char* "?>" Char*) "?>"

Documents may specify the following pseudo-attributes on xml-model processing instructions, unless otherwise stated:

href

Specifies the location of the referenced schema. Documents must specify this pseudo-attribute. Documents must set the value to a string that matches the grammar for <IRI-reference> given in RFC 3987 [RFC3987].

type

Specifies the content type of the referenced schema. If unspecified, the xml-model processor should return a parsing result that would be identical to that when the value is given as application/xml. The value of this pseudo-attribute is advisory in that it is intended to be used by an application only when no other source of media type information becomes available during retrieval of the stylesheet itself.
Editorial note: PBG 2010-03-15
The previous sentence addresses ISSUE-3

schematypens

Specifies the namespace name of the schema language in which the referenced schema is written. The application can use this value when determining whether it can make use of the referenced schema.

Editorial note: PBG	2010-03-15
The “schematypens” pseudo-attribute has been added as part of addressing ISSUE-1

charset

Specifies the character encoding for the referenced schema. If specified, documents must set the value to a valid character encoding name, which must be the name or alias labeled as "preferred MIME name" in the IANA Character Sets registry, if there is one, or the encoding's name, if none of the aliases are so labeled [IANACHARSET].

title

Gives the title (or other human readable description) of the referenced schema. If specified, documents may use any string as the value.

group

If, for any xml-model processing instruction, its group pseudo-attribute has a non-empty value, special rules for associating schemas apply as follows:

By default only schemas which do not have a specified group pseudo-attribute with a non-empty value on the corresponding xml-model processing instruction are treated as being associated with XML document.
An application may provide an interface for specifying a group name. If the group name is specified, only schemas which have the same value specified in the group pseudo-attribute on the corresponding xml-model processing instruction are considered to be associated with the XML document.

phase

Gives the phase name of the validation function for use with a Schematron schema. If specified, documents may use any string as the value. If specified, the xml-model processor should include this information in the parsing result (regardless of the language of the associated schema).

If the associated schema is a Schematron schema, and the parsing result includes the phase pseudo-attribute, then the application is expected to use the value of this pseudo-attribute as the phase name of the validation function (see ISO/IEC 19757-3, Section 6.1 [ISO/EIC 19757-3]).

To allow for extensibility, documents may specify other pseudo-attributes on xml-model processing instructions.
Editorial note: PBG 2010-03-15
The “must not” in this paragraph has been changed to “may” as part of addressing ISSUE-2

This specification provides a way to associate multiple schemas with a given XML document. Furthermore, there exist other ways certain schemas can be associated with a given XML document. Regardless of the association method, this specification does not prescribe the processing order when multiple schemas are associated with a given XML document.

In particular, this specification does not define the interaction of xml-model processing instructions with xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes which provide hints for locating schema in W3C XML Schema. Applications supporting both xml-model processing instructions and xsi:schemaLocation/xsi:noNamespaceSchemaLocation attributes may provide means for specifying which information takes precedence.

A Normative References

ASSOCSS: Associating Style Sheets with XML documents 1.0 (Second Edition). W3C, 4 December 2009. (See http://www.w3.org/XML/2009/12/xml-stylesheet/.)
IANACHARSET: Character Sets. IANA, May 2007. (See http://www.iana.org/assignments/character-sets.)
INFOSET: XML Information Set, J. Cowan, R. Tobin. W3C, February 2004. (See http://www.w3.org/TR/xml-infoset/.)
ISO/EIC 19757-3: ISO/IEC 19757-3:2006. Information technology — Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron, International Organization for Standardization and International Electrotechnical Commission. 2006.
RFC2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
RFC3987: Internationalized Resource Identifiers (IRIs), M. Dürst, M. Suignard. IETF, January 2005. (See http://www.ietf.org/rfc/rfc3987.txt.)
XML: Extensible Markup Language, T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, F. Yergeau. W3C, November 2008. (See http://www.w3.org/TR/xml/.)

B Examples (Non-Normative)

Example: Multiple schemas associated

<?xml version="1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.rng"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.xsd"?>
<book xmlns="http://docbook.org/ns/docbook">
 …
</book>

Example: Alternative schema groups

<?xml-model href="xhtml-transitional.xsd"?>
<?xml-model href="xhtml-strict.xsd" 
   group="Strict"
   title="Check against strict document type"?>
<?xml-model href="xhtml-strict-additional-constraints.sch" 
   group="Strict"
   title="Check against strict document type complex constraints"?>

C Suggested use of `schematypens` for determining schema language (Non-Normative)

Editorial note: PBG	2010-03-15
This informative appendix has been added as part of addressing ISSUE-1

Use of a combination of schematypens and type allows for the identification of many widely used schema languages as shown in the following table.

Schema language	type	schematypens
DTD	`application/xml-dtd`	unspecified
W3C XML Schema	unspecified or `application/xml`	`http://www.w3.org/2001/XMLSchema`
RELAX NG	unspecified or `application/xml`	`http://relaxng.org/ns/structure/1.0`
RELAX NG – compact syntax	`application/relax-ng-compact-syntax`	unspecified
Schematron	unspecified or `application/xml`	`http://purl.oclc.org/dsdl/schematron`
NVDL	unspecified or `application/xml`	`http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0`

Example: Multiple schemas associated

<?xml version="1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.docbook.org/xml/5.0/rng/docbook.xsd" schematypens="http://www.w3.org/2001/XMLSchema"?>
<book xmlns="http://docbook.org/ns/docbook">
 …
</book>

Associating Schemas with XML documents 1.0 (First Edition)

Editor's Draft 15 March 2010

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Conformance requirements

3 The xml-model processing instruction

Production for xml-model processing instruction

A Normative References

B Examples (Non-Normative)

C Suggested use of `schematypens` for determining schema language (Non-Normative)

Editorial note: PBG	2010-03-15
The previous two sentences address ISSUE-5

Editorial note: PBG	2010-03-15
This paragraph addresses ISSUE-4

Editorial note: PBG	2010-03-15
The “shoulds” in this paragraph have been changed to “musts” as part of addressing ISSUE-2

Editorial note: PBG	2010-03-15
The previous sentence addresses ISSUE-3

Editorial note: PBG	2010-03-15
The “must not” in this paragraph has been changed to “may” as part of addressing ISSUE-2

Associating Schemas with XML documents 1.0 (First Edition)

Editor's Draft 15 March 2010

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Conformance requirements

3 The xml-model processing instruction

Production for xml-model processing instruction

A Normative References

B Examples (Non-Normative)

C Suggested use of schematypens for determining schema language (Non-Normative)

C Suggested use of `schematypens` for determining schema language (Non-Normative)