W3C

XML 1.1

W3C Working Draft 13 December 2001

This Version:
http://www.w3.org/TR/2001/WD-xml11-20011213/
Latest Version:
http://www.w3.org/TR/xml11/
Previous Version:
None
Editors:
John Cowan, Reuters < jcowan@reutershealth.com >

Abstract

This document describes XML 1.1, a deliverable of the XML Core Working Group as defined in the XML Blueberry Requirements. XML 1.1 was formerly known as XML Blueberry. This document takes the form of a series of alterations to the XML 1.0 Recommendation [XML1.0], and its numbered sections correspond to those of the XML 1.0 Recommendation. Sections of that Recommendation that do not appear in this document remain unchanged in XML 1.1.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is a Working Draft of the XML Core Working Group (member only), for review by W3C members and other interested parties. This document has been produced as part of the XML Activity, and may eventually be advanced toward W3C Recommendation status.

Being a Working Draft document, this specification may be updated, replaced, or obsoleted by other documents at any time. The test cases described and referred to in this document may also be updated, replaced or obsoleted at an any time. It is therefore inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.

This draft document will be considered by the W3C and its members according to W3C process. This document is made public for the purpose of receiving comments that inform the W3C membership and staff on issues likely to affect the implementation, acceptance, and adoption of XML 1.1.

While this and subsequent drafts of this specification will be written as a series of alterations to the XML 1.0 Recommendation to facilitate editing and review, it is likely that the final XML 1.1 Recommendation will take the form of an integral revision of the XML 1.0 specification.

Comments should be sent to www-xml-blueberry-comments@w3.org. This is the preferred method of providing feedback. Public comments and their responses can be accessed at http://lists.w3.org/Archives/Public/www-xml-blueberry-comments/.


Table of Contents


Introduction

The W3C's XML 1.0 Recommendation was first issued in 1998, and despite the issuance of many errata culminating in a Second Edition of 2001, has remained (by intention) unchanged with respect to what is well-formed XML and what is not. This stability has been extremely useful for interoperability. However, the Unicode Standard on which XML 1.0 relies for character specifications has not remained static, evolving from version 2.0 to version 3.1 and beyond. Characters not present in Unicode 2.0 may already be used in XML 1.0 character data. However, they are not allowed in XML names such as element type names, attribute names, enumerated attribute values, processing instruction targets, and so on. In addition, some characters that should have been permitted in XML names were not, due to oversights and inconsistencies in Unicode 2.0.

The overall philosophy of names has changed since XML 1.0. Whereas XML 1.0 provided a rigid definition of names, wherein everything that was not permitted was forbidden, XML 1.1 names are designed so that everything that is not forbidden (for a specific reason) is permitted. Since Unicode will continue to grow past version 3.1, further changes to XML can be avoided by allowing almost any character, including those not yet assigned, in names.

In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the conventions used on IBM and IBM-compatible mainframes. As a result, XML documents on mainframes are not plain text files according to the local conventions. XML 1.0 documents generated on mainframes must either violate the local line-end conventions, or employ otherwise unnecessary translation phases before parsing and after generation. Allowing straightforward interoperability is particularly important when data stores are shared between mainframe and non-mainframe systems (as opposed to being copied from one to the other). For completeness, the Unicode line separator character, #x2028, is also supported.

A new XML version, rather than a set of errata to XML 1.0, is being created because the changes affect the definition of well-formed documents. XML 1.0 processors must continue to reject documents that contain new characters in XML names or new line-end conventions. The distinction between XML 1.0 and XML 1.1 documents will be indicated by the version number information in the XML declaration at the start of each document.

2.3 Common Syntactic Constructs

Change production [2] to read:

 [2]    Char    ::=    [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Issue 18: Should we allow the ASCII control characters #x1-#x1F in the Char production? Not allowing them means that textual content that may contain such characters (but typically does not) needs to be specially encoded in XML documents using a protocol like Base-64 in order to ensure the production of well-formed XML.
Issue 21: Should #x0 be allowed as well? Doing so would make it difficult for processors written in C (or C++ without the standard library) to represent XML data as ordinary strings.

Change production [3] to read:

 [3]    S    ::=    (#x9 | #x20 | #xA | #xD | #x85 | #x2028)+

Change the preceding text to read:

S (white space) consists of one or more space (#x20), tab, carriage return, line feed, newline, or Unicode line separator characters.

Change production 4, and add new production 4a:

 [4]    NameStartChar    :=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#x02FF] |
        [#x0370-#x037D] | [#x037F-#x2027] | [#x202A-#x218F] | [#x2800-#xD7FF] |
        [#xE000-#xFDCF] | [#xFDE0-#xFFEF] | [#x10000-#x10FFFF]
 [4a]    NameChar := NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] 

Change production [5] to:

 [5]    Name    ::=   NameStartChar NameChar*

Insert the following three paragraphs after production 5:

The first character of a Name must be a NameStartChar, but any other characters are NameChars; this mechanism is used to prevent names from beginning with Latin (ASCII) digits or with basic combining characters. Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters. The intention is to be inclusive rather than exclusive, so that writing systems not yet encoded in Unicode can be used in XML names.

Document authors are encouraged to use names which are meaningful words or combinations of words in natural languages, and to avoid symbolic or whitespace characters in names. Note that COLON, HYPHEN-MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted.

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when W3C-normalized it becomes a semicolon, which could change the meaning of entity references.

Change production [7] to:

 [7]    Nmtoken    ::=   NameChar+

2.8 Prolog and Document Type Declaration

Change "1.0" everywhere to "1.1"

Add the following paragraph:

XML 1.1 processors should accept XML 1.0 documents as well. If a document is well-formed or valid XML 1.0, it may be made well-formed or valid XML 1.1 respectively simply by changing the version number.

2.11 End-of-Line Handling

Replace the second paragraph with:

To simplify the tasks of applications, the characters passed to an application by the XML processor must be as if the XML processor normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating all of the following to a single #xA character: the two-character sequence #xD #xA; the two-character sequence #xD #85; the single characters #x85 and #x2028; and any #xD that is not immediately followed by #xA or #x85.

2.13 W3C Normalization Checking [NEW]

XML processors must/should/may check whether their input documents are in W3C normalized form, as defined by [Charmod]. XML processors must not transform the input to be in normalized form. It is a fatal error/error/not an error for the document not to be in normalized form.

Issue 11: Must, should or may? Fatal error, error, or not an error? The W3C i18n group wants "must" and "fatal error"; this may be a) too constraining b) too hard to implement at this stage.

3.3.3 Attribute-Value Normalization 

Add #x2028 to the lists "(#x20, #xD, #xA, #x9)" and "(#xD, #xA or #x9)".

4.3.4 Version Information in Entities [NEW]

Each entity, including the document entity, can be separately declared as XML 1.0 or XML 1.1. The version declaration appearing in the document entity determines the version of the document as a whole. An XML 1.1 document may invoke XML 1.0 external entities, so that otherwise duplicated versions of external entities, particularly DTD external subsets, need not be maintained. It is an error for such entities not to be well-formed according to the rules of XML 1.0: only the line terminators and name characters of XML 1.0 are allowed.

Issue 15: May/Should/Must XML 1.1 parsers give a well-formedness/validity error when a 1.1-only character is used in a 1.0 entity, and does the answer depend on the version of the document entity?

XML 1.0 documents must not invoke XML 1.1 entities.

Issue 16: Is this the Right Thing?

If an entity (including the document entity) is not labeled with a version number, it is treated as if labeled as version 1.0.

Appendix A References

Add the following normative references:

[XML1.0]
Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, Eve Maler (editors), Extensible Markup Language (XML) 1.0 (Second Edition), 6 October 2000.
[Charmod]
Character Model for the World Wide Web, W3C Working Draft, 28 September 2001.

Appendix B Character Classes [REMOVED]

Appendix B is to be removed in its entirety.