This document lists the design principles and requirements for the Blueberry revision of the XML Recommendation, a limited revision of XML 1.0 being developed by the World Wide Web Consortium's XML Core Working Group solely to address character set issues.
This is a W3C Working Draft produced as a deliverable of the XML Core WG according to its charter and the current XML Activity process. A list of current W3C working drafts and notes can be found at http://www.w3.org/TR .
This document is a work in progress representing the current consensus of the W3C XML Core Working Group. It is published for review by W3C members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C membership. Comments should be sent to email@example.com , which is an automatically and publicly archived email list .
The W3C's XML 1.0 Recommendation [XML] was first issued in 1998, and despite the issuance of many errata culminating in a Second Edition of 2001, has remained (by intention) unchanged with respect to what is well-formed XML and what is not. This stability has been extremely useful for interoperability. However, the Unicode Standard [Unicode] on which XML 1.0 relies has not remained static, evolving from version 2.0 to version 3.1. Characters present in Unicode 3.1 but not in Unicode 2.0 may be used in XML character data. However, they are not allowed in XML names such as element type names, attribute names, enumerated attribute values, processing instruction targets, and so on. In addition, some characters that should have been permitted in XML names were not, due to oversights and inconsistencies in Unicode 2.0.
As a result, fully native-language XML markup
is not possible in at least the following languages:
Amharic, Burmese, Canadian aboriginal languages, Cherokee, Dhivehi, Hakka
Chinese (Bopomofo script), Khmer, Minnan Chinese (Bopomofo script),
Mongolian (traditional script), Oromo, Syriac, Tigre, and Yi, because the
characters required to write these languages did not exist in Unicode 2.0.
In addition, Chinese (particularly as used in Hong Kong) and Japanese can
make use in XML names of only a subset of their complete character
The point has been made that many of these languages
can be written using other scripts, notably the Latin script, which makes
transliterated native markup possible. However, exactly the same argument applies to many languages (for example, Greek) that were already fully encoded in Unicode 2.0. Discriminating against languages simply because their scripts were not encoded in Unicode 2.0 is inherently unjust. In addition, working with transliteration is far more painful for native readers and writers than working with the native script.
In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the conventions used on IBM and IBM-compatible mainframes. As a result, XML documents on mainframes are not plain text files according to the local conventions. XML 1.0 documents generated on mainframes must either violate the local line-end conventions, or employ otherwise unnecessary translation phases before parsing and after generation. Allowing straightforward interoperability is particularly important when data stores are shared between mainframe and non-mainframe systems (as opposed to being copied from one to the other).
A new XML version, rather than a set of errata to XML 1.0, is being created because the change affects the definition of well-formed documents. XML 1.0 processors must continue to reject documents that contain new characters in XML names or new line-end conventions. It is presumed that the distinction between XML 1.0 and XML Blueberry will be indicated by the XML declaration.
The XML 1.0 goals listed in section 1.1 of the XML Recommendation are reaffirmed.
XML Blueberry documents shall permit the full
and straightforward use of writing systems supported by Unicode
XML Blueberry documents shall permit the full and straightforward use of operating environments that support Unicode 3.1.
The changes required for XML 1.0 processors to also process XML Blueberry shall be as few and as small as possible.
XML Blueberry documents shall allow the use within XML names of all Unicode 3.1 characters, insofar as appropriate for XML.
XML Blueberry documents shall support the line-end
conventions associated with Unicode 3.1, insofar as appropriate
The working group shall consider the issue of future updates to Unicode.
The working group shall consider the issue of W3C normalization as expressed in the W3C Character Model [CharMod].
In creating XML Blueberry, the working group shall not consider any revisions to XML 1.0 except those needed to accomplish these requirements.