XML Blueberry Requirements

W3C Working Draft 20 June 2001

This version:
Latest version:
John Cowan, Reuters ( )


This document lists the design principles and requirements for the Blueberry revision of the XML Recommendation, a limited revision of XML 1.0 being developed by the World Wide Web Consortium's XML Core Working Group solely to address character set issues.

Status of this document

This is a First W3C Working Draft produced as a deliverable of the XML Core WG according to its charter and the current XML Activity process. A list of current W3C working drafts and notes can be found at http://www.w3.org/TR .

This document is a work in progress representing the current consensus of the W3C XML Core Working Group. It is published for review by W3C members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C membership. Comments should be sent to www-xml-blueberry-comments@w3.org, which is an automatically and publicly archived email list.

Table of Contents

1. Introduction
2. Design Principles
3. Requirements
4. References

1. Introduction

The W3C's XML 1.0 Recommendation [XML] was first issued in 1998, and despite the issuance of many errata culminating in a Second Edition of 2001, has remained (by intention) unchanged with respect to what is well-formed XML and what is not. This stability has been extremely useful for interoperability. However, the Unicode Standard [Unicode] on which XML 1.0 relies has not remained static, evolving from version 2.0 to version 3.1. Characters present in Unicode 3.1 but not in Unicode 2.0 may be used in XML character data, but are not allowed in XML names such as element type names, attribute names, processing instruction targets, and so on. In addition, some characters that should have been permitted in XML names were not, due to oversights and inconsistencies in Unicode 2.0.

As a result, fully native-language XML markup is not possible in at least the following languages: Amharic, Burmese, Canadian aboriginal languages, Cantonese (Bopomofo script), Cherokee, Dhivehi, Khmer, Mongolian (traditional script), Oromo, Syriac, Tigre, Yi. In addition, Chinese, Japanese, Korean (Hangul script), and Vietnamese can make use of only a limited subset of their complete character repertoires.

In addition, XML 1.0 attempts to adapt to the line-end conventions of various modern operating systems, but discriminates against the convention used on IBM and IBM-compatible mainframes. XML 1.0 documents generated on mainframes must either violate the local line-end conventions, or employ otherwise unnecessary translation phases before and after XML parsing and generation.

A new XML version, rather than a set of errata to XML 1.0, is being created because the change affects the definition of well-formed documents: XML 1.0 processors must continue to reject documents that contain new characters in XML names or new line-end conventions. It is presumed that the distinction between XML 1.0 and XML Blueberry will be indicated by the XML declaration.

2. Design Principles

  1. The XML 1.0 goals listed in section 1.1 of the XML Recommendation are reaffirmed.

  2. XML Blueberry documents shall permit the full use of writing systems supported by Unicode 3.1.

  3. XML Blueberry documents shall permit the full use of operating environments that support Unicode 3.1.

  4. The changes required for XML 1.0 processors to process XML Blueberry as well shall be as few and as small as possible.

3. Requirements

  1. XML Blueberry documents shall allow the use within XML names of all Unicode 3.1 characters, insofar as appropriate for XML.

  2. XML Blueberry documents shall support the line-end conventions associated with Unicode 3.1, insofar as appropriate for XML.

  3. The working group shall consider the issue of future updates to Unicode.

  4. The working group shall consider the issue of W3C normalization as expressed in the W3C Character Model [CharMod].

  5. In creating XML Blueberry, the working group shall not consider any revisions to XML 1.0 except those needed to accomplish these requirements.

4. References

W3C (World Wide Web Consortium). Character Model for the World Wide Web (work in progress). [Cambridge, MA]. http://www.w3.org/TR/charmod
W3C (World Wide Web Consortium). Extensible Markup Language (XML) Recommendation. Version 1.0, 2nd edition. [Cambridge, MA]. http://www.w3.org/TR/REC-xml
The Unicode Consortium. The Unicode Standard, Version 3.1. [Reading, MA: Addison-Wesley Developers Press, 2000]. http://www.unicode.org