W3C

Unicode block names for use in XSD regular expressions

W3C Working Group Note 9 June 2011

This version:
http://www.w3.org/TR/2011/NOTE-xsd-unicode-blocknames-20110609/
Latest version:
http://www.w3.org/TR/xsd-unicode-blocknames/
Editor:
C. M. Sperberg-McQueen, Black Mesa Technologies LLC <cmsmcq@blackmesatech.com>

Abstract

This document lists the names of character categories and character blocks defined by Unicode and used in the regular expression language defined by XSD 1.0 and XSD 1.1.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a W3C Working Group Note as described in the World Wide Web Consortium Process Document. It contains a definition of a precisionDecimal datatype designed for compatibility with IEEE 754 floating-point decimal numbers.

In its current state, this document lists the block names that have appeared in various versions of the Unicode database. Some of this material has appeared in working drafts of [XSD 1.1 Part 2: Datatypes]; some has not. This document is substantially complete in its current form; future updates, if any, may include changes made in later versions of the Unicode database.

Comments on this document should be sent to the W3C XML Schema comments mailing list, www-xml-schema-comments@w3.org (archive). Each email message should contain only one comment.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The authors of this document are the members of the XML Schema Working Group.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Named blocks

Appendices

A References
B Acknowledgements (non-normative)


1 Introduction

XSD (the XML Schema Definition Language) defines a notation for regular expressions to be used in the pattern facet of simple type definitions; this regular-expression language has also been used, with some modifications, in other specifications.

The character-class escapes of XSD regular expressions allow regular expressions to refer conveniently to all UCS characters which share values for certain properties; in particular, XSD provides category escapes, which allow reference to the "general category" property of an entry in the Unicode database, and block escapes, which identify characters on the basis of named UCS blocks.

Experience has shown that it is cumbersome to revise the relevant passages in the [XSD 1.1 Part 2: Datatypes] each time the Unicode database is revised, so the version-specific information about Unicode block names present in earlier descriptions of XSD regular expressions has been factored out of [XSD 1.1 Part 2: Datatypes] and moved into this document for more convenient updates.

This document does not, however, contain any normative information. The normative specification of the XSD regular expression language is in [XSD 1.1 Part 2: Datatypes]. The normative specification of Unicode block names for any version of Unicode is in the Unicode database for that version.

Note: The definitions of single-character escapes and multi-character escapes do not vary from version to version of Unicode and are not recapitulated here; the definitions are given in [XSD 1.1 Part 2: Datatypes]. The possible values for the General Category property of each character (used in category escapes) are also stable across versions (there has been only one change over the history of the Unicode database) and are also not given here.

2 Named blocks

The Unicode database [Unicode Database] groups the code points of the Universal Character Set (UCS) into a number of blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement, Hangul Jamo, CJK Compatibility, etc. The block-escape construct allows regular expressions to refer to sets of characters by the name of the block in which they appear, using a ·normalized block name·.

[Definition:]   For any Unicode block, the normalized block name of that block is the string of characters formed by stripping out white space and underbar characters from the block name as given in [Unicode Database], while retaining hyphens and preserving case distinctions.

[Definition:]   A block escape expression denotes the set of characters in a given Unicode block. For any Unicode block B;, with ·normalized block name· X, the set containing all characters defined in block B can be identified with the block escape \p{IsX} (using lower-case 'p'). The complement of this set is denoted by the block escape \P{IsX} (using upper-case 'P'). For all X, if X is a normalized block name recognized by the processor, then [\P{IsX}] = [^\p{IsX}].

For example, the block escape for identifying the ASCII characters is "\p{IsBasicLatin}".

Note: Current versions of the Unicode database recommend that whenever block names are being matched hyphens, underbars, and white space should be dropped and letters folded to a single case, so both the string 'BasicLatin' and the string '-- basic LATIN --' will match the block name "Basic Latin".
The handling of block names in XSD block escapes differs from this behavior in two ways. First, the normalized block names defined in XSD do not suppress hyphens in the Unicode block names and do not level case distinctions. The normalized form of the block name 'Latin-1 Supplement', for example, is thus 'Latin-1Supplement', not 'latin1supplement' or 'LATIN1SUPPLEMENT'. Second, XSD processors are not required to perform any normalization at all upon the block name as given in the block escape, so '\p{IsLatin-1Supplement}' will be recognized as a reference to the Latin-1 Supplement block, but '\p{Is Latin-1 supplement}' will not.

Editorial Note: What happens when X is not a recognized block name? It's not an error, strictly speaking, since the grammar accepts any sequence of Basic Latin alphanumerics and hyphens.

The following table lists the block names in the versions of [Unicode Database] cited in the references (References (§A)); the normative authority for any given version of [Unicode Database] is the Unicode database itself; in current versions, see the "Blocks.txt" file. The "Versions" column indicates which versions of [Unicode Database] have a block with the name and endpoints indicated; if the column is blank, all versions have such a block.

When these block names are used in block escapes, blanks and underbars should be removed and the letters "Is" should be prepended: the block name "Basic Latin" appears in a block escape as "\p{IsBasicLatin}".

Start–End, Block nameVersions
#x0000–#x007F Basic Latin 
#x0080–#x00FF Latin-1 Supplement 
#x0100–#x017F Latin Extended-A 
#x0180–#x024F Latin Extended-B 
#x0250–#x02AF IPA Extensions 
#x02B0–#x02FF Spacing Modifier Letters 
#x0300–#x036F Combining Diacritical Marks 
#x0370–#x03FF Greekbefore 3.2.0 (i.e. 2.0.0 through 3.1.1)
#x0370–#x03FF Greek and Coptic3.2.0 and later
#x0400–#x04FF Cyrillic 
#x0500–#x052F Cyrillic Supplementary3.2.0, 4.0.0
#x0500–#x052F Cyrillic Supplement4.0.1 and later
#x0530–#x058F Armenian 
#x0590–#x05FF Hebrew 
#x0600–#x06FF Arabic 
#x0700–#x074F Syriac3.2.0 and later
#x0750–#x077F Arabic Supplement4.1.0 and later
#x0780–#x07BF Thaana3.0.0 and later
#x07C0–#x07FF NKo5.0.0 and later
#x0800–#x083F Samaritan5.2.0 and later
#x0840–#x085F Mandaic6.0.0
#x0900–#x097F Devanagari 
#x0980–#x09FF Bengali 
#x0A00–#x0A7F Gurmukhi 
#x0A80–#x0AFF Gujarati 
#x0B00–#x0B7F Oriya 
#x0B80–#x0BFF Tamil 
#x0C00–#x0C7F Telugu 
#x0C80–#x0CFF Kannada 
#x0D00–#x0D7F Malayalam 
#x0D80–#x0DFF Sinhala3.0.0 and later
#x0E00–#x0E7F Thai 
#x0E80–#x0EFF Lao 
#x0F00–#x0FBF Tibetan2.0.0 through 2.1.9
#x0F00–#x0FFF Tibetan3.0.0 and later
#x1000–#x109F Myanmar3.2.0 and later
#x10A0–#x10FF Georgian 
#x1100–#x11FF Hangul Jamo 
#x1200–#x137F Ethiopic3.0.0 and later
#x1380–#x139F Ethiopic Supplement4.1.0 and later
#x13A0–#x13FF Cherokee3.0.0 and later
#x1400–#x167F Unified Canadian Aboriginal Syllabics3.0.0 and later
#x1680–#x169F Ogham3.0.0 and later
#x16A0–#x16FF Runic3.0.0 and later
#x1700–#x171F Tagalog3.2.0 and later
#x1720–#x173F Hanunoo3.2.0 and later
#x1740–#x175F Buhid3.2.0 and later
#x1760–#x177F Tagbanwa3.2.0 and later
#x1780–#x17FF Khmer3.0.0 and later
#x1800–#x18AF Mongolian3.0.0 and later
#x18B0–#x18FF Unified Canadian Aboriginal Syllabics Extended5.2.0 and later
#x1900–#x194F Limbu4.0.0 and later
#x1950–#x197F Tai Le4.0.0 and later
#x1980–#x19DF New Tai Lue4.1.0 and later
#x19E0–#x19FF Khmer Symbols4.0.0 and later
#x1A00–#x1A1F Buginese4.1.0 and later
#x1A20–#x1AAF Tai Tham5.2.0 and later
#x1B00–#x1B7F Balinese5.0.0 and later
#x1B80–#x1BBF Sundanese5.1.0 and later
#x1BC0–#x1BFF Batak6.0.0
#x1C00–#x1C4F Lepcha5.1.0 and later
#x1C50–#x1C7F Ol Chiki5.1.0 and later
#x1CD0–#x1CFF Vedic Extensions5.2.0 and later
#x1D00–#x1D7F Phonetic Extensions4.0.0 and later
#x1D80–#x1DBF Phonetic Extensions Supplement4.1.0 and later
#x1DC0–#x1DFF Combining Diacritical Marks Supplement4.1.0 and later
#x1E00–#x1EFF Latin Extended Additional 
#x1F00–#x1FFF Greek Extended 
#x2000–#x206F General Punctuation 
#x2070–#x209F Superscripts and Subscripts 
#x20A0–#x20CF Currency Symbols 
#x20D0–#x20FF Combining Marks for Symbolsbefore 3.2.0 (i.e. 2.0.0 through 3.1.1)
#x20D0–#x20FF Combining Diacritical Marks for Symbols3.2.0 and later
#x2100–#x214F Letterlike Symbols 
#x2150–#x218F Number Forms 
#x2190–#x21FF Arrows 
#x2200–#x22FF Mathematical Operators 
#x2300–#x23FF Miscellaneous Technical 
#x2400–#x243F Control Pictures 
#x2440–#x245F Optical Character Recognition 
#x2460–#x24FF Enclosed Alphanumerics 
#x2500–#x257F Box Drawing 
#x2580–#x259F Block Elements 
#x25A0–#x25FF Geometric Shapes 
#x2600–#x26FF Miscellaneous Symbols 
#x2700–#x27BF Dingbats 
#x27C0–#x27EF Miscellaneous Mathematical Symbols-A3.2.0 and later
#x27F0–#x27FF Supplemental Arrows-A3.2.0 and later
#x2800–#x28FF Braille Patterns3.0.0 and later
#x2900–#x297F Supplemental Arrows-B3.2.0 and later
#x2980–#x29FF Miscellaneous Mathematical Symbols-B3.2.0 and later
#x2A00–#x2AFF Supplemental Mathematical Operators3.2.0 and later
#x2B00–#x2BFF Miscellaneous Symbols and Arrows4.0.0 and later
#x2C00–#x2C5F Glagolitic4.1.0 and later
#x2C60–#x2C7F Latin Extended-C5.0.0 and later
#x2C80–#x2CFF Coptic4.1.0 and later
#x2D00–#x2D2F Georgian Supplement4.1.0 and later
#x2D30–#x2D7F Tifinagh4.1.0 and later
#x2D80–#x2DDF Ethiopic Extended4.1.0 and later
#x2DE0–#x2DFF Cyrillic Extended-A5.1.0 and later
#x2E00–#x2E7F Supplemental Punctuation4.1.0 and later
#x2E80–#x2EFF CJK Radicals Supplement3.0.0 and later
#x2F00–#x2FDF Kangxi Radicals3.0.0 and later
#x2FF0–#x2FFF Ideographic Description Characters3.0.0 and later
#x3000–#x303F CJK Symbols and Punctuation 
#x3040–#x309F Hiragana 
#x30A0–#x30FF Katakana 
#x3100–#x312F Bopomofo 
#x3130–#x318F Hangul Compatibility Jamo 
#x3190–#x319F Kanbun 
#x31A0–#x31BF Bopomofo Extended3.0.0 and later
#x31C0–#x31EF CJK Strokes4.1.0 and later
#x31F0–#x31FF Katakana Phonetic Extensions3.2.0 and later
#x3200–#x32FF Enclosed CJK Letters and Months 
#x3300–#x33FF CJK Compatibility 
#x3400–#x4DB5 CJK Unified Ideographs Extension A3.0.0 through 3.1.1
#x3400–#x4DBF CJK Unified Ideographs Extension A3.2.0 and later
#x4DC0–#x4DFF Yijing Hexagram Symbols4.0.0 and later
#x4E00–#x9FFF CJK Unified Ideographs 
#xA000–#xA48F Yi Syllables3.0.0 and later
#xA490–#xA4CF Yi Radicals3.0.0 and later
#xA4D0–#xA4FF Lisu5.2.0 and later
#xA500–#xA63F Vai5.1.0 and later
#xA640–#xA69F Cyrillic Extended-B5.1.0 and later
#xA6A0–#xA6FF Bamum5.2.0 and later
#xA700–#xA71F Modifier Tone Letters4.1.0 and later
#xA720–#xA7FF Latin Extended-D5.0.0 and later
#xA800–#xA82F Syloti Nagri4.1.0 and later
#xA830–#xA83F Common Indic Number Forms5.2.0 and later
#xA840–#xA87F Phags-pa5.0.0 and later
#xA880–#xA8DF Saurashtra5.1.0 and later
#xA8E0–#xA8FF Devanagari Extended5.2.0 and later
#xA900–#xA92F Kayah Li5.1.0 and later
#xA930–#xA95F Rejang5.1.0 and later
#xA960–#xA97F Hangul Jamo Extended-A5.2.0 and later
#xA980–#xA9DF Javanese5.2.0 and later
#xAA00–#xAA5F Cham5.1.0 and later
#xAA60–#xAA7F Myanmar Extended-A5.2.0 and later
#xAA80–#xAADF Tai Viet5.2.0 and later
#xAB00–#xAB2F Ethiopic Extended-A6.0.0
#xABC0–#xABFF Meetei Mayek5.2.0 and later
#xAC00–#xD7A3 Hangul Syllablesbefore 3.2.0 (i.e. 2.0.0 through 3.1.1)
#xAC00–#xD7AF Hangul Syllables3.2.0 and later
#xD7B0–#xD7FF Hangul Jamo Extended-B5.2.0 and later
#xD800–#xDB7F High Surrogates 
#xDB80–#xDBFF High Private Use Surrogates 
#xDC00–#xDFFF Low Surrogates 
#xE000–#xF8FF Private Usebefore 3.2.0 (i.e. 2.0.0 through 3.1.1)
#xE000–#xF8FF Private Use Area3.2.0 and later
#xF900–#xFAFF CJK Compatibility Ideographs 
#xFB00–#xFB4F Alphabetic Presentation Forms 
#xFB50–#xFDFF Arabic Presentation Forms-A 
#xFE00–#xFE0F Variation Selectors3.2.0 and later
#xFE10–#xFE1F Vertical Forms4.1.0 and later
#xFE20–#xFE2F Combining Half Marks 
#xFE30–#xFE4F CJK Compatibility Forms 
#xFE50–#xFE6F Small Form Variants 
#xFE70–#xFEFE Arabic Presentation Forms-B2.1.9 through 3.1.1
#xFE70–#xFEFF Arabic Presentation Forms-B2.0.0 through 2.1.8, also 3.2.0 and later (i.e. not 2.1.9 through 3.1.1)
#xFEFF–#xFEFF Specialsbefore 3.2.0 (i.e. 2.0.0 through 3.1.1)
#xFF00–#xFFEF Halfwidth and Fullwidth Forms 
#xFFF0–#xFFFD Specials2.1.9 through 3.1.1
#xFFF0–#xFFFF Specials2.0.0 through 2.1.8, also 3.2.0 and later (i.e. not 2.1.9 through 3.1.1)
#x10000–#x1007F Linear B Syllabary4.0.0 and later
#x10080–#x100FF Linear B Ideograms4.0.0 and later
#x10100–#x1013F Aegean Numbers4.0.0 and later
#x10140–#x1018F Ancient Greek Numbers4.1.0 and later
#x10190–#x101CF Ancient Symbols5.1.0 and later
#x101D0–#x101FF Phaistos Disc5.1.0 and later
#x10280–#x1029F Lycian5.1.0 and later
#x102A0–#x102DF Carian5.1.0 and later
#x10300–#x1032F Old Italic3.1.0 and later
#x10330–#x1034F Gothic3.1.0 and later
#x10380–#x1039F Ugaritic4.0.0 and later
#x103A0–#x103DF Old Persian4.1.0 and later
#x10400–#x1044F Deseret3.1.0 and later
#x10450–#x1047F Shavian4.0.0 and later
#x10480–#x104AF Osmanya4.0.0 and later
#x10800–#x1083F Cypriot Syllabary4.0.0 and later
#x10840–#x1085F Imperial Aramaic5.2.0 and later
#x10900–#x1091F Phoenician5.0.0 and later
#x10920–#x1093F Lydian5.1.0 and later
#x10A00–#x10A5F Kharoshthi4.1.0 and later
#x10A60–#x10A7F Old South Arabian5.2.0 and later
#x10B00–#x10B3F Avestan5.2.0 and later
#x10B40–#x10B5F Inscriptional Parthian5.2.0 and later
#x10B60–#x10B7F Inscriptional Pahlavi5.2.0 and later
#x10C00–#x10C4F Old Turkic5.2.0 and later
#x10E60–#x10E7F Rumi Numeral Symbols5.2.0 and later
#x11000–#x1107F Brahmi6.0.0
#x11080–#x110CF Kaithi5.2.0 and later
#x12000–#x123FF Cuneiform5.0.0 and later
#x12400–#x1247F Cuneiform Numbers and Punctuation5.0.0 and later
#x13000–#x1342F Egyptian Hieroglyphs5.2.0 and later
#x16800–#x16A3F Bamum Supplement6.0.0
#x1B000–#x1B0FF Kana Supplement6.0.0
#x1D000–#x1D0FF Byzantine Musical Symbols3.1.0 and later
#x1D100–#x1D1FF Musical Symbols3.1.0 and later
#x1D200–#x1D24F Ancient Greek Musical Notation4.1.0 and later
#x1D300–#x1D35F Tai Xuan Jing Symbols4.0.0 and later
#x1D360–#x1D37F Counting Rod Numerals5.0.0 and later
#x1D400–#x1D7FF Mathematical Alphanumeric Symbols3.1.0 and later
#x1F000–#x1F02F Mahjong Tiles5.1.0 and later
#x1F030–#x1F09F Domino Tiles5.1.0 and later
#x1F0A0–#x1F0FF Playing Cards6.0.0
#x1F100–#x1F1FF Enclosed Alphanumeric Supplement5.2.0 and later
#x1F200–#x1F2FF Enclosed Ideographic Supplement5.2.0 and later
#x1F300–#x1F5FF Miscellaneous Symbols And Pictographs6.0.0
#x1F600–#x1F64F Emoticons6.0.0
#x1F680–#x1F6FF Transport And Map Symbols6.0.0
#x1F700–#x1F77F Alchemical Symbols6.0.0
#x20000–#x2A6D6 CJK Unified Ideographs Extension B3.1.0, 3.1.1
#x20000–#x2A6DF CJK Unified Ideographs Extension B3.2.0 and later
#x2A700–#x2B73F CJK Unified Ideographs Extension C5.2.0 and later
#x2B740–#x2B81F CJK Unified Ideographs Extension D6.0.0
#x2F800–#x2FA1F CJK Compatibility Ideographs Supplement3.1.0 and later
#xE0000–#xE007F Tags3.1.0 and later
#xE0100–#xE01EF Variation Selectors Supplement4.0.0 and later
#xF0000–#xFFFFD Private Use3.1.0, 3.1.1
#xF0000–#xFFFFF Supplementary Private Use Area-A3.2.0 and later
#x100000–#x10FFFD Private Use3.1.0, 3.1.1
#x100000–#x10FFFF Supplementary Private Use Area-B3.2.0 and later
Note: The blocks mentioned above include the HighSurrogates, LowSurrogates, and HighPrivateUseSurrogates blocks. These blocks identify surrogate characters, which do not occur at the level of the character abstraction that XML instance documents operate on. For that reason, block escapes using these block names will never match any characters in an XML document.

As indicated in the "Versions" column, [Unicode Database] has been revised over time. Implementors of the XSD regular expression language are encouraged to support the block names defined in all versions of the Unicode Standard. When the implementation supports multiple versions of the Unicode database, and they differ in salient respects (e.g. different characters are assigned to a given block in different versions of the database), then it is implementation-defined which set of block definitions is used for any given assessment episode.

A References

Unicode Database
The Unicode Consortium. Unicode Character Database. Current version available at: http://www.unicode.org/Public/
Unicode Database 2.0.0
The Unicode Consortium. The Unicode Character Database, version 2.0.0. [n.p.]: The Unicode Consortium, 1996. List of components at http://www.unicode.org/versions/components-2.0.0.html. Character data at http://www.unicode.org/Public/2.0-Update/UnicodeData-2.0.14.txt. Blocks data at http://www.unicode.org/Public/2.0-Update/Blocks-1.txt.
Unicode Database 2.1.2
The Unicode Consortium. The Unicode Character Database, version 2.1.2. [n.p.]: The Unicode Consortium, 1998. List of components at http://www.unicode.org/versions/components-2.1.2.html. Character data at http://www.unicode.org/Public/2.1-Update/UnicodeData-2.1.2.txt. Blocks data as for 2.0.0.
Unicode Database 2.1.5
The Unicode Consortium. The Unicode Character Database, version 2.1.5. [n.p.]: The Unicode Consortium, 1998. List of components at http://www.unicode.org/versions/components-2.1.5.html. Character data at http://www.unicode.org/Public/2.1-Update2/UnicodeData-2.1.5.txt. Blocks data as for 2.0.0.
Unicode Database 2.1.8
The Unicode Consortium. The Unicode Character Database, version 2.1.8. [n.p.]: The Unicode Consortium, 1998. List of components at http://www.unicode.org/versions/components-2.1.8.html. Character data at http://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt. Blocks data as for 2.0.0.
Unicode Database 2.1.9
The Unicode Consortium. The Unicode Character Database, version 2.1.9. [n.p.]: The Unicode Consortium, 1999. List of components at http://www.unicode.org/versions/components-2.1.9.html. Character data at http://www.unicode.org/Public/2.1-Update4/UnicodeData-2.1.9.txt. Blocks data at http://www.unicode.org/Public/2.1-Update4/Blocks-2.txt.
Unicode Database 3.0.0
The Unicode Consortium. The Unicode Character Database, version 3.0.0. [n.p.]: The Unicode Consortium, 1999. List of components at http://www.unicode.org/versions/components-3.0.0.html. Character data at http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt. Blocks data at http://www.unicode.org/Public/3.0-Update/Blocks-3.txt.
Unicode Database 3.0.1
The Unicode Consortium. The Unicode Character Database, version 3.0.1. [n.p.]: The Unicode Consortium, 2000. List of components at http://www.unicode.org/versions/components-3.0.1.html. Character data at http://www.unicode.org/Public/3.0-Update1/UnicodeData-3.0.1.txt. Blocks data as for 3.0.0.
Unicode Database 3.1.0
The Unicode Consortium. The Unicode Character Database, version 3.1.0. [n.p.]: The Unicode Consortium, 2001. List of components at http://www.unicode.org/versions/components-3.1.0.html. Character data at http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt. Blocks data at http://www.unicode.org/Public/3.1-Update/Blocks-4.txt.
Unicode Database 3.1.1
The Unicode Consortium. The Unicode Character Database, version 3.1.1. [n.p.]: The Unicode Consortium, 2001. List of components at http://www.unicode.org/versions/components-3.1.1.html. Character data and blocks data as for 3.1.0.
Unicode Database 3.2.0
The Unicode Consortium. The Unicode Character Database, version 3.2.0. [n.p.]: The Unicode Consortium, 2002. List of components at http://www.unicode.org/versions/components-3.2.0.html. Character data at http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt. Blocks data at http://www.unicode.org/Public/3.2-Update/Blocks-3.2.0.txt.
Unicode Database 4.0.0
The Unicode Consortium. The Unicode Character Database, version 4.0.0. [n.p.]: The Unicode Consortium, 2003. List of components at http://www.unicode.org/versions/components-4.0.0.html. Character data at http://www.unicode.org/Public/4.0-Update/UnicodeData-4.0.0.txt. Blocks data at http://www.unicode.org/Public/4.0-Update/Blocks-4.0.0.txt.
Unicode Database 4.0.1
The Unicode Consortium. The Unicode Character Database, version 4.0.1. [n.p.]: The Unicode Consortium, 2004. List of components at http://www.unicode.org/versions/components-4.0.1.html. Character data at http://www.unicode.org/Public/4.0-Update1/UnicodeData-4.0.1.txt. Blocks data at http://www.unicode.org/Public/4.0-Update1/Blocks-4.0.1.txt.
Unicode Database 4.1.0
The Unicode Consortium. The Unicode Character Database, version 4.1.0. [n.p.]: The Unicode Consortium, 2005. List of components at http://www.unicode.org/versions/components-4.1.0.html. Character data at http://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt. Blocks data at http://www.unicode.org/Public/4.1.0/ucd/Blocks.txt.
Unicode Database 5.0.0
The Unicode Consortium. The Unicode Character Database, version 5.0.0. [n.p.]: The Unicode Consortium, 2006. List of components at http://www.unicode.org/versions/components-5.0.0.html. Character data at http://www.unicode.org/Public/5.0.0/ucd/UnicodeData.txt. Blocks data at http://www.unicode.org/Public/5.0.0/ucd/Blocks.txt.
Unicode Database 5.1.0
The Unicode Consortium. The Unicode Character Database, version 5.1.0. [n.p.]: The Unicode Consortium, 2008. List of components at http://www.unicode.org/versions/components-5.1.0.html. Character data at http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt. Blocks data at http://www.unicode.org/Public/5.1.0/ucd/Blocks.txt. XML versions of the database at http://www.unicode.org/Public/5.1.0/ucdxml/.
Unicode Database 5.2.0
The Unicode Consortium. The Unicode Character Database, version 5.2.0. [n.p.]: The Unicode Consortium, 2009. List of components at http://www.unicode.org/versions/components-5.2.0.html. Character data at http://www.unicode.org/Public/5.2.0/ucd/UnicodeData.txt. Blocks data at http://www.unicode.org/Public/5.2.0/ucd/Blocks.txt. XML versions of the database at http://www.unicode.org/Public/5.2.0/ucdxml/.
Unicode Database 6.0.0
The Unicode Consortium. The Unicode Character Database, version 6.0.0. [n.p.]: The Unicode Consortium, 2010. List of components at http://www.unicode.org/versions/components-6.0.0.html. Character data at http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt. Blocks data at http://www.unicode.org/Public/6.0.0/ucd/Blocks.txt. XML versions of the database at http://www.unicode.org/Public/6.0.0/ucdxml/.
Unicode Regular Expression Guidelines
Mark Davis. Unicode Regular Expression Guidelines, 1988. Available at: http://www.unicode.org/unicode/reports/tr18/
Unicode Versions
Unicode Consortium. Enumerated Versions of The Unicode Standard, 2011. Available at: http://www.unicode.org/versions/enumeratedversions.html
XSD 1.0 Part 1: Structures
World Wide Web Consortium. XML Schema Part 1: Structures, ed. Henry Thompson et al. W3C Recommendation 2 May 2001. Available at: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
XSD 1.0 Part 2: Datatypes
World Wide Web Consortium. XML Schema Part 2: Datatypes, ed. Paul V. Biron and Ashok Malhotra. W3C Recommendation 2 May 2001. Available at: http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
XSD 1.1 Part 1: Structures
World Wide Web Consortium. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures, ed. Shudi (Sandy) Gao 高殊镝, C. M. Sperberg-McQueen, and Henry S. Thompson. W3C Working Draft 3 December 2009. Available at: http://www.w3.org/TR/xmlschema11-1
XSD 1.1 Part 2: Datatypes
World Wide Web Consortium. W3C XML Schema Definition Language (XSD) 1.1 Part 2: Structures, ed. David Peterson et al. W3C Working Draft 3 December 2009. Available at: http://www.w3.org/TR/xmlschema11-2

B Acknowledgements (non-normative)

This document was prepared by the W3C XML Schema Working Group. The members at the time of publication were: