| number |  | text | implementations | 
    
      | C001 | [S] [I] [C] | Specifications, software and content MUST
        NOT require or depend on a one-to-one correspondence between
        characters and the sounds of a language. | SSML | 
    
      | C002 | [S] [I] [C] | Specifications, software and content MUST
        NOT require or depend on a one-to-one mapping between
        characters and units of displayed text. | CSS, XSL-FO, SVG | 
    
      | C003 | [S] [I] [C] | Protocols, data formats and APIs MUST
        store, interchange or process text data in logical order. | everything that uses Unicode, very widely implemented | 
    
      | C075 | [I] | Independent of whether some implementation uses logical selection
        or visual selection, characters selected MUST be kept in logical order in storage. | many implementations, in particular editors; SVG | 
    
      | C004 | [S] | Specifications of protocols and APIs that involve selection of
        ranges SHOULD provide for discontiguous
        logicalselections, at least to the extent necessary to support
        implementation of visual selection on screen on top of those
        protocols and APIs. | XPointer | 
    
      | C005 | [S] [I] | Specifications and software MUST NOT
        require nor depend on a single keystroke resulting in a single
        character, nor that a single character be input with a single
        keystroke (even with modifiers), nor that keyboards are the same all
        over the world. | DOM Events,... | 
    
      | C006 | [S] [I] | Software that sorts or searches text for users SHOULD do so on the basis of appropriate
        collation units and ordering rules for the relevant language and/or
        application. | XQuery, various OSes | 
    
      | C007 | [S] [I] | Where searching or sorting is done dynamically, particularly in a
        multilingual environment, the 'relevant language' SHOULD be determined to be that of the current
        user, and may thus differ from user to user. | XQuery, various OSes | 
    
      | C066 | [S] [I] | Software that allows users to sort or search text SHOULD allow the user to select alternative
        rules for collation units and ordering. | XQuery, various OSes | 
    
      | C008 | [S] [I] | Specifications and implementations of sorting and searching
        algorithms SHOULD accommodate text that
        contains any character in Unicode. | several implementations of UCA and others (C008 mainly warns about
        some old problem) | 
    
      | C009 | [S] [I] | Specifications, software and content MUST
        NOT require or depend on a one-to-one relationship between
        characters and units of physical storage. | XSLT, XQuery,... | 
    
      | C010 | [S] | When specifications use the term 'character' the specifications MUST define which meaning they intend. | XML | 
    
      | C067 | [S] | Specifications SHOULD use specific
        terms, when available, instead of the general term 'character'. | various specs | 
    
      | C013 | [S] [C] | Textual data objects defined by protocol or format specifications
        MUST be in a single character
        encoding. | HTML, XML,
        CSS,... | 
    
      | C014 | [S] | All specifications that involve processing of text MUST specify the processing of text according
        to the Reference
        Processing Model, namely: 
          Specifications MUST define text in
            terms of Unicode characters, not bytes or glyphs.For their textual data objects specifications MAY allow use of any character encoding
            which can be transcoded to a Unicode encoding form.Specifications MAY choose to
            disallow or deprecate some character encodings and to make others
            mandatory. Independent of the actual character encoding, the
            specified behavior MUST be the same
            as if the processing happened as follows:
            
              The character encoding of any textual data object received
                by the application implementing the specification MUST be determined and the data object
                MUST be interpreted as a
                sequence of Unicode characters - this MUST be equivalent to transcoding
                the data object to some Unicode
                encoding form, adjusting any character encoding label if
                necessary, and receiving it in that Unicode encoding
              form.All processing MUST take place
                on this sequence of Unicode characters.If text is output by the application, the sequence of
                Unicode characters MUST be
                encoded using a character encoding chosen among those allowed
                by the specification.If a specification is such that multiple textual data objects
            are involved (such as an XML document referring to external
            parsed entities), it MAY choose to
            allow these data objects to be in different character encodings.
            In all cases, the Reference
            Processing Model MUST be applied
            to all textual data objects. | HTML, CSS, XML, XSLT, XQuery,... | 
    
      | C070 | [S] | Specifications SHOULD NOT
        arbitrarily exclude code points from the full range of
        Unicode code
        points from U+0000 to U+10FFFF inclusive. | HTML, XML,
      CSS | 
    
      | C077 | [S] | Specifications MUST NOT allow code
        points above U+10FFFF. | HTML, XML,
      CSS | 
    
      | C079 | [S] | Specifications SHOULD NOT allow the
        use of codepoints reserved by Unicode for internal use. | discouraged by XML1.1 | 
    
      | C078 | [S] | Specifications MUST NOT allow the use
        of surrogate code points. | HTML, XML,
      CSS | 
    
      | C015 | [S] | Specifications MUST either specify a
        unique character encoding, or provide character encoding
        identification mechanisms such that the encoding of text can be
        reliably identified. | HTML, XML, CSS,... | 
    
      | C016 | [S] | When designing a new protocol, format or API, specifications SHOULD require a unique character
      encoding. | DOM, IRI->URI conversion, some IETF protocols | 
    
      | C017 | [S] | When basing a protocol, format, or API on a protocol, format, or
        API that already has rules for character encoding, specifications
        SHOULD use rather than change these
        rules. | HTML->MIME, XML->MIME, RFC3023-based media types | 
    
      | C018 | [S] | When a unique character encoding is required, the character
        encoding MUST be UTF-8, UTF-16 or
      UTF-32. | DOM, IRIs, some IETF protocols | 
    
      | C020 | [S] | Specifications SHOULD avoid using the
        terms 'character set' and 'charset' to refer to a character encoding,
        except when the latter is used to refer to the MIME charsetparameter or its IANA-registered
        values. The term 'character encoding', or
        in specific cases the terms 'character encoding
        form' or 'character encoding
        scheme', are RECOMMENDED. | lots of specs | 
    
      | C021 | [S] | If the unique encoding approach is not taken, specifications SHOULD require the use of the IANA charset
        registry names, and in particular the names identified in the
        registry as 'MIME preferred names', to
        designate character encodings in protocols, data formats and
      APIs. | recommended by XML | 
    
      | C022 | [S] [I] [C] | Character encodings that are not in the IANA registry SHOULD NOT be used, except by private
        agreement. | XML | 
    
      | C023 | [S] [I] [C] | If an unregistered character encoding is used, the convention of
        using 'x-' at the beginning of the name
        MUST be followed. | XML | 
    
      | C049 | [I] [C] | The character encoding of content SHOULD be chosen so that it maximizes the
        opportunity to directly represent characters (ie. minimizes the need
        to represent characters by markup
        means such as character
        escapes) while avoiding obscure encodings that are unlikely to be
        understood by recipients. | wide practice on the Web | 
    
      | C034 | [C] | If facilities are offered for identifying character encoding,
        content MUST make use of them; where the facilities offered for
        character encoding identification include defaults (e.g. in XML 1.0
        [XML
        1.0]), relying on such defaults is sufficient to satisfy this
        identification requirement. | wide (but not yet wide enough) practice on the Web | 
    
      | C024 | [I] [C] | Content and software that label text data MUST use one of the names required by the
        appropriate specification (e.g. the XML specification when editing
        XML text) and SHOULD use the MIME
        preferred name of a character encoding to label data in that
        character encoding. | wide practice | 
    
      | C025 | [I] [C] | An IANA-registered charsetname MUST NOT be used to label text data in a
        character encoding other than the one identified in the IANA
        registration of that name. | wide practice | 
    
      | C026 | [S] | If the unique encoding approach is not chosen, specifications MUST designate at least one of the UTF-8 and
        UTF-16 encoding forms of Unicode as admissible character encodings
        and SHOULD choose at least one of UTF-8
        or UTF-16 as required encoding forms (encoding forms that MUST be supported by implementations of the
        specification). | XML | 
    
      | C027 | [S] | Specifications that require a default encoding MUST define either UTF-8 or UTF-16 as the
        default, or both if they define suitable means of distinguishing
      them. | XML | 
    
      | C028 | [S] | Specifications MUST NOT propose the
        use of heuristics to determine the encoding of data. | none known | 
    
      | C029 | [I] | Receiving software MUST
        determine the encoding of data from available information according
        to appropriate specifications. | widely implemented (although it could be better) | 
    
      | C030 | [I] | When an IANA-registered charsetname
        is recognized, receiving software MUST
        interpret the received data according to the encoding associated with
        the name in the IANA registry. | widely implemented | 
    
      | C031 | [I] | When no charset is provided receiving software MUST adhere to the default character
        encoding(s) specified in the specification. | widely implemented | 
    
      | C035 | [S] | Specifications MUST define
        conflict-resolution mechanisms (e.g. priorities) for cases where
        there is multiple or conflicting information about character
      encoding. | HTML, XML | 
    
      | C033 | [I] | Software MUST completely implement the
        mechanisms for character encoding identification and conflict
        resolution. | browsers, XML parsers | 
    
      | C073 | [C] | Publicly interchanged content SHOULD
        NOT use codepoints in the private use area. | most Web pages | 
    
      | C076 | [C] | Content MUST NOT use a code point for
        any purpose other than that defined by its character encoding. | most Web pages | 
    
      | C038 | [S] | Specifications MUST NOT require the
        use of private use area characters with particular assignments. | most specs (bad exception that we are trying to avoid repeating:
        MathML 1.0) | 
    
      | C039 | [S] | Specifications MUST NOT require the
        use of mechanisms for definingagreements of private use code
      points. | all known specs | 
    
      | C040 | [S] [I] | Specifications and implementations SHOULD
        NOT disallow the use of private use code points by private
        agreement. | HTML, XML | 
    
      | C041 | [S] | Specifications MAY define markup
        to allow the transmission of symbols not in Unicode or to identify
        specific variants of Unicode characters. | SVG, MathML | 
    
      | C068 | [S] | Specifications SHOULD allow the
        inclusion of or reference to pictures and graphics where appropriate,
        to eliminate the need to (mis)use character-oriented mechanisms for
        pictures or graphics. | HTML, SVG | 
    
      | C042 | [S] | Specifications SHOULD NOT invent a new
        escaping mechanism if an appropriate one already exists. | XHTML, SVG, SMIL,... | 
    
      | C043 | [S] | The number of different ways to escape a character SHOULD be minimized (ideally to one). | CSS | 
    
      | C044 | [S] | Escape syntax SHOULD require either
        explicit end delimiters or a fixed number of characters in each
        character escape. Escape syntaxes where the end is determined by any
        character outside the set of characters admissible in the character
        escape itself SHOULD be avoided. | HTML, XML | 
    
      | C045 | [S] | Whenever specifications define character escapes that allow the
        representation of characters using a number, the number MUST represent the Unicode code point of the
        character and SHOULD be in hexadecimal
        notation. | HTML, XML, CSS | 
    
      | C046 | [S] | Escaped characters SHOULD be
        acceptable wherever their unescaped forms are; this does not preclude
        that syntax-significant
        characters, when escaped, lose their significance in the syntax. In
        particular, if a character is acceptable in identifiers and comments,
        then its escaped form should also be acceptable. | CSS, would have been ideal for XML | 
    
      | C047 | [I] [C] | Escapes SHOULD only be used when the
        characters to be expressed are not directly representable in the
        format or the character encoding of the document, or when the visual
        representation of the character is unclear. | most content on the Web | 
    
      | C048 | [I] [C] | Content SHOULD use the hexadecimal
        form of character escapes rather than the decimal form when there are
        both. | several implementations, lots of content | 
    
      | C050 | [S] | Specifications SHOULD exclude
        compatibility characters in the syntactic elements (markup,
        delimiters, identifiers) of the formats they define. | XML 1.0 | 
    
      | C011 | [S] | Specifications SHOULD NOT define a
        string as a 'byte string'. | all W3C specs | 
    
      | C012 | [S] | The 'character string' definition SHOULD be used by most specifications. | HTML, XML, XSLT,... | 
    
      | C051 | [S] [I] | The character
        string is RECOMMENDED as a basis for
        string indexing. | XSLT, XQuery | 
    
      | C052 | [S] [I] | A code
        unit string MAY be used as a basis
        for string indexing if this results in a significant improvement in
        the efficiency of internal operations when compared to the use of character
        string. | DOM | 
    
      | C071 | [S] [I] | Grapheme
        clusters MAY be used as a basis for
        string indexing in applications where user interaction is the primary
        concern. | not too much implemented yet | 
    
      | C074 | [S] | Specifications that define indexing in terms of grapheme clusters
        MUST either: a) define grapheme clusters
        in terms of default grapheme clusters as defined in Unicode Standard
        Annex #29, Text Boundaries [UTR
        #29], or b) define specifically how tailoring is applied to the
        indexing operation. | not too much implemented yet | 
    
      | C072 | [S] [I] | The use of byte
        strings for indexing is NOT
        RECOMMENDED. | all W3C specs | 
    
      | C053 | [S] | Specifications that need a way to identify substrings or point
        within a string SHOULD provide ways
        other than string indexing to perform this operation. | regular expressions,... | 
    
      | C054 | [I] [C] | Users of specifications (software developers, content developers)
        SHOULD whenever possible prefer ways
        other than string indexing to identify substrings or point within a
        string. | XSLT? | 
    
      | C055 | [S] | Specifications SHOULD understand and
        process single characters as substrings, and treat indices as
        boundary positions between counting units, regardless of the
        choice of counting units. | XSLT/XQuery (for first part) | 
    
      | C056 | [S] | Specifications of APIs SHOULD NOT
        specify single characters or single 'units of
        encoding' asargumentor return types. | DOM | 
    
      | C057 | [S] | When the positions between the units are counted for string
        indexing, starting with an index of 0 for the position at the start
        of the string is the RECOMMENDED
        solution, with the last index then being equal to the number of
        counting units in the string. | many examples in programming languages, unfortunately not XSLT | 
    
      | C062 | [S] | Since specifications in general need both a definition for their
        characters and the semantics associated with these characters,
        specifications SHOULD include a
        reference to the Unicode Standard, whether or not they include a
        reference to ISO/IEC 10646. | many specs | 
    
      | C063 | [S] | A generic reference to the Unicode Standard MUST be made if it is desired that characters
        allocated after a specification is published are usable with that
        specification. A specific reference to the Unicode Standard MAY be included to ensure that functionality
        depending on a particular version is available and will not change
        over time. | XML
        1.1 | 
    
      | C064 | [S] | All generic references to the Unicode Standard [Unicode]
        MUST refer to the latest version of the
        Unicode Standard available at the date of publication of the
        containing specification. | XML
        1.1 | 
    
      | C065 | [S] | All generic references to ISO/IEC 10646 [ISO/IEC
        10646] MUST refer to the latest
        version of ISO/IEC 10646 available at the date of publication of the
        containing specification. | XML
        1.1 |