W3CDocument Formats DomainInternational | Group Home Page | Member-Confidential!

Public Last Call #2 Comments
Character Model for the World Wide Web 1.0
W3C Working Draft 30 April 2002

Only comments that are publicly visible are listed below.
Such comments sometimes contain links to other items that are not publicly visible.

Useful links

Character Model: Last Call #2 WD | Last Call #1 WD

Related documents: Comment submission form | Public Last Call #1 Comments

Related mail archives: www-i18n-comments

Last Call Comments

#See keyFromForRefDescription
IDS
C018NaNaNDan Connolly
-
4.4Example unclear
  • Comment (received 2002-04-30) -- split/concat example looks good, I think (LCI-191)

    I'm not quite sure I understand it... maybe it's just my browser's poor rendering... but that string, 'cz', just looks like a normal ASCII string. I don't see how the accent thingy pops up when you delete the 'z'.

  • Decision: Not applicable.

    Rationale: We have classified this as 'not applicable' because you have not asked for any actual changes of the document. Your problem with understanding the cz example may have been due to rendering issues, but we doubt that, because the (non-combining) cedilla that we used in the draft to represent the combining cedilla is part of iso-8859-1, which was rendered well since the very first browsers. We suspect that you may have overlooked the cedilla, a little low hook after the z. We will try to take your suggestion for a test into consideration for our next phase.

  • Our response (sent 2003-12-11) -- Notification

C023EASJeremy Carroll
-
4.4'normalization-sensitive' unclear
  • Comment (received 2002-05-14) -- normalization-sensitive unclear

    I found the definition of normalization-sensitive in section 4.4 unclear.

  • Decision: Accepted.

  • Decision: Add examples (i) of operations which are not normalization-sensitive, and (ii) illustrating what we mean by inputs and outputs.

  • Our response (sent 2003-05-01) -- Notification

C024NaNaSJeremy Carroll
-
3.2Is UTF-7 a unicode encoding form?
  • Comment (received 2002-05-14) -- UTF-7

    Is UTF-7 a unicode encoding form? (I am pretty ignorant about UTF-7 but I believe it exists and is a UCS).

  • Decision: Not applicable.

    We have classified this comment as 'not applicable', because it is only a question.

    Our answer is that yes and no. UTF-7 can be considered an unicode encoding form, or not. It is an unicode encoding form to the extent that it encodes a sequence of unicode characters. However, it does not map a character to an identifiable sequence of bytes, and has a number of other rather undesirable properties. It was designed for use in very special cases such as Email, but has widely been replaced by UTF-8, and is no longer recommended for use, to the extent that we decided that the most adequate way to handle it in the Character Model was to completely ignore it.

C025TANOlu Ibidunni
-
3.1.5In 4th example, should 'o' be 'ö'?
  • See also the following comments: C020 C154

  • Comment (received 2002-05-14) -- Typo

    The last sentence of the fourth example in section 3.1.5 has the character 'o' should it be 'ö' instead.

  • Decision: Accepted.

  • Our response (sent 2003-02-17) -- Notification

  • Notification sent but bounced. See bounced note.

C026EANIan Jacobs
-
3.1.3mapping between character codes and units of displayed text
  • See also the following comments: C096

  • Comment (received 2002-05-24) -- mapping between character codes and units of displayed text

    In '[S] [I] Specifications and software MUST NOT assume a one-to-one mapping between character codes and units of displayed text.', does 'character codes' mean characters or character codes or both? Please clarify.

  • Decision: Accepted.

  • Decision: Change 'character codes' to 'characters'.

  • Our response (sent 2003-02-17) -- Notification

C027NaNaOJoseph Reagle
XML Sig WG
VariousXML Sig WG comments
C028NaNaNJeremy Carroll
RDF Core WG
VariousEndorsement from RDF Core
  • Comment (received 2002-05-27) -- Endorsement from RDF Core

    For the sections 3.4, 4, 6, 9, C, D RDF Core endorses the last call working draft. We have found earlier drafts helpful in identifying how best to meet our responsibilities to RDF users world wide. (However, we do not intend to address all the requirements of these sections in the version of the RDF recommendations currently in working draft).

  • Decision: Not applicable.

  • Rationale: We thank you for your endorsement. We have classified this comment as 'not applicable' because it does not suggest or imply any changes. We would like to note that the Character Model is written so as to make clear that specifications do not have to follow all the requirements, just those that apply in their specific case.

  • Our response (sent 2003-02-13) -- Notification

C029NaNaNJeremy Carroll
RDF Core WG
2breadth of scope
  • Comment (received 2002-05-27) -- breadth of scope

    Concerning sections 1 and 2 RDF Core is concerned that the scope of charmod is overly broad. In particular, there appears to be no acknowledgement that some languages being defined by W3C working groups may not be intended as web languages and hence not have a need to address internationalization issues. There may be an implicit (and false) assumption that all W3C recommendations specify (only) web languages with processing models.

  • Our response (sent 2002-05-27) -- Re: breadth of scope

  • Comment (received 2002-05-28) -- RE: breadth of scope

  • Decision: Not applicable.

  • Rationale: We have classified this comment as 'not applicable', because it is too general. Each CharMod requirement applies only where applicable. For example, if a specification doesn't deal with sorting, then requirements related to sorting do not apply. Also, specifications that don't deal with text (e.g. a bitmap format) would therefore not have any applicable requrements (except e.g. for textual comments and other metainformation embedded in the format). We would also like to point out that the term 'processing model' is taken very widely here. Even if a specification does not have an explicitly defined processing model, it implicitly defines how to process (e.g. match) characters. As an example, RDF conforms to the processing model, on the level of the abstract syntax by virtue of the fact that the abstract syntax is expressed in Unicode, and on the level of RDF/XML by virtue of being based on XML.

  • Our response (sent 2003-02-13) -- Notification

C030ENNJeremy Carroll
RDF Core WG
3.5non-universality of processing model
  • Comment (received 2002-05-27) -- non-universality of processing model

    For the section 3.5 RDF Core WG notes that the language is somewhat offputting for us as specification developers given that our specification explicitly does not have a processing model. We have no particular suggestions about this, nor would we object if the I18N WG chose not to address this issue.

  • Our response (sent 2002-05-27) -- Re: non-universality of processing model

  • Comment (received 2002-05-28) -- RE: non-universality of processing model

  • Decision: Noted.

    Rationale: We have classified this comment as 'Noted', because it did not contain any suggestions for changes.

    However, in order to address the misunderstanding that we think this comment exposes, we have added some text (just before C014):

    "Also, while this document uses the term Reference <emph>Processing</emph> Model and describes its properties in terms of processing, the model also applies to specifications that do not explicitly define a processing model."

    We hope that this clarifies the situation for RDF: Even if there is no processing model for RDF, on the level of text processing, RDF conforms to the Charmod Reference Processing Model because of the way the abstract syntax is defined in terms of Unicode characters and because of the way XML is used.

  • Our response (sent 2003-02-13) -- Notification

C031SPNJeremy Carroll
RDF Core WG
8no dependency on IRI draft
  • See also the following comments: C059 C170

  • Comment (received 2002-05-27) -- no dependency on IRI draft

    The main concern of the RDF Core WG is section 8. Any normative section of charmod MUST NOT depend on the IETF IRI draft which is not finished and is not yet stable. We draw attention to 'SHOULD use Internationalized Resource Identifiers (IRI) [I-D IRI]'. The IRI draft is only a draft, the reference to it is not normative, and the strength of this SHOULD dependency appears excessive ('not optional'). In particular, the IRI draft does not adequately address IRI equality (not merely functional equivalence in retrieval). Moreover, the bidi section presents a learning curve which developers are unlikely to want to climb before IRI has consensus around it; We have found the text in Xlink section 5.4 and XML Erratum 26 adequately clear for some of the IRI questions, particularly those that are most pressing for RDF and believe that charmod should merely:

    - reiterate such text;

    - reiterate the early uniform normalization model for the iris when regarded as unicode strings

  • Decision: Partially accepted.

    Rationale: Our plan is that the IRI ID, referenced in this section, will have been submitted for Proposed Standard by the time CharMod moves to the next stage. IRI equality is fully addressed in the latest IRI ID version.

  • Our response (sent 2003-02-13) -- Notification

C032NaNaOJeremy Carroll
RDF Core WG
VariousOverview of RDF Core feedback
  • Comment (received 2002-05-27) -- Overview of RDF Core feedback

    The RDF Core WG has made feedback concerning the following sections of charmod:

    > 1. Introduction

    > 2. Conformance

    > 3.4 Strings

    > 3.5 Reference Processing Model

    > 4. Early Uniform Normalization

    > 6. String Identity Matching

    > 8. Characeter Encoding in URI References

    > 9. Referencing the Unicode Standard

    > A.2 Other References

    > C. Composing Characters

    > D. Resources for Normalization

    [...]

    RDF Core makes no comments on the other sections.

  • This comment lists the sections that have been commented on by the RDF Core WG. Please see the specific comments listed below.

  • This comment has been split into the following comments: C028 C029 C030 C031

C033EPNIan Jacobs
-
3.1.6Use of word 'byte'
  • Comment (received 2002-05-28) -- Use of word 'byte'

    Proposal: Change 'the word bytes is generally considered to mean 8-bit bytes' to 'the word bytes is generally considered to mean 8 bits.'

  • Decision: Improve the sentence.

  • Decision: Partially accepted.

  • Rationale for 'Partially accepted': We think we can improve on the suggested wording.

  • Our response (sent 2003-02-17) -- Notification

C034SASJoseph Reagle
XML Sig WG
3.6.3Private Use Code Points: Disagreement with our approach
C035SANJoseph Reagle
XML Sig WG
Various'All W3C specs must conform.'
  • Comment (received 2002-05-24) -- Re: 2nd Last Call for the Character Model for the WWW

    Please do not state 'All W3C specs must conform.' I think you should:

    a. state in the STATUS that the intent is that this will be used for W3C specifications.

    b. state, 'any spec wishing to conform must ...' and how, when, and what unforeseen exceptions might be permitted becomes is a matter of W3C policy -- perhaps following Ian's suggestion.

  • Decision: Accepted

  • Rationale: We have originally rejected this comment. We have later, after extensive discussions, been instructed by W3C Management that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language. We have also been instructed to request a finding from the TAG corresponding to the text that we removed.

C036EANJoseph Reagle
XML Sig WG
3.1.3Define 'logical order'
C037EANJoseph Reagle
XML Sig WG
4.1.1Character 'í' hard to distinguish from 'i', particularly when italicized
C038EANJim Melton
XML Query WG
2Conformance of new vs. old specs
  • See also the following comments: C051 C088 C089 C135

  • Comment (received 2002-05-31) -- Conformance of new vs. old specs

    Section 2, 'Conformance', contains the following statements:

    [S] Every W3C specification MUST:

    1. conform to the requirements applicable to specifications,

    2. specify that implementations MUST conform to the requirements applicable to software, and

    3. specify that content created according to that specification MUST conform to the requirements applicable to content.

    [S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform.

    It seems strange that 'Every...specification MUST...conform to the requirements', but that existing specifications that do not conform 'SHOULD be modified'. While we assume that the intent is to require conformance by *new* specifications without mandating updates to existing specifications solely for conformance reasons, the wording is certainly surprising and could be made clearer.

  • Decision: Accepted

    You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear.

C039EANJim Melton
XML Query WG
3.1.5Determining relevant language for sorting
  • Comment (received 2002-05-31) -- determining relevant language for sorting

    Section 3.1.5, 'Units of collation', contains the statement 'Note that, where searching or sorting is done dynamically, particularly in a multilingual environment, the 'relevant language' should be determined to be that of the current user, and may thus differ from user to user.'

    While we agree that the user's language is frequently the most reasonable choice to be used in determining a collation to be used for various operations, it is most certainly not *always* the best choice. This is particularly true when massive amounts of data have been placed into a repository of some sort (e.g., a database) using the semantics of the data itself. For example, database systems often enhance retrieval performance through the use of special structures ('indexes') that are created long in advance of knowing what user might be retrieving the data. In such cases, it might be determined that the 'best' default language is the language of the data, or of the repository, or of some other entity.

    To ensure that the statement in Section 3.1.5 is interpreted to allow this situation, the phrase 'should be determined' should ;^) be replaced with 'SHOULD be determined'. By formalizing the term ('SHOULD'), the Character Model document properly recognizes that some applications require the ability to have different defaults.

  • Decision: Accepted

  • Our response (sent 2003-05-01) -- Notification

C040EANJim Melton
XML Query WG
3.1.7How to avoid use of the term 'character'?
  • See also the following comments: C004 C138 C166

  • Comment (received 2002-05-31) -- How to avoid use of the term 'character'?

    Section 3.1.7, 'Summary', contains a paragraph that reads: '[S] When specifications use the term 'character' it MUST be clear which of the possible meanings they intend. [S] Specifications SHOULD avoid the use of the term 'character' if a more specific term is available.'

    This paragraph would be considerably more useful if it either contained a list of the possible meanings or contained a pointer to another location in the document that provides such a list.

  • Decision: Accepted

  • Decision: We'll add clarification.

  • Our response (sent 2003-05-01) -- Notification

C041SANJim Melton
XML Query WG
3.2Proprietary charset identifiers
  • See also the following comments: C139

  • Comment (received 2002-05-31) -- proprietary charset identifiers

    Section 3.2, 'Digital Encoding of Characters', list element 4, contains the phrase '... is identified by an IANA charset identifier.'

    In fact, there are a great many CESes that are identified by charset identifiers that are not assigned by IANA at all, but that are 'created' by proprietary means (e.g., corporations). The Character Model specification must not prohibit the use of CESes identified by charset identifiers assigned through other means.

    To correct this, simply change '...is identified by an IANA charset identifier.' to '...is identified by a unique identifier, such as an IANA charset identifier.'

  • Decision: Rejected.

  • See our response (below) for our rationale.

  • Our response (sent 2002-06-05) -- Re: proprietary charset identifiers

    [...] Please tell us, at your earliest convenience, whether you are satisfied with our decision or not. If not, please provide additional rationale.

  • Comment (received 2002-06-13) -- Re: proprietary charset identifiers

    [...] I assure you that I am not satisfied with the decision [...]

  • Decision: Accepted.

  • Note: We've made the requested change and will ask the XML Query WG whether they have further concerns about section 3.6.2:

    [S] If the unique encoding approach is not taken, specifications SHOULD mandate the use of the IANA charset registry names [...]

    If they do have such concerns, they need to raise a separate comment.

    After some discussion (see the last message on this from Jim Melton) we have decided to accept the comment as it was made. We have changed "... is identified by an IANA charset identifier." to "... is identified by a unique identifier, such as an IANA charset identifier."

    However, our exchange suggests that the XML Query WG may also not be okay with some of the wording in Section 3.6.2, which (among else) says: "[S] If the unique encoding approach is not taken, specifications SHOULD mandate the use of the IANA charset registry names [...]"; if this is the case, please indicate so at as soon as possible, or we will have to assume that this is okay with you.

C042SA Jim Melton
XML Query WG
4.4Discussion of subsequent items
  • Comment (received 2002-05-31) -- DISCUSSION OF SUBSEQUENT ITEMS

    Section 4.4, 'Responsibility for Normalization', is the section in which the Query WG is most interested. Intense discussions have taken place in the past over the subject of when normalization should, should not, must, or must not be performed, and what components of an environment have responsibilities in that area. Implementers of data repositories that might contain vast quantities of data (e.g., database systems) have expressed particular concerns about this, observing that some applications involve the need to store data very quickly, but retrieve it in a less urgent fashion, while other applications place severe demands on retrieval but have fewer constraints on storing data. In other words, the demands of users of applications, not a rigid policy, must govern *some* aspects of the decision about when normalization is performed and by whom.

  • Decision: Accepted.

C043SA Jim Melton
XML Query WG
4.4Objection to prohibition against receiver from normalizing text
  • Comment (received 2002-05-29) -- Objection to prohibition against receiver from normalizing text

    Section 4.4, 'Responsibility for Normalization', specifies one requirement (on web content) that states '[C] In order to conform to this specification, all text content on the Web MUST be in include-normalized form and SHOULD be in fully-normalized form.' While this statement expresses a clearly desirable situation, it is 100% guaranteed that the web will *never* contain only include-normalized text. The character model MUST (irony noted) recognize that fact and give guidance for dealing with such data. A preferred alternative, currently prohibited by the Character Model document, is to allow a *consuming* application to do the normalization. [The Character Model] currently prohibits such action, as addressed in the next comment. This prohibition is unreasonable and we believe that its inclusion will dramatically inhibit adoption of the Character Model by products and implementors.

  • Decision: Accepted.

C044SA Jim Melton
XML Query WG
4.4Prohibition against normalizing suspect text
  • Comment (received 2002-05-31) -- Prohibition against normalizing suspect text

    In Section 4.4, 'Responsibility for Normalization', another requirement (on implementations and on specifications) states: '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form, and MUST NOT normalize the suspect text. Private agreements MAY, however, be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.'

    We do not object to the observation that normalization-sensitive operations are best performed on normalized text. However, the requirement (as stated) clearly prohibits a consuming application from normalize non-normalized text that it receives. We are quite opposed to this prohibition for a variety of reasons.

    One problem is that the requirement doesn't make it clear what the consuming application's behavior must be, but it seems reasonable to conclude that the consuming application must reject the un-normalized text. Of course, such an application that silently rejects such text is unlikely to be considered user-friendly, so we might guess that an error can be raised in some manner. But that makes the application very unhelpful in general, since users (e.g., of web browsers) often wish to access text regardless of how rigidly it conforms to the Character Model's requirements.

    The 'Private agreements' clause starts off in a promising manner, but then requires that the results of such agreements remain unhelpful to users of the applications, since the application is not allowed to produce 'observable results' based on handling un-normalized text.

    So, what can be done to support applications (and specifications) that must deal with text cannot always be guaranteed to be normalized? We very much want certain classes of applications to be allowed to do normalization on un-normalized text and we are willing to participate in discussions that identify those classes of applications.

    A rather cynical way out of this dilemma that can be imagined is for an application (e.g., a database management system) to 'read' suspect text and then 'create' brand new normalized text that just happens to be character-for-character identical to the un-normalized text it received. That obviously implies degenerating into games just to get around conformance requirements; instead, we must fix the specifications and requirements themselves.

  • Decision: Accepted.

C045SA Jim Melton
XML Query WG
4.4Prohibition against interim unnormalized states
  • Comment (received 2002-05-31) -- Prohibition against interim unnormalized states

    Section 4.4, 'Responsibility for Normalization', contains a requirement that states: '[I] A text-processing component which modifies text and performs normalization-sensitive operations MUST behave as if normalization took place after each modification, so that any subsequent normalization-sensitive operations always behave as if they were dealing with normalized text.'

    We believe that many implementors, on grounds of performance considerations, disagree with the requirement that normalization take place after each operation. While we recognize that the Note following the quoted requirement suggests a way to ease the performance issue (using what we call 'local normalization'), we believe that a couple of good examples will help ease implementors' concerns. More importantly, we believe that some application requirements are best satisfied by allowing such (normalization-sensitive) operations on text that has not yet been proven to be normalized. A requirement that such operations on such text result in fully-normalized text poses the unreasonable additional burden of doing complete normalization of possibly huge text without a clear application need to do so.

    We urge the document's editors to modify the requirements to recognize the facts that some applications require the ability to use and manipulate un-normalized text and that operations on such text need not necessarily be normalized. In XQuery, we have a normalize function to allow applications to *force* normalization when they require normalized text and we believe that this is a useful way forward.

  • Decision: Accepted.

C046NaNaODavid Fallside
XMLP WG
VariousXMLP WG response to Charmod LC#2
C047NaNaOTim Bray
-
VariousComments on Character Model
C048NaNaOYin Leng Husband
WSArch WG
VariousWSArch WG review of Charmod LC #2
C049NaNaONorman Walsh
TAG
VariousTAG comments on Character Model for the World Wide Web 1.0
C050SASCliff Schmidt
Microsoft
VariousCharMod restricts closed systems
  • Comment (received 2002-06-06) -- CharMod restricts closed systems

    In the effort to improve interoperability of text exchange across open applications on the Web, the Character Model should not restrict the ability for closed systems to leverage the Web and Web-based technologies The term 'closed system', as used in this document, refers to a system designed to support organizations communicating among themselves based on a contract into which all parties have explicitly entered.

    The background of this spec states that 'the Web may be seen as a single, very large application...rather than as a collection of small independent applications.' Based on this premise, it is understandable why the CharMod spec chooses to require early normalization in a single canonical form. However, the Web and technologies that have developed to support the Web have also provided enormous value to closed systems, including intranet and extranet scenarios. The relationship between the evolution of the World Wide Web and its use in closed/private systems has been a mutually beneficial one. Private systems have benefited from the efficiencies of applying Web-developed standards and tools, which has in-turn increased the demand and support for these Web-enabling components. The current Character Model spec threatens to break this relationship by forcing restrictions on tools that are commonly used in closed systems, in order to exclusively support the goals of the open system Web.

    It is apparent that the I18N WG has solid reasons for preferring Normalization Form C as an interchange format for the Web; however, it is not likely to be the optimal choice for all applications. There are many legacy systems (both applications and operating systems) that use a decomposed character normalization. It will be difficult for organizations to justify why they should adopt CharMod-based technologies (such as XML 1.1 over XML 1.0), which require transcoding to a less optimal normalization form with no benefit for their closed system. This is likely to lead to fractured use of technologies such as XML 1.0/1.1.

    Finally, XML plays an important role as a data-interchange format in scalable, loosely coupled systems. The Character Model reduces XML to a format applicable only to natural language communication in one particular normalization form. This is unfortunate considering that vastly more bytes of machine-to-machine XML are transmitted than are people-to-people or people-machine bytes.

    The restrictions mandated by the Character Model limit the use of the Web and Web-based technologies for a large base of users. While supporting the vision for the Web as a 'single, very large application', the limitations to other uses of the Web does not appear to support the Character Model's goal to 'facilitate the use of the Web by all people'.

  • Decision: Accepted.

  • Decision: Changed to overall SHOULD in 4.4.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C051EASCliff Schmidt
Microsoft
2Inconsistent/Redundant Requirements for W3C Spec Conformance
  • See also the following comments: C038 C088 C089 C135

  • Comment (received 2002-06-06) -- Inconsistent/Redundant Requirements for W3C Spec Conformance

    '[S] Every W3C specification MUST:

    1. conform to the requirements applicable to specifications,

    2. specify that implementations MUST conform to the requirements applicable to software, and

    3. specify that content created according to that specification MUST conform to the requirements applicable to content.

    [S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform.'

    CONCERN: Stating that all specs 'MUST' conform, but that non-conforming specs 'SHOULD' be modified appears to be inconsistent.

    RECOMMENDATION: The conformance model should only apply to future specs (including future versions of current specs), instead of specifying different conformance levels for existing and future versions.

  • Decision: Accepted.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C052SPSCliff Schmidt
Microsoft
2W3C Spec Conformance
  • Comment (received 2002-06-06) -- W3C Spec Conformance

    '[S] Every W3C specification MUST:

    1. conform to the requirements applicable to specifications,

    2. specify that implementations MUST conform to the requirements applicable to software, and

    3. specify that content created according to that specification MUST conform to the requirements applicable to content.

    [S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform.'

    CONCERN: The CharMod's requirement that all specs and related implementations conform to the entire CharMod spec will force non-NFC based applications to perform round trip transcoding to/from NFC in order to use the Web, even in closed system scenarios (e.g. extranets). This will also affect intranet scenarios as corporate systems are forced to jump through hoops in order to satisfy text processors (such as XML parsers) that are required to reject non-NFC text. The costs certainly outweigh the benefits for closed systems. However, it is clear that a recommended conformance level would improve open system interoperability.

    RECOMMENDATION: Replace conformance paragraph and included list with the following sentence:

    '[S] Future W3C specifications (including future versions of existing specifications) MUST reference this specification as W3C recommended guidance for interoperable Web applications.'

  • Decision: Partially accepted. The concerns about NFC have been addressed in section 4.4

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C053EPSCliff Schmidt
Microsoft
3.5Full Range of Unicode Code Points Not Allowed in XML
  • Comment (received 2002-06-06) -- Full Range of Unicode Code Points Not Allowed in XML

    '[S] Specifications SHOULD allow the use of the full range of Unicode code points from U+0000 to U+10FFFF inclusive; code points above U+10FFFF MUST NOT be used.'

    CONCERN: If this is truly a goal for text on the Web, users should understand why XML is unable to achieve this. As a high profile W3C spec, readers are likely to notice the inconsistent message. Does this mean that I18N believes that XML (1.1 or some later version) should support the characters 0x0-0x1F?

    RECOMMENDATION: If XML 1.1 is unable to achieve this goal, the Character Model spec should either remove this requirement or explain the discrepancy.

  • Decision: Partially accepted.

  • Rationale for 'Partially accepted': It is up to each specification to provide the specific reason(s) for deviating from a 'SHOULD' requirement. We have however amended the text to read: 'Specifications SHOULD not arbitrarily exclude characters from the full range of Unicode code points from U+0000 to U+10FFFF inclusive;'.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C054ERSCliff Schmidt
Microsoft
4.2.3Definition of 'Fully-Normalized'
  • Comment (received 2002-06-06) -- Definition of 'Fully-Normalized'

    'Text is fully-normalized if:

    1. the text is in a Unicode encoding form, is include-normalized and none of the constructs comprising the text begin with a composing character or a character escape representing a composing character; or

    2. the text is in a legacy encoding and, if it were transcoded to a Unicode encoding form by a normalizing transcoder, the resulting text would satisfy clause 1 above.'

    CONCERN: Based on previous definitions, 'Unicode-normalized' may be a more precise term than 'Unicode encoding form' (if the implication is that full normalization requires include-normalization, which requires Unicode normalization as defined in 4.2.1.

    RECOMMENDATION: Refer to text that is 'Unicode-normalized' (possibly linked to the definition [in 4.2.1]), instead of 'Unicode encoding form'.

  • Decision: Partially accepted.

  • Rationale: Checking reveals that we could go either way; change record to partially accepted, but no change.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C055SRSCliff Schmidt
Microsoft
4.4Mandating NFC for All Web Content
  • Comment (received 2002-06-06) -- Mandating NFC for All Web Content

    '[C] In order to conform to this specification, all text content on the Web MUST be in include-normalized form and SHOULD be in fully-normalized form.

    [S] Specifications of text-based formats and protocols MUST, as part of their syntax definition, require that the text be in normalized form.'

    CONCERN: This restriction is currently applied to 'all text content on the Web' when only content intended for interoperability in open systems will necessarily benefit from it. Although the first requirement for content producers ([C]) could be interpreted to have no impact on intranet scenarios, the second requirement for specifications ([S]) will impact the tools that intranet scenarios have been depending on.

    RECOMMENDATION: Replace above text with text similar to:

    '[C] In order to conform to this specification, all text content on the Web intended for consumption by foreign systems MUST be in include-normalized form and SHOULD be in fully-normalized form.

    [S] Specifications of text-based formats and protocols MUST, as part of their syntax definition, reference the above requirement.'

  • Decision: Rejected. Note, however, the relaxation of the language in section 4.4.

  • Rationale: 'Foreign systems' is undefined and undefinable.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C056SPSCliff Schmidt
Microsoft
4.4Text-Processors MUST Perform Normalization Checking
  • Comment (received 2002-06-06) -- Text-Processors MUST Perform Normalization Checking

    '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form, and MUST NOT normalize the suspect text. Private agreements MAY, however, be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.'

    CONCERN: This requirement will force technologies such as XML parsers to be tied to the latest list of NFC disallowed diacritic characters in order to check normalization. Additionally, in some cases ('MAYBE' cases) NFC checks require text processors to scan backwards through a text stream in order to confirm normalization status. This will require major architectural changes for any processors designed to break a text stream into separate smaller windows for efficient processing, because no previously processed buffer can be thrown away until it is no longer needed to confirm the validity of any diacritic code points at the start of the next buffer. Text processors that expand character entities today at least have the ability to note the ‘&’ flag. It is also worth noting that optimizers of normalization checks will observe that all code points < 0x341 are always allowable. This would result in non-English based texts being disproportionately impacted by normalization checks. Finally, this requirement forces the redefinition of XML to allow for only NFC text.

    RECOMMENDATION: Replace the above text with text similar to:

    '[S] [I] Text-processing components MAY include an option to verify that suspect text is in normalized form. Text-processing components MUST NOT normalize the suspect text without specific direction.'

  • Decision: Partially accepted.

    Rationale: Try to add a note explaining that in the base case, only a one-character lookahead is needed. In the long term, try to move material about 'composing' characters to UAX 15.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C057SPSCliff Schmidt
Microsoft
4.4Content Producers and Proxies
  • Comment (received 2002-06-06) -- Content Producers and Proxies

    'NOTE: As an optimization, it is perfectly acceptable for a system to define the producer to be the actual producer (e.g. a small device) together with a remote component (e.g. a server serving as a kind of proxy) to which normalization is delegated. In such a case, the communications channel between the device and proxy server is considered to be internal to the system, not part of the Web. Only data normalized by the proxy server is to be exposed to the Web at large, as shown in the illustration below:'

    CONCERN: Although this note seems to allow closed systems to define their own boundaries, these systems will still be prevented from leveraging technologies based on CharMod, without first normalizing.

    RECOMMENDATION: If the CharMod spec continues to mandate that all text processors must check normalization, this note should point out that such processors could not be used between devices and their proxies.

  • Decision: Partially accepted.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C058ERSCliff Schmidt
Microsoft
4.4Web Repositories
  • Comment (received 2002-06-06) -- Web Repositories

    'A similar case would be that of a Web repository receiving content from a user and noticing that the content is not properly normalized. If the user so requests, it would certainly be proper for the repository to normalize the content on behalf of the user, the repository becoming effectively part of the producer for the duration of that operation.'

    CONCERN: As noted in issue, 'Content Producers and Proxies', this scenario is not possible if XML is to be used between the user and the Web repository. This scenario also seems to imply that users may very liberally interpret the boundaries of content production. Could one also claim that the transfer from one repository to another repository was also contained within the realm of content production? What if the second repository was the final destination for the content; does this mean the content never in fact needed to be normalized?

    This scenario seems to encourage users to get around normalization requirements where impractical or inappropriate, yet leaves them with a confusing message and no legal tools to work with (since tools such as XML parsers will only accept normalized text anyway).

    RECOMMENDATION: Delete this paragraph.

  • Decision: Rejected.

  • Rationale: Defining the boundary of content production is outside the scope of this specification.

  • Our response (sent 2003-03-06) -- Notified and discussed during FTF mtg

  • Comment (received 2003-03-15) -- Satisfied

C059SANCliff Schmidt
Microsoft
8IRIs
  • See also the following comments: C031 C170

  • Comment (received 2002-06-06) -- IRIs

    '[S] W3C specifications that define protocol or format elements (e.g. HTTP headers, XML attributes, etc.) which are to be interpreted as URI references (or specific subsets of URI references, such as absolute URI references, URIs, etc.) SHOULD use Internationalized Resource Identifiers (IRI) [I-D IRI] (or an appropriate subset thereof).'

    CONCERN: Although other W3C specifications support goals similar to those of the IRI proposal, we hesitate to endorse this section of the CharMod spec until the IRI draft has undergone further review.

    RECOMMENDATION: Considering the W3C practice to keep the maturity level of a technical report within one level of any technical report on which it depends, the Character Model should not be considered for Recommendation until the IRI proposal has reached RFC status. Although the precedent has typically referred to W3C dependencies, it seems reasonable that any dependency on a spec outside the W3C should be judged by criteria at least as strong as those imposed on the W3C.

  • Decision: Accepted.

    Our plan is that the IRI ID, referenced in this section, will have been submitted for Proposed Standard by the time CharMod moves to the next stage. IRI equality is fully addressed in the latest IRI ID version.

C060NaNaNDavid Fallside
XMLP WG
2XML Protocol LC#2 review question on implementation testing
  • Comment (received 2002-05-30) -- XML Protocol LC#2 review question on implementation testing

    In reviewing the Charmod LC#2, the XML Protocol WG has a request in relation to implementation conformance.

    - Section 2 Conformance, 3rd paragraph, last sentence

    - '[S] [I] [C] In order to conform to this document, specifications MUST NOT violate any requirements preceded by [S], software MUST NOT violate any requirements preceded by [I], and content MUST NOT violate any requirements preceded by [C].'

    - The XML Protocol WG has produced a protocol specification, which makes various testable assertions, as well as a Collection of Tests each showing whether an assertion is implemented in the protocol processor. As such, the tests do not check for conformance to other specifications (e.g. XML 1.0). The XML Protocol WG asks the I18n WG to comment on whether the [I] implementation requirements of Charmod apply to the XMLP Test Collection.

  • Decision: Not applicable

  • Rationale: We have classified this comment as 'not applicable', because it is a question, not a comment leading to a potential change of the Character Model. The test suite should test for CharMod-related requirements in the specification(s) being tested. The tests should conform to [C] requirements (except where they are wrong on purpose). If the test collection includes code, then that should also conform to [I] requirements.

C061ENaNDavid Fallside
XMLP WG
1.1'All W3C specifications must conform to this document'
  • Comment (received 2002-05-30) -- XMLP WG response to Charmod LC#2

    Goals and Scope, second last paragraph

    'All W3C specifications must conform to this document (see section 2 Conformance).'

    This statement seems too comprehensive and probably needs qualification. What about existing W3C specifications or a W3C specification whose status is very close to LC (e.g. XMLP's)?

    Suggest: 'All W3C specifications published after [a certain date or event such as this Charmod becoming a recommendation] must conform to this document (... etc).'

  • Decision: Rejected

  • Rationale 'Rejected': This para states the general principle and refers to section 2 for details. The various requirements will come into force once the CharMod spec becomes a REC.

  • New decision: Not applicable

    Rationale: We have classified this comment as 'not applicable' because we have been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language from section 2. We still define conformance to CharMod. We have been instructed to request a finding from the TAG corresponding to the text that we removed. So CharMod will be enforced by the fact of being a REC, coupled with an eventual TAG finding and ongoing reviews of relevant specs by the I18N WG.

C062NaNaCDavid Fallside
XMLP WG
4.4'XML protocol need not normalize application payloads or check to insure that they are normalized'
  • Comment (received 2002-05-30) -- XMLP WG response to Charmod LC#2

    Responsibility for Normalization, 8th paragraph, 1st sentence

    '[S] Specifications of text-based formats and protocols MUST, as part of their syntax definition, require that the text be in normalized form'

    In our previous response to the Charmod WD review [1], we said

    'The XML Protocol processor will defer to applications any normalization (early or late) that may be required for sending and/or receiving application payload(s).'

    We received confirmation from i18n WG that that is acceptable:

        > XML protocol need not normalize application payloads

        > or check to insure that they are normalized

        Correct.

    The XMLP WG would like to see clarification of this [S] requirement in relation to the payload's normalization or lack thereof.

  • Decision: We shall write to the XMLP WG explaining why CharMod does not require XMLP to be responsible for the N11N status of the payload.

C063NaNaCDavid Fallside
XMLP WG
4.4Is text to be normalized when forwarded?
  • Comment (received 2002-05-30) -- XMLP WG response to Charmod LC#2

    Responsibility for Normalization, 8th paragraph, 1st sentence

    '[S] Specifications of text-based formats and protocols MUST, as part of their syntax definition, require that the text be in normalized form'

    In a previous review, we asked the question: 'May intermediaries re-send payloads (either normalized or un-normalized) untouched, even though they may change the protocol envelope?' From this requirement, do we take it that re-sent text is to be normalized when forwarded? However, please see comment [C062].

  • Decision: We shall write to the XMLP WG explaining why CharMod does not require XMLP to be responsible for the N11N status of the payload.

C064ERCDavid Fallside
XMLP WG
4.4Give the reason(s) for prohibition against normalizing suspect text
  • Comment (received 2002-05-30) -- XMLP WG response to Charmod LC#2

    Responsibility for Normalization, 9th paragraph, 1st sentence

    '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form,and MUST NOT normalize the suspect text'

    It would be helpful to give the reason(s) for the prohibition against normalizing the suspect text.

  • Decision: Rejected.

  • Rationale: Covered by section 4.1.

C065NaNaCDavid Fallside
XMLP WG
4.4Does rejected un-normalized text have to be normalized before it is returned to sender?
  • Comment (received 2002-05-30) -- XMLP WG response to Charmod LC#2

    Responsibility for Normalization, 9th paragraph, 1st sentence

    '...and MUST NOT normalize the suspect text'

    In a previous review, we asked the question: 'If un-normalized text is rejected and returned to sender, does it have to be normalized before transmission?' From this requirement, do we take it that rejected un-normalized text is to remain un-normalized when returned?

  • Decision: We shall write to the XMLP WG explaining that this should be handled in the same manner as a payload that is not well-formed.

C066NaNa David Fallside
XMLP WG
4.4Specifications that define a mechanism for producing a document SHOULD require that the final output be normalized
  • Comment (received 2002-05-30) -- XMLP WG response to Charmod LC#2

    Responsibility for Normalization, last [S] requirement

    '[S] Specifications that define a mechanism (for example an API or a defining language) for producing a document SHOULD require that the final output of this mechanism be normalized.'

    Is a document text-based format? Is this requirement covered by the earlier one - '[S] Specifications of text-based formats and protocols MUST, as part of their syntax definition, require that the text be in normalized form.'? However, the earlier requirement is stronger ('MUST') than this one ('SHOULD').

  • Q: Is a document text-based format? A: Yes. [We must clarify that we mean the textual parts of documents]

  • Q: Is this requirement covered by the earlier one? A: No. There may not be a spec for the output document, as in the case of plain text, or the spec may not (yet) require N11N.

C067SPSTim Bray
-
3.1.5Collation
  • Comment (received 2002-05-30) -- Comments on Character Model

    [S] [I] Software that sorts or searches text for users MUST do so on the basis of appropriate collation units and ordering rules for the relevant language and/or application.

    Hmm, there are cases where you just don't know the language, and even if you do, is this a requirement in the general case for things like XQuery? I think there are scenarios where it's reasonable to say a particular module shall order things by Unicode character number order and that's all there is to it. I think this should be rewritten to say that IF strings are being collated, they MUST be collated EITHER in the order appropriate to the language they're in, or if that's not possible by unicode character number.

  • Decision: Partially accepted.

  • Note: We'll change the 'MUST' to a 'SHOULD'.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-01) -- Satisfied

C068SPSTim Bray
-
3.6Unique Character Encoding
  • See also the following comments: C114

  • Comment (received 2002-05-30) -- Comments on Character Model

    [S] When designing a new protocol, format or API, specifications SHOULD mandate a unique character encoding.

    No. If the format is in XML and has likely usage scenarios which include creation by humans, this is a good enough reason to just go by the XML rules. For example, I habitually compose XML documents in ISO-8859-1, which suits my needs as a user of European languages. I see no reason whatsoever why a specification should invalidate either my habits or those of a Japanese author who wants to use some flavor of JIS.

    OK, I guess this argument could fall under the exception clause of SHOULD, but I'd go so far as to add

    [S] When designing an XML-based protocol which is apt to be authored by humans, specifications MUST NOT limit the use of character encodings beyond the rules provided by XML.

  • Decision: Partially accepted.

  • Rationale: We have added: "[S] When basing a protocol, format, or API on a protocol, format, or API that already has rules for character encoding, specifications SHOULD use rather than change these rules." and have added XML as an example. As said elsewhere, we prefer not to have requirements specific to a particular format. Also, the 'authored by humans' part is not necessarily true; in general, humans care about the actual text and about the tools they use, not about encodings.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-24) -- Satisfied

C069EPSTim Bray
-
3.6.2Admissibility of UTF-*
  • Comment (received 2002-05-30) -- Comments on Character Model

    The paragraph beginning

    '[S] If the unique encoding approach is not chosen, specifications MUST designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode as admissible... '

    is fine, but if the format uses XML, then XML's rules cover this and in fact require that UTF-8 and -16 are both admissable; which takes priority over the language here and this should be noted.

  • Decision: Partially accepted.

  • Note: Covered by our edit resulting from C114 and your previous comment C068.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-24) -- Satisfied

C070SNSTim Bray
-
4Early Uniform Normalization
  • Comment (received 2002-05-30) -- Comments on Character Model

    I am unable to develop an intelligent opinion as to the cost-benefit trade-off of Early Uniform Normalization and will remain unable to do so without hard information as to the cost. For example, if there was a C-language library available unencumbered by licensing issues which had a memory footprint smaller than say 10k and which ran at I/O speeds, you could reasonably argue that this is a cost effectively equal to zero. On the other hand, if E.U.N. requires a memory footprint of 256K or, worse, understanding and linking to the entire ICU library (blecch), the cost is likely to be unacceptable in a large class of applications.

    There's a normalizer demo at Unicode.org referenced from Appendix D, which suggests that a few hundred lines of Java suffice, but I haven't had time to build to tables or to really think about whether they are being done in the best possible way.

    I think my blockage on this point will be shared by the AC members who will eventually be asked to express on opinion on E.U.N. So I think somebody owes the world the gift of a few quantitative research results on these numbers.

  • Decision: Noted.

    We agree with the sentiment. Refer to some mails by Mark about cost of checking/normalizing. Doing normalization really early (when data is input or converted) is usually very cheap because it can be done by design (e.g. keyboards with dead keys, conversion from a specific legacy encoding).

  • Decision: Noted.

    We agree that this is an important consideration. Please refer to some earlier mails by Mark Davis about cost of checking/normalizing. Doing normalization really early (when data is input or converted) is usually very cheap because it can be done by design (e.g. keyboards with dead keys, conversion from a specific legacy encoding). Normalization is indeed best run at i/o speed, but this should be human input speed rather than network i/o speed.

    A general normalization algorithm needs significantly more than 10KB footprint. But there is quite a wide range of possible tradeoffs between speed and footprint.

    We have added references to implementations and additional material in Appendix D, resources for Normalization. http://www.w3.org/International/Group/charmod-edit/#sec-n11n-resources There is also an FAQ at http://www.unicode.org/faq/normalization.html. An implementation that I (MD) did for just *checking* NFC came in under 50KB (in C). Mark reported 110KB for actual normalization to NFC (in Java).

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-24) -- Satisfied

C071ERDTim Bray
-
6Bit-by-bit identity
  • Comment (received 2002-05-30) -- Comments on Character Model

    List item 4. 'Testing for bit-by-bit identity.'

    <pedantry intensity='severe'>This may be the way you do it but I think it's the wrong way to talk about it. The point about Unicode is that it says is a character is an thingie identified by number which has a bunch of properties. At the end of the day, what you want people to do is to normalize the data in computer storage to a series of non-negative integers and when testing for equality, if you have two sequences of non-negative integers which are equal in length and pairwise equal in value, then you have equality. It is is conceivable in theory that the integer values are stored differently in two parts of the same program; and in practice, who knows what lurks inside a Perl 'scalar', and and what really happens when perl processes the '==' operator?. So I think that item 4 should say the strings are pairwise numerically equal by code point and leave it at that.</pedantry>

  • Decision: Rejected.

  • Rationale: What actually happens in the various programming languages we know is that they all require care to make sure that the encoding is really the same. There is no C function to automatically compare multibyte and wide-character represetations, and so on. We think that it is much better to be too specific to make sure implementers don't forget anything, rather than too abstract.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2003-05-01) -- Dissatisfied

C072SRDTim Bray
-
9Referencing Unicode
  • Comment (received 2002-05-30) -- Comments on Character Model

    [S] Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications SHOULD include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646.

    Change SHOULD to MUST. There's no excuse for doing a spec that talks about this stuff without referencing Unicode. Among other things, it's easy to buy the Unicode spec, and the spec is useful; neither of these things are true about the ISO version.

  • Decision: Rejected.

  • Rationale: We do not think that a MUST is appropriate for this matter. Please see our answer to comment C128.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2003-05-01) -- Dissatisfied

C073SASTim Bray
-
3.1.3[S] Protocols, data formats and APIs MUST store, interchange or process text data in logical order
  • Comment (received 2002-05-30) -- Comments on Character Model

    '[S] Protocols, data formats and APIs MUST store, interchange or process text data in logical order' - shouldn't that be [S] [I] - software should do this too? In fact, arguably this should be [S] [I] [C].

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-01) -- Satisfied

C074SPCTim Bray
-
2[S] [I] and [C]
  • Comment (received 2002-05-30) -- Comments on Character Model

    Mind you, it seems that the boundaries between [S] [I] and [C] are pretty fuzzy. If I were editing this thing, I'd just drop the whole notation and rely on getting the normative language right about what must be done, relying on the spec/data/software authors to follow the normative language that reasonably applies to them.

  • Decision: Partially accepted. Add to section 2 some mention of the 'cascade' of responsibilities from [S] to [I] to [C]. [We have not yet made such an edit]

C075EASTim Bray
-
3.1.6Backward octets
  • Comment (received 2002-05-30) -- Comments on Character Model

    There is a problem in the phrase beginning 'also known as octets'... it seems backward; the reason we talk about 'octets' is that some bytes *used to be* non-8-bit; the fact that they're all 8-bit now means that the term 'octet' is probably a bit redundant. Perhaps the wording is correct but my brain obstinately insists on reading it backward so a little editorial cleanup is in order.

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-01) -- Satisfied

C076ERSTim Bray
-
3.7Absence of explicit end delimiters makes Charmod non-compliant
  • Comment (received 2002-05-30) -- Comments on Character Model

    The bullet point beginning '[S] Escape syntax SHOULD either require explicit end delimiters' is fine, but the charmod document itself doesn't actually comply per section 1.3's description of the U+hhhh notation. It might be elegant to cite the containing document as an example of non-compliance :)

  • Decision: Rejected

  • Rationale for 'Rejected': See below

  • Our response (sent 2002-06-18) -- Re: Absence of explicit end delimiters makes Charmod non-compliant

    [...] The U+hhhh notation does not appear to have any delimiters, but because U+hhhh is used as a word in free-flowing text, the usual word delimiters (space, punctuation) function as delimiters. Also, U+hhhh is not actually any escape syntax, because it is not intended to stand in directly for a character, but to talk about a character on a meta-level. In both aspects, this is similar to character names (e.g. LATIN UPPER CASE LETTER A).

    Please tell us whether you are satisfied or not with this decision at your earliest convenience.

  • Comment (received 2002-06-18) -- Re: Absence of explicit end delimiters makes Charmod non-compliant

    Of course, thank you.

C077EPSTim Bray
-
4.2.2Bold legacy encoding
  • Comment (received 2002-05-30) -- Comments on Character Model

    [L]ist item '2' uses the term 'legacy encoding', since it's defined shouldn't it be in bold?

  • Decision: Partially accepted.

  • Rationale for 'Partially accepted': This is not the first instance of the term.

  • Decision: Add a link to the definition in section 1.2.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-01) -- Satisfied

C078EASTim Bray
-
4.2.2or the absence thereof
  • Comment (received 2002-05-30) -- Comments on Character Model

    [In] 4.2.2 (second NOTE) [and] 4.2.3 (first NOTE) the phrase '(or the absence thereof)' baffles me no matter how many times I read it... please clarify a bit.

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-01) -- Satisfied

C079ERSTim Bray
-
4.4'[C] In order to conform to this specification, all text content on the web MUST ...'
  • Comment (received 2002-05-30) -- Comments on Character Model

    '[C] In order to conform to this specification, all text content on the web MUST...' er, shouldn't this be [I] as well, since a lot of that content is produced by software? But see my comment to 3.1.3 above [since split into C073 and C074].

  • Decision: Rejected.

  • Rationale: Covered by other requirements.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-24) -- Satisfied

C080EANYin Leng Husband
WSArch WG
1.1'must conform to these provisions'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Goals and Scope, last paragraph

    'Since other W3C specifications will be based on some of the provisions of this document, without repeating them, software developers implementing W3C specifications must conform to these provisions.'

    Unclear what 'these provisions' (end of sentence) are since the first part of the sentence refers to only 'some of the provisions'. That is, should software developers implementing W3C specifications conform to some or all of these provisions?

  • Decision: Accepted

  • Decision: Remove the sentence.

  • Our response (sent 2003-02-17) -- Notification

C081EANYin Leng Husband
WSArch WG
1.2'covers the widest possible range'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Background, 3rd paragraph, 2nd bullet

    'covers the widest possible range,'

    Unicode covers the widest possible range of what? Characters? Languages? Scripts? Writing notations?

  • Decision: Accepted

  • Decision: Remove this bullet.

  • Our response (sent 2003-02-17) -- Notification

C082EANYin Leng Husband
WSArch WG
1.2'a way of referencing characters independent of the encoding of a resource'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Background, 3rd paragraph, 3rd bullet

    'provides a way of referencing characters independent of the encoding of a resource,'

    Unclear what the 'resource' is. What is the relationship between the characters being referenced and the 'resource'?

    Is this the intent? - 'provides a way to reference characters independent of the encoding of the characters,'

  • Decision: Accepted

  • Decision: Change 'a resource' to 'the text'.

  • Our response (sent 2003-02-17) -- Notification

C083EANYin Leng Husband
WSArch WG
1.2'Unicode now serves as a common reference'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Background, 4th paragraph, last sentence

    'Unicode now serves as a common reference for W3C specifications and applications.'

    Unclear what sort of 'reference' is meant.

    Is this the intent? - 'Unicode now serves as a common reference character set for W3C specifications and applications'

  • Decision: Accepted

  • Decision: Make the sentence: 'Unicode now serves as a common reference for W3C specifications and applications.' more like the previous two sentences. Additionally, remove 'common reference'.

  • Our response (sent 2003-02-17) -- Notification

C084ERSYin Leng Husband
WSArch WG
1.2'Use of control codes for various purposes'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Background, 8th paragraph, last bullet

    'Use of control codes for various purposes (e.g. bidirectionality control, symmetric swapping, etc.).'

    It would be useful to have links to reference material that explain the issues.

    E.g. 'Use of control codes for various purposes (e.g. bidirectionality control [Unicode Standard 13.2], symmetric swapping [Unicode Standard 13.3], etc.).'

  • Decision: Rejected

  • Rationale for 'Rejected': This is a Background section and the level of detail proposed would be out of place.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C085EANYin Leng Husband
WSArch WG
1.2 'such properties'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Background, 9th paragraph, 1st sentence

    'It should be noted that such properties also exist in legacy encodings (where legacy encoding is taken to mean any character encoding not based on Unicode), and in many cases have been inherited by Unicode in one way or another from such legacy encodings.'

    Unclear what 'such properties' are. The previous sentence talks about 'aspects of Unicode' with no mention of 'properties'.

    Is this the intent? - 'It should be noted that such aspects also exist in legacy encodings (where legacy encoding is taken to mean any character encoding not based on Unicode), and in many cases have been inherited by Unicode in one way or another from such legacy encodings.'

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C086EANYin Leng Husband
WSArch WG
2Inconsistent usage of term 'requirements'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Conformance, 1st NOTE, 1st sentence

    'RFC 2119 makes it clear that requirements that use SHOULD are not optional ...'

    Inconsistent usage of term 'requirements'. The first paragraph of this Conformance section makes a distinction between 'requirements' and 'recommendations'. It says that 'requirements are expressed using the key words 'MUST', ... etc.'. This NOTE talks of 'requirements that use SHOULD ...'

  • Decision: Accepted

  • Decision: Replace our 1st para with this para from RFC 2119:

    The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 'OPTIONAL' in this document are to be interpreted as described in RFC 2119.

  • Our response (sent 2003-02-17) -- Notification

C087NaNaSYin Leng Husband
WSArch WG
2How will conformance be enforced?
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Conformance, 3rd Paragraph, last sentence

    '[S] [I] [C] In order to conform to this document, specifications MUST NOT violate any requirements preceded by [S], software MUST NOT violate any requirements preceded by [I], and content MUST NOT violate any requirements preceded by [C].'

    How will conformance be enforced? Are the the conformance requirements in this document testable for violations?

  • Decision: Not applicable

  • Rationale: We have classified this as 'Not applicable', because you have asked questions rather than suggesting changes to the document. Our answers are as follows:

  • Decision:

    Q: How will conformance be enforced?

    A: Through the usual W3C Process.

    Q: Are the the conformance requirements in this document testable for violations?

    A: Because this is an architectural specification, it is not possible to test the requirements automatically. The conformance requirements are testable by human beings. For some specific [S], [I] and [C] it is possible to write automated tests for some of the requirements in some contexts (such as a specific specification).

  • Comment (received 2004-02-07) -- Satisfied.

C088SASYin Leng Husband
WSArch WG
2'MUST conform' vs 'SHOULD be modified in order to conform'
  • See also the following comments: C038 C051 C089 C135

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Conformance, 5th Paragraph, 1st sentence

    '[S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform'

    This lowered (to SHOULD) conformance requirement seems to contradict that in the preceding paragraph which states that '[S] Every W3C specification MUST conform to the requirements applicable to specifications, ...'

  • Decision: Accepted

    You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear.

  • Comment (received 2004-02-07) -- Satisfied.

C089EASYin Leng Husband
WSArch WG
2How is 'the next version of that specification [to] be modified in order to conform'?
  • See also the following comments: C038 C051 C088 C135

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Conformance, 5th Paragraph, 1st sentence

    '[S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform'

    Current wording says that in order to conform, the next version is to be modified, i.e. without stating nature of modification.

    Is this the intent? - '[S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified so that it then becomes conformant.'

  • Decision: Accepted.

    You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear.

  • Comment (received 2004-02-07) -- Satisfied.

C090EANYin Leng Husband
WSArch WG
2Way unclear
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Conformance, 6th Paragraph, last sentence

    '[I] Where this specification contains a procedural description, it MUST be understood as a way to specify the desired external behavior. Implementations MAY use other ways of achieving the same results, as long as observable behavior is not affected.'

    'way' in the first sentence refers to 'a way to specify' whereas in the second sentence, the 'other ways' are 'ways of achieving' what is specified. Also current wording 'as long as observable behavior is not affected' is probably not the correct requirement.

    Is this the intent? - '[I] Where this specification contains a procedural description, it MUST be understood as a way to specify the desired external behavior. Implementations MAY use different means of achieving the same results, as long as observable behavior is as described.'

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C091ENaSYin Leng Husband
WSArch WG
3.1.1Would be helpful to define 'featural syllabary'
  • See also the following comments: C093

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Introduction, 2nd EXAMPLE, 1st sentence

    'Korean Hangul is a featural syllabary ...'

    Would be helpful to define 'featural syllabary' and explain distinction between a 'syllabary' and 'featural syllabary'. The 1st and 2nd examples give the impression that the distinction is in arranging 'into square syllabic blocks'.

  • Decision: Not Applicable.

  • Rationale for 'Not Applicable': We decided to simplify the text by removing definitions such as abugida, abjad, etc.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C092EPSYin Leng Husband
WSArch WG
3.1.1'combines symbols for individual sounds of the language'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Introduction, 2nd EXAMPLE, 1st sentence

    '... that combines symbols for individual sounds of the language ...'

    Are these 'individual sounds of the language' phonemes or syllables?

    Is this the intent? - '... that combines symbols for individual phonemes [or syllables] of the language ...'

  • Decision: Partially accepted

  • Rationale for 'Partially accepted': This section (3.1.1) is introductory. We don't want to use the term 'phoneme' before the next section (3.1.2), where it is introduced.

  • Decision: Change 'into square syllabic blocks' to 'into square blocks, each of which represents a syllable'.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C093ENaSYin Leng Husband
WSArch WG
3.1.1'Indic scripts are abugidas'
  • See also the following comments: C006

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Introduction, 3rd EXAMPLE, 1st sentence

    'Indic scripts are abugidas.'

    Would be helpful to indicate definition of 'abugidas' explicitly. E.g. 'Indic scripts are abugidas where each consonant letter carries an inherent vowel that is eliminated or replaced using semi-regular or irregular ways to combine consonants and vowels into clusters.'

  • Decision: Not Applicable.

  • Rationale for 'Not Applicable': We decided to simplify the text by removing definitions such as abugida, abjad, etc.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C094ENaSYin Leng Husband
WSArch WG
3.1.1'Arabic script is an example of an abjad'
  • See also the following comments: C093

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Introduction, 4th EXAMPLE, 1st sentence

    'Arabic script is an example of an abjad.'

    Would be helpful to indicate definition of 'abjad' explicitly. E.g. 'Arabic script is an example of an abjad where short vowel sounds are typically not written at all.'

  • Decision: Not Applicable.

  • Rationale for 'Not Applicable': We decided to simplify the text by removing definitions such as abugida, abjad, etc.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C095EANYin Leng Husband
WSArch WG
3.1.1'Usages'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Introduction, 2nd last paragraph, 1st sentence

    'The developers of W3C specifications, and the developers of software based on those specifications, are likely to be more familiar with usages they have experienced and less familiar with the wide variety of usages in an international context.'

    In both instances of 'usages', it is unclear 'usages' of what are intended.

  • Decision: Accepted

  • Decision: Clarify the meaning of 'usages'.

  • Our response (sent 2003-02-17) -- Notification

C096EANYin Leng Husband
WSArch WG
3.1.3'Characters' vs 'character codes'
  • See also the following comments: C026

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Units of visual rendering, 3rd paragraph, 1st sentence

    '[S] [I] Specifications and software MUST NOT assume a one-to-one mapping between character codes and units of displayed text.'

    Inconsistency issue? This sentence speaks of mapping between 'character codes' whereas the third sentence of the first paragraph of 3.1.3 (There is not a one-to-one correspondence between characters and glyphs) speaks of mapping between 'characters', not 'character codes'. Also, in all the other 3.1.x sections, the [S][I] requirements are about non one-to-one correspondence between 'characters', not 'character codes'.

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C097EANYin Leng Husband
WSArch WG
3.1.3Logical order
  • See also the following comments: C036

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Units of visual rendering, 5th paragraph, 3rd sentence

    'The Unicode Standard [Unicode] requires that characters be stored and interchanged in logical order.'

    Would be helpful to define 'logical order' or to provide link to reference material such as Unicode Standard, Section 2.2 where it is defined.

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C098EANYin Leng Husband
WSArch WG
3.1.5 'In Thai the sequence U+0E44 U+0E01 must be sorted as if it was written U+0E01 U+0E44.'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Units of collation, 5th EXAMPLE, 1st sentence

    'In Thai the sequence U+0E44 U+0E01 must be sorted as if it was written U+0E01 U+0E44.'

    Would be helpful to show the actual glyphs for U+0E44 and U+0E01.

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C099ENaSYin Leng Husband
WSArch WG
3.1.7'Character' and 'text' are defined circularly
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Summary, 1st paragraph, 2nd and 3rd sentences

    'In the context of the digital representations of text, a character can be defined informally as a small logical unit of text. Text is then defined as sequences of characters.'

    'Character' and 'text' are defined circularly.

  • Decision: Not applicable

  • Rationale for 'Not applicable': The circularity is intentional. The para in question says that the definition is informal.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C100SRSYin Leng Husband
WSArch WG
3.6.2UTF-8 or UTF-16 as a default encoding form
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character encoding identification, 9th paragraph, 2nd sentence

    '[S] Specifications MAY define either UTF-8 or UTF-16 as a default encoding form (or both if they define suitable means of distinguishing them), but they MUST NOT use any other character encoding as a default.'

    Since specifications 'MUST NOT use any other character encoding as a default' other than 'either UTF-8 or UTF-16' should the beginning of the sentence be '[S] Specifications MUST define either UTF-8 or UTF-16 as a default encoding form... ' ?

  • Decision: Rejected.

  • Rationale for 'Rejected': The provision of defaults is not compulsory.

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Satisfied

C101NaNaOYin Leng Husband
WSArch WG
3.6.2'Specifications MUST NOT propose the use of heuristics to determine the encoding of data'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character encoding identification, 9th paragraph, last sentence

    '[S] Specifications MUST NOT propose the use of heuristics to determine the encoding of data.'

    It would be helpful to either give examples of the undesirable 'heuristics' or the reasons for banning 'use of heuristics'. Would the absence of a BOM in UTF-8 encoding be considered use of heuristics for identifying encoding?

  • This comment has been split into the following comments: C133 C134

C102SANYin Leng Husband
WSArch WG
3.6.2'On interfaces to other protocols, software SHOULD support conversion'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character encoding identification, 12th paragraph, last sentence

    '[I] On interfaces to other protocols, software SHOULD support conversion ...'

    In the phrase 'to other protocols', which is the base protocol that the 'other protocols' are being distinguished from?

    Is this the intent? - '[I] On interfaces to protocols, software SHOULD support conversion ...'

  • Decision: Accepted.

  • Decision: Remove the sentence.

  • Our response (sent 2003-02-17) -- Notification

C103SANYin Leng Husband
WSArch WG
3.6.2'between Unicode encoding forms' or 'to Unicode encoding forms' or both?
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character encoding identification, 12th paragraph, last sentence

    '[I] On interfaces to other protocols, software SHOULD support conversion between Unicode encoding forms as well as any other necessary conversions.'

    Should it be 'between Unicode encoding forms' or 'to Unicode encoding forms' or 'both between and to Unicode encoding forms'?

  • Decision: Accepted.

  • Decision: Remove the sentence.

  • Our response (sent 2003-02-17) -- Notification

C104EASYin Leng Husband
WSArch WG
3.7'instances of the language' vs 'the language'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character Escaping, 1st paragraph, 3rd sentence

    'There is also a need, often satisfied by the same or similar mechanisms, to express characters not directly representable in the character encoding of instances of the language.'

    Why 'instances of the language' and not just 'the language' ?

  • Decision: Not applicable

  • Rationale for 'Not applicable': Languages don't have character encodings inherently associated with them. Language instances do.

  • Comment (received 2003-02-18) -- Re: Your comments on the Character Model [C080-C086, C090-C100, C102-C105, C107-C111]

    It is basically my own lack of understanding of the rationale, i.e. why don't languages have inherently associated character encodings whereas language instances do. It boils down to not knowing the difference between a language and a language instance.

    I suggest that additional explanation or examples be given for the sentence in question.

  • Decision: Accepted. We understand the concern.

    Action: Text has been reworded 'There is also a need, often satisfied by the same or similar mechanisms, to express characters not directly representable in the character encoding chosen for a particular document or program (an instance of the markup or programming language).'

  • Our response (sent 2003-02-17) -- Notification

  • Comment (received 2003-02-18) -- Your comments on the Character Model [C080-C086, C090-C100, C102-C105, C107-C111]

  • Our response (sent 2003-05-07) -- RE: Your comments on the Character Model [C080-C086, C090-C100, C102-C105, C107-C111]

  • Comment (received 2003-06-23) -- Satisfied

C105EANYin Leng Husband
WSArch WG
3.7'a language's syntax, which is itself expressed as characters represented at the character encoding level'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character Escaping, 1st paragraph, last sentence

    ' ... a language's syntax, which is itself expressed as characters represented at the character encoding level.'

    Why is a language's syntax expressed as characters 'represented at the character encoding level' and not just as characters in the sense of abstract symbols?

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C106EASYin Leng Husband
WSArch WG
3.7'Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character Escaping, 4th [S] requirement, 2nd and last sentences

    'Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided. ... Forms like SPREAD's &UABCD; [SPREAD] or XML's &#xhhhh;, where the character escape is explicitly terminated by a semicolon, are much better.'

    The examples of good forms ('where the character escape is explicitly terminated by a semicolon') in the last sentence seem to exhibit the characteristics ('where the end is determined by a character outside the set of characters admissible in the character escape itself') of escape syntaxes that SHOULD be avoided.

  • Decision: Accepted

  • We have replaced "Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided." with "Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself SHOULD be avoided." Although this change is minimal, it should now be clear that this refers to cases where almost any arbitrary character can terminate an escape. Strictly speaking, the ';' in the examples is part of the escape (part of the text that gets replaced), where in other cases, the terminating character itself is not replaced. (often old octal notations work that way).

  • Comment (received 2004-02-07) -- Satisfied.

C107EANYin Leng Husband
WSArch WG
3.7'Escaped characters SHOULD be acceptable wherever unescaped characters are'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character Escaping, 6th [S] requirement, 1st sentence

    '[S] Escaped characters SHOULD be acceptable wherever unescaped characters are; ...'

    What are 'unescaped characters'? Any character not expressed in the escaping mechanism? Seems to say that escaped characters SHOULD be acceptable wherever a character is acceptable (since a character normally is not expressed in the escaping mechanism).

    Is this the intent? - '[S] Escaped characters SHOULD be acceptable wherever their unescaped forms are; ...'

  • Decision: Accepted

  • Decision: Clarify the sentence

  • Our response (sent 2003-02-17) -- Notification

C108EANYin Leng Husband
WSArch WG
3.7'escaped characters SHOULD be acceptable in identifiers and comments'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character Escaping, 6th [S] requirement, last sentence

    'In particular, escaped characters SHOULD be acceptable in identifiers and comments...'

    What if the identifier syntax is defined to be of a set that does not include the character which is escaped?

  • Decision: Accepted

  • Decision: Clarify the sentence

  • Our response (sent 2003-02-17) -- Notification

C109EANYin Leng Husband
WSArch WG
4.2.3'Many languages will benefit from defining more boundaries'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Fully-normalized text, 5th paragraph, last sentence

    'Many languages will benefit from defining more boundaries...'

    It would be helpful to give examples of the 'more boundaries'.

  • Decision: Accepted

  • Decision: Clarify the sentence

  • Our response (sent 2003-02-17) -- Notification

C110EANYin Leng Husband
WSArch WG
4.3.1Missing 'nor' clause
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    General Examples, 3rd paragraph, 1st sentence

    'The string suc¸on (U+0073 U+0075 U+0063 U+0327 U+006F U+006E), where U+0327 is the COMBINING CEDILLA, encoded in a Unicode encoding form, is neither ...'

    The string ... 'is not ...' because there is no 'nor' alternative.

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C111EANYin Leng Husband
WSArch WG
4.3.1'not include-normalized' vs 'not Unicode-normalized'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    General Examples, 5th paragraph, 1st sentence

    '...the string suc¸on (U+0073 U+0075 U+0063 U+0327 U+006F U+006E) which is not include-normalized ('c¸' is replaceable by 'ç').'

    Should it be this? - '...the string suc¸on (U+0073 U+0075 U+0063 U+0327 U+006F U+006E) which is not Unicode-normalized ('c¸' is replaceable by 'ç').'

  • Decision: Accepted

  • Our response (sent 2003-02-17) -- Notification

C112NaNaOKarl Dubost
QA WG
VariousQA Review for Charmod
C113NaNaOMark Scardina
XSL WG
VariousXSL WG Comments on Character Model WD
C114SPSNorman Walsh
TAG
3.6Specifications SHOULD NOT add rules for character encoding beyond what is provided in XML
  • See also the following comments: C068

  • Comment (received 2002-06-04) -- TAG comments on Character Model for the World Wide Web 1.0

    We believe that specifications SHOULD NOT add rules for character encoding beyond what is provided in XML. They MUST NOT restrict character sets beyond what XML allows. In other words, the TAG disagrees with the current wording of the recommendation at the beginning of section 3.6 that says a specification should mandate a unique encoding. We believe a specification must not mandate a single encoding to the exclusion of UTF8/16.

    For some machine-to-machine routing protocol, we accept that restricting the encoding to UTF8/16 would be acceptable. But for specifications designed for editing by humans (such as MathML), we believe that this restriction should not be imposed.

  • Decision: Partially accepted.

  • Decision: Clarify our intent (When building on top of a pre-existing spec such as XML, this is a good enough reason to 'escape' the SHOULD).

  • Additional comments: It is unclear whether the comment is trying to address only XML, or is more general. It mentions XML several times, but is worded as if it may also apply to things outside XML. We think that having a single encoding can be beneficial in many cases, and that XML on this point should not restrict things outside XML. We think that for XML, the considerations given in the comment (protocol vs. document) are important.

    We have added the following text:

    >>>> [S] When basing a protocol, format, or API on a protocol, format, or API that already has rules for character encoding, specifications SHOULD use rather than change these rules.

    EXAMPLE: An XML-based format should use the existing XML rules for choosing and determining the character encoding of external entities, rather than invent new ones. >>>>

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-19) -- Satisfied. TBray: Might be worthwhile to point to RFC3470.

C115NaNaOChris Lilley
TAG
VariousTAG comments on Character Model for the World Wide Web 1.0
C116ERNChris Lilley
TAG
VariousNumbered conformance requirements
C117EPSChris Lilley
TAG
VariousThe use, within the spec, of images of characters
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    Please at least link to an accessible representation of 'foreign' characters rather than merely providing raster images of them. The text of this specification does not conform to itself, since it iuses bytes (pixels) to represent Unicode characters. Its also less than optimal wrt WAI guidelines. Apendix B is a lot better. But if the concern is to ensure correct rendering on legacy browsers, at least provide a link to the actual unicode sample, as characters and markup.

  • Decision: Partially accepted.

  • Rationale for 'Partially accepted': We have carefully reexamined the use of images, character numbers (U+...), character names, and actual characters, and made some corrections.

    We have based the choice of which mean(s) to use in each case on the amount of general support for the characters in question (Latin-1 being supported from the start of the Web, whereas Plane2 not yet being widely available anywhere), and on the importance of visual, logical, or numerical information for the point being made, and have tried to make sure that there are two or more means of representation where appropriate.

    We would like to point out that to some extent, we have to deal with a bootstrap problem. As an example, both the Unicode Standard and the SVG spec use bitmap images as a way to 'ground' one technology in another.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-19) -- Satisfied.

  • Comment (received 2004-01-26) -- Chris Lilley: In general you went further than I expected, but please provide text that can be linked to for remaining text in bitmaps.

  • We have added an Appendix containing the text. You can link to the appropriate text by clicking on an image. This is explained in chapter 1.

C118SPSChris Lilley
TAG
VariousXML 1.0 and 1.1 are non conforming
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    Much of this document is a statement of existing good design practice. Many existing W3C specifications implement large parts of it. This is good. Care should be taken with MUSTs which make W3C Recs non-conforming. For example, XML 1.0 and 1.1 are non conforming.

  • Decision: Partially accepted.

  • Decision: Attempt to clarify terminology such as 'conforming'; Improve text about code points in section 3.5.

  • Rationale for 'Partially accepted': We have attempted to clarify terminology such as "conforming"; (i.e. to indicate that preexisting technology only 'SHOULD' conform even when new one 'MUST'; but this is now to some extent obsolete due to the fact that the application of Charmod to other specs will not be defined by Charmod itself, but rather by a TAG finding (we hope)).

    We have improved text in various instances where we thought that there might be a problem. We never had the intention to make XML 1.0 or XML 1.1 non-conforming. We would be very glad to reexamine and fix any specific instance where you think that we (still) are saying that XML is not conforming if you can point out such specific instances to us.

    On the other hand, we wrote Charmod so that it not only applies to XML, but also to other, potentially new formats. We therefore tried to make sure to indicate best practice for such cases even if these might not always be exactly the same as what XML (to quite some extent for historical reasons) is doing. A typical example would be the use of both decimal and hexadecimal escape syntaxes in HTML and XML.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-19) -- Satisfied.

C119SANChris Lilley
TAG
VariousSplit the document in two
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    These sections (collectively 'character 101'):

    3 Characters.

    5 Compatibility and Formatting Characters.

    6 String Identity Matching.

    7 String Indexing

    9 Referencing the Unicode Standard and ISO/IEC 10646

    taken as a group, are great, in general, and should be collected together with appropritate intrioductory and reference material as a separate document and move to Proposed Rec once it exits Last Call. There is already a large body of existing implementation of these concepts in W3C Recs.

    Section 4 Early Uniform Normalization is very important, but affects a lot of specifications and needs, I believe, a CR period as does section 8 Character Encoding in URI References (perhaps - there is some exprerience for the latter). Thus, I suggest splitting the document so that sections 4 and 8 can move to CR without delaying sections 3,5,6,7,9 which are needed as a Rec ASAP!

  • Original decision: Rejected.

  • Rationale: The proposed approach would result in a lot more work, as all the chapters have been written as part of a single document. Note also that other chapters (including 6 String Identity Matching and 7 String Indexing) would have to be dropped from such an early document, as both depend on Early Uniform Normalization. This would, effectively, leave only chapters 3 Characters and 9 Referencing the Unicode Standard and ISO/IEC 10646 (chapter 5 Compatibility and Formatting Characters consists mainly of a link to an extenal document). The comment overstates the urgency of getting chapters such as 3 Characters to REC status.

  • New decision: Accepted.

    Rationale: We have originally rejected this comment, but we have recently re-examined it, and we are putting together a plan for splitting the document. The sections on string indexing and on string matching depend to a certain extent on normalization, and so we are not completely sure of the final structure of the document at this stage.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-19) -- TAG is [very] happy with the WG's disposition re: comment C119. RF: Note that it hasn't been split yet. TB, TBL, RF: Leave open until this is actually done.

  • Split is now effective.

C120SPSChris Lilley
TAG
3.1.5Remove parts dealing with collation and sorting
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    The portions about collation and sorting (for example 3.1.5 Units of collation) are sparse, vague, and anecdotal which contrasts strangely with the MUSTs; this section should be removed and returned for further work to produce a separate architectural specification on collation that has crisp, well thought out conformance criteria. The maturity of the collation parts does not match that of the 'character 101', normalization and URI reference parts.

  • Decision: Partially accepted.

  • Rationale for 'Partially accepted': We have modified the normative statements (changing from 'MUST' to 'SHOULD' and some wording changes). We disagree that the section on collation/sorting does not match the maturity of the other sections.

    In the context of Section 3.1, Perceptions of Characters, the fact that units of collation are different from other units, and the various issues, are important and well established. The text as well as the examples have been carefully chosen to show the range of phenomena. We do not see the need for a separate architectural document on collation and related issues; there are already an ISO standard and an Unicode Technical Standard, as well as many implementations, for user-oriented sorting/collation.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-26) -- Satisfied

C121NaNaOChris Lilley
TAG
3.1.3Units of visual rendering
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    'Logical selection looks like this:'

    There should be a requirement after that

    [S][I] Specifications of protocols and APIs that involve selection of ranges MUST provide for contiguous logical selections.

    Having defined the terms 'logical selection mode' and 'visual selection mode', please use them rather than the highly ambiguous 'discontiguous selections' and 'contiguous selections', so in fact that should be

    [S][I] Specifications of protocols and APIs that involve selection of ranges MUST provide for text selection in logical selection mode.

    Also, should there not be something about copying that selection and pasting it somewhere else, that what you get is the logical selection?

    Similarly in the next part, I suggest rewording to remove the ambiguous phrase:

    [S] Specifications of protocols and APIs that involve selection of ranges SHOULD provide for text selection in logical selection mode, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs.

    Its not clear that this is such a strong requirement and it complicates processing, especially on handheld devices. Perhaps weaken to MAY? And say what happens when this funky visual selection getc copied and pasted - do you get a set of separate logical selections (if so how delimited)? A single visually ordered selection (yuk)? Something else?

    Otherwise, the weaker requirement for contiguous visual selection is likely to merely encourage the use of visual storage or the disposal of logical storage once the visual result has been generated. Which would lead to text copied from visualy contiguous (logically discontiguous) selections being stored in visual order. Which is to be avoided.

    It would be a good idea to tie into WAI concerns by noting that accessibility tools, which access the DOM, should be able to get at logically ordered text and to know which parts are selected.

  • This comment has been split into the following comments: C174 C175 C176 C177 C178 C179

C122EASChris Lilley
TAG
3.5Specifications MUST be defined in terms of Unicode characters, not bytes or glyphs
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S ] Specifications MUST be defined in terms of Unicode characters, not bytes or glyphs.'

    Yes in general, but not exclusively. The MUST should be retainmed, but the scope of the statement tightened up. Specifications *when they talk about characters* MUST be defined in terms of Unicode characters, not bytes or glyphs. Specifications are allowed to talk about bytes if that is what they are representing (eg, PNG which defines a byte stream) or glyphs (for example the SVG glyph element, which is very clearly defined as a glyph because that is its purpose in life. Although its 'unicode' attribute is, indeed, defined in terms of (a string of) Unicode characters.

  • Decision: Accepted.

    We have added a preliminary qualifying sentence: "All specifications that involve processing of text MUST specify the processing of text according to the Reference Processing Model, namely:"

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-26) -- Satisfied

C123EASChris Lilley
TAG
3.5Is XML non-conforming?
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S] Specifications SHOULD allow the use of the full range of Unicode code points from U+0000 to U+10FFFF inclusive; code points above U+10FFFF MUST NOT be used.'

    So XML is not conforming, since it disallows for example U+0000 ?

  • Decision: Accepted.

  • Decision: Change section 2 to state that if a specification satisfies the conditions laid down in RFC 2119 ('[...] there [...] exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.'), it shall be considered to be conforming.

    For general issues, see #C118. We have reworded the sentence in question, it now reads: "Specifications SHOULD not arbitrarily exclude characters from the full range of Unicode code points from U+0000 to U+10FFFF inclusive; code points above U+10FFFF MUST NOT be allowed."

    We also have added a note:

    "NOTE: Despite the prohibition against arbitrarily excluding characters, specifications will typically exclude Unicode ranges such as surrogate and non-character code points. On the other hand, it would be an example of an arbitrary decision to exclude characters above the Basic Multilingual Plane, or limit the characters to ASCII or Latin-1 repertoire."

    We do not think that the exclusion of U+0000 in XML 1.1, or of the C0 range in XML 1.0, is arbitrary; it was done for very clear reasons.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-26) -- Satisfied

C124NaNaOChris Lilley
TAG
3.6.2Character encoding identification
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S] If the unique encoding approach is not chosen, specifications MUST designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode as admissible encodings and SHOULD choose at least one of UTF-8 or UTF-16 as mandated encoding forms (encoding forms that MUST be supported by implementations of the specification).'

    Does that mean that, for example, saying UTF-8 is allowed and UTF-16 is disallowed and an encoding declaration is not required, is okay?

    Needs a little more on encodings that are a group of similar but not identical encodings, for example shift-jis.

    'Because of the layered Web architecture (e.g. formats used over protocols), there may be multiple and at times conflicting information about character encoding. [S] Specifications MUST define conflict-resolution mechanisms (e.g. priorities) for cases where there is multiple or conflicting information about character encoding.'

    Yes. Better though to not define such layering; the XML MIME RFC messed this up by allowing the charset and the xml encoding declaration to differ and for the former to take precedence; this requires 'save as' to rewrite the XML otherwise it is no longer well formed.... better to require any transcoders to leave XML alone or to know how to rewrite the encoding declaration if they change the encoding.

    'Certain encodings are more or less associated with certain languages (e.g. Shift-JIS with Japanese); trying to support a given language or set of customers may mean that certain encodings have to be supported.'

    The corollary should be clearly stated: do not assume that 'everyone' supports a favored but non-mandated encoding 'every parser I know supports Latin-1/Shift-JIS' is not true.

  • This comment has been split into the following comments: C182 C183 C184 C185

C125SPSChris Lilley
TAG
3.6.33.6.3 contradictory
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S] Specifications SHOULD NOT provide mechanisms for agreement on private use code points between parties and MUST NOT require the use of such mechanisms. '

    svg glyph with a unicode='&#xFE00;' is that a private agreement (aand hence in contravention)? If you disallow it, though, you break the following

    '[S] [I] Specifications and implementations SHOULD be designed in such a way as to not disallow the use of private use code points by private arrangement.'

    and in practice, duisallowing it would merely encourage mapping glyphs to the ascii code range wheras they should use the correct unicode code point or, if none, the PUA. Related point, avoid using character mechanisms for things that are not characters ('pi' fonts). Use small inline graphics instead.

  • Decision: Accepted.

    We agree with your concern about e.g. an svg glyph with an attribute unicode="&#xFE00;". We have changed the text somewhat, please check. However, we would like to point out that this svg mechanism is not designed for agreement on private use characters, it is designed for rendering of characters in general. It can be used for *rendering* of private-use characters, which may be appropriate or necessary in some cases.

    It could also be misused to completely change the rendering of some text (in the case of Chinese or Japanese easily to an extent that would completely change the meaning of the visually appearing text). While the use for private use characters could be checked, the use for completely changing the rendering could obviously not be checked by an SVG implementation.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-26) -- Satisfied

C126ENaSChris Lilley
TAG
3.7Should XML allow NCRs everywhere?
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S] Escaped characters SHOULD be acceptable wherever unescaped characters are; this does not preclude that a syntax-significant character, when escaped, loses its significance in the syntax. In particular, escaped characters SHOULD be acceptable in identifiers and comments.'

    XML should allow NCRs everywhere, for example inside element and attribute names?

  • We have classified this as "Not applicable", because it was a question.

    Our answer is: Yes, in an ideal world, or if we ever got to redo XML, it would be preferable to allow NCRs e.g. in element and attribute names, because this leads to a more clearly layered encoding model. Indeed the I18N WG at one time was in contact with Jon Bosak and others (including members of the respective ISO committee) to investigate the possibility of such a change. As explained under #C118, this does not mean that XML is non-conformant, nor that it should be changed. But it is important to note this experience for any new formats. We would also like to note that CSS and Java do it this way.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-01-26) -- Satisfied

C127SRNChris Lilley
TAG
8Say that the IRI form is used in the document instance and the hexified URI form when it goes over the wire
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S] W3C specifications MUST define when the conversion from IRI references to URI references (or subsets thereof) takes place, in accordance with Internationalized Resource Identifiers (IRI) [I-D IRI].'

    Why not go further and say that the IRI form is used in the document instance and the hexified URI form when it goes over the wire? It would be bad if different XML namespaces defined different processing here.

  • Decision: Rejected.

  • Rationale: We do not want to preclude the direct use of IRIs by wire protocols. Whether to use URIs or IRIs is defined by the wire protocol in question. HTTP currently defines to use URIs, a new version of HTTP (if ever needed) or some other protocol may use IRIs. Similar considerations apply to documents formats, some document formats in some 'slots' may allow IRIs, whereas others don't.

  • Our response (sent 2004-01-16) -- Notification

C128ERSChris Lilley
TAG
9Referencing the Unicode Standard and ISO/IEC 10646
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    'Conformance to Unicode implies conformance to ISO/IEC 10646, see [Unicode 3.0] Appendix C.

    [S] Since specifications in general need both a definition for their characters and the semantics associated with these characters, specifications SHOULD include a reference to the Unicode Standard, whether or not they include a reference to ISO/IEC 10646. By providing a reference to The Unicode Standard implementers can benefit from the wealth of information provided in the standard and on the Unicode Consortium Web site.'

    That is a bit weak. Say explicitly that a reference to 10646 without a reference to Unicode implies no character semantics, no bidi processing no character case information etc etc. Also, since one is a strict superset of the other, provide a rationale why a specification should ever provide a reference to 10646 since a reference to Unicode exactly covers the same CCS?

  • Decision: Rejected.

  • Rationale: The current language is the result of careful deliberation and compromise. The situation is not as simple as you describe it. ISO 10646 and Unicode are as good as the other at giving the "LATIN SMALL LETTER A" the semantics of 'latin small letter a'. Also, ISO 10646 actually contains a normative reference to Unicode's bidi algorithm, and some other stuff in Unicode.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-02-07) -- Satisfied.

C129ERNSteven Pemberton
HTML WG
3.6.3'private agreements don't scale on the web'
  • Comment (received 2002-07-03) -- 'private agreements don't scale on the web'

    Well, they can scale. The real problem is that they don't interoperate.

  • Decision: Rejected.

  • Rationale: We believe that private agreements indeed do not scale on the Web. The text already contains the explanation why this is so: "Code points from different private agreements may collide. Also a private agreement, and therefore the meaning of the code points, can quickly become lost." (slight editorial changes from the LC version) The collision problem already exists for two private agreements, and very quickly increases with the number of agreements.

C130EANSteven Pemberton
HTML WG
4.3.1Readability of tables
  • See also the following comments: C017

  • Comment (received 2002-07-03) -- Readability of tables

    The second table in this section is much more readable than the first, by using '-' instead of 'N', since '-' is visually different from 'Y'. Recommend using same for first table.

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

C131EANSteven Pemberton
HTML WG
VariousSpelling
  • Comment (received 2002-07-03) -- Spelling

    Incorrect spellings: occuring, reponsibilities English spellings (US is mandated for W3C specs): recognise, standardisation, normalised, organisation, behaviour

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

C132EPNSteven Pemberton
HTML WG
3.3Give example of transcoding
  • Comment (received 2002-07-03) -- Give example of transcoding

  • Comment (received 2002-07-05) -- Re: Give example of transcoding

    It would be useful if 3.3 gave an example of where transcoding is used, since this is a frequently misunderstood point with regards to XML and HTML. People (and some UAs) think that the encoding also specifies the repertoire/CCS.

    Something along the lines of:

    'For example, in XML and HTML, documents are always in Unicode, but they may be delivered to a user agent in an encoding for another coded character set (indicated by the encoding attribute in XML, and the HTTP content-type header in HTML). The user agent then transcodes the characters of the incoming document stream into Unicode code points. For example, a document delivered with encoding iso-8859-2 may contain the string 'ő&#x0151;' where the first character (LATIN SMALL LETTER O WITH DOUBLE ACUTE) is at code point 0xf5 in iso-8859-2. This will be transcoded so that there will be two identical characters at (Unicode) code point 0x0151 in the document as processed by the user agent.'

  • Decision: Partially accepted. Add an example of transcoding.

  • Rationale: Steven's example is too HTML-specific, and doesn't match with what we say, namely that transcoders don't resolve NCRs.

  • Comment (received 2003-05-01) -- Re: Your comments on the Character Model [C130, C131]

  • Our response (sent 2003-05-08) -- RE: Your comments on the Character Model [C130, C131]

  • Comment (received 2003-05-09) -- RE: Your comments on the Character Model [C130, C131]

  • Our response (sent 2004-02-03) -- RE: Your comments on the Character Model [C130, C131]

C133EASYin Leng Husband
WSArch WG
3.6.2'Specifications MUST NOT propose the use of heuristics to determine the encoding of data'
  • See also the following comments: C158 C169

  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character encoding identification, 9th paragraph, last sentence

    '[S] Specifications MUST NOT propose the use of heuristics to determine the encoding of data.'

    It would be helpful to either give examples of the undesirable 'heuristics' or the reasons for banning 'use of heuristics'.

  • Decision: Accepted

  • We have added explanatory text as follows: "Examples of heuristics include the use of statistical analysis of byte (pattern) frequencies or character (pattern) frequencies. Heuristics are bad because they will not work consistently across different implementations. Well-defined instructions of how to unambiguously determine a character encoding, such as those given in XML 1.0 [XML 1.0], Appendix F, are not considered heuristics."

  • Comment (received 2004-02-07) -- Satisfied.

C134ENaSYin Leng Husband
WSArch WG
3.6.2'Specifications MUST NOT propose the use of heuristics to determine the encoding of data'
  • Comment (received 2002-05-31) -- WSArch WG review of Charmod LC #2

    Character encoding identification, 9th paragraph, last sentence

    '[S] Specifications MUST NOT propose the use of heuristics to determine the encoding of data.'

    Would the absence of a BOM in UTF-8 encoding be considered use of heuristics for identifying encoding?

  • Decision: Not applicable

  • We have classified this comment as 'not applicable', because it is a question.

    Note: It depends on the context, eg in XML, if both a BOM and an encoding declaration are absent, the entity must be encoded using UTF-8. We do not consider this to amount to the use of heuristics, as the correct behavior is fully specified and deterministic. In other contexts, the detection of the absence of a BOM might be used as part of general 'sniffing', which we would say amounts to the use of heuristics. But we do not know of such a case, and in general, UTF-8 should work even without a BOM.

  • Comment (received 2004-02-07) -- Satisfied.

C135SANMark Scardina
XSL WG
2XSL WG Comments on Character Model WD
  • See also the following comments: C038 C051 C088 C089

  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] Every W3C specification MUST conform to the requirements applicable to specifications, specify that implementations MUST conform to the requirements applicable to software, and specify that content created according to that specification MUST conform to the requirements applicable to content. [S] If an existing W3C specification does not conform to the requirements in this document, then the next version of that specification SHOULD be modified in order to conform.'

    Why is it that every spec MUST but subsequent specs only SHOULD meet this requirement? Is the intent here to permit current non-conforming specs to maintain backwards compatibiltiy in future releases? This is an XSL requirement.

  • Decision: Accepted.

    You point out a clear inconsistency, which we have fixed a while ago. We have later been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language altogether. We have been instructed to request a finding from the TAG corresponding to the text that we removed. We will make sure that, if relevant, the inconsistency you pointed out will not reappear.

  • Our response (sent 2003-02-13) -- Notification

C136EANMark Scardina
XSL WG
3.1.3XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] Protocols, data formats and APIs MUST store, interchange or process text data in logical order.'

    This appears to be higher level of conformance than necessary for interoperability. Why do internals need to be dictated as long as there is exernal conformance?

  • Decision: Partially accepted.

  • Rationale: As elsewhere in the document, the intent is not to proscribe internals, only external behavior. But this may include protocols (e.g. http), data formats (e.g. XML,...), and APIs (e.g. the DOM), and it may include storage (mostly data formats), interchange (mostly protocols), and processing (mostly APIs).

    To make absolutely clear that the Character Model only addresses observable behavior, we have changed the following sentence in the introduction: "Where this specification contains a procedural description, it is to be understood as a way to specify the desired external behavior. Implementations can use other means of achieving the same results, as long as observable behavior is not affected." to "Where this specification places requirements on processing, it is to be understood as a way to specify the desired external behavior. Implementations can use other means of achieving the same results, as long as observable behavior is not affected."

  • Our response (sent 2003-02-13) -- Notification

C137SANMark Scardina
XSL WG
3.1.5XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    'Note that, where searching or sorting is done dynamically, particularly in a multilingual environment, the 'relevant language' should be determined to be that of the current user, and may thus differ from user to user.'

    Suggest the 'should' become 'SHOULD' to bring it to the level of a recommendation.

  • Decision: Accepted.

  • Our response (sent 2002-05-01) -- Notification

C138EANMark Scardina
XSL WG
3.1.7XSL WG Comments on Character Model WD
  • See also the following comments: C004 C040 C166

  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] When specifications use the term 'character ' it MUST be clear which of the possible meanings they intend. [S] Specifications SHOULD avoid the use of the term 'character' if a more specific term is available.'

    In 3.1.7 it is stated that specifications must make it clear 'which of the possible meanings' of the word 'character' is intended. But it's not explicit what the 'possible meanings' are. Where do we read that one of the possible meanings is 'a Unicode code point'?

    There should be examples of this as the spec itself is an offender here. While it could be explicitly stated in line with their own requirement, We read it that the possible meanings were 'Units of aural rendering', 'Units of visual rendering', 'Units of input', 'Units of collation' and 'Units of storage'.

    It is very difficult to conform to the second requirement in 3.1.7, as is illustrated by the fact that the Character Model document itself fails to conform to it: see the immediately following section heading.

  • Decision: Accepted. Add clarification / examples.

  • Our response (sent 2002-05-01) -- Notification

C139SANMark Scardina
XSL WG
3.2XSL WG Comments on Character Model WD
  • See also the following comments: C041

  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    'A CES, together with the CCSes it is used with, is identified by an IANA charset identifier. Given a sequence of bytes representing text and a charset identifier, one can in principle unambiguously recover the sequence of characters of the text.'

    There are other legal identifiers other than IANA. We should not be restricted to these.

  • Decision: Accepted.

  • We have accepted this comment and have changed "... is identified by an IANA charset identifier." to "... is identified by a unique identifier, such as an IANA charset identifier." in Section 3.2.

    We suspect that you might also have some problems with some of the wording in Section 3.6.2, which (among else) says: "[S] If the unique encoding approach is not taken, specifications SHOULD mandate the use of the IANA charset registry names [...]"; if this is the case, please indicate so at as soon as possible, or we will have to assume that this is okay with you.

  • Our response (sent 2003-02-13) -- Notification

C140ENaWMark Scardina
XSL WG
3.5XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] Specifications MAY allow use of any character encoding which can be transcoded to Unicode for its text entities.

    [S] Specifications MAY choose to disallow or deprecate some encodings and to make others mandatory. Independent of the actual encoding, the specified behavior MUST be the same as if the processing happened as follows:

    The encoding of any text entity received by the application implementing the specification MUST be determined and the text entity MUST be interpreted as a sequence of Unicode characters - this MUST be equivalent to transcoding the entity to some Unicode encoding form , adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form.

    All processing MUST take place on this sequence of Unicode characters.

    If text is output by the application, the sequence of Unicode characters MUST be encoded using an encoding chosen among those allowed by the specification.

    [S] If a specification is such that multiple text entities are involved (such as an XML document referring to external parsed entities), it MAY choose to allow these entities to be in different character encodings. In all cases, the Reference Processing Model MUST be applied to all entities.'

    It may be less confusing to have these requirements separated with a clarifying sentence, breaking these out under a clarifying context. Is this intent to forbid entity representation of non-Unicode characters?

  • Our response (sent 2002-07-23) -- Re: XSL WG Comments on Chairacter Model WD

  • Comment (received 2002-07-23) -- Re: XSL WG Comments on Chairacter Model WD

  • Our response (sent 2002-07-24) -- Re: XSL WG Comments on Chairacter Model WD

  • Comment (received 2002-07-25) -- Re: XSL WG Comments on Chairacter Model WD

  • Comment (received 2002-07-30) -- Re: XSL WG Comments on Chairacter Model WD

  • Our response (sent 2002-09-03) -- RE: Please clarify XSL WG comment (issue 146) on Character Model WD

    [...] Please note that we have asked you for clarification on three of your comments [...]

  • Comment (received 2002-09-10) -- Character Model Comments Clarifications

  • Our response (sent 2002-09-24) -- Re: Character Model Comments Clarifications

  • Comment (received 2002-09-24) -- RE: Character Model Comments Clarifications

    [...] As to the 'one paragraph' comment, my apologies as in my 'cut and pasting' of the WD for our discussion, the paragraphs got lost. Thus the resulting comment.

  • Our response (sent 2002-10-07) -- RE: Character Model Comments Clarifications

    Many thanks for the comment above. Unfortunately, this doesn't really help us understanding your original comment. To make progress on this issue, can I suggest that you, or somebody else from the XSL WG, take the original comment (e.g. at http://www.w3.org/International/Group/2002/charmod-lc/#C140), and exchange the sentence 'It may be less confusing to have these requirements separated with a clarifying sentence, breaking these out under a clarifying context.' with something more detailed, explaining which requirements (i.e. some of those cited, all of those cited,...) where to break, what to clarify in particular, and so on.

  • Comment (received 2002-10-07) -- RE: Character Model Comments Clarifications

    Martin, the original comment is no longer relevant once the original text was reviewed based upon your answer. Please close it.

C141EANMark Scardina
XSL WG
3.7XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    'Certain guidelines apply to content developers, as well as to software that generates content: ... [I] [C] Choose an encoding for the document that maximizes the opportunity to directly represent characters and minimizes the need to represent characters by markup means such as character escapes. In general, if the first encoding choice is not satisfactory, Unicode is the next best choice, for its large character repertoire and its wide base of support.'

    The last bullet immediately before the section heading of section 4 seems strange. Grammatically, it is hard to parse, and is in the imperative mood which is not used elsewhere. Semantically, the statement that 'If the first encoding choice is not satisfactory, Unicode is the next best choice' seems very odd. Surely (a) Unicode is always the first choice, and (b) Unicode is not an encoding? Also the term 'satisfactory' is far too vague for a specification. We also question the appropriateness of these 'guidelines' in the spec body. They seem more appropriate for a note or appendix.

  • Decision: Accepted.

  • Our response (sent 2002-05-01) -- Notification

C142NaNaNMark Scardina
XSL WG
3.7XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    We have a concern about the guideline preventing new character escaping syntax.

  • Decision: Not applicable.

    Rationale: We have classified this comment as "not applicable", because the comment is too general to give any idea of what is wrong with the document or what we should do to fix it. We note that we think that"Specifications MUST NOT invent a new escaping mechanism if an appropriate one already exists." leaves enough room for new escaping syntaxes should an appropriate one not yet exist.

  • Our response (sent 2003-02-13) -- Notification

C143ERCMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[C] In order to conform to this specification, all text content on the Web MUST be in include-normalized form and SHOULD be in fully-normalized form.'

    The impacts of this requirement on XSLT and other infoset 'pipeline' type processes are still unclear to us.

    For instance, XSLT and many other specifications are designed around an infoset 'pipeline' so that various processes can transform, augment, or otherwise manipulate content. A final step in a pipeline often involves serialization of the infoset. It appears to us that serialization of an infoset according to the Character Model may result in either significant manipulation of the data within that infoset (resulting in a loss of data fidelity) or failure to serialize. In either case, an upstream process such as an XSLT transformation cannot trust that its output can be successfully processed further on in the pipeline, without adopting normalization rules at the infoset level as well. The practical inability to limit normalization to text content on the Web concerns us. The implications of this are not adequately discussed in the Character Model spec.

    Without a clear idea of the implications of the Character Model upon the tendency to rely on the XML Information Set instead of upon text for composing processes within a system, we cannot agree to the mandate for normalization.

  • Decision: Rejected.

  • Rationale: The Character Model makes no distinction between different representations of text, eg between text represented using the Infoset and text in a file.

C144NaNaOMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form, and MUST NOT normalize the suspect text . Private agreements MAY, however, be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.'

    The exception for private agreements is crippled by the observable results restriction thus when all is said and done any suspect text will always remain.

    Section 4.4 appears to require that XML be changed to disallow the use of a composing character as the first character in an entity. This change would be backwards incompatible. XSL WG specifications such as XSLT and XPath must continue to work with all XML well-formed documents.

    Since the contents of an XML text node are 'suspect text' (there is nothing to prevent use of a composing character as the first character in a text node), section 4.4 appears to be saying that XPath must disallow operations such as substring() unless the text is inspected and found to be normalized. We do not believe that users want to pay the high cost of this feature.

  • This comment has been split into the following comments: C187 C188 C189

C145NaRCMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[I] A text-processing component which modifies text and performs normalization-sensitive operations MUST behave as if normalization took place after each modification, so that any subsequent normalization-sensitive operations always behave as if they were dealing with normalized text.'

    The fourth requirement in section 4.4 is labelled [I], but XPath implementations have to do what the XPath specification says, so this is actually an [S] requirement. The implication of this requirement is that functions such as concat() should perform normalization. This is both expensive and backwards-incompatible, we will have to examine whether it is something where the benefits exceed the costs. This also seems to violate the self-imposed limitation to only require conformance to observable behaviors. How XPaths are handled within an XSLT Processor should not be the subject of this spec as long as the results are conformant.

  • Decision: Rejected.

  • Rationale: [See forthcoming mail from MD]

C146ERCMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] Specifications of text-based languages and protocols SHOULD define precisely the construct boundaries necessary to obtain a complete definition of full-normalization . These definitions MUST include at least the boundaries between markup and character data as well as entity boundaries (if the language has any include mechanism) and SHOULD include any other boundary that may create denormalization when instances of the language are processed.'

    The requirement (still in 4.4) about defining construct boundaries is very unclear when applied to a language that performs dynamic manipulation of strings.

  • Our response (sent 2002-08-26) -- Please clarify XSL WG comment (issue 146) on Character Model WD

  • Our response (sent 2002-09-03) -- RE: Please clarify XSL WG comment (issue 146) on Character Model WD

    [...] Please note that we have asked you for clarification on three of your comments [...]

  • Comment (received 2002-09-10) -- Character Model Comments Clarifications

  • Our response (sent 2002-09-24) -- Re: Character Model Comments Clarifications

  • Comment (received 2002-09-24) -- RE: Character Model Comments Clarifications

  • Our response (sent 2002-09-24) -- Re: Character Model Comments Clarifications

  • Decision: Rejected.

  • Rationale: The requirement that a language has to be clear about the boundaries of its syntactic constructs was designed in particular so that simple applications of XSLT (where text nodes,... are treated as units and not modified, but potentially concatenated) can produce normalized output from normalized input easily. You are right that this conformance criterion doesn't deal with dynamic operations. This is dealt with later in the spec (same subsection).

    Our response (sent 2002-09-24) --

C147EA Mark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] Specifications MUST document any security issues related to normalization.'

    The requirement 'Specifications MUST document any security issues related to normalization.' is untestable on its face and should be detailed.

  • Decision: Accepted.

C148EANMark Scardina
XSL WG
8XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] [I] Forms of string matching other than identity matching SHOULD be performed as if the following steps were followed: Steps 1 to 3 for string identity matching. Matching the strings in a way that is appropriate to the application.'

    It is unclear whether the procedure for string identity matching in section 6 establishes a requirement for expansion of %HH escapes in URIs, especially when comparing namespace URIs, where such expansion has not traditionally been performed. Section 8 should give guidance on this.

  • Decision: Accepted.

    Note that full guidance is given in the newest version of the IRI spec, which is referenced by Section 8. This explicitly does not require to 'expand' %HH escapes, and therefore does not conflict with curret implementation practice for namespace URIs.

  • Our response (sent 2003-02-13) -- Notification

C150ERDC. M. Sperberg-McQueen
-
VariousThe term 'UCS' vs. the term 'Unicode'
  • Comment (received 2002-07-12) -- The term 'UCS' vs. the term 'Unicode'

    Sec. 1.1 says, inter alia, 'In this document, Unicode is used as a synonym for the Universal Character Set.' I believe the term 'UCS' would be better, because it is clearer and less subject to misconstruction.

    It is clearer because the term 'Unicode' may reasonably be used to denote (a) the consortium of that name, (b) the Univeral Character Set defined by ISO/IEC 10646 and by the Unicode Standard, (c) the UCS taken together with the additional rules defined by the Unicode Standard, which Unicode does NOT share with ISO/IEC 10646, and (d) the Unicode Standard itself. Despite the explicit statement that in the character model spec the term 'Unicode' is used in sense (b), I suspect the common use, elsewhere, of the term in senses (a), (d), and especially (c), will necessarily color readers' perceptions of the meaning of the text.

    The term 'UCS' is also less likely to convey to casual readers that it is really the Unicode Standard, not ISO/IEC 10646, which counts. It is true, as you have pointed out from time to time, that the Unicode Consortium and the responsible ISO/IEC technical committee have worked well for some time now in keeping the two standards aligned. I applaud that fact and the role some of you have individually played in making it happen. But I remember too the years in which the two organizations threatened to burden the world with two different and incompatible universal character sets, and the roles some of you played then, and I am unwilling that any W3C specification should risk conveying the idea that if the two standards should diverge, the Web or the W3C would naturally side with one or the other party.

    It would not be appropriate to use the term 'ISO/IEC 10646' (or just '10646' for short) to refer to the UCS. It is also not appropriate to use the term 'Unicode'.

    Please reconsider and use the neutral and unambiguous term 'UCS'.

  • Decision: Rejected.

  • Rationale for 'Rejected': The word 'Unicode' is almost universally used in this sense, including by Production 2 of the XML specification.

  • Decision: Review all instances of the word 'Unicode', to ensure they are used consistently.

  • Note: There are two problems with the use of the word 'Unicode' in the Note in section 3.3:

    a) we mean the Unicode Standard,

    b) Unicode is not an encoding.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Dissatisfied

C151EPDC. M. Sperberg-McQueen
-
A.2ANSI X3.4 is missing
  • Comment (received 2002-07-12) -- ANSI X3.4 is missing

    The spec refers several times to ASCII. In the context of a specification defining a character model, I assume that this term is used in its proper and narrow sense to denote the coded character set defined by American national standard ANSI X3.4. That American national standard should be included among the non-normative references.

  • Decision: Partially accepted.

  • Decision: Cite ISO 646 (International Reference Version), rather than ANSI X3.4, and link to it from the text.

  • Rationale for 'Partially accepted': Where a national and an international standard define the same matter, use of the latter is preferable.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Dissatisfied

C152SASC. M. Sperberg-McQueen
-
3.1.5Spanish 'ch' is not a letter sequence
  • Comment (received 2002-07-12) -- Spanish 'ch' is not a letter sequence

    Section 3.1.5 says 'EXAMPLE: In traditional Spanish sorting, the letter sequences 'ch' and 'll' are treated as atomic collation units. Although Spanish sorting, and to some extent Spanish everyday use, treat 'ch' as a single unit, current digital encodings treat it as two letters, and keyboards do the same (the user types 'c', then 'h').'

    This is not what I learned in grade school. Sra. Robles was quite clear, and rather strict about it (and so of course I am sure that she is right and your informants must be wrong).

    I believe the paragraph would be more accurate and clearer if it read 'EXAMPLE: In traditional Spanish sorting, the character sequences 'ch' and 'll' are treated as single letters and as atomic collation units. Although Spanish sorting, and to some extent Spanish everyday use, treat 'ch' as a single unit, current digital encodings treat it as two characters, and keyboards do the same (the user types 'c', then 'h').'

    I don't know of any digital encoding whose specification provides any definition of 'letter', and thus I find it surprising and confusing to read that most such encodings treat 'ch' as two letters: I don't believe that any character set specifications or encodings can meaningfully be said to treat ANYTHING as ANY number of 'letters', since 'letter' is a concept foreign to their universe of discourse.

  • Decision: Accepted.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Satisfied

C153EASC. M. Sperberg-McQueen
-
3.1.5Counting languages
  • Comment (received 2002-07-12) -- Counting languages

    The paragraph which reads 'EXAMPLE: In most languages, the letter 'æ' is sorted as two consecutive collation units: 'a' and 'e'' [I wonder if that aesc will come through this HTML form ...] might be improved if the 'most languages' were changed to 'some languages'. In Old English, Old Norse, Norwegian, Danish, and Swedish, I believe that aesc is treated as a single collation unit; I don't know of any languages in which aesc occurs in native words which sorts it in the way you describe. Are you counting all the other languages in Western Europe as languages in which aesc is sorted as 'ae'? (Note that it does not matter whether the languages which sort aesc as 'ae' outnumber the others or not: the point to be made is that they exist. The term 'most' brings in an element of quantitative comparison which is distracting -- do Flemish and Dutch count as one language, or two, in this tally? -- and unnecessary. Hence my suggestion to eliminate 'most'.)

  • Decision: Accepted.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Satisfied

C154TASC. M. Sperberg-McQueen
-
3.1.5For 'o' read '&ouml;'
  • See also the following comments: C020 C025

  • Comment (received 2002-07-12) -- For 'o' read '&ouml;'

    In 'In German certain applications treat the letter 'o' as if it were the sequence 'oe'' the quoted letter should be an o with an umlaut, surely? (Or else what applications are you thinking of?)

  • Decision: Accepted.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Satisfied

C155SANC. M. Sperberg-McQueen
-
3.1.5User control of collation, foreign matter
  • Comment (received 2002-07-12) -- User control of collation, foreign matter

    Thank you for specifying that '[S] [I] Software that allows users to sort or search text SHOULD allow the user to select alternative rules for collation units and ordering.' I am glad to see that you are not requiring that software use the collation rules of the user's language whether the user wants it to or not.

    The requirement '[S] [I] When sorting and searching in the context of a particular language, it MUST be possible to deal gracefully with strings being compared that contain Unicode characters not normally associated with that language' appears to be unenforceably vague: the word 'gracefully' seems impossible to define sharply enough to allow objective determinations of whether a given spec or implementation conforms with this requirement or not. I would be loath to lose the word 'gracefully' from the text, but I don't believe it belongs in a conformance requirement.

  • Decision: Accepted.

  • We have replaced "[S] [I] When sorting and searching in the context of a particular language, it MUST be possible to deal gracefully with strings being compared that contain Unicode characters not normally associated with that language." with "[S] [I] Specifications and implementations of sorting and searching algorithms SHOULD accommodate all characters in the Unicode set." The change from 'MUST' to 'SHOULD' is not due to this comment, but due to other comments.

  • Our response (sent 2003-02-13) -- Notification

C156EASC. M. Sperberg-McQueen
-
3.5Deriving specs from specs, building specs on specs
  • Comment (received 2002-07-12) -- Deriving specs from specs, building specs on specs

    In the sentence 'NOTE: All specifications that derive from the XML 1.0 specification [XML 1.0] automatically inherit this Reference Processing Model', I think the phrase 'derive from' is not quite right. One would not say that the XHTML spec 'derives from' the XML spec: it references it, uses it, builds a language on it, builds on it, cites it normatively (does it?), but it is specs like XML 1.1 which 'derive from' (stand in the relation of genealogical descent to) the XML 1.0 spec. Perhaps an acceptable wording would be 'all specifications which define applications of the XML 1.0 specification ...'?

  • Decision: Accepted.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Satisfied

C157SASC. M. Sperberg-McQueen
-
3.6Always reliable identification is a chimaera
  • See also the following comments: C168

  • Comment (received 2002-07-12) -- Always reliable identification is a chimaera

    The requirement '[S] Specifications MUST either specify a unique encoding, or provide character encoding identification mechanisms such that the encoding of text can always be reliably identified' is, I think, too strong. I do not believe that any identification mechanism can ALWAYS guarantee the correct identification of an encoding; if I am right, this requirement guarantees that no specification ever written has ever conformed, and no specification will ever conform, to the character model specification. Malicious users, incompetent users, ignorance or indifference on the part of those responsible for servers, and transcoders which understandably do not touch the internal labels on the data they transcode, can combine to defeat any labeling or encoding-identification scheme ever devised. Even the W3C server has been known, from time to time, to serve documents with the wrong character-encoding identification.

    Please weaken this requirement so that it is achievable, or else XML 1.1 and every other spec now under development by the W3C will be blocked by this unrealistic counsel of perfection. The identification mechanisms of XML 1.0 are pretty good, if I say so myself. But they do not come close to meeting the requirement stated here. I think you've set the bar too high.

  • Our response (sent 2002-07-11) -- Re: Always reliable identification is a chimaera

  • Decision: Accepted.

  • Decision: Remove the word 'always'.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Satisfied

C158EANC. M. Sperberg-McQueen
-
3.6.2Heuristics
  • See also the following comments: C133 C169

  • Comment (received 2002-07-12) -- Heuristics

    The spec says '[S] Specifications MUST NOT propose the use of heuristics to determine the encoding of data.' Is it your intent to outlaw the heuristics given by the XML 1.0 spec? If not, I believe the wording of this requirement is incorrect, and should be changed to something specifying that heuristics should be used only in the absence of usable labels. If it is your intent to disallow the heuristics defined in XML 1.0, I disagree, and believe the MUST NOT should be changed to something weaker, preferably a generic health warning.

  • Our response (sent 2002-07-11) -- Re: Heuristics

  • Comment (received 2002-07-13) -- Re: Heuristics

  • Decision: Accepted.

  • We have added explanatory text as follows: "Examples of heuristics include the use of statistical analysis of byte (pattern) frequencies or character (pattern) frequencies. Heuristics are bad because they will not work consistently across different implementations. Well-defined instructions of how to unambiguously determine a character encoding, such as those given in XML 1.0 [XML 1.0], Appendix F, are not considered heuristics."

  • Our response (sent 2003-02-13) -- Notification

C159NaNaOC. M. Sperberg-McQueen
-
3.7fixed-length escapes
  • Comment (received 2002-07-12) -- fixed-length escapes

    In contemplating the rule '[S] Escape syntax SHOULD either require explicit end delimiters or mandate a fixed number of characters in each character escape' I am uncertain whether you intend to outlaw the kinds of escapes defined by section 6.3 of ISO 2022 or not. ISO 2022 defines some fixed-length and some variable-length escape sequences, in which certain classes of characters are defined as final characters. These final characters might be viewed as explicit end delimiters, but they are not solely delimiters. They are part of the escape sequence and cannot be disregarded in establishing the meaning of the escape sequence.

    I don't think I have a strong preference for making escape sequences of this kind legal or illegal here, but I think it probably needs to be clearer whether they are legal or not.

    In the same rule, 'Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided' is a good provision, but at first glance it seemed to be saying that the terminating semicolon of entity and character references (which is 'a character outside the set of characters admissible in the character escape itself') was being deprecated. I think rephrasing might help, though I have not been able to draft a better alternative.

  • Our response (sent 2002-07-12) -- Re: fixed-length escapes

  • Comment (received 2002-07-13) -- Re: fixed-length escapes

  • This comment has been split into the following comments: C180 C181

C160EASC. M. Sperberg-McQueen
-
4.1.2For 'insure' read 'ensure'
  • Comment (received 2002-07-12) -- For 'insure' read 'ensure'

    In the phrase 'it is insured that normalization is uniform' I believe 'insured' should be spelled 'ensured'. (Usage among native speakers of English is not completely uniform, but some prefer to reserve the verb 'insure' for financial transactions involving payments of fees in exchange for financial protection against risks, and to use 'ensure' when the meaning is 'to make certain' or 'to entail'.

  • Decision: Accepted.

  • Our response (sent 2003-02-12) -- Notification

  • Comment (received 2003-02-12) -- Satisfied

C161EANChris Haynes
-
3.7Recommend Unicode for character escapes?
  • Comment (received 2002-07-12) -- Recommend Unicode for character escapes?

    Section 3.7 - Character escapes - contains the paragraph:

    [S] Whenever specifications define character escapes that allow the representation of characters using a number the number SHOULD be in hexadecimal notation.

    I rather expected it to continue:

    ... and SHOULD represent the Unicode code point of the character.

    I wondered if the Reference Preocessing Model (3.5) applied an implied mandate to use Unicode, but it seems to permit other encodings to be used, and therefore does not seem to supply the 'default' for 3.7.

    Have I missed something?

  • Our response (sent 2002-07-13) -- Re: Recommend Unicode for character escapes?

    I think this is a typical example of where it was just absolutely obvious to us, but where it makes a lot of sense to say things explicitly. I would actually prefer to change the 'should' to a 'must':

    ... and MUST represent the Unicode code point of the character.

  • Decision: Accepted.

  • Our response (sent 2003-02-17) -- Notification

C162NaNaNKarl Dubost
QA WG
2Conformance
  • Comment (received 2002-06-18) -- QA Review for Charmod

    [...] I would like to know how you plan to enforce the use of charmod in other specifications by process, pubrules, charters? We are faced to the same question in QA WG.

  • Decision: Not applicable.

  • Rationale: We have classified this comment as 'not applicable' because it does not make any suggestions re. changes of the specification. We have been told that it is inappropriate for a W3C spec to directly enforce requirements on other specifications, and have removed the relevant language from section 2. We still define conformance to CharMod. We have been instructed to request a finding from the TAG corresponding to the text that we removed. So CharMod will be enforced by the fact of being a REC, coupled with an eventual TAG finding and ongoing reviews of relevant specs by the I18N WG. As we understand, many of the requirements on other specs that the QA WG is looking at are much more procedural in nature, whereas the requirements in CharMod are more technical. Therefore, different considerations may apply to your work.

C163NaNaNKarl Dubost
QA WG
2Testable Assertions/Requirements
  • Comment (received 2002-06-18) -- QA Review for Charmod

    I found interesting the way you have declared the rules of Conformance for your specifications. I would like to know if there's a plan for a Test Suite or at least Examples and Techniques to demonstrate your technologies.

    For example in the first statement (Testable assertion?), I had difficulty to define a binary test case, is it possible to have testable examples for each rule in a separate document. It will help people understand the statement you have defined.

  • Our response (sent 2002-06-20) -- Re: QA Review for Charmod

    Binary tests are very difficult in many case, or have to be worked out individually for each spec (e.g. XML, CSS,...).

  • Decision: Not applicable.

  • Rationale: We have classified this as 'Not applicable', because you are just as kind about our plans, not suggesting changes to the document. The Character Model is an architectural specification, and it is therefore difficult if not impossible to create binary tests. If we had an automatic test to see whether another specification conforms to the character model, that would indeed be great, but it is obvious that this is impossible.

    In some cases, tests can be worked out for individual specifications that conform to the character model (e.g. XML, CSS,...), but those would be part of the test suite for that spec. For some aspects of the character model, or some material we reference, there are already tests, e.g. for NFC. Regarding examples and techniques, the text already contains many examples where we found they are necessary to clarify the specification, and we have added more examples as a result of last call comments. We also expect that for passing CR, we will have to provide a list of other specifications that follow the various provisions in Charmod, and such a list will provide a wealth of examples.

C164ERSKarl Dubost
QA WG
3.1.2QA Review for Charmod
  • Comment (received 2002-06-18) -- QA Review for Charmod

    '[S] [I] Specifications and software MUST NOT assume that there is a one-to-one correspondence between characters and the sounds of a language.'

    Here I clearly understand that there's not *necessary* correspondance but it could be sometimes. So it means for me that in the set of all possible values, some will be false, and so invalidate the one to one relationship.

    I don't know if it's better, but you will tell me:

    '[S] [I] Specifications and software MUST allow the correspondence between one phoneme and mutliple characters when necessary.'

    Because for me it seems more testable and comprehensive for a developper or a specification editor.

    As a general rule, even if it seems valid, avoid the use of MUST NOT when it's possible to define a MUST.

  • Our response (sent 2002-06-20) -- Re: QA Review for Charmod

    This would be wrong because there are also multiple phonemes - one character and multiple phonemes - multiple characters. Excluding the one-to-one case looked easier.

  • Decision: Rejected

  • Rationale for 'Rejected': See our earlier response above.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-05) -- Satisfied

C165EANKarl Dubost
QA WG
3.1.3QA Review for Charmod
  • Comment (received 2002-06-18) -- QA Review for Charmod

    '[S] Protocols, data formats and APIs MUST store, interchange or process text data in logical order.'

    Why only S (Specifications) ? Do you mean:

    '[S] (Specifications of?) Protocols, data formats and APIs MUST store, interchange or process text data in logical order.'

    or

    '[I] Protocols, data formats and APIs MUST store, interchange or process text data in logical order.'

  • Our response (sent 2002-06-20) -- Re: QA Review for Charmod

    I think it can be [S/I/C]. Specifications don't store or interchange data, but they specify how it's done.

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

C166EPSKarl Dubost
QA WG
3.1.7QA Review for Charmod
  • See also the following comments: C004 C040 C138

  • Comment (received 2002-06-18) -- QA Review for Charmod

    '[S] When specifications use the term 'character' it MUST be clear which of the possible meanings they intend.'

    Ambiguous definition, can you clarify? What's the meaning of 'be clear' in this context. The answer will depend on the people.

    Do you mean?

    '[S] When specifications use the term 'character', the specifications MUST define the possible meanings they intend.'

  • Our response (sent 2002-06-20) -- Re: QA Review for Charmod

    Yes. But 'possible meanings' -> 'meaning'.

  • Decision: Partially accepted.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-05) -- Satisfied

C167ERSKarl Dubost
QA WG
VariousQA Review for Charmod
  • Comment (received 2002-06-18) -- QA Review for Charmod

    When you have multiple rules, please make it a list to be more readable.

  • Comment (received 2002-09-23) -- FW: Query about CharMod last call comment

  • Decision: Rejected.

    Rationale: The current layout makes the prose easier to read. Note that we have marked the start and end of each statement.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-05) -- Satisfied

C168SANC. M. Sperberg-McQueen
XML Schema WG
3.6Reliability of character encoding identification
  • See also the following comments: C157

  • Comment (received 2002-07-12) -- Reliability of character encoding identification

    Section 3.6 specifies that '[S] Specifications MUST either specify a unique encoding, or provide character encoding identification mechanisms such that the encoding of text can always be reliably identified.'

    The XML Schema WG believes that this requirement, as formulated, is not met by any existing specifications and is unlikely ever to be met by any. Document producers, software implementors, and server administrators, working alone or in concert, have innumerable opportunities to render character-set labels false out of malice, ignorance, or indifference; if character-set labels are false, the encoding of the text can only rarely be reliably identified.

    The word 'always' seems to suggest that encoding identification mechanisms must function even in the case of hostile users or misconfigured servers; that's not possible. Either the i18n WG should lower its expectations or it should express its expectations more clearly.

    We believe a more correct standard would be to require that specifications provide mechanisms to ensure that it is POSSIBLE to get things right, or to ensure that with correct operation / under normal circumstances character encodings are reliably and correctly identified.

    N.B. This comment is substantially similar to C157 and to comment 3.13 of our comments on the previous last-call draft.

  • Decision: Accepted.

  • We have removed the word 'always'. The intent is to require specifications to make it possible to reliably identify character encodings.

  • Our response (sent 2003-02-13) -- Notification

C169EANC. M. Sperberg-McQueen
XML Schema WG
3.6.2Heuristics considered useful
  • See also the following comments: C133 C158

  • Comment (received 2002-07-12) -- Heuristics considered useful

    The rule '[S] Specifications MUST NOT propose the use of heuristics to determine the encoding of data' appears to mean that the XML 1.0 Recommendation does not conform to the character model spec, since in its Appendix F it proposes the use of heuristics for recognizing the character encoding being used well enough to bootstrap and read the XML declaration and any encoding declaration included within it. If this is the intent, we believe this rule should be scaled back to something more like what was in the first last-call spec: Specifications MUST NOT require or encourage the use of unreliable heuristics.' If this is not the intent, we believe the rule needs to be rewritten to be clearer.

    Either way, it would be useful to define 'heuristics'; without a definition, it's hard to know exactly what constraint is intended to be expressed by this rule.

    N.B. this comment is similar but not identical to C158.

  • Decision: Accepted.

  • We have accepted this comment. We have added explanatory text as follows: "Examples of heuristics include the use of statistical analysis of byte (pattern) frequencies or character (pattern) frequencies. Heuristics are bad because they will not work consistently across different implementations. Well-defined instructions of how to unambiguously determine a character encoding, such as those given in XML 1.0 [XML 1.0], Appendix F, are not considered heuristics."

  • Our response (sent 2003-02-13) -- Notification

C170SPNC. M. Sperberg-McQueen
XML Schema WG
8Converting to RFC-2396-style URIs
  • See also the following comments: C031 C059

  • Comment (received 2002-07-12) -- Converting to RFC-2396-style URIs

    We note with some alarm that section 8 of the specification no longer contains an account of any algorithm for converting internationalized resource identifiers into uniform resource identifiers as defined by RFC 2396 and widely implemented. We believe that the algorithm which was presented in the first last-call draft should be restored, in order to allow other W3C specifications to refer to it as needed. (Note that the XML Schema 1.0 Recommendation does refer readers to section 8 of this document for information about conversion problems -- information which this version of the document no longer provides.) We do not believe the reference to section 2.2.5 of RFC 2718 serves as an adequate substitute: RFC 2718 is informational, not normative, and its account of the algorithm presupposes more familiarity with the family of URI specifications than it is reasonable to assume, even of writers of W3C specifications.

  • Decision: Partially accepted.

    Rationale: Our plan is that the IRI Internet-Draft, referenced in this section, will have been submitted for Proposed Standard by the time CharMod moves to the next stage (CR). Conversion from IRIs to URIs is fully addressed in the IRI spec, and is needed there, and should therefore not be duplicated in charmod.

    The reference to RFC 2718 is informative only. To make this clearer, we have moved it out of the actual conformance criterion (C060) into a separate sentence reading "This is in accordance with Guidelines for new URL Schemes [rfc2718] Section 2.2.5.". In any way, that part of that section speaks about new schemas and things such as XPointer, not about 'IRI slots' such as anyURI.

  • Our response (sent 2003-02-13) -- Notification

C171SACC. M. Sperberg-McQueen
XML Schema WG
4Early uniform normalization
  • Comment (received 2002-07-12) -- Early uniform normalization

    We note that this version of the character model specification does a much better job of motivating the choice of early uniform normalization than did the previous version, and we congratulate the i18n WG on this important improvement to the coherence and plausibility of the character model.

    That said, we continue to have reservations about the Draconian rules enunciated in connection with early uniform normalization. We note that a number of working groups have expressed strong objections, and we urge the i18n WG to continue working toward consensus, and not to attempt to move forward until there is something much more like consensus within the community than we believe there is at present.

  • Decision: Accepted.

C172EASC. M. Sperberg-McQueen
-
9The spelling of ISO's name
  • Comment (received 2002-07-12) -- The spelling of ISO's name

    For 'International Organisation for Standardisation' read 'International Organization for Standardization' (or so says their Web site, http://www.iso.ch, and their documents).

  • Decision: Accepted.

  • Our response (sent 2003-05-01) -- Notification

  • Comment (received 2003-05-02) -- Satisfied

C173NaNaWRick Jelliffe
-
3.7fixed-length escapes
  • Comment (received 2002-07-13) -- Re: fixed-length escapes

    [...] To 'escape' a character means to allow it to be used with a different significance. So in C '\' is the delimiter character, and '\\' is the escaped delimiter.

    In XML, the only escaping is provided by CDATA marked sections (and perhaps by comments and PIs). In SGML, you could optionally have a 'markup suppression' character that acted as an escape too.

    An entity reference is not an 'escape'. To keep on calling it an 'escape' loses a valuable distinction, and can only promote confusion, because it lumps together references and real escapes.

  • Withdrawn.

C174ERSChris Lilley
TAG
3.1.3Units of visual rendering
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    Having defined the terms 'logical selection mode' and 'visual selection mode', please use them rather than the highly ambiguous 'discontiguous selections' and 'contiguous selections' [...]

  • Decision: Rejected.

  • Rationale for 'Rejected': We have rejected this comment. Distinct concepts require distinct terms. Logical selection can lead to both contiguous and discontiguous selections. Visual selection also can lead to both contiguous and discontiguous selections if you select from the end of a line to the start of the next. Logical/visual selection indicates the principle by which the program is working; contiguous/discontiguous selection indicates the visible results of this inner working.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-02-07) -- Satisfied

C175ERDChris Lilley
TAG
3.1.3Units of visual rendering
C176SRDChris Lilley
TAG
3.1.3Units of visual rendering
C177ERSChris Lilley
TAG
3.1.3Units of visual rendering
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    Similarly in the next part, I suggest rewording to remove the ambiguous phrase:

    [S] Specifications of protocols and APIs that involve selection of ranges SHOULD provide for text selection in logical selection mode, at least to the extent necessary to support implementation of visual selection on screen on top of those protocols and APIs.

  • Decision: Rejected.

  • Rationale: The original paragraph is about visual selection, not logical. Visual selection requires discontiguous logical ranges and the requirement is for protocols and APIs to provide the latter.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-02-07) -- Satisfied.

C178SRDChris Lilley
TAG
3.1.3Units of visual rendering
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    Its not clear that this is such a strong requirement and it complicates processing, especially on handheld devices. Perhaps weaken to MAY? And say what happens when this funky visual selection getc copied and pasted - do you get a set of separate logical selections (if so how delimited)? A single visually ordered selection (yuk)? Something else?

    Otherwise, the weaker requirement for contiguous visual selection is likely to merely encourage the use of visual storage or the disposal of logical storage once the visual result has been generated. Which would lead to text copied from visualy contiguous (logically discontiguous) selections being stored in visual order. Which is to be avoided.

  • Decision: Rejected.

  • Rationale: First, visual storage and visual selection are independent of each other. We think it's important that protocols and APIs SHOULD support discontiguous logical ranges so that implementations MAY implement visual selection if they wish. This is in particular relevant for technologies such as XPointer. We do not think that this will lead to the use of visual ordering inside the selection. In situations such as cut/paste without special support, the visual selection is usually copied as as sequence of segments, all internally in logical order. The sequence of segments and other things may be implementation-dependent, and in advanced applications, the overall result may depend on where the insertion is made.

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-02-07) -- Dissatisfied: "The TAG resolved to pushback on the grounds of misunderstanding. I18N WG seem to have missed the point we were making, or we were not clear enough.

    We were concerned about implementations *only* providing visual presentation and visual ordering, while the I18N WG seemed to take the comments as requiring only logical selection. This is not the case.

    It seems that we all agree that implementations should provide logical selection; and charmod asks for visual selection as well, assuming a logical backing store.

C179EASChris Lilley
TAG
3.1.3Units of visual rendering
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    It would be a good idea to tie into WAI concerns by noting that accessibility tools, which access the DOM, should be able to get at logically ordered text and to know which parts are selected.

  • Decision: Accepted.

  • We have mentioned accessibility (and interoperability in general) just before "[S] Protocols, data formats and APIs MUST store, interchange or process text data in logical order."

  • Our response (sent 2004-01-16) -- Notification

  • Comment (received 2004-02-07) -- Satisfied.

C180ERNC. M. Sperberg-McQueen
-
3.7fixed-length escapes
  • Comment (received 2002-07-12) -- fixed-length escapes

    In contemplating the rule '[S] Escape syntax SHOULD either require explicit end delimiters or mandate a fixed number of characters in each character escape' I am uncertain whether you intend to outlaw the kinds of escapes defined by section 6.3 of ISO 2022 or not. ISO 2022 defines some fixed-length and some variable-length escape sequences, in which certain classes of characters are defined as final characters. These final characters might be viewed as explicit end delimiters, but they are not solely delimiters. They are part of the escape sequence and cannot be disregarded in establishing the meaning of the escape sequence.

    I don't think I have a strong preference for making escape sequences of this kind legal or illegal here, but I think it probably needs to be clearer whether they are legal or not.

  • Our response (sent 2002-07-12) -- Re: fixed-length escapes

  • Comment (received 2002-07-13) -- Re: fixed-length escapes

  • Decision: Rejected.

  • Rationale: We do not think it is necessary to explicitly exclude this kind of escape sequences, because we do not think that anybody would actually want to use anything like this. There is an amazingly wide variety of escape sequence syntaxes, but we have never seen anything that even get close. While completely distinguishing good and bad escape syntaxes has some appeal, we want to keep a certain practical touch to our document and want to keep it readable, and want to give the reader enough breathing room that they can actually think about the issues at hand (because they need to; the Character Model cannot just be applied mechanically).

  • Our response (sent 2003-02-13) -- Notification

C181EANC. M. Sperberg-McQueen
-
3.7fixed-length escapes
  • See also the following comments: C106

  • Comment (received 2002-07-12) -- fixed-length escapes

    [...] 'Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided' is a good provision, but at first glance it seemed to be saying that the terminating semicolon of entity and character references (which is 'a character outside the set of characters admissible in the character escape itself') was being deprecated. I think rephrasing might help, though I have not been able to draft a better alternative.

  • Decision: Accepted.

    We have replaced "Escape syntaxes where the end is determined by a character outside the set of characters admissible in the character escape itself SHOULD be avoided." with "Escape syntaxes where the end is determined by any character outside the set of characters admissible in the character escape itself SHOULD be avoided." Although this change is minimal, it should now be clear that this refers to cases where almost any arbitrary character can terminate an escape. Strictly speaking, the ';' in the examples is part of the escape (part of the text that gets replaced), where in other cases, the terminating character itself is not replaced. (often old octal notations work that way).

  • Our response (sent 2003-02-13) -- Notification

C182NaNaNChris Lilley
TAG
3.6.2Character encoding identification
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    '[S] If the unique encoding approach is not chosen, specifications MUST designate at least one of the UTF-8 and UTF-16 encoding forms of Unicode as admissible encodings and SHOULD choose at least one of UTF-8 or UTF-16 as mandated encoding forms (encoding forms that MUST be supported by implementations of the specification).'

    Does that mean that, for example, saying UTF-8 is allowed and UTF-16 is disallowed and an encoding declaration is not required, is okay?

  • Answer: Yes.

  • Decision: We have classified this as "Not applicable", because it was a question.

    Our answer is "yes". This should be understood in light of our comments to C118. It is not meant to change the rules of specific existing formats or protocols, but to give guidance to new formats or protocols.

  • Our response (sent 2004-01-16) -- Notification

C183EPNChris Lilley
TAG
3.6.2Character encoding identification
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    Needs a little more on encodings that are a group of similar but not identical encodings, for example shift-jis.

  • Decision: Rejected.

  • Rationale: We have rejected this comment, because this is already mentioned. But as a result of other editing, the relevant note is now in a very prominent position just after the opening paragraph. If you think this is not enough, please provide concrete suggestions on what you think is missing.

  • Our response (sent 2004-01-16) -- Notification

C184NaNaNChris Lilley
TAG
3.6.2Character encoding identification
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    'Because of the layered Web architecture (e.g. formats used over protocols), there may be multiple and at times conflicting information about character encoding. [S] Specifications MUST define conflict-resolution mechanisms (e.g. priorities) for cases where there is multiple or conflicting information about character encoding.'

    Yes. Better though to not define such layering; the XML MIME RFC messed this up by allowing the charset and the xml encoding declaration to differ and for the former to take precedence; this requires 'save as' to rewrite the XML otherwise it is no longer well formed.... better to require any transcoders to leave XML alone or to know how to rewrite the encoding declaration if they change the encoding.

  • Decision: Not applicable.

  • Rationale: We decided to reject this comment, in the sense that we are not dealing with this issue in the current version. However, we will note this issue for an eventual future version of the document.

    We would like to point out that we do not introduce layering, we just point out that it exists. On the specific point of RFC 3023, we can agree that some adjustments may be needed, but we think that this is for the IETF process to decide this. Going as far as disallowing a charset parameter in a protocol does not seem appropriate, because it would restrict implementation and deployment too much. In general, saying something like "don't allow too many ways of specifying the character encoding" seems like a good idea, but it is too general to be helpful for actual specification designers, and providing more detailed advice and examples seems difficult at this point.

C185EANChris Lilley
TAG
3.6.2Character encoding identification
  • Comment (received 2002-05-27) -- Comments on charmod from Chris

    'Certain encodings are more or less associated with certain languages (e.g. Shift-JIS with Japanese); trying to support a given language or set of customers may mean that certain encodings have to be supported.'

    The corollary should be clearly stated: do not assume that 'everyone' supports a favored but non-mandated encoding 'every parser I know supports Latin-1/Shift-JIS' is not true.

  • Decision: Accepted.

    We added a note.

C186SANDeborah Goldsmith
Apple
4.4Apple comments on Character Model
  • Comment (received 2002-07-23) -- Apple comments on Character Model

    Apple shares Microsoft's concern that the Character Model is overly strict with respect to forcing NFC conformance on all Web-based applications. Apple generally follows the philosophy of 'be lenient in what you accept and strict in what you produce,' which seems to us to have proven a good model historically. Therefore, we would prefer a Character Model that recommends that content producers generate NFC, but that does not require that applications (including XML parsers and other content producers and proxies) must reject anything that is not NFC.

  • Decision: Accepted.

  • Our response (sent 2002-08-20) -- Re: Apple comments on Character Model

C187SPCMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form, and MUST NOT normalize the suspect text . Private agreements MAY, however, be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.'

    The exception for private agreements is crippled by the observable results restriction thus when all is said and done any suspect text will always remain.

  • Decision: Clarify that 'externally observable' refers to both inputs and outputs.

  • Our response (sent 2002-08-26) -- Please clarify XSL WG comment (issue 187) on Character Model WD

  • Our response (sent 2002-09-03) -- RE: Please clarify XSL WG comment (issue 146) on Character Model WD

    [...] Please note that we have asked you for clarification on three of your comments [...]

  • Comment (received 2002-09-10) -- Character Model Comments Clarifications

C188NaNCMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form, and MUST NOT normalize the suspect text . Private agreements MAY, however, be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.'

    Section 4.4 appears to require that XML be changed to disallow the use of a composing character as the first character in an entity. This change would be backwards incompatible. XSL WG specifications such as XSLT and XPath must continue to work with all XML well-formed documents.

  • Decision: Noted.

C189NaRCMark Scardina
XSL WG
4.4XSL WG Comments on Character Model WD
  • Comment (received 2002-06-28) -- XSL WG Comments on Chairacter Model WD

    '[S] [I] A text-processing component that receives suspect text MUST NOT perform any normalization-sensitive operations unless it has first confirmed through inspection that the text is in normalized form, and MUST NOT normalize the suspect text . Private agreements MAY, however, be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.'

    Since the contents of an XML text node are 'suspect text' (there is nothing to prevent use of a composing character as the first character in a text node), section 4.4 appears to be saying that XPath must disallow operations such as substring() unless the text is inspected and found to be normalized. We do not believe that users want to pay the high cost of this feature.

  • Decision: Rejected.

  • Rationale: See MD's mail when available.

C191SANDan Chiba
-
3.6.2'x-' prefix on charset names
  • Our response (sent 2002-10-02) -- Re: 'x-' prefix on charset names

    What you are saying is that you are in a situation where you can't respect the SHOULD in the first sentence nor the SHOULD NOT in the second sentence, and it's unclear which one of them is stronger. I propose that we have a look at this in the WG and make clear that in such a case, using x- is better than not using x-.

  • Comment (received 2002-10-24) -- Re: 'x-' prefix on charset names

    Yes, that is precisely what I meant. Thank you very much for taking my question to WG.

  • Decision: Accepted

    We added: "C023[S][I][C] If an unregistered character encoding is used, the convention of using 'x-' at the beginning of the name MUST be followed."

Key

The possible values of Impact are:

The possible values of Decision are:

The possible values of Status are:

Colours:

Impact: red cell if still not assigned.

Decision: red cell if still not assigned.

Status: red cell if still not assigned.

Status: orange cell if Closed but needs moving to Notified.

Status: yellow cell if Notified but needs moving to one of Satisfied, Dissatisfied, or Withdrawn.


Misha Wolf, WG chair
Martin J. Dürst, W3C staff contact & IG chair
last revised $Date: 2006/03/20 17:26:44 $ by $Author: rishida $

Valid XHTML 1.0!