Only comments that are publicly visible are listed below.
Such comments sometimes contain links to other items that are not publicly visible.
| LC002 | E | A | N | Tom Milo | - | B | Re: Character Model: Two new documents and Last Call
Behdad Esfahbod suggests: لالایی - a persian word meaning 'lullaby', pronounced /laalaayee/ Najib Tounsi suggests: تتلألأ - an arabic word which means "it
shines", "it sparkles". Where the subject "it" is of female gender.
Pronounce: ta ta la' la'
Discussed: Discussed: Decision: Accepted. We used the Persian word suggested above.
|
| LC003 | S | N | N | Markus Scherer | - | Overall | charmod vs. UTF-16/32
Decision: Noted and deferred. We agree with the first part of the sentence, but we do not yet have enough consensus to talk about the BOM in this version of the document. |
| LC004 | S | N | N | Markus Scherer | - | Overall | charmod vs. UTF-16/32
Decision: Noted and deferred We agree, but we do not yet have enough consensus to talk about the BOM in this version of the document. |
| LC005 | S | N | N | Markus Scherer | - | Overall | charmod vs. UTF-16/32
Decision: Noted and deferred We agree, but we do not yet have enough consensus to talk about the BOM in this version of the document. |
| LC006 | S | R | N | Markus Scherer | - | 6.2 | charmod vs. UTF-16/32
Discussed: Decision: Rejected The 'character string'
provides a good balance between user requirements (ideally
count in terms of grapheme clusters) and implementation
requirements (count in terms of code units). Also, it takes
into account that specifications (in particular those related
to XML) are written in terms of characters, not code units. We would like to point out that we have carefully listed the
alternatives and the reasons for when to use them in C052 and
C071,..., so that readers of the Character Model (writers of
specifications) should be able to make the best decision on
their own. Although we understand performance concerns about calculating
string length, we haven't heard any complaints about this e.g.
from implementers of XSLT. Also, in cases where it should really
become a bottleneck, e.g. finding a certain character position
in an extremely long string encoded in UTF-16 (or for that
matter e.g. in UTF-8), there are techniques for optimization
(e.g. building an index of every 1000'th character position
for an 1M long string, to be used for speedup of subsequent
indexing operations). Also, strings in general are not as easy to use as they may
seem. For some interesting background, please see
http://www.joelonsoftware.com/articles/fog0000000319.html. |
| LC007 | E | A | N | fantasai | - | 3.2 | poor example of multi-letter phonemes
|
| LC008 | E | A | N | fantasai | - | 3.3 | describing 'logical' order
|
| LC009 | S | R | N | Frank Ellermann | - | 4.5 | C069
Discussed: Discussed Decision: Rejected We have decided to reject your comment, but would like to thank
you for making it, because it has helped us getting more clarity
on what exactly we should say. We agree that C069, as it was written, at least in some interpretations,
would have prohibited ASCII art and ASCII smilies, and potentially even
Unicode smilies and so on. While we do not think that ASCII art and
ASCII smilies are necessarily a good idea, and in particular there
are accessibility issues, we note that there is quite a widespread
practice, and that with respect to accessibility, it is the expertise
of a separate group, and a separate spec, that is most qualified to
decide this (WCAG 1.0 has some techniques that mention ASCII art,
but doesn't prohibit it outrightly). So we decided to defer the question
of what to say about ASCII art and so on, and decided to remove
C069, and insert a much more specific conformance requirement into
the spec, placed somewhat earlier after the Note after C073: >>>>>>>>
C076 [C] Content MUST NOT use a code point for any purpose other than
that defined by its character encoding.
This prohibits the construction of fonts that misuse e.g. iso-8859-1
to represent different scripts, characters, or symbols than what is
actually encoded in iso-8859-1.
>>>>>>>> This is the major misuse that we tried to address with C069, in a somewhat
too general a fashion. In an ASCII smiley, a ')' is still a ')' as defined
in ASCII, it's just used in a different way than usually, but neither
the character model nor Unicode say how characters can be used and how not. |
| LC010 | Na | R | N | Frank Ellermann | - | 4.6 | C048
Discussed: Decision: Rejected We have taken the assumption that your comment asks for removing
C048 to avoid problems with browsers such as Netscape 3.x and 4.x.
Under that assumption, we have rejected your comment.
We would like to note that not only do these browsers not deal with hexadecimal character references, they are also very bad at dealing with character references in general according to the reference processing model. In particular, for Netscape 4.x, one has to label a document as UTF-8 in order for arbitrary
(decimal!) character references to take effect. Given that very poor if not non-existent support for the very basics of the Character Model in those browser versions, we do not feel that it is appropriate to remove C048, which otherwise is undisputed. In addition, browser statistics (see e.g.
http://www.w3schools.com/browsers/browsers_stats.asp) show that the percentage of these browsers is declining steadily and has reached very low numbers. We would also like to note that C048 is only a SHOULD, so this still allows the use of decimal numeric character references in situations where backwards compatibility with such kinds of browsers is really important, e.g. in intranet environments with very slow upgrade cycles. Please note that the wording of C048 has changed to
"Content SHOULD use the hexadecimal form of character escapes
rather than the decimal form when there are both."
to avoid saying anything about the relative preference of named character entities vs. numeric character references.
But this should be only marginally related to your comment. |
| LC011 | E | A | S | Tim Bray | - | 1.2 | Review of WD-charmod-20040225
|
| LC012 | E | N | S | Tim Bray | - | 1.2 | Review of WD-charmod-20040225
Decision: Noted We have classified this comment as 'noted', which means
that we acknowledge the point, but don't think that a change
to the specification is necessary. APIs often require more detailled specifications than protocols
or formats:
- APIs are often used on lower-granulary units than procotols and
formats.
- APIs often work on a single machine, and trade efficiency for
(cross-architecture) interoperability.
- Protocols and formats often only move data, whereas APIs
manipulate data. For I18N, this means that more details e.g. re. Unicode may have
to be specified for APIs than for protocols and formats. This is
explained in detail a couple paragraphs later. See also clarification |
| LC013 | E | N | S | Tim Bray | - | 1.2 | Review of WD-charmod-20040225
Decision: Noted We have classified this comment as 'noted'. This means that while it raises a valid point, we have decided not to change the specification. The fact that the Web can be seen as a single, very large application (in the sense that data flows through all the pieces without any total boundaries) is indeed very important in particular for the use of Unicode as a common reference point in the Character Model. Without such a reference, binary data would be exchanged without the chance to be able to compare two text strings (e.g. if they are in incompatible encodings). This also increases the requirement for Web-wide agreements on things such as counting characters,...
So this is indeed relevant to i18n, and is to quite some extent actually explained before and after the text in question. See also clarification |
| LC014 | E | P | S | Tim Bray | - | 1.2 | Review of WD-charmod-20040225
|
| LC015 | E | P | S | Tim Bray | - | 1.3 | Review of WD-charmod-20040225
Discussed: see notes RI finds only one other use of 'producer' and 'recipient'. Discussed: see notes Decision: Partially Accepted We removed the first paragraph and note in section 3.1, since those definitions were not needed for this document. We will use your proposed text for the Normalization document, where these definitions are needed. |
| LC016 | E | A | S | Tim Bray | - | 2 | Review of WD-charmod-20040225
|
| LC017 | E | A | S | Tim Bray | - | 2 | Review of WD-charmod-20040225
|
| LC018 | E | A | S | Tim Bray | - | 2 | Review of WD-charmod-20040225
|
| LC019 | E | A | S | Tim Bray | - | 3.1 | Review of WD-charmod-20040225
|
| LC020 | E | A | S | Tim Bray | - | 3.3 | Review of WD-charmod-20040225
|
| LC021 | E | A | S | Tim Bray | - | 3.3 | Review of WD-charmod-20040225
|
| LC022 | E | A | S | Tim Bray | - | 4.1 | Review of WD-charmod-20040225
|
| LC023 | E | A | S | Tim Bray | - | 4.3 | Review of WD-charmod-20040225
|
| LC024 | S | A | S | Tim Bray | - | 4.3 | Review of WD-charmod-20040225
Discussed: Discussion: Decision: Accepted We have split the requirement into two, making it a MUST NOT for surrogates, and a SHOULD not for other stuff.
The main reason for the distinction is that surrogates are the biggest area, therefore easiest to exclude. From there on, it's a bit of a slippery slope, with a decreasing return on investement. An example would be the U+??FFFE and U+??FFFF codepoints at the end of each plane. They are clearly not allowed, but a spec might want to make their own decision of whether they want to formally disallow them or not, based on efficiency considerations. See also clarification |
| LC025 | S | R | S | Tim Bray | - | 4.3 | Review of WD-charmod-20040225
Discussed: Decision: Rejected We have decided to reject this comment. The argument about having to choose between Java/C# friendly and C/C++ friendly has been countered on www-tag: in terms of programming, an explicit decoding step has to be used anyway e.g. in Java to deal with endianness issues, and interoperability and speed is not increased by adding more encodings because in the general case, all encodings have to be addressed. Also, we note that recently, the focus on abstract representations should allow to e.g. pass data directly as characters between two Java programs or processes. In addition, we note that we don't know any technology that currently would allow exactly UTF-8 and UTF-16 but nothing else (as opposed to XML, which allows lots of other encodings).
This would mean that it would be impossible to show implementation experience for such a combination. This seems to be in accordance with a well-known (at least in the IETF) saying for spec design:
"zero, one, many". In the case of (the next version of) CSS, this wouldn't really apply, because CSS, at least currently, like XML allows a wide range of character encodings. Also, it is very ASCII-heavy, more so on average than XML, so that
UTF-16 is less important. See also clarification |
| LC026 | E | N | S | Tim Bray | - | various | Review of WD-charmod-20040225
|
| LC027 | S | P | S | Tim Bray | - | 4.4.2 | Review of WD-charmod-20040225
Discussed: see notes Decision: Partially accepted We felt that the first part of the sentence had meaning and value, but removed the second part [ " and SHOULD implement them in such a way that they are easy to use (for instance in HTTP servers)" ]. See also clarification |
| LC028 | E | A | S | Tim Bray | - | 4.4.2 | Review of WD-charmod-20040225
|
| LC029 | E | P | S | Tim Bray | - | 4.4.2 | Review of WD-charmod-20040225
|
| LC030 | E | P | S | Tim Bray | - | 4.6 | Review of WD-charmod-20040225
Discussed: see notes Decision: Partially accepted This is a formal definition of an escape, rather than a statement of purpose, so we feel that point 3 is fine. We did however, change ''character codes' to 'encoded characters'. See also clarification |
| LC031 | E | A | S | Tim Bray | - | 4.6 | Review of WD-charmod-20040225
|
| LC032 | E | R | S | Tim Bray | - | 4.6 | Review of WD-charmod-20040225
Discussed: Discussed: Decision: Rejected Charmod does
not deprecate everything but Unicode (although it shows a clear and
intentional preference) and also the earlier requirement for hex
escapes (C045) applies to specifications that define escape syntaxes,
while this one applies to content (and implementations that generate
content). See also clarification |
| LC033 | E | R | S | Tim Bray | - | 4.6 | Review of WD-charmod-20040225
Discussed: see notes Decision: Rejected Charmod does not deprecate everything but Unicode (although it shows a clear and intentional preference). C049 includes things like using iso-8859-1 or windows-1252 for western European languages, or shift_jis,... for Japanese, and so on. See also clarification |
| LC034 | E | A | S | Tim Bray | - | 6.2 | Review of WD-charmod-20040225
|
| LC035 | S | A | S | Tim Bray | - | 7 | Review of WD-charmod-20040225
Discussed: see notes Decision: Accepted We have accepted this comment. As a result of this and other comments, we have split the character model again, creating a separate part that only deals with IRIs. We plan to move that to CR, and only proceed to PR when the IRI spec has further proceeded in the IETF (e.g. is published as a Proposed Standard RFC). We would like to note that the IRI spec recently has made quite some progress, IETF last call has successfully ended, and IESG approval may be close. See also clarification |
| LC036 | E | R | S | Tim Bray | - | 4.6 | Review of WD-charmod-20040225
Discussed: Decision: Rejected Not all specs need a reference to Unicode (or to 10646, for that matter). An example would be the xml:base spec, which doesn't involve characters except indirectly through XML and URIs. However, it is difficult to clearly define when a spec does or does not depend on character definitions and semantics. If C062 changed to a MUST, it would need to have a qualifier (e.g. "if the spec depends on character definitions and semantics...") which would make the MUST clause untestable. Therefore, this should remain a SHOULD, providing an escape hatch for specs that legitimately do not require a Unicode reference. See also clarification |
| LC037 | E | A | N | Susan Lesch | W3C Communications Team | Overall | Background color of images
|
| LC038 | E | A | N | Susan Lesch | W3C Communications Team | Overall | XML Spec XSLT
Discussed: see notes Decision: Accepted: but we fixed the XSLT we are using, rather than upgrade to the current version of XMLSpec, because we have added numerous extensions and don't feel we have the time or need to redo things. |
| LC039 | E | A | N | Susan Lesch | W3C Communications Team | 4.3 | Minor editorial
|
| LC040 | T | A | N | Susan Lesch | W3C Communications Team | 4.3 | Typo
|
| LC041 | T | A | N | Susan Lesch | W3C Communications Team | 2 | Typos
|
| LC042 | E | P | N | Susan Lesch | W3C Communications Team | A | URIs in citations
Our response (sent 2004-18-03) -- Re: URIs in citations Discussed: see notes Decision: Partially-accepted We already link the title text, and we don't think
URIs should not be linked. However, there are some titles that have not been linked that we noticed thanks to this comment. We have fixed them. |
| LC043 | S | P | S | Dan Connolly | - | 3.2 | conformance to "software MUST NOT assume" measurable?
Discussed: see notes Decision: Partially accepted We have changed the wording from: C001 [S][I][C] Specifications, software and content MUST NOT
>assume that there is< a one-to-one correspondence between characters
and the sounds of a language. to C001 [S][I][C] Specifications, software and content MUST NOT
>require or depend on< a one-to-one correspondence between characters
and the sounds of a language. and have made the same change for C002 and C003. This avoids the issue that specifications, implementation, and content don't really make 'assumptions'. As for conformance, we would like to first point out that all the conformance criteria in the Character Model are predicated on whether a given criteria actually applies to a give technology. So technology that does not deal with the auditory representation of language (i.e. most W3C specifications) are not affected by this criterion.
Technology that is affected (e.g. VoiceXML and in particular SSML) can be checked. If SSML for example tried to do text-to-speach conversion by defining a format for a table that would only associate single phonemes with single characters, it would very clearly not conform to the character model. But as you can check at http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.1.9,
SSML definitions of written to spoken correspondence using the <phoneme> element allows definitions on whole words or larger pieces of text, so it is conformant. With this example, I hope that we have shown that conformance of specifications can indeed be checked. To be even more concrete, one could easily collect a series of examples (starting with those mentioned in the spec, such as "thing"), where there is not a one-to-one correspondence between characters and phonemes, and check whether specs, implementations,... that deal with such correspondences can handle them. |
| LC044 | S | R | D | Dan Connolly | - | 3.7 | define 'character' once and for all
Discussed: see notes Decision: Rejected The definition for 'character'
currently available in the document ("a character can be defined informally as a small logical unit of text") is too fuzzy to be directly useful in other specifications. Having a single, very precise, definition of 'character' is not really feasible, because different kinds of specifications may need different definitions.
Also, in C067, we advise to use more specific terms if available.
The wide range of ways to look at the phenomenon of a 'character, and to define the term 'character', should become obvious to the reader after reading Section 3 of the Character Model. Discussed Decided to leave as dissatisfied. |
| LC045 | S | A | S | Martin Dürst | - | 8 | Last call comment on Charmod (Fundamentals)
|
| LC046 | S | A | S | Dan Connolly | - | 4.6 | appropriate mechanism exists... says who?
Discussed: Discussed: Decision: Accepted Changed "Specifications MUST NOT invent a new escaping mechanism if an appropriate one already exists." to "Specifications SHOULD NOT invent a new escaping mechanism if an appropriate one already exists." This can indeed only be observed by humans looking at a specification and comparing it with known pre-existing escaping mechanisms, and this will include some judgement. However, we think that it is better to have this conformance criterion to make such judgement explicit rather than to have spec writers come up with new mechanisms all the time. |
| LC047 | S | A | S | Dan Connolly | - | 7 | IRI section needs too much testing to go in Fundamentals
Discussion: Our response (sent 2004-07-22) -- [Urgent] Request for additional feedback (was: Clarification sought: CharMod LC053) Request for TAG to comment on our plans to split the document and advance this Fundamentals part straight to PR. Discussed Decision: Accepted. We split off section 7 'Character Encoding in URI References' into a separate document titled "Character Model for the World Wide Web 1.0: Resource Identifiers", and will advance the remainder of the Fundamentals document to PR. The new document will go to CR and will be advanced with the IRI Internet-Draft. |
| LC048 | S | P | N | Dominique Hazaël-Massieux | - | Overall | Support for DanC's comment re conformance
|
| LC049 | S | A | N | Frank Ellermann | - | 4.6 | C049
|
| LC050 | E | A | N | Philippe Le Hégaret | - | 7 | IRI Reference should be normative
Discussion: Discussion: question is: does the CharMod text require it to be a normative reference? Decision: Accepted We moved ID-IRI reference from non-normative to normative references. Updated document links to draft-duerst-iri-10.txt. Added following note: "[NOTE: This reference should be taken to point to the RFC once the IRI draft has progressed to that stage.]" Note that as a result of other comments, we have moved the section about IRIs to a separate document. |
| LC051 | S | A | N | Chris Lilley | TAG | 4.5 | Pi fonts and PUA
Decision: Accepted. Discussion: Discussed Decision: Accepted We have added some new text: >>>>>>>>
C076 [C] Content MUST NOT use a code point for any purpose other than
that defined by its character encoding.
This prohibits the construction of fonts that misuse e.g. iso-8859-1
to represent different scripts, characters, or symbols than what is
actually encoded in iso-8859-1.
>>>>>>>> This is just after C073, which says that content on the Web
SHOULD not use the PUA. By having C076 being a MUST and C73 a SHOULD,
it is clear that if symbols not encoded in Unicode have to be represented,
they have to go into the PUA rather than into some assigned or reserved
area. On the other hand, we have removed C069 because it was too general
and covered e.g. things like ASCII art, which is an issue of
use of characters rather than encoding of characters. |
| LC052 | E | A | N | Chris Lilley | TAG | 3.3 | C004 ambiguous
|
| LC053 | S | A | N | Chris Lilley | TAG | 7 | Please remove IRIs
|
| LC054 | E | P | S | Karl Dubost | QA | 3.2 | KD-001
Discussion:note that this is linked to LC055, LC056 and LC068 Decision: Partially accepted For 3.2, C001; 3.3, C002; 3.4, C005; 3.6, C009: replaced "MUST NOT assume" with "MUST NOT require or depend on". We have changed the wording from: "C001 [S][I][C] Specifications, software and content MUST NOT >>assume that there is<< a one-to-one correspondence between characters and the sounds of a language." to "C001 [S][I][C] Specifications, software and content MUST NOT >>require or depend on<< a one-to-one correspondence between characters and the sounds of a language." This avoids the issue that specifications, implementation, and content don't really make 'assumptions'. As for conformance, we would like to first point out that all the conformance criteria in the Character Model are predicated on whether a given criteria actually applies to a give technology. So technology that does not deal with the auditory representation of language (i.e. most W3C
specifications) are not affected by this criterion. Technology that is affected (e.g. VoiceXML and in particular SSML) can be checked. If SSML for example tried to do text-to-speach conversion by defining a format for a table that would only associate single phonemes with single characters, it would very clearly not conform to the character model. But as you can check at http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#S3.1.9, SSML definitions of written to spoken correspondence using the <phoneme> element allows definitions on whole words or larger pieces of text, so it is conformant. With this example, I hope that we have shown that conformance of specifications can indeed be checked. To be even more concrete, one could easily collect a series of examples (starting with those mentioned in the spec, such as "thing"), where there is not a one-to-one correspondence between characters and phonemes, and check whether specs, implementations,... that deal with such correspondences can handle them. As for implementability, there are a lot of text-to-speech engines, and a lot of speech detection engines, that do not require or depend on a one-to-one correspondence, so it is very clear that this can be implemented. As for your point of "If the software implements only this language because it's a specific use for only this language", yes, such a software would not conform to the character model. From the viewpoint of the character model, this would be on purpose; in the age of the World Wide Web, it is a bad idea to create software that can handle only one language, and it is a bad idea to create software that has language-related issues hard-coded when it can easily be made configurable. |
| LC055 | E | P | S | Karl Dubost | QA | 3.3 | KD-002
Discussion: Discussion: Decision: Partially acceptedOur reply is basically the same as that for LC054. We replaced "MUST NOT assume" with "MUST NOT require or depend on".
We note that this is testable with very simple examples, some of which can be found in the spec itself.
Implementations dealing with only a single language may not conform to the character model, and that is by design; it's the goal of the character model to make sure that specs and software can deal with as much languages as possible. |
| LC056 | E | P | S | Karl Dubost | QA | 3.4 | KD-003
Discussion: Discussion: Decision: Partially accepted Our reply is basically the same as that for LC054. We replaced "MUST NOT assume" with "MUST NOT require or depend on".
We note that this is testable with very simple examples, some of which can be found in the spec itself.
Implementations dealing with only a single language may not conform to the character model, and that is by design; it's the goal of the character model to make sure that specs and software can deal with as much languages as possible. |
| LC057 | E | R | S | Karl Dubost | QA | 3.5 | KD-004
Discussion: Decision: Rejected You write:
===> What's happening if you implement all western languages but not asian because the context of applications do not make it necessary. Do I still have to implement everything? If not how can I be conformant? As we have already explained in our responses to LC054-56 that the goal of the character model is to cover as many languages/scripts/ characters as possible. On the WWW, you never know what input you get.
If an implementation blows up just because it is unable to do anything with Asian characters, that would be very bad. Please note that we do not require any particular sort order for any character, simply sorting 'unknown' characters by codepoint would be okay. |
| LC058 | E | P | S | Karl Dubost | QA | 3.6 | KD-005
Discussion: Discussion: Decision: Partially accepted Our reply is basically the same as that for LC054. We replaced "MUST NOT assume" with "MUST NOT require or depend on".
We note that this is testable with very simple examples, some of which can be found in the spec itself.
Implementations dealing with only a single language may not conform to the character model, and that is by design; it's the goal of the character model to make sure that specs and software can deal with as much languages as possible. |
| LC059 | E | A | S | Karl Dubost | QA | 3.7 | KD-006
|
| LC060 | E | A | S | Karl Dubost | QA | 4.4.1 | KD-007
|
| LC061 | E | A | S | Karl Dubost | QA | 4.4.2 | KD-008
Discussion: Decision: Accepted Replaced "Specifications MAY define either UTF-8 or UTF-16 as a default encoding form (or both if they define suitable means of distinguishing them), but they MUST NOT use any other character encoding as a default." with "Specifications that mandate a default encoding MUST define either UTF-8 or UTF-16 as the default, or both if they define suitable means of distinguishing them." |
| LC062 | E | A | S | Karl Dubost | QA | 4.4.2 | KD-009
|
| LC063 | E | A | S | Karl Dubost | QA | 4.4.2 | KD-010
|
| LC064 | S | A | S | Karl Dubost | QA | 4.5 | KD-011
Discussion: Discussed Decision: Accepted We have removed C069, which was too general,
because we don't want to discuss the use of characters so much as the
encoding of characters. We have added new text, just after C073 >>>>>>>>
C076 [C] Content MUST NOT use a code point for any purpose other than
that defined by its character encoding.
This prohibits the construction of fonts that misuse e.g. iso-8859-1
to represent different scripts, characters, or symbols than what is
actually encoded in iso-8859-1.
>>>>>>>> in order to not loose the main issue for which C069 was originally
introduced. |
| LC065 | E | R | S | Karl Dubost | QA | 6.1 | KD-012
|
| LC066 | E | A | S | Karl Dubost | QA | 8 | KD-013
|
| LC067 | E | R | S | Karl Dubost | |