Comments on Unicode Technical Report #36

Version reviewed

http://www.unicode.org/draft/reports/tr36/tr36.html Revision 1.20

Main reviewer

Richard Ishida ishida@w3.org

Notes

These are so far mostly personal comments, NOT on behalf of the I18N WG. Where the WG has expressed agreement this is noted in the table.

Comments

ID Location CommentEd. /
Subs.
Discussion threads
1 2

Endorsed by the i18n WG:

We disagree with the use of the term 'homograph' to refer to "Where two different strings are essentially identical in most fonts at all sizes". The traditional use of homograph is to refer to two words that *are* identical in terms of the letters used. Introducing this new meaning will make it difficult to refer to the previous meaning.

Perhaps it would be better to call them 'pseudo-homographs', or some such.

S -
2 2 s/all visually confusables/all visually confusable strings/E -
3 2.1 "Certain parts of domain names are still required to be in ASCII". This and the remainder of the paragraph seem to suggest that TLDs will always be in ASCII - this is not decided. Suggest you soften that, eg. with 'are currently still required', or add a comment at the end that this may need revisiting later.E/S -
4 2.1, para starting "Similarly, for most scripts" May make it clearer to note that no precomposed characters exist for any part of the combinations in the example.E -
5 2.1, para starting "This focus on"

"Thus ToASCII("paypaI.com") = "paypaI.com" (where the 'I' is a capital 'i')." v complicated sentence. Why not just:

Thus ToASCII("paypaI.com") (where the 'I' is a capital 'i') produces no change.

E -
6 2.1, 1st note

"two or more domain names would need to be registered"

Clarify that this is by the person making the registration.

E -
72.1Should the last few paras (starting from "Some of the problems") fall under the title of 'Recommendations"?E
82.2, last para'whole-script spoofs' is a term that doesn't convey useful meaning to me, I would have preferred something like 'whole-label spoofs'.E
92.3, 1st paraSuggest append "such as punctuation" to "Spoofing with characters entirely within one script, or using characters that are common across scripts,". Otherwise it can sound confusingly as if this is the same as what you described in the previous section. (I know is says characters, not glyphs, but you have to catch the import of that.)E
102.3, table, last columnI think you should continue the "instead of..." on other rows, or describe the contents on a row by row basis. (The comments are a good addition, though - this was much more confusing in the previous version.)E
112.4, 1st paras/These are characters that should be visually distinguishable/These are characters or sequences of characters that should be visually distinguishable/E
12GeneralStrikes me it would be clearer for referring to specific examples to number examples in tables sequentially throughout the document. E
142.5Seems strange to have some user recommendations called out for this topic but not in other sections (eg. at the end of 2.1)E
152.5s/Use digits as little as possible in host names/Use digits as little as possible in host names containing right-to-left characters/E
162.5s/reading correctly IRIs/reading IRIs correctly/E
172.7wrt "Turning away from IDN", does IDNA not allow these characters?E
182.8The objective of section 2.8 should be spelled out more clearly in the opening paragraph. As best I can tell, this is a non-exhaustive collection of topics that provide useful detail to support the recommendations in 2.10. It took some time for me to understand the approach here and conclude that this was not the recommendations section, and that I didn't have to worry about what applied to whom.E
182.8The internal section numbering is incorrect.E
192.10The recommendations for handling bidi and right-to-left domain names are missing and could well be overlooked. Please move or copy them here.E
20A.1, idmodAre we sure that none of the phonetic characters excluded are used in writing languages (eg. African languages)?S
21A.1, idmod

Endorsed by the i18n WG:

We are not convinced that being archaic or liturgical is a suitable justification for exclusion. Although it is unlikely that these would be a good choice for a domain name due to the difficultly of users typing them in or viewing them, perhaps this list should confine itself to problematic cases due to visual similarity, etc.

S

Version: $Id: Overview.html,v 1.2 2005/07/06 13:17:20 rishida Exp $