Comments on "Requirements for Japanese Text Layout" (2) from Eric Muller on 2010-08-26 (public-i18n-cjk@w3.org from July to September 2010)

From: Eric Muller <emuller@adobe.com>
Date: Thu, 26 Aug 2010 14:49:25 -0700
To: public-i18n-cjk@w3.org
Message-ID: <4C76E165.8070002@adobe.com>
  Second set of comments.

Eric.

---
§3.1.1, item a, note 1.

I am not sure if the term "approximate number" is know to English 
speakers that learn Japanese, but it's certainly not known to "ordinary" 
English speakers. May be "number range" rather than "approximate 
number", together with a translation of the text of figure 67 would help 
understand.

---
§3.1.1: the special treatment is mentioned only for vertical mode. It 
seems that it could apply just as well in horizontal text. It may be 
worth rephrasing the first paragraph:

The space usually added after IDEOGRAPHIC COMMA "、" and the space 
before and after KATAKANA MIDDLE DOT "・" are omitted, in principle, for 
cosmetic reasons in the following cases.

to

The space usually added after IDEOGRAPHIC COMMA "、" and the space 
before and after KATAKANA MIDDLE DOT "・" are generally omitted in some 
cases, in vertical (but not in horizontal) text:

---

§3.7.4, note 1: "Also, the handling of inter-character spaces between 
these math symbols and adjacent characters is described in Appendix A 
Character Classes <http://www.w3.org/TR/jlreq/#character-classes-en> as 
a complete table, in accordance with the concept of character class, 
described in 3.9 About Character Classes 
<http://www.w3.org/TR/jlreq/#en-subheading2_9>."

First, Appendix A is only a description of the character classes, so it 
seems that the reference was meant to be to Appendix B, "Spacing between 
characters".

Second, table 2 does not include classes 17 and 18 at all. Similarly, 
tables 4, 5, 6 and 7, do not include those classes. In fact, there does 
not seem to be any description of the inter-character spaces between 
math and other characters outside 3.7.4, so the note is misleading.

---
§3.7.4, item b., note 1. It seems that the reference should be to figure 
175 rather than 174.

---

In Table 2 (Spacing between characters), the entry for cl-02 closing 
brackets / line end refers to note 2, which says:

    The preferred spacing between closing brackets (cl-02)
    <http://www.w3.org/TR/jlreq/#cl-02> and the line end is a half em.
    The alternative is to set solid (see 3.1.9 Positioning of Closing
    Brackets, Full Stops, Commas and Middle Dots at Line End
    <http://www.w3.org/TR/jlreq/#en-subheading2_1_9>).

The corresponding entries in tables 4, 5, 6 are "1/2=0", "<blank>" and 
"1/2=0" respectively.

To facilitate the correlation with tables 4, 5 and 6, it may be worth to 
rephrase note 2, replacing "The alternative is" by "An alternative, used 
in JIS X 4051, is".


---

Similarly for cl-05 middle dot / line end and note 4:

    The preferred spacing between middle dots (cl-05)
    <http://www.w3.org/TR/jlreq/#cl-05> and the line end is a quarter
    em. The alternative is to set solid.

The corresponding entries in table 4, 5, 6 are "note 5", "<blank>" and 
"<blank>" respectively.

To facilitate the correlation with tables 4, 5, 6, it may be worth to 
rephrase note 4, replacing "The alternative is" by "An alternative, used 
in JIS X 4051 and in books is".

Furthermore, note 5 in table 4 is:

    Table 4, and only Table 4, allows the preceding and trailing
    conditional quarter em space accompanying middle dots (cl-05)
    <http://www.w3.org/TR/jlreq/#cl-05> to be reduced to leave no space.
    The priority order is the third.

Wouldn't it be simpler to have "1/4-0" in table 4, and not have note 5? 
This would give the same organization as for cl-02/line end.

---
Similarly for cl-06, cl-07 / line end and note 6:

-> An alternative used in JIS X 4051, is to set use a half-em after a 
full stop and to set commas solid.

---
Table 2, note 10:

    When two adjacent characters belong to the same simple-ruby
    character complex (cl-22) <http://www.w3.org/TR/jlreq/#cl-22> run,
    set them according to the method explained in 3.3.5 Positioning of
    Mono-ruby with Respect to Base Characters
    <http://www.w3.org/TR/jlreq/#en-subheading2_3_5>. When two adjacent
    characters belong to two distinct simple-ruby character complex
    runs, set them solid.

If two adjacent characters belong to the same simple-ruby character 
complex, doesn't that imply a group ruby, and therefore the pointer 
should be to 3.3.6?

---
Table 4, 5, 6, note 1, called for cl-05/cl-05. The note gives the 
behavior for tables 4 and 5, but not for table 6. I supposed that the 
"Tables 4 and 5 allow..." should be "Tables 4, 5 and 6 allow..."


Same thing for note 2. Furthermore, "Tables 4 and 5 allow the quarter em 
space accompanying the trailing middle dot (cl-05) 
<http://www.w3.org/TR/jlreq/#cl-05> to be reduced, to leave no space as 
a minimum" can be a bit misleading, as the half em on the full stop 
remains. May be "Tables 4 , 5 and 6 allow the quarter em space 
accompanying the trailing middle dot (cl-05) 
<http://www.w3.org/TR/jlreq/#cl-05> to be reduced to zero, leaving only 
the half em space accompanying the leading full stop (cl-06)."

Note 3: the behavior of table 4 and 5 is not specified. I suppose that 
in table 4, both the half em after the comma and the quarter em before 
the middle dot can be reduced to 0. Table 5 being about JIS, the space 
is only a quarter em before the middle dot, and I suppose this can be 
reduced to 0.

---
Appendix E.

I understand that to achieve a given line width by expansion:
- the western word spaces are enlarge up to 1/2 em
- then the 2nd step 1/4 em spaces are expanded to 1/2 em
- then the 3rd step 0 em spaces are expanded to 1/4 em
- then the 4th step 0 em spaces are expanded

What is not clear is whether there is a limit to the expansion of the 
4th step 0 em spaces, and if so, what is expanded after this limit is 
reached.

May be an example will help: consider <ideo, ideo, coma, ideo, ideo>.  
That's normally 5em; how are the space distributed if that line is 
justified to a width of 10em?

---

The class cl-08 inseparable characters is made of

U+2014 — EM DASH
U+2026 … HORIZONTAL ELLIPSIS
U+2025 ‥ TWO DOT LEADER

U+3033 〳 VERTICAL KANA REPEAT MARK UPPER HALF
U+3034 〴 VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF
U+3035 〵 VERTICAL KANA REPEAT MARK LOWER HALF

The treatment of those characters in line breaking is described by note 
5 on table 3:

    There is no line break opportunity between consecutive inseparable
    characters (cl-08) <http://www.w3.org/TR/jlreq/#cl-08> of the same
    kind. If two consecutive inseparable characters (cl-08)
    <http://www.w3.org/TR/jlreq/#cl-08> are of different kinds, a line
    break opportunity exists between them. For example, a line shall not
    be broken between two consecutive EM DASH "―"EM DASH "―" followed by
    HORIZONTAL ELLIPSIS "…".

and the treatment of those characters in space expansion is described by 
note 4 on table 7:

    A third order opportunity exists for inter-character space
    expansion, to take up to a maximum of a quarter em space, with
    respect to the corresponding character size, between two consecutive
    inseparable characters (cl-08) <http://www.w3.org/TR/jlreq/#cl-08>
    which are of different kinds.

It seems to me that the intent is that the sequences of characters:
- multiple U+2014
- multiple U+2015
- multiple U+2016
- <3033, 3035>
- <3034, 3035>
be treated as if they were a single character (i.e. no linebreak or 
character expansion in them).

The situation then looks a lot more similar to that of the class "unit 
symbols".

I would suggest to redefine class 08 along those lines, i.e. as a class 
of sequence of characters.

Also, I am wondering if the last two sequences should not be treated as 
cl-09 rather than cl-08.

---
Table 7, note 10: "A fourth order opportunity exists for inter-character 
space expansion...". In the table itself, the color of the cell that 
refers to note 10 is blue, i.e, third order. Which is right? the note 
text or the cell color?


---
§3.8.4,items c and d, both refer to "bunrikinshi", but this term is not 
defined and is not used anywhere else. Appendix G has "buri kinshi", but 
this is the only occurrence, and it is also weakly defined. It seems 
that the only definition is provided by Table 7.

---
§3.9.2, Characters as reference marks (cl-20):

    Characters which are inside verification seal (those are characters
    inside a verification seal that appear in the line just after the
    item applicable for reference marks
    <http://www.w3.org/TR/jlreq/#reference-marks> of notes
    <http://www.w3.org/TR/jlreq/#note>)

There is no other occurrence of the words "verification" and "seal", 
which makes it a bit difficult to understand this text (what's a 
"verification seal"? how does it related to reference marks of notes?). 
Also, a reference to §3.1.9, item j, may help.

---
Table 5, the legend has "(The example that it followed the regulation of 
JIS X 4051)" -> "(according to JIX X 4051)"

---
Table 6, the legend has "(The example being done in the book and so on)" 
-> "(often used in book and similar materials)"

---
Received on Thursday, 26 August 2010 21:52:51 UTC