WD-css3-text-20021024 substantive comments from Etan Wexler on 2002-11-27 (www-style@w3.org from November 2002)

From: Etan Wexler <ewexler@stickdog.com>
Date: 27 Nov 2002 23:00 +0000
To: www-style@w3.org
Message-ID: <WD-css3-text-20021024-comments@d20021127.etan.wexler>
Following are comments on the Working Draft, "CSS3 module:
text", <http://www.w3.org/TR/2002/WD-css3-text-20021024>.



2. Introduction


"In both CSS1 and CSS2, text formatting has been limited to
simple effects like for example: text decoration, text alignment
and character spacing."

Change "character" to "grapheme cluster".


"- wide-cell glyph (e.g. Han) which is the n-th character in the
text run"

Change "character" to "glyph".


"- narrow-cell glyph (e.g. Roman) which is the n-th glyph in the
text run"

Change "Roman" to "Latin".


"Many typographical properties in East Asian typography depends
on the fact that a character is typically rendered as either a
wide or narrow character."

Change the second occurrence of "character" to "glyph".


"Spacing between these characters in the diagrams is usually
symbolic"

Change "characters" to "glyphs".



[Section 3.3 was accidentally skipped.]



3.5. Script character classification: the 'text-script' property


Why has the name 'text-script' been chosen when XSL uses
'script'?


"For example, line breaking or text justification behaviors
depend on the 'dominant' script of the textual content of an
element."

Why do quote marks delimit "dominant"?


"Use the first character descendant, after any reordering due to
character direction and bi-directionality, which has an
unambiguous script identifier to determine the dominant script
of the element's content."

Reordering in the bidirectional algorithm affects the glyphs but
does not alter the character sequence.  The phrase "after any
reordering due to character direction and bi-directionality"
thus changes nothing and should be eliminated.


"In the absence of any textual components with a clear script
identifier (or no textual content at all), the computed value is
'Latin'."

The value 'Latin' is not given in ISO 15924 and is thus not
valid CSS.  Use the value 'Latn', with no "i".


"<script>
A script definition in conformance with [ISO15924]."

The value is to be a script identifier (or "specifier", in XSL
language), not a script definition.  A script definition
consists of prose and intangible history and usage.  A script
identifier is a machine-readable and relatively short string.

I prefer Unicode Technical Report #24, "Script Names"
(<http://www.unicode.org/unicode/reports/tr24/>), over ISO
15924.  The Unicode sript names are in actual English.  I
understand that for compatibility with XSL, ISO 15924 must be
the reference.  Due to the change in property name, however, the
desire to retain compatibility is in doubt.

What is the lexical form of <script>?  I assume that it is an
identifier, but somebody might assume that it is a string. 
Explicitness is needed for interoperability.



4.1. Text alignment: the 'text-align' property


"<string>"
"If set on other elements, it will be treated as 'start'."

I suggest the revision, ""If set on other elements, the computed
value is 'start'."



4.2. Justification: the 'text-justify' property


"It affects the text layout only if 'text-align' is set to
'justify'. That way, UA's that do not support this property will
still render the text as fully justified"

If the 'text-justify' values were allowed for 'text-align' as
meaning "justify in this manner", an extra declaration could be
used for fallback.  This would mean writing the following, for
example.

    text-align: justify;
    text-align: newspaper;

That would be instead of the following.

    text-align: justify;
    text-justify: newspaper;

This presents a trivial difference to the author (the reduction
in length slightly favoring my proposal).  Where the difference
really matters is in implementations, which would not have to
carry an extra property on each element.


"Scripts using space between word without connector
(Latin-based, Hebrew, etc...) and symbol characters."

What scripts are "Latin-based"?  "Greek-based" would include
Latin, Greek, Cyrillic, and Coptic.


"However, if the kashida-space property has a non zero value it
is recommended to use kashida elongation for Arabic text."

The property is called 'text-kashida-space', although
'kashida-space' seems preferable to me.


"The concept of a word is script dependent, the exact algorithm
is determined by the user agent."

Change to "The script guides what constitutes a word, although
the user agent determines the exact algorithm."


"At minimum, justification is expected to occur at each white
space boundary."

Does this intend to include "zero width space" and the
explicit-width spaces?


"The diagram below illustrates this mode, by showing how the
characters are laid out in the last two lines of an element"

Change "characters" to "glyphs".


"The threshold value may be related to the column width (in
number of characters)."

Change to "The threshold value may be related to the ratio of
column width to font size."


"Mixed character layout in the last two lines of a newspaper
justified element"

Change "character" to "glyph".


"In CSS3 a value of 'letter-spacing: 0' no longer inhibits
spacing-out of words for justification."

Why is this?  A person setting 'letter-spacing' to '0' has
clearly chosen something besides 'auto'.


"most script groups (except Hindi)"

Hindi is not a script group or script.  Was the intent to except
baseline-connected Indic scripts?


"Mixed character layout in the last two lines of a distribute
justified element"

Change "character" to "glyph".


"inter-cluster
Plays the same role as inter-ideograph but for South Eastern
Asian scripts. That is letter spacing only occurs for clusters
belonging to those scripts. A cluster is defined as a group of
characters formatted as a single unit."

Change to the following and append a reference to Unicode
Technical Report #29, "Text Boundaries"
(<http://www.unicode.org/unicode/reports/tr29/>).

"inter-cluster
This is the Southeast Asian counterpart to 'inter-ideograph'.
That is, letter spacing only occurs between script-defined
grapheme clusters."


"Plays the same role as inter-ideograph but for Arabic through
the Kashida effect. That is, no letter spacing occurs for other
scripts."

Change to "This is the Arabic counterpart to 'inter-ideograph'.
Letter spacing may be increased between Arabic letters, the
extra space being filled by kashida. No letter spacing occurs
for other scripts."


[table]

Change "Latin" to a more inclusive term or at least make a note
similar to the one for "Devanagari".



4.3. Last line alignment: the 'text-align-last' property'


"However, if the 'text-align' property is set to the value
'justify', the last line will be aligned to the start of the
inline progression."

The 'auto' value should allow the user agent to justify the last
line if it passes a threshold determined by the user agent.



4.4. Minimum and maximum font size: the 'min-font-size' and
'max-font-size' property


"Value: <font-size> | auto"
"Computed value: <font-size>"

The <font-size> value must be absolute.  What would
"min-font-size: smaller" mean?


"'auto' means that the user agent determine the minimum readable
font-size for the media."

Capitalize "auto".


"For example, a value is 9px is recommended for Latin scripts."

Change to "For example, a value of '9px' is recommended for the
Latin script."


"'auto' means that there is no limit."

Capitalize "auto".



4.5. Additional compression: The 'text-justify-trim' property


"the blank space within the character area itself may be reduced
without affecting the appearance of the glyph"

Change to "the blank space within the glyphs themselves may be
reduced without affecting the appearance of the filled parts of
glyphs".


"Character layout with punctuation and Kana compression"

Change "Character" to "Glyph".



4.6. Kashida effect: the 'text-kashida-space' property'


This property really wants to be called 'kashida-space'.


"Kashida is a typographic effect used in Arabic writing systems
that allows character elongation at some carefully chosen points
in Arabic."

Change to "Kashida is a typographic effect used in Arabic
writing systems that allows glyph elongation at some carefully
chosen points."


"This property can be used with any justification style where
kashida expansion is used (currently text-justify: auto,
kashida, distribute and newspaper)."

Change to "This property has a visible effect with any
justification style where kashida expansion is allowed
(currently 'text-justify' of 'auto', 'kashida', 'distribute' and
'newspaper')."



5. Indentation: the 'text-indent' property


We still lack a graceful way to achieve hanging indents.


"User agents should render this indentation as blank space."

This will be misinterpreted.  Change to "User agents should
render this indentation without any of the element's normally
positioned content."



6.1. Types of line breaking


"Finally, the Unicode character: U+200B ZERO WIDTH SPACE can be
inserted in such scripts to specify an explicit line breaking
opportunity."

Change to "To specify an explicit line breaking opportunity, the
character U+200B ZERO WIDTH SPACE can be inserted in documents
of Thai and similar scripts."


'A number of levels of line-breaking "strictness" can be used in
Japanese typography.'

Why is "strictness" in quote marks?


"In addition, hyphenation is controlled by 'word-break-inside'."

How does this relate to the XSL hyphenation model?


"All these properties are also available through the
'word-break' short hand property."

Change to "The 'word-break' shorthand property sets
'word-break-CJK' and 'word-break-inside'."  Move into a separate
paragraph.



6.2. Line breaking: the 'line-break' property


"it is recommended that breaks between small katakana and
hiragana characters be allowed"

Change "katakana and hiragana" to "kana".


"In Japanese, a set of line breaking restrictions is referred to
as "Kinsoku". JIS X-4051 [JIS-X-4051] is a popular source of
reference for this behavior using the strict set of rules. This
architecture involves character classification into line
breaking behavior classes. Those classes are then analyzed in a
two dimensional behavior table where each row-column position
represents a pair action to be taken at the occurrence of these
classes. For example, given a closing character class and an
opening character class, the intersection in that table of these
two classes (the first character belonging to the opening class
and the second belonging to the closing class) will indicate no
line breaking opportunity. The rules described by JIS X-4051
have been superseded by the Unicode Technical Report #14
mentioned earlier."

The majority of this paragraph appears superfluous.  Change to
the following and add a reference link for Unicode Technical
Report #14.

"In Japanese, a set of line breaking restrictions is referred to
as "Kinsoku". JIS X-4051 [JIS-X-4051] is a popular source of
reference for this behavior using the strict set of rules. The
rules described by JIS X-4051 have been superseded by the
Unicode Technical Report #14."




6.3. Word breaking: the 'word-break-CJK', 'word-break-inside'
properties and the shorthand 'word-break' property


"Keeps non-CJK scripts together (according to their own rules),
while Hangul and CJK (including the Korean Hanja characters)
break everywhere or according to the rules of the 'line-break'
mode."

Add "ideographs" after "Hangul and CJK".

What determines whether 'line-break' is obeyed?


"Same as 'normal' for CJK and Hangul"

Add "ideographs" after "CJK".


"CJK and Hangul are kept together. This option should only be
used in the context of CJK used in small clusters like in the
Korean writing system."

Add "ideographs" after both occurrences of "CJK".


"All word-break related properties are first reset to their
initial values (all 'normal')."

Change "All word-break related properties" to "The properties
'word-break-CJK' and 'word-break-inside'".  Link the property
names to the respective definitions.




7. Text Wrapping, White-space Control and Text Overflow


The focus on line feed (U+000A) as the only line break character
is specific to XML, to the detriment of CSS.  Choosing the word
"linefeed" for property names is one thing; a slight misnomer
can be accomodated.  Limiting implementation behavior to dealing
with line feed only is another thing, and a bad one at that.



7.1. Text wrapping: the 'wrap-option' property


"The best line-breaking opportunity is determined in priority by
the existence of preserved line-feed characters (U+000A), or by
the line-breaking algorithm controlled by the 'line-break' and
word-break' properties."

Change "'line-break' and word-break'" to "'line-break',
'word-break-CJK' and 'word-break-inside'".


"independently of 'line-break' and word-break' properties."

Change "'line-break' and word-break'" to "'line-break',
'word-break-CJK' and 'word-break-inside'".



7.2. White-space control: the 'linefeed-treatment',
'white-space-treatment', 'all-space-treatment' properties and
the 'white-space' shorthand property


"The white-space set is determined by the XML [XML1.0]
specification"

This binding to XML 1.0 works to the detriment of CSS, which
will have a hard time accomodating non-XML languages, or even
later revisions of XML.


"Line feed characters are rendered as one of the following
characters: a space character, a zero width space character
(U+200B), or no character (i.e. not rendered)."

Change to "Line feed characters are either rendered as a space
character (U+0020), rendered as a zero width space character
(U+200B), or not rendered."


"The choice of the resulting character is conditioned by the
script property of the characters preceding and following the
line feed character."

Add a reference to Unicode Technical Report #24, "Script Names",
after "property".


"A sequence of white space characters without any line feed
characters is rendered as a single space character."

Change to "A sequence of white-space characters without any line
feed characters is rendered as a single space character
(U+0020)."


"A sequence of white space characters with one or more line feed
character is rendered similarly to a single line feed
character."

Change to "A sequence of white-space characters with one or more
line feed characters is rendered as a single line feed
character."


"In determining how to convert a LINE FEED character a user
agent should consider the following cases, whereby the script of
characters on either side of the LINE FEED determines the choice
of the replacement."

Change to "In determining how to convert a line feed character,
a user agent should consider the following cases, whereby the
scripts of characters preceding and following the line feed
determine the choice of the replacement."


"If the characters preceding and following the LINE FEED
character belong to a script in which the SPACE character is
used as a word separator, the LINE FEED character should be
converted into a SPACE character."

Change to "If the characters preceding and following the line
feed character belong to a script in which the space character
(U+0020) is used as a word separator, the line feed character
should be converted into a space character."


"If none of the conditions in (1) through (3) are true, the LINE
FEED character should be converted into a SPACE character."

Change to "If none of the conditions in (1) through (3) are
true, the line feed character should be converted into a space
character (U+0020)."


"When white-space characters are collapsed for rendering
purpose, the style applied to the collapsed set is the one that
would be applied to first white-space character of the set."

Change to "When white-space characters are collapsed for
rendering purpose, the style applied to the replacement
character is the style that would be applied to first
white-space character of the original sequence."


"Linefeed characters are transformed for rendering purpose into
one of the following characters: a space character, a zero width
space character (U+200B), or no character (i.e. not rendered)."

Change to "The user agent either transforms each line feed
character to a space character (U+0020), transforms each line
feed character to a zero width space character (U+200B), or
removes the line feed characters."


"The choice of the resulting character is conditioned by the
script property of the characters preceding and following the
line feed character in the same line flow elements part of the
same block element."

Add a reference to the previously defined algorithm and clean up
the end of the sentence, which makes no sense.


"Linefeed characters are ignored. i.e. they are transformed for
rendering purpose into no character."

Change to "Line feed characters are ignored. They are removed
and are not rendered."


"White-space characters, when rendered as an advance width, use
the width of the space character (U+0020)."

Add "the glyph normally used for" before "the space".


"White space characters, except for linefeeds, are ignored. i.e.
they are transformed for rendering purpose into no character."

Change to "White-space characters, except for line feed
characters, are ignored. They are removed and are not rendered."


"All white space characters are rendered as intended (advance
width). The treatment of linefeeds is not determined by this
property."

Change to "White-space characters other than line feed are
rendered as they are (with advance width)."


"All white-space characters are rendered as intended."

Change to "All white-space characters are rendered as they are."


"The tab character (U+0009) is rendered as the smallest non-zero
number of spaces necessary to line characters up along tab stops
that are every 8 characters."

Change to "The tab character (U+0009) is rendered as the
smallest non-zero number of spaces necessary to reach or exceed
the next tab stop.  Tab stops occur in the inline progression
direction every at points corresponding to multiples of eight
times the width of the glyph normally used for space (U+0020)."


"The definition of the property values are established by
referring to the individual white-space properties set as
follows"

Change to "The definitions of the property values are
established by the following table, which shows the settings of
the constituent properties".



7.3. Text overflow: the 'text-overflow-mode',
'text-overflow-ellipsis' properties and the shorthand
'text-overflow' property


"Text overflow deals with the situation where some textual
content is clipped when it overflows the element's box in its
text advance direction as determined by the writing-mode
property value."

Change "text advance direction" to "inline progression
direction".


"This situation may only occur when the 'overflow' property has
the values: hidden, scroll and auto (in the latter case only
when the UA behavior results in content scrolling)."

Change to "This situation occurs only when the 'overflow'
property has the value 'hidden', 'scroll' or 'auto' (in the
latter case only when the user agent introduces a scrolling
mechanism).


"The hint is typically an ellipsis character "...", although the
actual character representation may vary. An image may also be
substituted. "

Change to "The hint is typically a horizontal ellipsis character
(U+2026), although the hint may be some other string or even an
image.


"If both hints should appear, only the 'after' hint is
rendered."

Change "should appear" to "are enabled."


"The text-overflow is divided in properties:
'text-overflow-mode' that controls the presentation of hint
characters, 'text-overflow-ellipsis' that controls the values of
the hint characters presented at the box boundaries and a
shorthand property: 'text-overflow'."

Change to "Control over text overflow is divided among
properties: 'text-overflow-mode' controls the presence and
position of the hint, while 'text-overflow-ellipsis' controls
what constitutes the hint.  The shorthand property
'text-overflow' sets the other text overflow properties."


"Name: text-overflow-mode"
"Applies to: all block-level elements"

What happens with inline-block elements?


"an ellipsis string is inserted at each box boundaries where a
text overflow occurs. The values of these ellipsis strings is
determined by the 'text-overflow-ellipsis' property."

Change to "A visual hint is inserted at each box boundary where
text overflow occurs. The 'text-overflow-ellipsis' property
determines the content of the hint."


"The insertions take place at the boundary of the last full
glyph representation of a line of text."

Please clarify.


"similar to 'ellipsis', but the insertions take place at the
boundary of the last full glyph representation of a word within
the line of text."

Change to "A visual hint is inserted at each box boundary where
text overflow occurs. The 'text-overflow-ellipsis' property
determines the content of the hint.  The insertions take place
after the last word that entirely fits on the line."


"The hint characters only replace textual information. If the
clipping occurs on a replaced element, standard clipping
occurs."

Change to "The overlfow hints are active only for textual
content. That is, the user agent must not render an overflow
hint when only replaced content overflows."


"will result on no ellipsis shown for its content (because it
has a specified width and furthermore the text wrapping occurs
in the 'hidden' overflow area of its parent element)."

Change to "will result in the absence of a hint overflow
(because the element has a specified width)."


"In other words, the text-overflow-mode only affects the textual
content of a block element which participate in its own inline
flow."

Please clarify.  What is a block element which participates in
its own inline flow?


"Name: text-overflow-ellipsis
Value: [<ellipsis-end> | <uri> [, <ellipsis-after> | <uri>]?]"

Why is the comma needed?  Change the production to
"<ellipsis>{1,2}".  Define <ellipsis> as [ <string> | <uri> ]. 
Change the following prose as appropriate.


"Applies to: all block-level elements"

What happens to inline-block elements?



8.1. Letter spacing: the 'letter-spacing' property


"This property specifies spacing behavior between text
characters."

Change "text characters" to "grapheme clusters" and add a
reference to Unicode Technical Report #29, "Text Boundaries".


"However, this value allows the user agent to alter the space
between characters in order to justify text."

Change "characters" to "grapheme clusters".


"This value indicates inter-character space in addition to the
default space between characters."

Change to "This value indicates spacing added between grapheme
clusters in addition to the default spacing between grapheme
clusters."


"The value is added to the advance width of each spacing
character (as opposed to combining character) or group of
characters that are clustered in single grapheme unit (like in
Thai, Khmer, etc.), including the last character of the element.
Characters which are joined together by effect of applying a
cursive font to them, or by standard typography rules (Arabic
script, Northern Indian scripts like Devanagari) have the valued
added to the normal advance width of each spacing characters.
Combining characters (not spacing) do not get any letter-spacing
effect, only the combination of the base character and its
combining characters does."

Eliminate all of this, as it is implied by the suggested prior
use of the term "grapheme cluster".


"For justification purposes, user agents should minimize effect
on letter-spacing as much as possible (priority to word-spacing
expansion/compression as opposed to character-spacing
expansion/compression)."

Change to "For justification purposes, user agents should
minimize alteration of spacing within words.  The priority
should be to alter spacing between words."


"The justification algorithm may further modify the
inter-character spacing, but only in text where there is no
other opportunities to distribute the extra spacing (such as
single word on a line, ideographic text)."

Change to "The justification algorithm may further modify the
spacing between grapheme clusters, but only in text (such as
single word on a line or ideographic text) where there is no
other opportunity to distribute the extra spacing."


"Because of the visual disruptive effect of modifying
letter-spacing on writing systems which use joined characters,
like for example Arabic, the usage of this property is
discouraged in those cases."

Change to "Because of the visually disruptive effect of
modifying this spacing in writing systems, such as Arabic, which
use joined glyphs, the usage of this property is discouraged in
those cases."


"There are cases like Japanese or Chinese writing systems where
justification will change all letter-spacing effects as there is
no other opportunity in the line to expand or compress the
character content in order to fit the line span."

Change to "There are cases, like in Japanese and Chinese writing
systems, where justification will change all spacing between
grapheme clusters, as there is no other opportunity in the line
to expand or compress the textual content in order to fit the
line."


"Character spacing algorithms are user agent-dependent. For
example, the spacing will not occur necessarily between all
characters, but instead between each glyph that constitutes
either a letter or a cluster unit."

Change to "The user agent determines the exact algorithm for
spacing between grapheme clusters."


"Furthermore this property should not be used for scripts and/or
fonts that link characters together (cursive fonts for Roman
scripts, all Arabic cases, Indic scripts with headline like
Devanagari, etc...). Character spacing may also be influenced by
justification (see the 'text-align' property)."

Change to "Furthermore, this property should not be set to a
<length> for scripts and/or fonts that ligate glyphs with
connecting strokes; such scripts and fonts include cursive Latin
fonts, Arabic, and Devanagari. Spacing between grapheme clusters
may also be influenced by justification (see the 'text-align'
property)."


"In this example, the space between characters in blockquote
elements is increased by '0.1em'."

Change "characters" to "grapheme clusters".


"In the following example, the user agent is requested not to
alter inter-character space"

Change "spacing within words".


"When the resultant space between two characters is not the same
as the default space, user agents should not use ligatures."

Change to "When the resultant spacing is not the default, user
agents should not use ligatures."



8.2. Word spacing: the 'word-spacing' property


"If there are no characters, the user agent doesn't have to
create an additional character advance width."


"There is no inter-word space. All white-space characters are
treated like zero-length characters."

Change to "Word-separating white-space characters are rendered
with a width of zero."

Change to "If there are no word-separating characters, the user
agent doesn't have to create an additional advance width between
words."


"Determining word boundary is typically done by detecting white
space characters. There are however many scripts and writing
systems that do not separate their words by any character (like
Japanese, Chinese, Thai, etc...), detecting word boundaries in
these cases require dictionary based algorithms that may not be
supported by all user agents."

Change to "Determining word boundaries is typically done by
detecting white-space characters. There are, however, many
scripts and writing systems that do not separate their words by
any character; such scripts and writing systems include
Japanese, Chinese, and Thai. Detecting word boundaries in these
systems requires dictionary-based algorithms that user agents
may choose not to support."



[Sections after section 8.2 could not be reviewed before the
deadline.]
Received on Wednesday, 27 November 2002 20:00:41 UTC