Re: [css-text] Control characters

On 06/27/2014 05:27 AM, Jonathan Kew wrote:
> On 27/6/14 09:49, Koji Ishii wrote:
>
>>> Of course, you still need to define how those control characters
>>> are rendered, erroneous or not.
>>
>> Yes, this is the text we have now[1]. Your quick review is invaluable
>> for us, please let us know if any.
>>
>>> Control characters (Unicode class Cc) other than tab (U+0009), line
>>> feed (U+000A), and carriage return (U+000D) are ignored for the
>>> purpose of rendering. (As required by [UNICODE], unsupported
>>> Default_ignorable characters must also be ignored for rendering.)
>
> IMO, it would be better to require the presence of spurious control
> characters (i.e. other than tab, linefeed, return) to be rendered
> visibly - e.g. as "hexbox" glyphs or inverse-colored ^X sequences -
> rather than ignored.
>
> The presence of such characters within the text degrades functionality
> by interfering with operations such as search, indexing, copy/paste to
> other environments, etc. Their presence is typically the result of
> broken authoring tools/workflows, but as long as browsers ignore them
> for rendering, authors generally remain unaware that their data is bad,
> and readers will usually be unaware that their searches, etc., may be
> missing content they would have expected to match.
>
> I realize that making stray control characters visible will result in
> some pages (containing bad text) looking "worse" from an aesthetic
> point of view, but I don't believe this is such a widespread and
> serious problem that we should give up the battle and accept that the
> Web will forever hide these errors and leave the problem of polluted
> data unaddressed. If browser vendors would agree to make the CCs
> visible, and include this in the relevant specs, there'll be a spate
> of bug reports - as we've seen when we had them rendered as hexboxes
> in Firefox - but these can be redirected to the sites/authors concerned,
> and there will be significant pressure on authors and tool vendors to
> fix the underlying problems.
>
> Although there'd no doubt be some short-term discontent, I think this
> would be significantly better for the long-term health of the web.
> Our concern should not -only- be to optimize the display of (a small
> minority of badly-authored) web pages of today; we should also be
> concerned for the quality and usability of web data in the future.

Thanks for your comments and concerns. The CSSWG has reviewed this issue,
and, after reviewing also other implementors' feedback, has resolved to
make this change.

The minutes to the resolution are here:
   http://lists.w3.org/Archives/Public/www-style/2014Oct/0259.html

The change has been checked into the Editor's Draft and should make its
way to /TR shortly. The new text reads:

   # Control characters (Unicode category Cc) other than tab (U+0009),
   # line feed (U+000A), and carriage return (U+000D) must be rendered
   # as a visible glyph and otherwise treated as any other character
   # of the Other Symbols (So) general category and Common script.
   # The UA may use a glyph provided by a font specifically for the
   # control character, substitute the glyphs provided for the
   # corresponding symbol in the Control Pictures block, generate a
   # visual representation of its codepoint value, or use some other
   # method to provide an appropriate visible glyph. As required by
   # [UNICODE], unsupported Default_ignorable characters must be
   # ignored for rendering.

Let us know if there are any errors or if you have further suggestions
for improvement.

Thanks!

~fantasai

Received on Thursday, 23 October 2014 00:49:00 UTC