Re: [css-text] Control characters

On 27/6/14 09:49, Koji Ishii wrote:

>> Of course, you still need to define how those control characters
>> are rendered, erroneous or not.
>
> Yes, this is the text we have now[1]. Your quick review is invaluable
> for us, please let us know if any.
>
>> Control characters (Unicode class Cc) other than tab (U+0009), line
>> feed (U+000A), and carriage return (U+000D) are ignored for the
>> purpose of rendering. (As required by [UNICODE], unsupported
>> Default_ignorable characters must also be ignored for rendering.)

IMO, it would be better to require the presence of spurious control 
characters (i.e. other than tab, linefeed, return) to be rendered 
visibly - e.g. as "hexbox" glyphs or inverse-colored ^X sequences - 
rather than ignored.

The presence of such characters within the text degrades functionality 
by interfering with operations such as search, indexing, copy/paste to 
other environments, etc. Their presence is typically the result of 
broken authoring tools/workflows, but as long as browsers ignore them 
for rendering, authors generally remain unaware that their data is bad, 
and readers will usually be unaware that their searches, etc., may be 
missing content they would have expected to match.

I realize that making stray control characters visible will result in 
some pages (containing bad text) looking "worse" from an aesthetic point 
of view, but I don't believe this is such a widespread and serious 
problem that we should give up the battle and accept that the Web will 
forever hide these errors and leave the problem of polluted data 
unaddressed. If browser vendors would agree to make the CCs visible, and 
include this in the relevant specs, there'll be a spate of bug reports - 
as we've seen when we had them rendered as hexboxes in Firefox - but 
these can be redirected to the sites/authors concerned, and there will 
be significant pressure on authors and tool vendors to fix the 
underlying problems.

Although there'd no doubt be some short-term discontent, I think this 
would be significantly better for the long-term health of the web. Our 
concern should not -only- be to optimize the display of (a small 
minority of badly-authored) web pages of today; we should also be 
concerned for the quality and usability of web data in the future.

JK

Received on Friday, 27 June 2014 09:27:42 UTC