Brief   Full   Jump  

Small
Medium
Large

Teal
High contrast
Bluish
Black

Sans-serif
Serif
Monospaced
Close
d
?
Styles

[css-text] Control characters

60 messages.

[css-text] Control characters
James Clark   Thu, 20 Mar 2014 10:00:53 +0700

www-style > March 2014 > 0000.html

Received on Thursday, 20 March 2014 03:01:41 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org, www-style@w3.org.

CSS Text says: Control characters (Unicode class Cc) other than tab (U+0009), line feed > (U+000A), and carriage return (U+000D) are ignored for the purpose of > rendering. (This is a change from CSS 2.1, which says they are rendered as usual.) I was wondering what the thinking is here. This requirement conflicts with Unicode (see http://www.unicode.org/faq/unsup_char.html) in a couple of ways: 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB (VT), 0xC (FF) and 0x85 (NEL) the White_Space property. Characters with the White_Space property are supposed to be rendered as a visible but blank space. (Of these, HTML includes only 0xC as a space character.) 2. Other control characters are supposed to be rendered normally (ie displayed with a missing glyph if not available in the font). James
Re: [css-text] Control characters
"Robert O'Callahan"   Thu, 20 Mar 2014 12:57:47 +0800

www-style > March 2014 > 0000.html

Received on Thursday, 20 March 2014 04:58:24 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jjc@jclark.com
Copied to: www-style@w3.org, www-style@w3.org, jfkthame@gmail.com.

On Thu, Mar 20, 2014 at 11:00 AM, James Clark <jjc@jclark.com> wrote: > CSS Text says: > > Control characters (Unicode class Cc) other than tab (U+0009), line feed >> (U+000A), and carriage return (U+000D) are ignored for the purpose of >> rendering. > > > (This is a change from CSS 2.1, which says they are rendered as usual.) I > was wondering what the thinking is here. This requirement conflicts with > Unicode (see http://www.unicode.org/faq/unsup_char.html) in a couple of > ways: > > 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB (VT), 0xC > (FF) and 0x85 (NEL) the White_Space property. Characters with the > White_Space property are supposed to be rendered as a visible but blank > space. (Of these, HTML includes only 0xC as a space character.) > > 2. Other control characters are supposed to be rendered normally (ie > displayed with a missing glyph if not available in the font). > We had a discussion about this a while back within Mozilla; some people like the idea of displaying control characters so that such 'soft errors' in pages can be more easily detected and fixed. We ended up defining an internal CSS property '-moz-control-character-visibility:visible|hidden', with initial value hidden, but we set it to visible for devtools, plain text files, the contents of text inputs, view-source, etc. We could easily standardize that if other people are interested. Rob -- Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w
Re: [css-text] Control characters
Jonathan Kew   Thu, 20 Mar 2014 14:10:52 +0000

www-style > March 2014 > 0000.html

Received on Thursday, 20 March 2014 16:00:39 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org, www-style@w3.org
Copied to: robert@ocallahan.org, jjc@jclark.com.

On 20/3/14 04:57, Robert O'Callahan wrote: > On Thu, Mar 20, 2014 at 11:00 AM, James Clark <jjc@jclark.com > <mailto:jjc@jclark.com>> wrote: > > CSS Text says: > > Control characters (Unicode class Cc) other than tab (U+0009), > line feed (U+000A), and carriage return (U+000D) are ignored for > the purpose of rendering. > > > (This is a change from CSS 2.1, which says they are rendered as > usual.) I was wondering what the thinking is here. This requirement > conflicts with Unicode (see > http://www.unicode.org/faq/unsup_char.html) in a couple of ways: > > 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB > (VT), 0xC (FF) and 0x85 (NEL) the White_Space property. Characters > with the White_Space property are supposed to be rendered as a > visible but blank space. (Of these, HTML includes only 0xC as a > space character.) > > 2. Other control characters are supposed to be rendered normally (ie > displayed with a missing glyph if not available in the font). > > > We had a discussion about this a while back within Mozilla; some people > like the idea of displaying control characters so that such 'soft > errors' in pages can be more easily detected and fixed. > > We ended up defining an internal CSS property > '-moz-control-character-visibility:visible|hidden', with initial value > hidden, but we set it to visible for devtools, plain text files, the > contents of text inputs, view-source, etc. We could easily standardize > that if other people are interested. For some further discussion, see comments (arguing both for and against such a change) in relevant mozilla bugs, such as: https://bugzilla.mozilla.org/show_bug.cgi?id=757521 https://bugzilla.mozilla.org/show_bug.cgi?id=909344 https://bugzilla.mozilla.org/show_bug.cgi?id=947588 https://bugzilla.mozilla.org/show_bug.cgi?id=963252 JK
Re: [css-text] Control characters
Zack Weinberg   Thu, 20 Mar 2014 17:11:15 -0400

www-style > March 2014 > 0000.html

Received on Thursday, 20 March 2014 21:11:38 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jjc@jclark.com
Copied to: www-style@w3.org, www-style@w3.org.

On Wed, Mar 19, 2014 at 11:00 PM, James Clark <jjc@jclark.com> wrote: I have no opinion on most of this, but ... > 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB (VT), 0xC > (FF) and 0x85 (NEL) the White_Space property. Characters with the > White_Space property are supposed to be rendered as a visible but blank > space. (Of these, HTML includes only 0xC as a space character.) For compatibility with legacy content naively converted to UTF-n, U+0085 (and, indeed, the entire C1 controls block) need to be interpreted as graphic characters per Windows-1252, instead of as control characters. zw
Re: [css-text] Control characters
James Clark   Fri, 21 Mar 2014 08:55:00 +0700

www-style > March 2014 > 0000.html

Received on Friday, 21 March 2014 01:55:48 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: www-style@w3.org, www-style@w3.org, robert@ocallahan.org.

After reading those mozilla bugs, and thinking some more, I suggest the following: 1. Render control characters U+0080-U+009F normally (ie show boxes if there is no available glyph). 2. Treat U+000C (form feed), in addition to U+0009, U+000A and U+000D, as whitespace. 3. Ignore other control characters for the purposes of rendering (as in the current spec) Reasoning: 1. The most likely reason for a document containing C1 control characters is that they are left over from conversion from one of the Windows 8-bit legacy encodings. Note that HTML treats numeric character references to chars in this range specially [1]. This is a deviation from Unicode, which requires an U+0085 to be rendered as blank space, if there is no available glyph; however, U+0085 as a whitespace character (NEL) typically only results from a conversion from EBCDIC, which is almost certainly much less common than Windows legacy case. 2. HTML [2] and Unicode both treat form feed as a whitespace character. It is also still occasionally used as a whitespace character in real-life" for example, GNU Emacs has a set of commands that work on "pages", which by default are separated by form feeds (eg C-x [ and C-x ] will move backwards and forwards by pages); formatted ASCII output uses form feed to separate pages. Unicode also treats U+000B (vertical tab) as white space, as does JavaScript; HTML doesn't (although it does treat it slightly differently from other control characters [3]). However, I have never seen U+000B intentionally used as whitespace. 3. Other control characters with code points less U+0020 are more likely to be random crap, which the user won't be helped by showing (though it would be useful to show them in some contexts such as view-source). [1] http://www.w3.org/html/wg/drafts/html/master/single-page.html#tokenizing-character-references [2] http://www.w3.org/html/wg/drafts/html/master/single-page.html#space-character [3] http://www.w3.org/html/wg/drafts/html/master/single-page.html#preprocessing-the-input-stream James On Thu, Mar 20, 2014 at 9:10 PM, Jonathan Kew <jfkthame@gmail.com> wrote: > On 20/3/14 04:57, Robert O'Callahan wrote: > >> On Thu, Mar 20, 2014 at 11:00 AM, James Clark <jjc@jclark.com >> <mailto:jjc@jclark.com>> wrote: >> >> CSS Text says: >> >> Control characters (Unicode class Cc) other than tab (U+0009), >> line feed (U+000A), and carriage return (U+000D) are ignored for >> the purpose of rendering. >> >> >> (This is a change from CSS 2.1, which says they are rendered as >> usual.) I was wondering what the thinking is here. This requirement >> conflicts with Unicode (see >> http://www.unicode.org/faq/unsup_char.html) in a couple of ways: >> >> 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB >> (VT), 0xC (FF) and 0x85 (NEL) the White_Space property. Characters >> with the White_Space property are supposed to be rendered as a >> visible but blank space. (Of these, HTML includes only 0xC as a >> space character.) >> >> 2. Other control characters are supposed to be rendered normally (ie >> displayed with a missing glyph if not available in the font). >> >> >> We had a discussion about this a while back within Mozilla; some people >> like the idea of displaying control characters so that such 'soft >> errors' in pages can be more easily detected and fixed. >> >> We ended up defining an internal CSS property >> '-moz-control-character-visibility:visible|hidden', with initial value >> hidden, but we set it to visible for devtools, plain text files, the >> contents of text inputs, view-source, etc. We could easily standardize >> that if other people are interested. >> > > For some further discussion, see comments (arguing both for and against > such a change) in relevant mozilla bugs, such as: > > https://bugzilla.mozilla.org/show_bug.cgi?id=757521 > https://bugzilla.mozilla.org/show_bug.cgi?id=909344 > https://bugzilla.mozilla.org/show_bug.cgi?id=947588 > https://bugzilla.mozilla.org/show_bug.cgi?id=963252 > > JK > >
Re: [css-text] Control characters
fantasai   Sat, 10 May 2014 15:07:58 -0700

www-style > May 2014 > 0000.html

Received on Saturday, 10 May 2014 22:08:31 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: zackw@panix.com
Copied to: www-style@w3.org, www-style@w3.org, annevk@annevk.nl.

On 03/20/2014 02:11 PM, Zack Weinberg wrote: > On Wed, Mar 19, 2014 at 11:00 PM, James Clark <jjc@jclark.com> wrote: > > I have no opinion on most of this, but ... > >> 1. In addition to 0x9, 0xA and 0xD, Unicode gives characters 0xB (VT), 0xC >> (FF) and 0x85 (NEL) the White_Space property. Characters with the >> White_Space property are supposed to be rendered as a visible but blank >> space. (Of these, HTML includes only 0xC as a space character.) > > For compatibility with legacy content naively converted to UTF-n, > U+0085 (and, indeed, the entire C1 controls block) need to be > interpreted as graphic characters per Windows-1252, instead of as > control characters. Should this be handled at the render layer or at the encoding layer? (The latter is only possible of course if there are Unicode equivalents.) ~fantasai
Re: [css-text] Control characters
Zack Weinberg   Sat, 10 May 2014 18:17:01 -0400

www-style > May 2014 > 0000.html

Received on Saturday, 10 May 2014 22:17:23 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: fantasai.lists@inkedblade.net
Copied to: www-style@w3.org, www-style@w3.org, annevk@annevk.nl.

On Sat, May 10, 2014 at 6:07 PM, fantasai <fantasai.lists@inkedblade.net> wrote: > On 03/20/2014 02:11 PM, Zack Weinberg wrote: >> For compatibility with legacy content naively converted to UTF-n, >> U+0085 (and, indeed, the entire C1 controls block) need to be >> interpreted as graphic characters per Windows-1252, instead of as >> control characters. > > Should this be handled at the render layer or at the encoding layer? I believe HTML5 does this at the encoding layer, so we should do the same. (Also, as far as I know, all characters encoded by Windows-1252 but not ISO-8859-1 do have Unicode equivalents.) zw
Re: [css-text] Control characters
Koji Ishii   Fri, 27 Jun 2014 04:30:04 +0000

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 04:30:41 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: zackw@panix.com
Copied to: fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org, annevk@annevk.nl.

I did some tests by inserting control characters into DOM directly, so that we could split encoding issues from rendering issues. The test is here[1] if you want to see by yourself, but in short, 0x80-0x9F is handled at encoding layer and therefore we don’t have to worry about in CSS Text. A few issues were found by the test, but it’s a different topic. So I think we do not require any changes to the spec on this point. Please let us know if any. In case you’re interested in the test results, in terms of rendering: * IE11 does not render Cc except 0x0B. * Firefox (Win/Mac), Chrome (Win/Mac), Safari does not render Cc at all. In terms of letter-spacing: * IE11 and Firefox (Win/Mac) applies letter spacing to Cc. * Chrome (Win/Mac), Safari does not apply letter spacing to Cc. [1] http://jsbin.com/quciq/ /koji On May 11, 2014, at 7:17, Zack Weinberg <zackw@panix.com> wrote: > On Sat, May 10, 2014 at 6:07 PM, fantasai <fantasai.lists@inkedblade.net> wrote: >> On 03/20/2014 02:11 PM, Zack Weinberg wrote: >>> For compatibility with legacy content naively converted to UTF-n, >>> U+0085 (and, indeed, the entire C1 controls block) need to be >>> interpreted as graphic characters per Windows-1252, instead of as >>> control characters. >> >> Should this be handled at the render layer or at the encoding layer? > > I believe HTML5 does this at the encoding layer, so we should do the > same. (Also, as far as I know, all characters encoded by Windows-1252 > but not ISO-8859-1 do have Unicode equivalents.) > > zw >
Re: [css-text] Control characters
Anne van Kesteren   Fri, 27 Jun 2014 08:00:14 +0200

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 06:00:42 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: kojiishi@gluesoft.co.jp
Copied to: zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On Fri, Jun 27, 2014 at 6:30 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: > [...], but in short, 0x80-0x9F is handled at encoding layer and therefore we don’t have to worry about in CSS Text. I don't follow this. U+0080 to U+009F are in Unicode and can be present in a document. Quite trivially so with utf-8. It's correct that windows-1252 would map the bytes of the same number to different code points, but I'm not sure how that would affect CSS. -- http://annevankesteren.nl/
Re: [css-text] Control characters
Koji Ishii   Fri, 27 Jun 2014 06:22:55 +0000

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 06:23:27 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: annevk@annevk.nl
Copied to: zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

>> [...], but in short, 0x80-0x9F is handled at encoding layer and therefore we don’t have to worry about in CSS Text. > > I don't follow this. U+0080 to U+009F are in Unicode and can be > present in a document. Quite trivially so with utf-8. It's correct > that windows-1252 would map the bytes of the same number to different > code points, but I'm not sure how that would affect CSS. There was a proposal to interpret C1 characters as graphics characters[1][2]. What I meant was to reject this proposal because the mapping is done at encoding layer, and once Unicode C1 characters appear in the DOM, they should be handled as control characters. [1] http://lists.w3.org/Archives/Public/www-style/2014Mar/0490.html [2] http://lists.w3.org/Archives/Public/www-style/2014Mar/0501.html /koji
Re: [css-text] Control characters
Anne van Kesteren   Fri, 27 Jun 2014 08:52:15 +0200

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 06:52:42 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: kojiishi@gluesoft.co.jp
Copied to: zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On Fri, Jun 27, 2014 at 8:22 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: > [...] once Unicode C1 characters appear in the DOM, they should be handled as control characters. Sounds like we are on the same page then. Of course, you still need to define how those control characters are rendered, erroneous or not. -- http://annevankesteren.nl/
Re: [css-text] Control characters
Koji Ishii   Fri, 27 Jun 2014 08:49:59 +0000

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 08:50:31 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: annevk@annevk.nl
Copied to: zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

>> [...] once Unicode C1 characters appear in the DOM, they should be handled as control characters. > > Sounds like we are on the same page then. Thank you for the confirmation. > Of course, you still need to > define how those control characters are rendered, erroneous or not. Yes, this is the text we have now[1]. Your quick review is invaluable for us, please let us know if any. > Control characters (Unicode class Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) are ignored for the purpose of rendering. (As required by [UNICODE], unsupported Default_ignorable characters must also be ignored for rendering.) [1] http://dev.w3.org/csswg/css-text/#white-space-processing /koji
Re: [css-text] Control characters
Jonathan Kew   Fri, 27 Jun 2014 10:27:13 +0100

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 09:27:42 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: kojiishi@gluesoft.co.jp, annevk@annevk.nl
Copied to: zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On 27/6/14 09:49, Koji Ishii wrote: >> Of course, you still need to define how those control characters >> are rendered, erroneous or not. > > Yes, this is the text we have now[1]. Your quick review is invaluable > for us, please let us know if any. > >> Control characters (Unicode class Cc) other than tab (U+0009), line >> feed (U+000A), and carriage return (U+000D) are ignored for the >> purpose of rendering. (As required by [UNICODE], unsupported >> Default_ignorable characters must also be ignored for rendering.) IMO, it would be better to require the presence of spurious control characters (i.e. other than tab, linefeed, return) to be rendered visibly - e.g. as "hexbox" glyphs or inverse-colored ^X sequences - rather than ignored. The presence of such characters within the text degrades functionality by interfering with operations such as search, indexing, copy/paste to other environments, etc. Their presence is typically the result of broken authoring tools/workflows, but as long as browsers ignore them for rendering, authors generally remain unaware that their data is bad, and readers will usually be unaware that their searches, etc., may be missing content they would have expected to match. I realize that making stray control characters visible will result in some pages (containing bad text) looking "worse" from an aesthetic point of view, but I don't believe this is such a widespread and serious problem that we should give up the battle and accept that the Web will forever hide these errors and leave the problem of polluted data unaddressed. If browser vendors would agree to make the CCs visible, and include this in the relevant specs, there'll be a spate of bug reports - as we've seen when we had them rendered as hexboxes in Firefox - but these can be redirected to the sites/authors concerned, and there will be significant pressure on authors and tool vendors to fix the underlying problems. Although there'd no doubt be some short-term discontent, I think this would be significantly better for the long-term health of the web. Our concern should not -only- be to optimize the display of (a small minority of badly-authored) web pages of today; we should also be concerned for the quality and usability of web data in the future. JK
Re: [css-text] Control characters
Anne van Kesteren   Fri, 27 Jun 2014 11:54:55 +0200

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 09:55:23 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: kojiishi@gluesoft.co.jp
Copied to: zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On Fri, Jun 27, 2014 at 10:49 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: > Yes, this is the text we have now[1]. Your quick review is invaluable for us, please let us know if any. > >> Control characters (Unicode class Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) are ignored for the purpose of rendering. (As required by [UNICODE], unsupported Default_ignorable characters must also be ignored for rendering.) Yeah, something like that is fine. Although your specification claims to use RFC 2119 so you might want to make that a MUST. However, explicitly requiring a hexbox as Jonathan wants is also fine, as long as we end up with one or the other being required, and stop allowing both. I mostly care about this finally being defined :-) > [1] http://dev.w3.org/csswg/css-text/#white-space-processing -- http://annevankesteren.nl/
Re: [css-text] Control characters
Brad Kemper   Fri, 27 Jun 2014 08:18:46 -0700

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 15:19:16 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: kojiishi@gluesoft.co.jp, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

> On Jun 27, 2014, at 2:27 AM, Jonathan Kew <jfkthame@gmail.com> wrote: > > On 27/6/14 09:49, Koji Ishii wrote: > >>> Of course, you still need to define how those control characters >>> are rendered, erroneous or not. >> >> Yes, this is the text we have now[1]. Your quick review is invaluable >> for us, please let us know if any. >> >>> Control characters (Unicode class Cc) other than tab (U+0009), line >>> feed (U+000A), and carriage return (U+000D) are ignored for the >>> purpose of rendering. (As required by [UNICODE], unsupported >>> Default_ignorable characters must also be ignored for rendering.) > > IMO, it would be better to require the presence of spurious control characters (i.e. other than tab, linefeed, return) to be rendered visibly - e.g. as "hexbox" glyphs or inverse-colored ^X sequences - rather than ignored. > > The presence of such characters within the text degrades functionality by interfering with operations such as search, indexing, copy/paste to other environments, etc. Their presence is typically the result of broken authoring tools/workflows, but as long as browsers ignore them for rendering, authors generally remain unaware that their data is bad, and readers will usually be unaware that their searches, etc., may be missing content they would have expected to match. > > I realize that making stray control characters visible will result in some pages (containing bad text) looking "worse" from an aesthetic point of view, but I don't believe this is such a widespread and serious problem that we should give up the battle and accept that the Web will forever hide these errors and leave the problem of polluted data unaddressed. If browser vendors would agree to make the CCs visible, and include this in the relevant specs, there'll be a spate of bug reports - as we've seen when we had them rendered as hexboxes in Firefox - but these can be redirected to the sites/authors concerned, and there will be significant pressure on authors and tool vendors to fix the underlying problems. > > Although there'd no doubt be some short-term discontent, I think this would be significantly better for the long-term health of the web. Our concern should not -only- be to optimize the display of (a small minority of badly-authored) web pages of today; we should also be concerned for the quality and usability of web data in the future. I disagree with the notion that we should use ugly and confusing rendering of unintentional characters as a weapon for punishing/scolding authors. If UNICODE says the characters should be ignored, then let's ignore them, and don't render them. It is not our place to use the threat bad rendering to coerce authors into fixing or preventing encoding errors. We should be forgiving of the problems, instead of trying to make them worse.
Re: [css-text] Control characters
Jonathan Kew   Fri, 27 Jun 2014 16:50:01 +0100

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 15:50:27 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: brad.kemper@gmail.com
Copied to: kojiishi@gluesoft.co.jp, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On 27/6/14 16:18, Brad Kemper wrote: > > >> On Jun 27, 2014, at 2:27 AM, Jonathan Kew <jfkthame@gmail.com> >> wrote: >> >> On 27/6/14 09:49, Koji Ishii wrote: >> >>>> Of course, you still need to define how those control >>>> characters are rendered, erroneous or not. >>> >>> Yes, this is the text we have now[1]. Your quick review is >>> invaluable for us, please let us know if any. >>> >>>> Control characters (Unicode class Cc) other than tab (U+0009), >>>> line feed (U+000A), and carriage return (U+000D) are ignored >>>> for the purpose of rendering. (As required by [UNICODE], >>>> unsupported Default_ignorable characters must also be ignored >>>> for rendering.) >> >> IMO, it would be better to require the presence of spurious control >> characters (i.e. other than tab, linefeed, return) to be rendered >> visibly - e.g. as "hexbox" glyphs or inverse-colored ^X sequences - >> rather than ignored. >> >> The presence of such characters within the text degrades >> functionality by interfering with operations such as search, >> indexing, copy/paste to other environments, etc. Their presence is >> typically the result of broken authoring tools/workflows, but as >> long as browsers ignore them for rendering, authors generally >> remain unaware that their data is bad, and readers will usually be >> unaware that their searches, etc., may be missing content they >> would have expected to match. >> >> I realize that making stray control characters visible will result >> in some pages (containing bad text) looking "worse" from an >> aesthetic point of view, but I don't believe this is such a >> widespread and serious problem that we should give up the battle >> and accept that the Web will forever hide these errors and leave >> the problem of polluted data unaddressed. If browser vendors would >> agree to make the CCs visible, and include this in the relevant >> specs, there'll be a spate of bug reports - as we've seen when we >> had them rendered as hexboxes in Firefox - but these can be >> redirected to the sites/authors concerned, and there will be >> significant pressure on authors and tool vendors to fix the >> underlying problems. >> >> Although there'd no doubt be some short-term discontent, I think >> this would be significantly better for the long-term health of the >> web. Our concern should not -only- be to optimize the display of (a >> small minority of badly-authored) web pages of today; we should >> also be concerned for the quality and usability of web data in the >> future. > > I disagree with the notion that we should use ugly and confusing > rendering Then create beautiful and clear glyphs for them! :) > of unintentional characters as a weapon for > punishing/scolding authors. What is "ugly and confusing", IMO, is when browsers display the data <U+0048 U+0001 U+0065 U+0002 U+006C U+0003 U+006C U+0004 U+006F> such that it appears to read "Hello", yet when a user searches for the string "Hello" they'll fail to find it; it will be indexed separately; it will be mangled by screen-readers; etc., etc. > If UNICODE says the characters should be > ignored, then let's ignore them, and don't render them. Control characters are NOT considered default-ignorable in Unicode. If you search for Default_Ignorable_Code_Point in http://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt, you'll see that neither C0 nor C1 controls are included. > It is not our > place to use the threat bad rendering to coerce authors into fixing > or preventing encoding errors. We should be forgiving of the > problems, instead of trying to make them worse. This isn't "trying to make them worse". It's trying to encourage and facilitate the creation of cleaner data by making irregularities visible. JK
Re: [css-text] Control characters
"Tab Atkins Jr."   Fri, 27 Jun 2014 10:55:12 -0700

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 17:55:59 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: brad.kemper@gmail.com, kojiishi@gluesoft.co.jp, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On Fri, Jun 27, 2014 at 8:50 AM, Jonathan Kew <jfkthame@gmail.com> wrote: > What is "ugly and confusing", IMO, is when browsers display the data > > <U+0048 U+0001 U+0065 U+0002 U+006C U+0003 U+006C U+0004 U+006F> > > such that it appears to read "Hello", yet when a user searches for the > string "Hello" they'll fail to find it; it will be indexed separately; it > will be mangled by screen-readers; etc., etc. The same happens with a bunch of invisible non-control characters, though. Slip a ZWNJ somewhere in there and you'll get the same effect. You might have a consistent policy about these things that dictates that the control characters are bad but other invisible characters are fine, though. ~TJ
Re: [css-text] Control characters
Jonathan Kew   Fri, 27 Jun 2014 21:57:19 +0100

www-style > June 2014 > 0000.html

Received on Friday, 27 June 2014 20:57:43 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jackalmage@gmail.com
Copied to: brad.kemper@gmail.com, kojiishi@gluesoft.co.jp, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org, www-style@w3.org.

On 27/6/14 18:55, Tab Atkins Jr. wrote: > On Fri, Jun 27, 2014 at 8:50 AM, Jonathan Kew <jfkthame@gmail.com> wrote: >> What is "ugly and confusing", IMO, is when browsers display the data >> >> <U+0048 U+0001 U+0065 U+0002 U+006C U+0003 U+006C U+0004 U+006F> >> >> such that it appears to read "Hello", yet when a user searches for the >> string "Hello" they'll fail to find it; it will be indexed separately; it >> will be mangled by screen-readers; etc., etc. > > The same happens with a bunch of invisible non-control characters, > though. Slip a ZWNJ somewhere in there and you'll get the same > effect. That's not necessarily true. ZWNJ (and a number of other normally-invisible characters) are defined to be "default ignorable", so processes such as searching that base their behavior on Unicode character properties should be able to ignore them appropriately. And there are legitimate uses for ZWNJ as part of encoded text, and (some of the time) it'll visibly affect rendering in specific, desired ways. There are, of course, plenty of cases where authors can use valid content (lіkе thіѕ, perhaps) in confusing ways; we can't really do much about that. But the C0/C1 control characters - apart from a few exceptions like newline - do not have any legitimate use as part of text on the web; their defined control functions such as <start of text> or <end of transmission block> are provided by entirely different levels of the platform. > > You might have a consistent policy about these things that dictates > that the control characters are bad but other invisible characters are > fine, though. Indeed. Other invisible characters are encoded because they have specific roles to play in representing text, such as controlling directionality (OK, although other HTML/CSS approaches may be preferable), joining behavior, etc. The control characters are bad, except those whose control function is actually relevant within the web platform. JK
Re: [css-text] Control characters
Brad Kemper   Sat, 28 Jun 2014 21:33:42 -0700

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 04:34:14 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: jackalmage@gmail.com, kojiishi@gluesoft.co.jp, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

On Jun 27, 2014, at 1:57 PM, Jonathan Kew <jfkthame@gmail.com> wrote: > That's not necessarily true. ZWNJ (and a number of other normally-invisible characters) are defined to be "default ignorable", so processes such as searching that base their behavior on Unicode character properties should be able to ignore them appropriately. > > [...] > But the C0/C1 control characters - apart from a few exceptions like newline - do not have any legitimate use as part of text on the web; their defined control functions such as <start of text> or <end of transmission block> are provided by entirely different levels of the platform. Then why not have the control characters ignored when searching for text too?
Re: [css-text] Control characters
Jonathan Kew   Sun, 29 Jun 2014 08:11:02 +0100

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 07:11:24 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: brad.kemper@gmail.com
Copied to: jackalmage@gmail.com, kojiishi@gluesoft.co.jp, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

On 29/6/14 05:33, Brad Kemper wrote: > > On Jun 27, 2014, at 1:57 PM, Jonathan Kew <jfkthame@gmail.com> > wrote: > >> That's not necessarily true. ZWNJ (and a number of other >> normally-invisible characters) are defined to be "default >> ignorable", so processes such as searching that base their behavior >> on Unicode character properties should be able to ignore them >> appropriately. >> >> [...] > >> But the C0/C1 control characters - apart from a few exceptions like >> newline - do not have any legitimate use as part of text on the >> web; their defined control functions such as <start of text> or >> <end of transmission block> are provided by entirely different >> levels of the platform. > > Then why not have the control characters ignored when searching for > text too? They don't have the default-ignorable property. Now, I suppose we could specify (somewhere - though I don't see how this would fall within the scope of CSS) that text processes such as searching, sorting, indexing, etc., within the web platform should base their behavior *not* on the (normative) Unicode character properties, but on something else that we specify independently. But IMO this would be a *REALLY* bad idea. There's a standard; we should follow it. This isn't just about behavior within the web platform, but also consistency and interoperability with text processing in other environments. The more closely we all keep to the relevant standards, the better for everyone. JK
Re: [css-text] Control characters
Koji Ishii   Sun, 29 Jun 2014 11:51:35 +0000

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 11:52:09 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: brad.kemper@gmail.com, jackalmage@gmail.com, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

> They don't have the default-ignorable property. Interesting. From the data, they were default-ignorable until Unicode 5.0, and then Unicode removed them in 5.1. I guess we need to learn motivations why Unicode did so if we were going to spend more efforts in this topic. > Now, I suppose we could specify (somewhere - though I don't see how this would fall within the scope of CSS) that text processes such as searching, sorting, indexing, etc., within the web platform should base their behavior *not* on the (normative) Unicode character properties, but on something else that we specify independently. But IMO this would be a *REALLY* bad idea. There's a standard; we should follow it. > > This isn't just about behavior within the web platform, but also consistency and interoperability with text processing in other environments. The more closely we all keep to the relevant standards, the better for everyone. First of all, CSS defines surrendering, so searching, sorting, indexing, etc. are out of scope. Second. As far as I understand, there’s nothing in Unicode stating higher-level protocols should render control characters, though it may also not recommend not to. In that case, we’re not violating the normative Unicode character properties at all. By the way, my personal +1 is to Brad. Seaching and copying text in browsers sometimes bother me too, so I share your concern, but improving search is the appropriate way to address what you want to solve than to discuss about rendering of control characters. W3C does not have such spec today, but you could suggest W3C to create a spec for text-izing HTML content which should help interoperable behavior for searching, sorting, indexing, etc. /koji
Re: [css-text] Control characters
Anne van Kesteren   Sun, 29 Jun 2014 13:56:28 +0200

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 11:56:55 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: kojiishi@gluesoft.co.jp
Copied to: jfkthame@gmail.com, brad.kemper@gmail.com, jackalmage@gmail.com, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

On Sun, Jun 29, 2014 at 1:51 PM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: > By the way, my personal +1 is to Brad. Could you elaborate a bit on why you agree? We render U+FFFD for instance (and not doing so would be bad). Why would we want to hide other code points that could potentially indicate something went wrong? -- http://annevankesteren.nl/
Re: [css-text] Control characters
Koji Ishii   Sun, 29 Jun 2014 14:06:42 +0000

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 14:07:19 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: annevk@annevk.nl
Copied to: jfkthame@gmail.com, brad.kemper@gmail.com, jackalmage@gmail.com, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

>> By the way, my personal +1 is to Brad. > > Could you elaborate a bit on why you agree? We render U+FFFD for > instance (and not doing so would be bad). Why would we want to hide > other code points that could potentially indicate something went > wrong? There are two perspectives in my mind. It was said that changing the rendering of control characters can solve searching, sorting, indexing, etc., but I do not think changing the rendering solves these issues at all. My point is only that if it were the issue, try to solve the issue rather than changing the rendering. There are a lot of other issues that prevent search working properly than control characters. In that point, I +1 to Brad. It’s a separate issue, good to pursue, but does not help to determine whether we should display control characters or not. U+FFFD has completely separate story so it’s hard to compare. It’s So (Symbols, Others,) not Cc. Its use is defined in quite details in Unicode 6.3, "3.9 Unicode Encoding Forms"[1], "5.22 Best Practice for U+FFFD Substitution”[2], and in UTR#36 Unicode Security Considerations[3]. In regards to whether control characters should be displayed or not, I was actually fine with either way, had a weak preference not to display just because that’s the existing behaviors and I did not find good enough reasons to change. But, hey, thanks to your e-mail, by reading Unicode spec again to write the above paragraph, I found this text in "5.21 Ignoring Characters in Processing”: > Surrogate code points, private-use characters, and control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters or code points, when not interpreted and not displayable by normal rendering, should be displayed in fallback rendering with a fallback glyph So I changed my opinion; I’m still not sure if Unicode recommends all non-Default_Ignorable_Code_Point to be displayed, but at least "Surrogate code points, private-use characters, and control characters” should be displayed. I still would like to double-check with UTC if our understanding is correct, and if there are more to be/to not be displayed. So, thank you for asking!! [1] http://www.unicode.org/versions/Unicode6.3.0/ch03.pdf [2] http://www.unicode.org/versions/Unicode6.3.0/ch05.pdf [3] http://www.unicode.org/reports/tr36/ /koji
Re: [css-text] Control characters
Brad Kemper   Sun, 29 Jun 2014 09:43:51 -0700

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 16:44:20 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: annevk@annevk.nl
Copied to: kojiishi@gluesoft.co.jp, jfkthame@gmail.com, jackalmage@gmail.com, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

> On Jun 29, 2014, at 4:56 AM, Anne van Kesteren <annevk@annevk.nl> wrote: > >> On Sun, Jun 29, 2014 at 1:51 PM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: >> By the way, my personal +1 is to Brad. > > Could you elaborate a bit on why you agree? We render U+FFFD for > instance (and not doing so would be bad). Why would we want to hide > other code points that could potentially indicate something went > wrong? Speaking for myself I don't think it is our purview (or should be) to intentionally break the rendering of Web pages, however laudable you might think the reason is. We should be be prioritizing intended rendering, not educating through such a blunt weapon. I think the way text/html is forgiving of unclosed tags and unquoted attributes and so on is a better model for how we should deal with these sort of problems. Be forgiving, don't degrade legacy content.
Re: [css-text] Control characters
Brad Kemper   Sun, 29 Jun 2014 09:52:08 -0700

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 16:52:37 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: kojiishi@gluesoft.co.jp
Copied to: jfkthame@gmail.com, jackalmage@gmail.com, annevk@annevk.nl, zackw@panix.com, fantasai.lists@inkedblade.net, www-style@w3.org.

On Jun 29, 2014, at 4:51 AM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: >> Now, I suppose we could specify (somewhere - though I don't see how this would fall within the scope of CSS) that text processes such as searching, sorting, indexing, etc., within the web platform should base their behavior *not* on the (normative) Unicode character properties, but on something else that we specify independently. But IMO this would be a *REALLY* bad idea. There's a standard; we should follow it. >> >> This isn't just about behavior within the web platform, but also consistency and interoperability with text processing in other environments. The more closely we all keep to the relevant standards, the better for everyone. > > First of all, CSS defines surrendering, so searching, sorting, indexing, etc. are out of scope. Second. As far as I understand, there’s nothing in Unicode stating higher-level protocols should render control characters, though it may also not recommend not to. In that case, we’re not violating the normative Unicode character properties at all. I agree. If there is a browser that can't find a word because it has an invisible control character in it, then it should be handled by filing a bug report to the browser maker for its silly usability problem, but it is not in scope for CSS to dictate that.
Re: [css-text] Control characters
Boris Zbarsky   Sun, 29 Jun 2014 15:58:44 -0400

www-style > June 2014 > 0000.html

Received on Sunday, 29 June 2014 19:59:12 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org.

On 6/29/14, 12:52 PM, Brad Kemper wrote: > If there is a browser that can't find a word because it has an invisible control character in it, then it should be handled by filing a bug report to the browser maker for its silly usability problem Can you name a piece of software in which text search works the way you claim you want it to work in browsers? Do text editors do that? Word processors? Internet search engines? Anything else? -Boris
Re: [css-text] Control characters
Brad Kemper   Mon, 30 Jun 2014 12:22:53 -0700

www-style > June 2014 > 0000.html

Received on Monday, 30 June 2014 19:23:22 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: bzbarsky@MIT.EDU
Copied to: www-style@w3.org, www-style@w3.org.

On Jun 29, 2014, at 12:58 PM, Boris Zbarsky <bzbarsky@MIT.EDU> wrote: > >> On 6/29/14, 12:52 PM, Brad Kemper wrote: >> If there is a browser that can't find a word because it has an invisible control character in it, then it should be handled by filing a bug report to the browser maker for its silly usability problem > > Can you name a piece of software in which text search works the way you > claim you want it to work in browsers? Do text editors do that? Word > processors? Internet search engines? Anything else? I don't know, but I also largely don't care. I will readily cede your implied point that it might not be what software developers are used to, or that it might not be what is normally available from their code libraries or built-in functions. My viewpoint is informed by how I think users (first) and web authors (second) are best served. It is not based on if it is easy for implementors, or if other software might suffer the same sort of problems.
Re: [css-text] Control characters
Boris Zbarsky   Mon, 30 Jun 2014 15:27:55 -0400

www-style > June 2014 > 0000.html

Received on Monday, 30 June 2014 19:28:25 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: brad.kemper@gmail.com
Copied to: www-style@w3.org, www-style@w3.org.

On 6/30/14, 3:22 PM, Brad Kemper wrote: > I will readily cede your implied point that it might not be what software developers are used to No, you didn't get my point. My point is that if you write a word on your website and your browser shows it as a word but Google's spider doesn't think it's a word, you will be unhappy. Your website's users will similarly be unhappy when they try and copy/paste the word into their word processor or mail client, and so forth. That is to say, there is a tension here between browsers fixing up broken sites for their users and web sites playing nice with the larger text-processing ecosystem that exists in the world, and it's possible to actually make things worse for users and authors by covering up issues that would completely break other tools they rely on. > My viewpoint is informed by how I think users (first) and web authors (second) are best served. So is mine. -Boris
Re: [css-text] Control characters
Ambrose LI   Mon, 30 Jun 2014 16:06:57 -0400

www-style > June 2014 > 0000.html

Received on Monday, 30 June 2014 20:08:05 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: bzbarsky@mit.edu
Copied to: brad.kemper@gmail.com, www-style@w3.org, www-style@w3.org.

2014-06-30 15:27 GMT-04:00 Boris Zbarsky <bzbarsky@mit.edu>: > No, you didn't get my point. My point is that if you write a word on your > website and your browser shows it as a word but Google's spider doesn't > think it's a word, you will be unhappy. These things already exist. It’s called Javascript-generated content. -- cheers, -ambrose <http://gniw.ca>
Re: [css-text] Control characters
Boris Zbarsky   Mon, 30 Jun 2014 18:51:14 -0400

www-style > June 2014 > 0000.html

Received on Monday, 30 June 2014 22:51:43 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: ambrose.li@gmail.com
Copied to: brad.kemper@gmail.com, www-style@w3.org, www-style@w3.org.

On 6/30/14, 4:06 PM, Ambrose LI wrote: > 2014-06-30 15:27 GMT-04:00 Boris Zbarsky <bzbarsky@mit.edu>: >> No, you didn't get my point. My point is that if you write a word on your >> website and your browser shows it as a word but Google's spider doesn't >> think it's a word, you will be unhappy. > > These things already exist. It’s called Javascript-generated content. 1) Search engine spiders nowadays run JS. 2) Website authors typically know about this gotcha, and more importantly know when they have JS-generated content. -Boris
Re: [css-text] Control characters
Brad Kemper   Mon, 30 Jun 2014 23:39:52 -0700

www-style > July 2014 > 0000.html

Received on Tuesday, 1 July 2014 06:40:20 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: bzbarsky@MIT.EDU
Copied to: www-style@w3.org, www-style@w3.org.

> On Jun 30, 2014, at 12:27 PM, Boris Zbarsky <bzbarsky@MIT.EDU> wrote: > > No, you didn't get my point. My point is that if you write a word on your website and your browser shows it as a word but Google's spider doesn't think it's a word, you will be unhappy. I think if Google's spider is that broken, then Google should fix it. It does Google no good to avoid recognizing a word because it has an unintentional control character in the middle of it. > Your website's users will similarly be unhappy when they try and copy/paste the word into their word processor or mail client, and so forth. The browser already changes copied content as a result of text-transform. I don't see why it wouldn't leave out from copying characters that are known to be mistakes. > That is to say, there is a tension here between browsers fixing up broken sites for their users and web sites playing nice with the larger text-processing ecosystem that exists in the world, OK > and it's possible to actually make things worse for users and authors by covering up issues that would completely break other tools they rely on. I think it is possible to discard mistakes without breaking other tools.
Re: [css-text] Control characters
Boris Zbarsky   Tue, 01 Jul 2014 03:07:43 -0400

www-style > July 2014 > 0000.html

Received on Tuesday, 1 July 2014 07:08:16 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: brad.kemper@gmail.com
Copied to: www-style@w3.org, www-style@w3.org.

On 7/1/14, 2:39 AM, Brad Kemper wrote: > I think if Google's spider is that broken, then Google should fix it. So you think all software in the world should be changed to deal with this particular brand of broken content, I get it. How likely do you think it is in practice? > The browser already changes copied content as a result of text-transform. Firefox certainly doesn't do that. -Boris
Re: [css-text] Control characters
Rafał Pietrak   Tue, 01 Jul 2014 09:14:39 +0200

www-style > July 2014 > 0000.html

Received on Tuesday, 1 July 2014 07:15:23 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org.

W dniu 01.07.2014 08:39, Brad Kemper pisze: [-----------------] >> Your website's users will similarly be unhappy when they try and copy/paste the word into their word processor or mail client, and so forth. > The browser already changes copied content as a result of text-transform. I don't see why it wouldn't leave out from copying characters that are known to be mistakes. I can make a "sworn statement" here: 1. my bank (one of *.pl) did placed a hidden character within account numbers of passed (e.g my earlier) bank transferes. 2. my browser (Epiphany) did copy those hidden characters when C-C. 3. but that copied account number was for that same bank invalid, when pasted into a new bank transfer panel. As a user, I was seriously p-off. I'd rather see them all.... *at least* as replaced by ordinary space (which is normally the case of [:newline:] charachter). -R
Re: [css-text] Control characters
Brad Kemper   Tue, 1 Jul 2014 07:57:03 -0700

www-style > July 2014 > 0000.html

Received on Tuesday, 1 July 2014 14:57:37 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: bzbarsky@MIT.EDU
Copied to: www-style@w3.org, www-style@w3.org.

On Jul 1, 2014, at 12:07 AM, Boris Zbarsky <bzbarsky@MIT.EDU> wrote: > >> On 7/1/14, 2:39 AM, Brad Kemper wrote: >> I think if Google's spider is that broken, then Google should fix it. > > So you think all software in the world should be changed to deal with this particular brand of broken content, I get it. That is a mischaracterization. I merely said that it is beyond our preview to make sure Google can find words if they have bad characters in them, and beyond our purview to highlight authors mistakes. Just as we do not dictate that implementors should highlight misspellings. If a character is obviously a mistake, if there is no question it is unintentional, and if it is easy to hide it or remove it, then we should. > How likely do you think it is in practice? > >> The browser already changes copied content as a result of text-transform. > > Firefox certainly doesn't do that. Good, and good to know. Thanks. WebKit does, and I think IE does too. I imagine Chrome does too. Anyway, my preference is that the browser completely filters out the unintentional control characters completely, so that they don't interfere with copying or searching. But that is an implementation detail.
Re: [css-text] Control characters
Brad Kemper   Tue, 1 Jul 2014 08:03:13 -0700

www-style > July 2014 > 0000.html

Received on Tuesday, 1 July 2014 15:03:45 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: rafal@ztk-rp.eu
Copied to: www-style@w3.org, www-style@w3.org.

> On Jul 1, 2014, at 12:14 AM, Rafał Pietrak <rafal@ztk-rp.eu> wrote: > > W dniu 01.07.2014 08:39, Brad Kemper pisze: > [-----------------] >>> Your website's users will similarly be unhappy when they try and copy/paste the word into their word processor or mail client, and so forth. >> The browser already changes copied content as a result of text-transform. I don't see why it wouldn't leave out from copying characters that are known to be mistakes. > > I can make a "sworn statement" here: > 1. my bank (one of *.pl) did placed a hidden character within account numbers of passed (e.g my earlier) bank transferes. > 2. my browser (Epiphany) did copy those hidden characters when C-C. I wouldn't like that either. I don't they hidden unintentional control characters should be included in copied text. > 3. but that copied account number was for that same bank invalid, when pasted into a new bank transfer panel. > > As a user, I was seriously p-off. > > I'd rather see them all.... *at least* as replaced by ordinary space (which is normally the case of [:newline:] charachter). That would be better than drawing a box or a question mark or something. But I'd rather it just got stripped out of the content, and thus not present in what you copy. If an author wants to be educated as to where the characters are, there are probably other tools to display them more obvious before they are published to the Web. Boris's implication is that there are many such applications.
[css-text] Rendering of control characters
Behdad Esfahbod   Wed, 24 Sep 2014 16:46:07 +0300

www-style > September 2014 > 0000.html

Received on Wednesday, 24 September 2014 13:46:38 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org, www-style@w3.org, fantasai@inkedblade.net.

Hi, Currently CSS Text Module Level 3 says [0]: """ Control characters (Unicode class Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) are ignored for the purpose of rendering. """ AFAIU the Unicode standard says that those should be rendered like any other characters except for the ones that are considered whitespace. This also lines up closely with CSS2.1 [1]: """ Control characters other than U+0009 (tab), U+000A (line feed), U+0020 (space), and U+202x (bidi formatting characters) are treated as characters to render in the same way as any normal character. """ I'm asking, because this disparity requires implementations to rewrite such characters before passing text to a shaping engine like HarfBuzz. This makes me wonder, why the deviation from Unicode? If there are no good explanations, can we change that please? [0] http://www.w3.org/TR/css-text-3/#white-space-processing [1] http://www.w3.org/TR/CSS2/text.html#ctrlchars Cheers, -- behdad http://behdad.org/
Re: [css-text] Rendering of control characters
Glenn Adams   Wed, 24 Sep 2014 08:04:06 -0600

www-style > September 2014 > 0000.html

Received on Wednesday, 24 September 2014 14:04:54 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: behdad@behdad.org
Copied to: www-style@w3.org, www-style@w3.org, fantasai@inkedblade.net.

On Wed, Sep 24, 2014 at 7:46 AM, Behdad Esfahbod <behdad@behdad.org> wrote: > Hi, > > Currently CSS Text Module Level 3 says [0]: > > """ > Control characters (Unicode class Cc) other than tab (U+0009), line feed > (U+000A), and carriage return (U+000D) are ignored for the purpose of > rendering. > """ > > AFAIU the Unicode standard says that those should be rendered like any > other > characters except for the ones that are considered whitespace. This also > lines up closely with CSS2.1 [1]: > > """ > Control characters other than U+0009 (tab), U+000A (line feed), U+0020 > (space), and U+202x (bidi formatting characters) are treated as characters > to > render in the same way as any normal character. > """ > > I'm asking, because this disparity requires implementations to rewrite such > characters before passing text to a shaping engine like HarfBuzz. This > makes > me wonder, why the deviation from Unicode? If there are no good > explanations, > can we change that please? > In the recent F2F, it was resolved to change this to display Cc characters (other than HT, LF, CR). So I believe this problem has been resolved. > > [0] http://www.w3.org/TR/css-text-3/#white-space-processing > [1] http://www.w3.org/TR/CSS2/text.html#ctrlchars > > Cheers, > -- > behdad > http://behdad.org/ > >
Re: [css-text] Rendering of control characters
Behdad Esfahbod   Wed, 24 Sep 2014 17:11:55 +0300

www-style > September 2014 > 0000.html

Received on Wednesday, 24 September 2014 14:12:30 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: glenn@skynav.com
Copied to: www-style@w3.org, www-style@w3.org, fantasai@inkedblade.net.

On 14-09-24 05:04 PM, Glenn Adams wrote: > > In the recent F2F, it was resolved to change this to display Cc characters > (other than HT, LF, CR). So I believe this problem has been resolved. Ah, great. Thanks! I'll continue procrastinating on more issues till they resolve themselves :). Are there meeting notes online about this? -- behdad http://behdad.org/
Re: [css-text] Rendering of control characters
Simon Sapin   Wed, 24 Sep 2014 15:43:20 +0100

www-style > September 2014 > 0000.html

Received on Wednesday, 24 September 2014 14:43:45 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org.

On 24/09/14 15:11, Behdad Esfahbod wrote: > On 14-09-24 05:04 PM, Glenn Adams wrote: >> >> In the recent F2F, it was resolved to change this to display Cc characters >> (other than HT, LF, CR). So I believe this problem has been resolved. > > Ah, great. Thanks! I'll continue procrastinating on more issues till they > resolve themselves :). > > Are there meeting notes online about this? > The discussion in last May seems to have another conclusion: http://lists.w3.org/Archives/Public/www-style/2014Jun/0167.html > - Issue 72 about control characters will be tested a bit more, but > likely will result in no change. See also: http://dev.w3.org/csswg/css-text-3/issues-lc-2013#issue-72 -- Simon Sapin
Re: [css-text] Rendering of control characters
Glenn Adams   Wed, 24 Sep 2014 10:26:05 -0600

www-style > September 2014 > 0000.html

Received on Wednesday, 24 September 2014 16:26:52 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: behdad@behdad.org
Copied to: www-style@w3.org, www-style@w3.org, fantasai@inkedblade.net.

On Wed, Sep 24, 2014 at 8:11 AM, Behdad Esfahbod <behdad@behdad.org> wrote: > On 14-09-24 05:04 PM, Glenn Adams wrote: > > > > In the recent F2F, it was resolved to change this to display Cc > characters > > (other than HT, LF, CR). So I believe this problem has been resolved. > > Ah, great. Thanks! I'll continue procrastinating on more issues till they > resolve themselves :). > > Are there meeting notes online about this? > The resolution is recorded in IRC logs at [1]. I would expect minutes to be published soon. [1] http://log.csswg.org/irc.w3.org/css/2014-09-08/#e469848 > > -- > behdad > http://behdad.org/ >
Re: [css-text] Rendering of control characters
Simon Sapin   Wed, 24 Sep 2014 17:59:40 +0100

www-style > September 2014 > 0000.html

Received on Wednesday, 24 September 2014 17:00:08 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org.

On 24/09/14 17:26, Glenn Adams wrote: > On Wed, Sep 24, 2014 at 8:11 AM, Behdad Esfahbod <behdad@behdad.org > <mailto:behdad@behdad.org>> wrote: > > On 14-09-24 05:04 PM, Glenn Adams wrote: > > > > In the recent F2F, it was resolved to change this to display Cc characters > > (other than HT, LF, CR). So I believe this problem has been resolved. > > Ah, great. Thanks! I'll continue procrastinating on more issues > till they > resolve themselves :). > > Are there meeting notes online about this? > > > The resolution is recorded in IRC logs at [1]. I would expect minutes to > be published soon. > > [1] http://log.csswg.org/irc.w3.org/css/2014-09-08/#e469848 Ah, yes, that’s more recent. Please ignore my other message! -- Simon Sapin
Re: [css-text] Control characters
fantasai   Wed, 22 Oct 2014 20:48:31 -0400

www-style > October 2014 > 0000.html

Received on Thursday, 23 October 2014 00:49:00 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com, kojiishi@gluesoft.co.jp, annevk@annevk.nl
Copied to: zackw@panix.com, www-style@w3.org, www-style@w3.org.

On 06/27/2014 05:27 AM, Jonathan Kew wrote: > On 27/6/14 09:49, Koji Ishii wrote: > >>> Of course, you still need to define how those control characters >>> are rendered, erroneous or not. >> >> Yes, this is the text we have now[1]. Your quick review is invaluable >> for us, please let us know if any. >> >>> Control characters (Unicode class Cc) other than tab (U+0009), line >>> feed (U+000A), and carriage return (U+000D) are ignored for the >>> purpose of rendering. (As required by [UNICODE], unsupported >>> Default_ignorable characters must also be ignored for rendering.) > > IMO, it would be better to require the presence of spurious control > characters (i.e. other than tab, linefeed, return) to be rendered > visibly - e.g. as "hexbox" glyphs or inverse-colored ^X sequences - > rather than ignored. > > The presence of such characters within the text degrades functionality > by interfering with operations such as search, indexing, copy/paste to > other environments, etc. Their presence is typically the result of > broken authoring tools/workflows, but as long as browsers ignore them > for rendering, authors generally remain unaware that their data is bad, > and readers will usually be unaware that their searches, etc., may be > missing content they would have expected to match. > > I realize that making stray control characters visible will result in > some pages (containing bad text) looking "worse" from an aesthetic > point of view, but I don't believe this is such a widespread and > serious problem that we should give up the battle and accept that the > Web will forever hide these errors and leave the problem of polluted > data unaddressed. If browser vendors would agree to make the CCs > visible, and include this in the relevant specs, there'll be a spate > of bug reports - as we've seen when we had them rendered as hexboxes > in Firefox - but these can be redirected to the sites/authors concerned, > and there will be significant pressure on authors and tool vendors to > fix the underlying problems. > > Although there'd no doubt be some short-term discontent, I think this > would be significantly better for the long-term health of the web. > Our concern should not -only- be to optimize the display of (a small > minority of badly-authored) web pages of today; we should also be > concerned for the quality and usability of web data in the future. Thanks for your comments and concerns. The CSSWG has reviewed this issue, and, after reviewing also other implementors' feedback, has resolved to make this change. The minutes to the resolution are here: http://lists.w3.org/Archives/Public/www-style/2014Oct/0259.html The change has been checked into the Editor's Draft and should make its way to /TR shortly. The new text reads: # Control characters (Unicode category Cc) other than tab (U+0009), # line feed (U+000A), and carriage return (U+000D) must be rendered # as a visible glyph and otherwise treated as any other character # of the Other Symbols (So) general category and Common script. # The UA may use a glyph provided by a font specifically for the # control character, substitute the glyphs provided for the # corresponding symbol in the Control Pictures block, generate a # visual representation of its codepoint value, or use some other # method to provide an appropriate visible glyph. As required by # [UNICODE], unsupported Default_ignorable characters must be # ignored for rendering. Let us know if there are any errors or if you have further suggestions for improvement. Thanks! ~fantasai
RE: [css-text] Control characters
Greg Whitworth   Sat, 15 Nov 2014 00:11:08 +0000

www-style > November 2014 > 0000.html

Received on Saturday, 15 November 2014 00:11:37 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org, www-style@w3.org.

> Thanks for your comments and concerns. The CSSWG has reviewed this > issue, and, after reviewing also other implementors' feedback, has resolved > to make this change. Just so I can make a decision concerning a livesite bug that we have had reported regarding showing the box for the control character, does Blink/Webkit/Gecko plan to implement this? We currently haven't fully implemented this but fell into it due to the font selected, but it does bring up the issue that a user will think that a UA is broken if one UA is showing boxes while others are not. Interestingly, Gecko removed this from their implementation due to this very problem[1]. Thanks in advance, Greg [1] https://bugzilla.mozilla.org/show_bug.cgi?id=947588
Re: [css-text] Control characters
Jonathan Kew   Sat, 15 Nov 2014 10:45:29 +0000

www-style > November 2014 > 0000.html

Received on Saturday, 15 November 2014 10:45:57 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org.

On 15/11/14 00:11, Greg Whitworth wrote: >> Thanks for your comments and concerns. The CSSWG has reviewed this >> issue, and, after reviewing also other implementors' feedback, has >> resolved to make this change. > > Just so I can make a decision concerning a livesite bug that we have > had reported regarding showing the box for the control character, > does Blink/Webkit/Gecko plan to implement this? We currently haven't > fully implemented this but fell into it due to the font selected, but > it does bring up the issue that a user will think that a UA is broken > if one UA is showing boxes while others are not. Interestingly, Gecko > removed this from their implementation due to this very problem[1]. We did (after considerable debate). But given the recent CSS WG agreement on this, I expect us to revert that change. Just filed https://bugzilla.mozilla.org/show_bug.cgi?id=1099557. JK
Re: [css-text] Control characters
"Robert O'Callahan"   Sun, 16 Nov 2014 01:11:14 +1300

www-style > November 2014 > 0000.html

Received on Saturday, 15 November 2014 12:11:43 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: www-style@w3.org.

On Sat, Nov 15, 2014 at 11:45 PM, Jonathan Kew <jfkthame@gmail.com> wrote: > On 15/11/14 00:11, Greg Whitworth wrote: > >> Thanks for your comments and concerns. The CSSWG has reviewed this >>> issue, and, after reviewing also other implementors' feedback, has >>> resolved to make this change. >>> >> >> Just so I can make a decision concerning a livesite bug that we have >> had reported regarding showing the box for the control character, >> does Blink/Webkit/Gecko plan to implement this? We currently haven't >> fully implemented this but fell into it due to the font selected, but >> it does bring up the issue that a user will think that a UA is broken >> if one UA is showing boxes while others are not. Interestingly, Gecko >> removed this from their implementation due to this very problem[1]. >> > > We did (after considerable debate). But given the recent CSS WG agreement > on this, I expect us to revert that change. > > Just filed https://bugzilla.mozilla.org/show_bug.cgi?id=1099557. > I think we should coordinate schedules here. I don't want to Firefox to be the only browser doing this; I'd like to ship this change around the same time another major browser does. Rob -- oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo owohooo osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o o‘oRoaocoao,o’o oioso oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo owohooo osoaoyoso,o o‘oYooouo ofooooolo!o’o owoiololo oboeo oiono odoaonogoeoro ooofo otohoeo ofoioroeo ooofo ohoeololo.
Re: [css-text] Control characters
Jonathan Kew   Sat, 15 Nov 2014 13:20:03 +0000

www-style > November 2014 > 0000.html

Received on Saturday, 15 November 2014 13:20:32 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org
Copied to: robert@ocallahan.org.

On 15/11/14 12:11, Robert O'Callahan wrote: > On Sat, Nov 15, 2014 at 11:45 PM, Jonathan Kew <jfkthame@gmail.com > <mailto:jfkthame@gmail.com>> wrote: > > On 15/11/14 00:11, Greg Whitworth wrote: > > Thanks for your comments and concerns. The CSSWG has > reviewed this > issue, and, after reviewing also other implementors' > feedback, has > resolved to make this change. > > > Just so I can make a decision concerning a livesite bug that we have > had reported regarding showing the box for the control character, > does Blink/Webkit/Gecko plan to implement this? We currently > haven't > fully implemented this but fell into it due to the font > selected, but > it does bring up the issue that a user will think that a UA is > broken > if one UA is showing boxes while others are not. Interestingly, > Gecko > removed this from their implementation due to this very problem[1]. > > > We did (after considerable debate). But given the recent CSS WG > agreement on this, I expect us to revert that change. > > Just filed https://bugzilla.mozilla.org/__show_bug.cgi?id=1099557 > <https://bugzilla.mozilla.org/show_bug.cgi?id=1099557>. > > > I think we should coordinate schedules here. I don't want to Firefox to > be the only browser doing this; I'd like to ship this change around the > same time another major browser does. Agreed, that would be good. So Greg, can you comment on schedule at all? How about Webkit or Blink folk? JK
RE: [css-text] Control characters
Greg Whitworth   Sat, 15 Nov 2014 20:36:15 +0000

www-style > November 2014 > 0000.html

Received on Saturday, 15 November 2014 20:36:44 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com, www-style@w3.org, robert@ocallahan.org, robert@ocallahan.org.

> Agreed, that would be good. > > So Greg, can you comment on schedule at all? I can't because this isn't a huge priority for us at the moment but saw the potential for the problem that Gecko hit and was hoping we could synchronize (as best as possible) when this change is made in all UAs. Now that IE on Windows 10 has flags I don't see why we couldn't all place this implementation in behind a flag and then when all of us have it there just remove it from behind the flag. I can get a rough timeline for IE to discuss at Sydney and then we can possibly target a quarter for all major UAs to have it shipped behind the flag, and subsequently shipped on by default. I also think this is a change that we should alert web developers on so that they have time to address their sites and remove any control characters that are in their markup. Because while I understand the need for this and agree with the change, the end users will not and we should allow a removal period for web devs before making their sites ugly when they currently work fine (eg: I can imagine the phone calls from clients about what the web dev did to add square boxes into the site without their approval). Do we have agreement to discuss this at Sydney with rough timelines for having this behind a flag? And then based on that a time period target a quarter to have it ship with the it on by default. Again, I understand that prioritization of this feature will vary by UA, especially based on the costing that's why I think it best to have this information ready for discussion at Sydney. Thoughts? > How about Webkit or Blink folk? > > JK >
[css-text] Control Characters Roll Call on implementation
Greg Whitworth   Thu, 10 Sep 2015 20:21:05 +0000

www-style > September 2015 > 0000.html

Received on Thursday, 10 September 2015 20:21:35 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

Hello everyone, As I brought up at recent F2F, and I plan to bring up again at TPAC, we all agreed[1] to implement behind a flag the breaking change stated in CSS Text L3 to render control characters. # Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) must be rendered as a visible glyph and # otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically # for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its # codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters # must be ignored for rendering. We have implemented this in Microsoft Edge for all C0 and if it did not already have a hex box in the selected font we are using U+25AF (White Vertical Rectangle). One thing of note is that we noticed all browsers throw away the C1 control characters during HTML parsing so this replacement isn't possible. If the other UAs can confirm this we should adjust the language in the text to only change the C0 control characters and ensure that your HTML parser is disposing of the C1 block. Here are bugs for the dev work in each browser: Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 Webkit: Can't seem to find one (Dean, can you please get me one) Blink: Can't find one (Tab, can you please get me one) Microsoft: Implemented behind a flag, I'll provide an email once it's available in an insider preview I want to **reiterate** that while this is not a monumental change, this is actually an important practice for all of us to work together on a breaking change and potentially increase this to clean up the web platform in the future. Please let me know where you development status is and if you expect to have the development done by October (TPAC). Remember, it does not have to ship, just be completed. >From there we can start the PR machines and tooling to let authors know of this breaking change. Thanks, Greg [1] http://logs.csswg.org/irc.w3.org/css/2015-02-08/#e520447
Re: [css-text] Control Characters Roll Call on implementation
Jonathan Kew   Thu, 10 Sep 2015 21:42:27 +0100

www-style > September 2015 > 0000.html

Received on Thursday, 10 September 2015 20:43:00 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: gwhit@microsoft.com, dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

On 10/9/15 21:21, Greg Whitworth wrote: > One thing of note is that we noticed all > browsers throw away the C1 control characters during HTML parsing so > this replacement isn't possible. If the other UAs can confirm this we > should adjust the language in the text to only change the C0 control > characters and ensure that your HTML parser is disposing of the C1 > block. Note that even if the HTML parser disposes of C1 controls, it's still possible to put them into the document via script. Therefore, I think it's important for the rendering engine to handle these properly as well (i.e. render them as hexboxes or similar, rather than using glyphs from some fallback font that happens to have abused the codepoints). JK
Re: [css-text] Control Characters Roll Call on implementation
"Tab Atkins Jr."   Thu, 10 Sep 2015 16:15:28 -0700

www-style > September 2015 > 0000.html

Received on Thursday, 10 September 2015 23:16:15 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: gwhit@microsoft.com
Copied to: dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

On Thu, Sep 10, 2015 at 1:21 PM, Greg Whitworth <gwhit@microsoft.com> wrote: > Hello everyone, > > As I brought up at recent F2F, and I plan to bring up again at TPAC, we all agreed[1] to implement behind a flag the breaking change stated in CSS Text L3 to render control characters. > > # Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) must be rendered as a visible glyph and > # otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically > # for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its > # codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters > # must be ignored for rendering. > > We have implemented this in Microsoft Edge for all C0 and if it did not already have a hex box in the selected font we are using U+25AF (White Vertical Rectangle). One thing of note is that we noticed all browsers throw away the C1 control characters during HTML parsing so this replacement isn't possible. If the other UAs can confirm this we should adjust the language in the text to only change the C0 control characters and ensure that your HTML parser is disposing of the C1 block. > > Here are bugs for the dev work in each browser: > Mozilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 > Webkit: Can't seem to find one (Dean, can you please get me one) > Blink: Can't find one (Tab, can you please get me one) > Microsoft: Implemented behind a flag, I'll provide an email once it's available in an insider preview > > I want to **reiterate** that while this is not a monumental change, this is actually an important practice for all of us to work together on a breaking change and potentially increase this to clean up the web platform in the future. > > Please let me know where you development status is and if you expect to have the development done by October (TPAC). Remember, it does not have to ship, just be completed. I've been poking at Emil for a while, and he just sent an Intent To Implement <https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/uIc2ZvLQOHw> and a bug <https://code.google.com/p/chromium/issues/detail?id=530342>. He'd already implemented this a while back (predating our agreement to do it together) but had it reverted, so the code is already ready to go. Assuming we target Chrome 47 as planned, it'll hit stable on December 1. ~TJ
Re: [css-text] Control Characters Roll Call on implementation
Jonathan Kew   Fri, 11 Sep 2015 17:01:53 +0100

www-style > September 2015 > 0000.html

Received on Friday, 11 September 2015 16:02:22 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jackalmage@gmail.com, gwhit@microsoft.com
Copied to: dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

On 11/9/15 00:15, Tab Atkins Jr. wrote: > I've been poking at Emil for a while, and he just sent an Intent To > Implement <https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/uIc2ZvLQOHw> > and a bug <https://code.google.com/p/chromium/issues/detail?id=530342>. > He'd already implemented this a while back (predating our agreement to > do it together) but had it reverted, so the code is already ready to > go. Assuming we target Chrome 47 as planned, it'll hit stable on > December 1. The corresponding Gecko change in https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 should appear in Nightly builds in a day or two, and is currently aimed to ship in Firefox 43 (mid-December). So that will coordinate well with Chrome 47. JK
Re: [css-text] Control Characters Roll Call on implementation
"Myles C. Maxfield"   Mon, 14 Sep 2015 14:02:19 -0700

www-style > September 2015 > 0000.html

Received on Monday, 14 September 2015 21:02:54 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: jackalmage@gmail.com, gwhit@microsoft.com, dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

> On Sep 11, 2015, at 9:01 AM, Jonathan Kew <jfkthame@gmail.com> wrote: > > On 11/9/15 00:15, Tab Atkins Jr. wrote: > >> I've been poking at Emil for a while, and he just sent an Intent To >> Implement <https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/uIc2ZvLQOHw> >> and a bug <https://code.google.com/p/chromium/issues/detail?id=530342>. >> He'd already implemented this a while back (predating our agreement to >> do it together) but had it reverted, so the code is already ready to >> go. Assuming we target Chrome 47 as planned, it'll hit stable on >> December 1. > > The corresponding Gecko change in https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 should appear in Nightly builds in a day or two, and is currently aimed to ship in Firefox 43 (mid-December). So that will coordinate well with Chrome 47. > > JK > > I've created a WebKit bug for this at https://bugs.webkit.org/show_bug.cgi?id=149128 <https://bugs.webkit.org/show_bug.cgi?id=149128> -- Myles
RE: [css-text] Control Characters Roll Call on implementation
Greg Whitworth   Thu, 17 Sep 2015 03:11:53 +0000

www-style > September 2015 > 0000.html

Received on Thursday, 17 September 2015 03:12:42 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: mmaxfield@apple.com, mmaxfield@apple.com, jfkthame@gmail.com, simonp@opera.com, jackalmage@gmail.com, dbaron@dbaron.org, dino@apple.com, tabatkins@google.com
Copied to: www-style@w3.org, www-style@w3.org.

I just left this message on Blink-dev, but think it's worth putting here as well so we all have a reference to a rough timeline of events: -------------------------------------- The minutes are spotty on this from the Sydney face to face but regarding the timeline, "around November," this was to have it behind a flag. Coincidentally a rough plan to have it ship around November was suggested as well since Apple normally ships around then. That said, I don't want to put the cart before the horse and discuss shipping dates until we can ascertain the actual impact of this change. Additionally, this is our first (that I'm aware of) coordinated release of a breaking change so I want to ensure that we blast the PR trumpets so that as many web devs are aware of this change as possible. Because even though we plan to test it in various ways to get feedback it would be good to get web developer feedback as well. So basically a rough timeline looks like this: TPAC 2015: All UAs have code in their browsers behind flag (off by default) TPAC 2015 - Summery or Fall 2016: PR from all UAs devrel, tooling, etc regarding breaking change Now - Early 2016: UAs do internal testing, testing via dev channels (if available), testing with third parties and report back any compat issues found to www-style thread Summer or Fall 2016: Find shipping date that can overlap as many UAs as possible as not to make it so that one UA has to carry the burden of "bugs" Greg
RE: [css-text] Control Characters Roll Call on implementation
Greg Whitworth   Thu, 17 Sep 2015 17:08:43 +0000

www-style > September 2015 > 0000.html

Received on Thursday, 17 September 2015 17:09:12 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: mmaxfield@apple.com, mmaxfield@apple.com, jfkthame@gmail.com
Copied to: jackalmage@gmail.com, dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

>> The corresponding Gecko change in https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 should appear in Nightly builds in a day or two, >> and is currently aimed to ship in Firefox 43 (mid-December). So that will coordinate well with Chrome 47. >> JK >> I've created a WebKit bug for this at https://bugs.webkit.org/show_bug.cgi?id=149128 >> -- Myles Awesome thanks to you both!!
Re: [css-text] Control Characters Roll Call on implementation
Jonathan Kew   Thu, 17 Sep 2015 18:47:34 +0100

www-style > September 2015 > 0000.html

Received on Thursday, 17 September 2015 17:48:04 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: gwhit@microsoft.com, mmaxfield@apple.com, mmaxfield@apple.com
Copied to: jackalmage@gmail.com, dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

On 17/9/15 18:08, Greg Whitworth wrote: >>> The corresponding Gecko change in https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 should appear in Nightly builds in a day or two, >> and is currently aimed to ship in Firefox 43 (mid-December). So that will coordinate well with Chrome 47. >>> JK > > >>> I've created a WebKit bug for this at https://bugs.webkit.org/show_bug.cgi?id=149128 >>> -- Myles > > Awesome thanks to you both!! > You're welcome. Note, however, that the timeline suggested in your message yesterday: # TPAC 2015: All UAs have code in their browsers behind flag (off by # default) # TPAC 2015 - Summery or Fall 2016: PR from all UAs devrel, tooling, # etc regarding breaking change # Now - Early 2016: UAs do internal testing, testing via dev channels # (if available), testing with third parties and report back any compat # issues found to www-style thread # Summer or Fall 2016: Find shipping date that can overlap as many UAs # as possible as not to make it so that one UA has to carry the burden # of "bugs" is substantially different from what I understood Blink to be aiming at (shipping at the beginning of Dec), and followed for Gecko (FF43, ships mid-Dec). Should we be holding back from those planned dates? E.g. shipping this on Nightly and Developer Edition only for the time being? Tab, can you confirm the timeline for this in the Blink world? JK
RE: [css-text] Control Characters Roll Call on implementation
Greg Whitworth   Mon, 21 Sep 2015 15:54:05 +0000

www-style > September 2015 > 0000.html

Received on Monday, 21 September 2015 15:54:39 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com, mmaxfield@apple.com, mmaxfield@apple.com
Copied to: jackalmage@gmail.com, dbaron@dbaron.org, dino@apple.com, tabatkins@google.com, www-style@w3.org, www-style@w3.org.

>On 17/9/15 18:08, Greg Whitworth wrote: >>>> The corresponding Gecko change in >https://bugzilla.mozilla.org/show_bug.cgi?id=1099557 should appear in >Nightly builds in a day or two, >> and is currently aimed to ship in Firefox 43 >(mid-December). So that will coordinate well with Chrome 47. >>>> JK >> >> >>>> I've created a WebKit bug for this at >>>> https://bugs.webkit.org/show_bug.cgi?id=149128 >>>> -- Myles >> >> Awesome thanks to you both!! >> > >You're welcome. > >Note, however, that the timeline suggested in your message yesterday: > ># TPAC 2015: All UAs have code in their browsers behind flag (off by # default) ># TPAC 2015 - Summery or Fall 2016: PR from all UAs devrel, tooling, # etc >regarding breaking change # Now - Early 2016: UAs do internal testing, testing >via dev channels # (if available), testing with third parties and report back any >compat # issues found to www-style thread # Summer or Fall 2016: Find >shipping date that can overlap as many UAs # as possible as not to make it so >that one UA has to carry the burden # of "bugs" > >is substantially different from what I understood Blink to be aiming at >(shipping at the beginning of Dec), and followed for Gecko (FF43, ships mid- >Dec). Should we be holding back from those planned dates? E.g. >shipping this on Nightly and Developer Edition only for the time being? > >Tab, can you confirm the timeline for this in the Blink world? > >JK It's fine if they ship on stable, but they should still be behind a flag that is off by default. We need to ensure there is little to no substantial compat risk, and then ship with it on by default in stable builds after giving web developers the heads up of the breaking change (we're thinking late 2016). Greg
Re: [css-text] Control Characters Roll Call on implementation
fantasai   Mon, 21 Sep 2015 13:49:55 -0400

www-style > September 2015 > 0000.html

Received on Monday, 21 September 2015 17:50:30 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org.

On 09/21/2015 11:54 AM, Greg Whitworth wrote: > > It's fine if they ship on stable, but they should still be behind > a flag that is off by default. We need to ensure there is little > to no substantial compat risk, and then ship with it on by default > in stable builds after giving web developers the heads up of the > breaking change (we're thinking late 2016). If Chrome and Firefox are comfortable doing a coordinated release in December, I don't think we need to hold them back. If they're not, we can of course go with a longer timeline to loop in Microsoft and Apple. ~fantasai
Re: [css-text] Control Characters Roll Call on implementation
Jonathan Kew   Tue, 22 Sep 2015 08:48:44 +0100

www-style > September 2015 > 0000.html

Received on Tuesday, 22 September 2015 07:49:30 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: www-style@w3.org, jackalmage@gmail.com.

On 21/9/15 18:49, fantasai wrote: > On 09/21/2015 11:54 AM, Greg Whitworth wrote: >> >> It's fine if they ship on stable, but they should still be behind >> a flag that is off by default. We need to ensure there is little >> to no substantial compat risk, and then ship with it on by default >> in stable builds after giving web developers the heads up of the >> breaking change (we're thinking late 2016). > > If Chrome and Firefox are comfortable doing a coordinated release > in December, I don't think we need to hold them back. If they're not, > we can of course go with a longer timeline to loop in Microsoft and > Apple. Tab: can we get an update on the Chrome team's plans here, please? As of now, we're on track to ship this (i.e. ON-by-default) in Firefox on the release channel in December; but will put it back behind an OFF-by-default flag if the longer timeline is preferred. JK
Re: [css-text] Control Characters Roll Call on implementation
"Tab Atkins Jr."   Wed, 23 Sep 2015 16:54:07 -0700

www-style > September 2015 > 0000.html

Received on Wednesday, 23 September 2015 23:54:54 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jfkthame@gmail.com
Copied to: www-style@w3.org.

On Tue, Sep 22, 2015 at 12:48 AM, Jonathan Kew <jfkthame@gmail.com> wrote: > On 21/9/15 18:49, fantasai wrote: >> On 09/21/2015 11:54 AM, Greg Whitworth wrote: >>> It's fine if they ship on stable, but they should still be behind >>> a flag that is off by default. We need to ensure there is little >>> to no substantial compat risk, and then ship with it on by default >>> in stable builds after giving web developers the heads up of the >>> breaking change (we're thinking late 2016). >> >> >> If Chrome and Firefox are comfortable doing a coordinated release >> in December, I don't think we need to hold them back. If they're not, >> we can of course go with a longer timeline to loop in Microsoft and >> Apple. > > Tab: can we get an update on the Chrome team's plans here, please? As of > now, we're on track to ship this (i.e. ON-by-default) in Firefox on the > release channel in December; but will put it back behind an OFF-by-default > flag if the longer timeline is preferred. I just talked to our implementor - the plan right now is for it to ship in Chrome 47 (early Dec) flagged off, then flag on in 48 (mid-Jan). ~TJ
RE: [css-text] Control Characters Roll Call on implementation
Greg Whitworth   Wed, 20 Apr 2016 18:53:00 +0000

www-style > April 2016 > 0000.html

Received on Wednesday, 20 April 2016 18:53:30 UTC

Show in list: by dateby threadby subjectby author

Link to this message in this page.

Sent to: jackalmage@gmail.com, jfkthame@gmail.com, smfr@me.com
Copied to: www-style@w3.org.

>> On 21/9/15 18:49, fantasai wrote: >>> On 09/21/2015 11:54 AM, Greg Whitworth wrote: >>>> It's fine if they ship on stable, but they should still be behind a >>>> flag that is off by default. We need to ensure there is little to no >>>> substantial compat risk, and then ship with it on by default in >>>> stable builds after giving web developers the heads up of the >>>> breaking change (we're thinking late 2016). >>> >>> >>> If Chrome and Firefox are comfortable doing a coordinated release in >>> December, I don't think we need to hold them back. If they're not, we >>> can of course go with a longer timeline to loop in Microsoft and >>> Apple. >> >> Tab: can we get an update on the Chrome team's plans here, please? As >> of now, we're on track to ship this (i.e. ON-by-default) in Firefox on >> the release channel in December; but will put it back behind an >> OFF-by-default flag if the longer timeline is preferred. > >I just talked to our implementor - the plan right now is for it to ship in Chrome >47 (early Dec) flagged off, then flag on in 48 (mid-Jan). > >~TJ Following up here, this is currently behind a flag on Firefox/MS Edge and doesn't seem to be in Chrome. I'm currently not sure about Safari, any updates here. Remember that we had a tentative agreement to get this behind a flag by November of 2015 and then determine a ship timeframe that worked for all vendors. Thanks, Greg