This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12400 - Inconsistent treatment of combining characters beginning text run
Summary: Inconsistent treatment of combining characters beginning text run
Status: REOPENED
Alias: None
Product: HTML Checker
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Michael[tm] Smith
QA Contact: qa-dev tracking
URL: http://webkeys.platonix.co.il/layouts...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-30 08:00 UTC by Shai Berger
Modified: 2015-08-23 06:58 UTC (History)
1 user (show)

See Also:


Attachments
explanation and exemplification of problem (1.56 KB, text/html)
2011-03-30 08:00 UTC, Shai Berger
Details

Description Shai Berger 2011-03-30 08:00:46 UTC
Created attachment 973 [details]
explanation and exemplification of problem

Hi,

I need to present a base character with a combining character,
while emphasizing the combining character and demoting the base
character to "background" status. In other words, I want to present
a combination with different styles for the combined parts.

According to the validator, a text run should not begin with a combining
character. This makes it impossible for me to get what I want, as
:first-letter applies a style to the combination as a whole (obviously).

I was able to "work around" the validation issue by inserting a
zero-width character as first in the run, between my base character
and the combining one. Since this is clearly cheating, I think
the validator should have caught me.

I've tested a page with the workaround (the URL for this bug) on Chrome 10,
Firefox 3.6 and Internet Explorer 9. Chrome and Firefox render it the way
I want, IE was confused and did not render the combining character at all.

Actually, I'm not sure if this is a problem in the validator or the HTML5 spec,
but I thought filing it here was a good way to push it towards the right people.
Comment 1 Michael[tm] Smith 2013-04-21 01:32:44 UTC
(In reply to comment #0)
> Actually, I'm not sure if this is a problem in the validator or the HTML5
> spec,
> but I thought filing it here was a good way to push it towards the right
> people.

I believe that the validator is conforming here to the behavior required by the HTML5 spec -- or maybe by other specs that the HTML5 normatively references. I think the error message for this case is coming from the HTML parser code that the validator uses, not the validator code itself.

One way you can check that is, do "View source" on your test file in Firefox, which uses the same HTML parser code as the validator. Firefox's View-source feature will mark in red any parsing problems it finds. So if you see that it's flagging the same problem that the validator is reporting, then you know it's due to that HTML parser code.

And that HTML parser code attempts to conform to the HTML5 spec. So if you'd like to see a spec change around this, please consider posting the details to either public-html@w3.org or whatwg@whatwg.org
Comment 2 Shai Berger 2013-04-21 06:37:24 UTC
Since this bug was filed, several things changed. One of them is the page URL (updated). Another is that the issue was discussed in #13502 (added as "see also"), where it was concluded that both the use case (a combined character with separate styling for the separate parts) and the implementation (a text run starting with a combining character) are valid; further, they have always been valid.

According to comment 6 there (https://www.w3.org/Bugs/Public/show_bug.cgi?id=13502#c6), the validator's behavior is intentional and implements charmod-norm; according to the later discussion, charmod-norm does not apply to HTML. No change to the visible HTML spec was needed to fix this (charmod-norm was never referenced in the first place), but a comment to this effect was added to the document source.

So -- in the first case, where text runs do begin with combining characters, the validator's behavior is not conforming to the HTML5 spec or any normative reference.

In the second case, where a combining character follows a zero-width character, I'd expect a warning not because it violates a spec -- but because it makes no sense.

Following the resolution of #13502, I changed the referenced web application to produce text runs that begin with combining characters, and now it gets all these warnings from the validator. Viewing the source of the page does not show the relevant characters in red -- they are all in orange (as entity references); this strengthens the claim that the warning comes from the validator and not the HTML parser.

Thanks,
Shai.
Comment 3 Michael[tm] Smith 2013-06-26 18:11:28 UTC
(In reply to comment #2)
> discussed in bug 13502, where it was concluded that both the use case (a
> combined character with separate styling for the separate parts) and the
> implementation (a text run starting with a combining character) are valid;
> further, they have always been valid.
> 
> According https://www.w3.org/Bugs/Public/show_bug.cgi?id=13502#c6), the
> validator's behavior is intentional and implements charmod-norm; according
> to the later discussion, charmod-norm does not apply to HTML. No change to
> the visible HTML spec was needed to fix this (charmod-norm was never
> referenced in the first place), but a comment to this effect was added
> to the document source.
> 
> So -- in the first case, where text runs do begin with combining characters,
> the validator's behavior is not conforming to the HTML5 spec or any
> normative reference.

CCing Henri Sivonen, who's way more familiar with this than me...
Comment 4 Henri Sivonen 2013-06-27 14:28:26 UTC
Do all browsers support styling combining characters separately of the base character?
Comment 5 Shai Berger 2013-06-27 14:43:37 UTC
It is currently supported badly by Firefox and Chrome (used Chromium) on Linux; I suspect it is not supported by IE, and I don't know about others (but they all use Webkit now, don't they?...)

By "supported badly" I mean that applying separate styling to combining characters does render a separately styled character, but it is usually moved a little off the place it should be (less than a whole character width, but some).

I wasn't able to test this much on IE, as I'm not a Windows user. In the little tests I was able to do, IE failed to display the combining characters; I can't be sure if this was a problem with the feature or a font lacking the combining characters I used (Hebrew diacritics). 

Also, I'm not sure the others behave the same in this respect on all platforms.
Comment 6 Henri Sivonen 2013-08-16 09:46:41 UTC
(In reply to comment #5)
> It is currently supported badly by Firefox and Chrome (used Chromium) on
> Linux; I suspect it is not supported by IE

Sounds like this isn't interoperably supported by major browsers then! I suggest WONTFIX in the validator.
Comment 7 Shai Berger 2013-08-16 10:00:30 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > It is currently supported badly by Firefox and Chrome (used Chromium) on
> > Linux; I suspect it is not supported by IE
> 
> Sounds like this isn't interoperably supported by major browsers then! I
> suggest WONTFIX in the validator.

I was under the impression that the validator is supposed to express the W3 standards, not the current state of implementations.

Was I wrong?
Comment 8 Henri Sivonen 2013-08-16 10:16:17 UTC
The standards should be realistic about interop.
Comment 9 Michael[tm] Smith 2013-08-16 10:21:44 UTC
resolving wontfix per comment 6
Comment 10 Shai Berger 2013-08-16 10:53:21 UTC
(In reply to comment #8)
> The standards should be realistic about interop.

Then you should change the standards, overriding the conclusion reached in ticket 13502 (and the intentions of UNICODE, as detailed there).

To clarify: The problem is that the validator rejects something that the standards allow, on the ground that the browsers don't implement it well; effectively undermining the standards committees.

I find this decision unreasonable.

(I looked for some guidelines on whether it is OK for me to reopen a bug when I disagree with the decision, and couldn't find any; please accept my apology if this is a faux pas).
Comment 11 Michael[tm] Smith 2013-08-16 17:23:34 UTC
(In reply to comment #10)
> (In reply to comment #8)
> > The standards should be realistic about interop.
> 
> Then you should change the standards, overriding the conclusion reached in
> ticket 13502 (and the intentions of UNICODE, as detailed there).

If you want to have a change made to a particular spec, this bug is not the place to do it.

> To clarify: The problem is that the validator rejects something that the
> standards allow,

The validator doesn't "reject" it -- it emits a warning, not an error. A warning is appropriate here, given comment 5.

> on the ground that the browsers don't implement it well;

The fact that browsers don't implement it well is the reason the validator is emitting a warning. The warning is there to let users know that something they might be think will work correctly actually might not work as they expect.

> effectively undermining the standards committees.
> 
> I find this decision unreasonable.
> 
> (I looked for some guidelines on whether it is OK for me to reopen a bug
> when I disagree with the decision, and couldn't find any; please accept my
> apology if this is a faux pas).

The reason I moved the bug to "resolved" is that in comments here you've already heard from the maintainers of the validator, neither of whom is planning to take any action on the bug.
Comment 12 Shai Berger 2013-08-17 06:55:35 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #8)
> > > The standards should be realistic about interop.
> > 
> > Then you should change the standards, overriding the conclusion reached in
> > ticket 13502 (and the intentions of UNICODE, as detailed there).
> 
> If you want to have a change made to a particular spec, this bug is not the
> place to do it.
> 

It was Henri who suggested there was something wrong with the spec. I would like to see it implemented as it stands.

> > To clarify: The problem is that the validator rejects something that the
> > standards allow,
> 
> The validator doesn't "reject" it -- it emits a warning, not an error. A
> warning is appropriate here, given comment 5.
> 
> > on the ground that the browsers don't implement it well;
> 
> The fact that browsers don't implement it well is the reason the validator
> is emitting a warning. The warning is there to let users know that something
> they might be think will work correctly actually might not work as they
> expect.
> 

I see. In that case, perhaps the warning message can be made clearer.

> 
> The reason I moved the bug to "resolved" is that in comments here you've
> already heard from the maintainers of the validator, neither of whom is
> planning to take any action on the bug.

Will you accept a patch to this effect (make it clearer that the warning is about current implementation in browsers)?
Comment 13 Michael[tm] Smith 2013-08-17 13:03:53 UTC
(In reply to comment #12)
> > If you want to have a change made to a particular spec, this bug is not the
> > place to do it.
> 
> It was Henri who suggested there was something wrong with the spec. I would
> like to see it implemented as it stands.

So then I guess what you should probably do is file bugs in the Mozilla, Chrome, and WebKit bug trackers at least.

> > The fact that browsers don't implement it well is the reason the validator
> > is emitting a warning. The warning is there to let users know that something
> > they might be think will work correctly actually might not work as they
> > expect.
> 
> I see. In that case, perhaps the warning message can be made clearer.

Yeah, maybe so.

> > The reason I moved the bug to "resolved" is that in comments here you've
> > already heard from the maintainers of the validator, neither of whom is
> > planning to take any action on the bug.
> 
> Will you accept a patch to this effect (make it clearer that the warning is
> about current implementation in browsers)?

We'll review any patch submitted and if it's an improvement, it'll likely end up getting checked in. But if you have some suggestion for improved wording, you can  just post it here and we can see if Henri's agreeable to it and go from there.
Comment 14 Shai Berger 2013-09-13 14:15:26 UTC
Chromium bug: https://code.google.com/p/chromium/issues/detail?id=290906
Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=916102

Thanks for the suggestions.

For the improved text: Instead of the current "Text run starts with a composing character.", how about:

"Separate styling for composing characters is not supported well by current browsers. If no separate styling is intended, avoid starting text runs with composing characters."

(in case this "standard but not supported well" is the whole sense of warnings, then perhaps this is not the place to add it, but in a more general place in the validator output).

Thanks again,

Shai.