<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>13502</bug_id>
          
          <creation_ts>2011-08-01 16:02:11 +0000</creation_ts>
          <short_desc>Text run starting with composing character should be valid</short_desc>
          <delta_ts>2013-04-21 06:37:24 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>LC1 HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>VERIFIED</bug_status>
          <resolution>FIXED</resolution>
          
          <see_also>https://www.w3.org/Bugs/Public/show_bug.cgi?id=12400</see_also>
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>a11y</keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Shai Berger">shai</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>ayg</cc>
    
    <cc>eoconnor</cc>
    
    <cc>hsivonen</cc>
    
    <cc>ian</cc>
    
    <cc>kennyluck</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>xn--mlform-iua</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>51907</commentid>
    <comment_count>0</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-08-01 16:02:11 +0000</bug_when>
    <thetext>This is a continuation of bug #12400, which I have filed against the W3C validator. According to the validator, the sequence

&lt;h2 class=&quot;ddd&quot;&gt;&lt;span&gt;&amp;#x05de;&lt;/span&gt;&amp;#x0592;&lt;/h2&gt;

is invalid, because &quot;Text run starts with a composing character&quot;. In this sequence, 05de is the Hebrew Letter Mem, but 0592 is the composing character &quot;Hebrew Accent Segol&quot; (three dots displayed on top of the letter).

I remember finding that in the spec before, but now I can&apos;t. In fact, a Google search limited to the dev.w3.org site finds no references to &quot;text run&quot; that relate to HTML, and no references to &quot;composing character&quot; at all.

As discussed in #12400, Chrome, Firefox and Opera have no issue with this, and display the text as intended -- with different styles for the letter and the accent. Internet Explorer 9 does not. An attachment to said bug,
http://www.w3.org/Bugs/Public/attachment.cgi?id=973, is an HTML file exemplifying and explaining the issue.

So -- does the current html5 spec allow text runs beginning with composing characters?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>52018</commentid>
    <comment_count>1</comment_count>
    <who name="Aryeh Gregor">ayg</who>
    <bug_when>2011-08-02 22:21:01 +0000</bug_when>
    <thetext>I&apos;m going to bet that font support won&apos;t reliably permit styling a letter separately from its diacritics, in general.  Things like color, maybe, but I&apos;d be very surprised if you could get bold/italics/font-face/etc. to work reliably.  So I don&apos;t know how much sense it makes to allow this.  Something should specify how it&apos;s rendered if authors do it, though.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>53002</commentid>
    <comment_count>2</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-08-04 05:05:54 +0000</bug_when>
    <thetext>mass-moved component to LC1</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55627</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-08-22 22:42:56 +0000</bug_when>
    <thetext>Henri, what do you want the spec to say here? Should we have a section similar to &quot;Requirements relating to bidirectional-algorithm formatting characters&quot; that requires authors to not have lone combining characters?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57327</commentid>
    <comment_count>4</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2011-09-26 07:13:22 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; Henri, what do you want the spec to say here?

The validator&apos;s behavior is based on a draft of charmod. The basic assumption is that styling different parts of a grapheme cluster differently is not supported and if someone wants to discuss a combining character in isolation, they should combine it with U+0020.

If people who work on the text shaping subsystems of browsers actually want to support different parts of a grapheme cluster differently, the basic assumption needs revisiting. I don&apos;t work on text shaping. You should ask people who do.

If people who work on the relevant parts of rendering engines want to treat this as a supported feature, the validator should get out of the way.

Does different styling for a part of grapheme cluster make sense for any other properties than color (and opacity, which is analogous to tweaking the alpha channel)? What use cases are there for coloring different parts of the grapheme cluster differently?

(Given that, according to the reporter, IE9 doesn&apos;t support what&apos;s attempted in the test case, it&apos;s not totally crazy for the validator to whine about this.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57329</commentid>
    <comment_count>5</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-09-26 07:52:24 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; 
&gt; The validator&apos;s behavior is based on a draft of charmod.

For those of us who are not &quot;in&quot; the process, can you elaborate on what that is and where we can find it?

&gt; 
&gt; Does different styling for a part of grapheme cluster make sense for any other
&gt; properties than color (and opacity, which is analogous to tweaking the alpha
&gt; channel)? What use cases are there for coloring different parts of the grapheme
&gt; cluster differently?
&gt; 

The general use case for what I&apos;m asking is trying to emphasize one part of the grapheme cluster. This can make some sense even when discussing French accents in an educational setting, but it makes a lot of sense in writing systems where vowels show up as combining characters -- I&apos;m aware of Hebrew, Arabic and Thai, but there may be more.

Given this rationale, it would probably make sense to use, besides (generalized) color, the font-weight property. I can&apos;t come up with anything else that makes sense in general.

Thanks,
Shai.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57338</commentid>
    <comment_count>6</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2011-09-26 09:17:48 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; (In reply to comment #4)
&gt; &gt; 
&gt; &gt; The validator&apos;s behavior is based on a draft of charmod.
&gt; 
&gt; For those of us who are not &quot;in&quot; the process, can you elaborate on what that is
&gt; and where we can find it?

Oops. I meant charmod-norm:
http://www.w3.org/TR/charmod-norm/</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57339</commentid>
    <comment_count>7</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-09-26 10:28:26 +0000</bug_when>
    <thetext>(In reply to comment #6)
&gt; 
&gt; Oops. I meant charmod-norm:
&gt; http://www.w3.org/TR/charmod-norm/

Ironically, that document suggests (http://www.w3.org/TR/charmod-norm/#sec-Restrictions) SVG fonts as an alternative method to achieve the effects which are the subject of this bug. There&apos;s only one major browser which doesn&apos;t support SVG...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57375</commentid>
    <comment_count>8</comment_count>
    <who name="Aryeh Gregor">ayg</who>
    <bug_when>2011-09-26 22:03:50 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; If people who work on the relevant parts of rendering engines want to treat
&gt; this as a supported feature, the validator should get out of the way.

Comment #0 says it already works in Chrome, Firefox, and Opera.  (I didn&apos;t test it myself.)

&gt; What use cases are there for coloring different parts of the grapheme
&gt; cluster differently?

http://en.wikipedia.org/wiki/File:Example_of_biblical_Hebrew_trope.svg

The image highlights the diacritical marks to distinguish them from the main letters, and gives them different colors to distinguish vowels from cantillation marks.  I&apos;ve also seen the vowel mark sheva bolded independent of the letter it&apos;s under to signify that it&apos;s a sheva na instead of a sheva nach, a distinction that matters for pronunciation but which traditional Hebrew orthography doesn&apos;t make.

More theoretically, I could definitely imagine that it would be useful occasionally to emphasize a specific vowel mark in a Hebrew word.  Vowelized Hebrew can sometimes have three or four marks per letter, especially Biblical Hebrew.  If you&apos;re contrasting two words that differ only in diacritics, you need to actively draw the reader&apos;s attention to the difference if you want them to spot it.  I haven&apos;t personally seen this done, though.

Suffice it to say, there are definitely use-cases in Hebrew to be able to color or bold diacritics separately from the letter they&apos;re on.  It&apos;s not needed for normal typography or anything, though, more of a &quot;nice to have&quot; thing.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57376</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-09-26 22:14:18 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: see diff below
Rationale: I&apos;ve explicitly made the spec disallow isolated combining characters. If the use case is just colouring accents, then IMHO CSS should support that directly. It doesn&apos;t make any sense to style different parts of a combined character differently, since per Unicode, there is only one grapheme cluster involved; indeed, there might only be one glyph from the font being rendered, e.g. if the combination corresponds to a precomposed character, or if a ligature exists for that combination.

(The question of how such things should render is a CSS one.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57377</commentid>
    <comment_count>10</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-09-26 22:14:25 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r6590.
Check-in comment: Explicitly disallow combining characters at the start of text nodes.
http://html5.org/tools/web-apps-tracker?from=6589&amp;to=6590</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57389</commentid>
    <comment_count>11</comment_count>
    <who name="Aryeh Gregor">ayg</who>
    <bug_when>2011-09-27 00:24:54 +0000</bug_when>
    <thetext>Test case:

data:text/html,&lt;!doctype html&gt;
&lt;span style=font-size:7em&gt;
&lt;span style=color:blue&gt;&amp;%23x05de;&lt;/span&gt;&amp;%23x0592;
&amp;%23x05de;&amp;%23x0592;
&lt;/span&gt;

In both Firefox 8.0a2 and Chrome 15 dev on Ubuntu 11.04, this displays two identical grapheme clusters.  The base glyph in the first (right-hand) blue while the associated diacritic is black, but the display is otherwise unaffected, exactly as desired.  Opera 11.50 displays the diacritic in the first cluster as a box, refusing to combine it with the different-colored character.

This demonstrates that two major browsers already behave as desired in the cases we&apos;re interested in.  It&apos;s useful functionality, and there&apos;s no reason for the spec to make it invalid.  It might be that there are some cases where styling the diacritic differently from the base character makes no sense, but in some cases it does -- don&apos;t throw out the baby with the bathwater.  If you can identify specific markup that definitely doesn&apos;t make sense, make that specific markup invalid.

What does &quot;If the use case is just colouring accents, then IMHO CSS should support that directly&quot; mean?  I gave two real-world use-cases in comment 8, and both of them require being able to style some diacritics on a letter differently than others.  A CSS property like diacritic-color or whatever would not serve the use-cases.  It has to be possible to identify individual diacritics to style, and the only way to do that is to put tags in the markup.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57392</commentid>
    <comment_count>12</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-09-27 07:22:01 +0000</bug_when>
    <thetext>(In reply to comment #11)
&gt; Test case:
&gt; 
&gt; data:text/html,&lt;!doctype html&gt;
&gt; &lt;span style=font-size:7em&gt;
&gt; &lt;span style=color:blue&gt;&amp;%23x05de;&lt;/span&gt;&amp;%23x0592;
&gt; &amp;%23x05de;&amp;%23x0592;
&gt; &lt;/span&gt;
&gt; 
&gt; [...] Opera 11.50 displays the diacritic in the
&gt; first cluster as a box, refusing to combine it with the different-colored
&gt; character.
&gt; 
&gt; This demonstrates that two major browsers already behave as desired in the
&gt; cases we&apos;re interested in.

This is quite odd. While I see the same results for your test case (unsurprisingly, I&apos;m also running Ubuntu 11.04 and Opera 11.51), Opera does render diacritics with different color and font-weight in my example document (http://www.w3.org/Bugs/Public/attachment.cgi?id=973 which I have already linked above). I have tried to play a little with the data: test to make it more like my code, with no luck.  However, as my test document shows, the desired behavior is actually supported in all major browsers except IE.

&gt; It&apos;s useful functionality, and there&apos;s no reason
&gt; for the spec to make it invalid.  It might be that there are some cases where
&gt; styling the diacritic differently from the base character makes no sense, but
&gt; in some cases it does -- don&apos;t throw out the baby with the bathwater.  If you
&gt; can identify specific markup that definitely doesn&apos;t make sense, make that
&gt; specific markup invalid.
&gt; 

I agree, of course.

&gt; What does &quot;If the use case is just colouring accents, then IMHO CSS should
&gt; support that directly&quot; mean?  I gave two real-world use-cases in comment 8, and
&gt; both of them require being able to style some diacritics on a letter
&gt; differently than others. A CSS property like diacritic-color or whatever would
&gt; not serve the use-cases.  It has to be possible to identify individual
&gt; diacritics to style, and the only way to do that is to put tags in the markup.

I agree. As an example, in the Hebrew word &amp;#1502;&amp;#1489;&amp;#1513;&amp;#1500; (&quot;mevashel&quot;, cooking) proper voweling puts three different &quot;diacritics&quot; on the third letter (&amp;#1513;&amp;#1473;&amp;#1468;&amp;#1461;  a diacritic dot on the top right marking the letter as &quot;shin&quot; and not &quot;sin&quot;, a point in the middle that is like doubling a consonant in English, and the pair of dots at the bottom which are the vowel e). This is a common everyday word, not some contrived biblical example.

As a side point, the phrasing of the correction in the patch still allows for the &quot;cheating&quot; method I used to pacify the validator: make the text node begin with a RLM followed immediately by a combining character. All major browsers (except IE) then combine the combining character with the last character of the preceding text node, where that is possible.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57394</commentid>
    <comment_count>13</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-09-27 07:34:39 +0000</bug_when>
    <thetext>(In reply to comment #12)
&gt; 
&gt; [...] &amp;#1502;&amp;#1489;&amp;#1513;&amp;#1500; [...] &amp;#1513;&amp;#1473;&amp;#1468;&amp;#1461; 

Of course, those looked like letters when I edited the comment... my apologies.

data:text/html,&lt;!doctype html&gt;&amp;%231502;&amp;%231489;&amp;%231513;&amp;%231500;

data:text/html,&lt;!doctype html&gt;&amp;%231513;&amp;%231473;&amp;%231468;&amp;%231461;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57459</commentid>
    <comment_count>14</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-09-28 00:44:58 +0000</bug_when>
    <thetext>The Editor is correct, for the following reasons:

(1) Semantics must win over styling - as the editor has stated.
(2) When we count in the semantics, then browsers - contrary to what Aryeh is claiming - do not support combining characters that begins a text node very well at all.

Example:
  
While UAs will present/render 

    &lt;b&gt;accént&lt;/b&gt;

 as a single word, most of them, including VoiceOver, Internet Explorer, Firefox, Opera (Webkit is the exception) will present

    &lt;b&gt;acc&lt;span&gt;e&lt;/span&gt;&amp;#x301;nt&lt;/b&gt;

as two words.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57465</commentid>
    <comment_count>15</comment_count>
      <attachid>1032</attachid>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-09-28 02:01:02 +0000</bug_when>
    <thetext>Created attachment 1032
Effects when a text node begins with a combining character

For convenience, a data URI  of the attachment (doesn&apos;t work in IE):
  http://tinyurl.com/combining-char-in-text-node-st

The attachment file tests the CSS effects as well as the semantic effect of beginning a text node with a combining character.

The test shows

*  That it *is* possible - even  in Firefox and IE - to *visually* get the effect that Aryeh and Shai are after. However, in order to make it work, one must apply display:inline-block on the &apos;base character&apos;, which in turn causes the word to be treated as 2 or 3 words instead of as a single word. (This affectgs word break and other things.)

* That the same effect that is seen in Firefox and IE (due to the application of span{display:inline-block;}), can also be seen in Opera.

* That for Webkit, the test appears to be visually successful. However, if you test it in VoiceOver, you hear much the same thing as you can see in Firefox, IE and Opera: the word is split up.

* That very similar conceptual problems occurs if one tries to add the acute accent via CSS generated content.

PS: I should say that I have tried exactly the same thing that Aryeh and Shai describe in a Russian text where I wanted to add the accute to show word stress. As it was important to me that users could search and find words without having to type the accent, I ended up with some kind of :hover effect. (Today, Webkit excels in this regard - if you search for &apos;accent&apos;, then you will also find &apos;accént&apos; and &apos;acce&amp;#x0301;nt&apos;.)

PPS: It can actually a bad idea to merely place a &lt;span&gt; in the middle of a word even without any styling: acc&lt;b&gt;e&lt;/b&gt;nt. Reason:this appears to have the effect of making the word unfindable in IE (at least IE8). In other words, if you search for &apos;accent&apos; with IE&apos;s Find-in-window feature, you won&apos;t find the word.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57486</commentid>
    <comment_count>16</comment_count>
    <who name="Aryeh Gregor">ayg</who>
    <bug_when>2011-09-28 18:56:48 +0000</bug_when>
    <thetext>All that sounds like browsers&apos; word-breaking/find/etc. being buggy.  It doesn&apos;t justify making markup like this invalid.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57498</commentid>
    <comment_count>17</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-09-29 01:50:33 +0000</bug_when>
    <thetext>(In reply to comment #16)
&gt; All that sounds like browsers&apos; word-breaking/find/etc. being buggy.  It doesn&apos;t
&gt; justify making markup like this invalid.

Well, you used as argument that two browsers support the suggested behaviour. Since that is not true -  since the presense of mark-up  in every user agent tested, has at least one situation wehre an composed character that has been &quot;split up&quot; with mark-up, become interpreted as separated characters - one cannot use that as argument either.

But I agree that there is nothing wrong with mark-up around graphes, per se, if it would not have these side effects. (The very thing that such mark-up could affect normalization, does not sound convincing tomyself.) My main issue is that browsers have a road ahead of them when it comes to properly implement word-break/find/etc.

Btw, my objection to this being legal, is similar to my sceptisism towards the &lt;wbr&gt; tag: The &lt;wbr&gt; tag too has the effect of making words be treated as separate words, but without the authors proper awareness of the effect.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57502</commentid>
    <comment_count>18</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-09-29 11:43:30 +0000</bug_when>
    <thetext>There is a point that was evoked for me by Leif&apos;s earlier message: a significant distinction between the sets of combining characters under discussion. I&apos;ve sort of mentioned it in passing before, but I think it&apos;s a point that should be made more central.

Some of the characters I wish to emphasize separately from their base are indeed diacritics; such is the case for, e.g., 05C1 &quot;Hebrew Point Shin Dot&quot;. Anyone who can object to &quot;acce&lt;b&gt;&amp;#x0301;&lt;/b&gt;nt&quot; should also object to the equivalent with Shin Dot.

However, characters in the range 05B0--05BC (inclusive) are not diacritics in any sense but visual; they are our vowels. True, we tend to avoid using them in writing, and we have partial replacements for some of them in some contexts, but still: These are the vowels. The vowel &apos;e&apos;, in particular, has no replacement in any context in Hebrew; the only way to write it down is a combining character.

The change introduced by the editor makes the Hebrew equivalent of &quot;acc&lt;b&gt;e&lt;/b&gt;nt&quot; invalid. This seems to agree with Leif&apos;s PPS comment, and yet, I don&apos;t think the correct way to promote such a change (even if it is desired) is by enforcing it first on specific languages.

(as I noted before, the situation for Arabic and Thai is similar to the one in Hebrew: Vowels are combining characters; I cannot say much about the frequency of use of vowels and their possible replacements in those languages).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57525</commentid>
    <comment_count>19</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-09-29 21:25:27 +0000</bug_when>
    <thetext>(In reply to comment #18)

&gt; Anyone who can object to &quot;acce&lt;b&gt;&amp;#x0301;&lt;/b&gt;nt&quot; should also object to the
&gt; equivalent with Shin Dot.
&gt; 
&gt; However, characters in the range 05B0--05BC (inclusive) are not diacritics in
&gt; any sense but visual; they are our vowels.

How is that an argument? There is no such thing as &quot;right to have styled vowels&quot; ... ;-)

Beside, even if disallowed in HTML, you can get all you need via CSS. I even (re)discovered that IE and Firefox do not need that display:inline-block hack. All they need is that the &quot;base&quot; character and the combining character differ with regard to their respective font-weight values. Also, when using CSS, Find-in-Page tends to work a little better - in IE and Firefox, than otherwise. For Opera, I was unable to style the accent different from the base character - but at least I was able to to hold its hand: http://tinyurl.com/6yk2m9b

&lt;rant&gt;Each writing script has its advantages and disadvantages. For instance, Hebrew text runs are shorter than Latin runs, since there are no vowels there (and even if you have vowels, the text length doesn&apos;t increase).  As a user of of the Latin script where I must write vowels, I feel discriminated - for instance on Twitter!  It is even </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57618</commentid>
    <comment_count>20</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-10-01 21:22:17 +0000</bug_when>
    <thetext>(In reply to comment #19)
&gt; (In reply to comment #18)
&gt; 
&gt; &gt; Anyone who can object to &quot;acce&lt;b&gt;&amp;#x0301;&lt;/b&gt;nt&quot; should also object to the
&gt; &gt; equivalent with Shin Dot.
&gt; &gt; 
&gt; &gt; However, characters in the range 05B0--05BC (inclusive) are not diacritics in
&gt; &gt; any sense but visual; they are our vowels.
&gt; 
&gt; How is that an argument? There is no such thing as &quot;right to have styled
&gt; vowels&quot; ... ;-)
&gt; 

There is in Latin scripts... 

&gt; Beside, even if disallowed in HTML, you can get all you need via CSS. [...]
&gt; For Opera, I was unable to style the accent different from the base character -
&gt; but at least I was able to to hold its hand: http://tinyurl.com/6yk2m9b
&gt; 

1) This example relies on moving the combining character to a css &quot;content&quot; text run (which, then, starts with a combining character). It turns semantics into presentation, and assumes that an invalid HTML text run will still be a valid CSS text run.

2) This example doesn&apos;t work in Chromium (I mean the actual code, not just the redirect). It can probably be fixed to work there too, but I fear the specter of browser-specific code.

3) Since the graphic capability is, as you say, present in all browsers (I didn&apos;t check IE myself); and since nobody is seriously contemplating to forbid the marking of single letters in a word via markup; why, then, is it so important to forbid it for symbols which are combining characters?

I actually found an answer for this question in the charmod-norm draft (http://www.w3.org/TR/charmod-norm, linked earlier by Henri). It is required there that fully-normalized text does not include text-runs which begin with a combining character, because when such text-runs are concatenated (appended) to another text-run, normalization may change the characters involved or their order. As an example, &quot;acce&quot;+&quot;&amp;#x301;nt&quot; should normalize into &quot;acc</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57620</commentid>
    <comment_count>21</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-10-02 04:12:38 +0000</bug_when>
    <thetext>(In reply to comment #20)
&gt; (In reply to comment #19)
&gt; &gt; (In reply to comment #18)

&gt; &gt; &gt; Anyone who can object to &quot;acce&lt;b&gt;&amp;#x0301;&lt;/b&gt;nt&quot; should also object to the
&gt; &gt; &gt; equivalent with Shin Dot.
&gt; &gt; &gt; 
&gt; &gt; &gt; However, characters in the range 05B0--05BC (inclusive) are not diacritics in
&gt; &gt; &gt; any sense but visual; they are our vowels.
&gt; &gt; 
&gt; &gt; How is that an argument? There is no such thing as &quot;right to have styled
&gt; &gt; vowels&quot; ... ;-)
&gt; 
&gt; There is in Latin scripts... 

Out of curiosity, would you also like be able to put emphasiz on the vovels, like this: &amp;#x5d3;&lt;strong style=&quot;color:red;&quot; class=&apos;kamatz&apos;&gt;&amp;#x5b8;&lt;/strong&gt;&amp;#x5bc;&amp;#x5d2; ?

&gt; &gt; Beside, even if disallowed in HTML, you can get all you need via CSS. [...]
&gt; &gt; For Opera, I was unable to style the accent different from the base character -
&gt; &gt; but at least I was able to to hold its hand: http://tinyurl.com/6yk2m9b
&gt; &gt; 
&gt; 
&gt; 1) This example relies on moving the combining character to a css &quot;content&quot;
&gt; text run (which, then, starts with a combining character). It turns semantics
&gt; into presentation, and assumes that an invalid HTML text run will still be a
&gt; valid CSS text run.

It is nothing new that it is entirely possible to both enhance and clutter up the user expereince of the consumption of the underlying mark-up with the help of CSS.

It is also not - in theory - *necessary* to let the CSS content begin wtih a combining character. You might instead replace the entire content of the element - base letter and diacritics. And, in fact, that is probably what you should do. Then you ought to avoid the problem.

Actually, for Webkit, you don&apos;t need CSS generated content at all - you can instead rely on :first-letter. (In reality a CSS bug, of coures.)  Well, at least I was able to do so in this demo - which also contains a colored Hebrew vowel, colored in Firefox, Webkit/Chrome and Opera: 

http://tinyurl.com/6xw4rcm 

(In IE I could not get to work properly, so instead made sure that it did not work at all.)

That said: You have a point. Because, when one adds the diacritic via CSS, then browsers must either:

 a) ignore the CSS from a &apos;semantic&apos; point of view - that is: not 
     disturb the reader with the CSS content, but treat it as 
     decoration only. Since the combining letters are just &quot;colorizing&quot; 
     of the base letters, this works fine. (Not?)
     Often a) is perceived as the way CSS should work.
 b) combine text in mark-up and text in CSS, in to a meaningful/-less whole
 d) replace entire content with new content - which must then (of course) be read as normal text

The b) and the c) are &quot;on your side&quot; in the sense that - really, contrary to a common perception (see a)), there is not supposed to be any *functional* difference between adding these - or other - charactes via CSS or via mark-up. There is a principal difference, though: It is possible to disable/ignore the CSS, and then things will fall back to &quot;normal&quot;. It is in line with the &apos;progressive enhancement&apos; philosophy to enhance stuff with CSS, while keeping the unstyled mark-up functional in and by itself - without any styling.

&gt; 2) This example doesn&apos;t work in Chromium (I mean the actual code, not just the
&gt; redirect). It can probably be fixed to work there too, but I fear the specter
&gt; of browser-specific code.

I have not tested in Chromium - neither in the browser nor in the OS. But I have tested in Chrome - the browser, and it did work then. If it doesn&apos;t work (perfectly), then that might be a font issue, I gues - as fonts are a thing that I think varies on different platforms.

&gt; 3) Since the graphic capability is, as you say, present in all browsers (I
&gt; didn&apos;t check IE myself); and since nobody is seriously contemplating to forbid
&gt; the marking of single letters in a word via markup; why, then, is it so
&gt; important to forbid it for symbols which are combining characters?

Because we then ensure that it is possible to fall back to something that works. If you add mark-up around combining characters, then it breaks from the start - at least that is the situation today. But if you only add mark-up around &apos;logical characters&apos;, then, if the styling layer creates problems, one can fall back to the unstyled layer.

&gt; I actually found an answer for this question in the charmod-norm draft
&gt; (http://www.w3.org/TR/charmod-norm [ snip ]
&gt; &quot;acceB&quot;+&quot;Ant&quot; may normalize into &quot;acceABnt&quot;.

&gt; [snip] [But: ] (&quot;When data
&gt; transfer on the Web remained mostly unidirectional (from server to browser),
&gt; and where the main purpose was to render documents, the use of Unicode without
&gt; specifying additional details was sufficient&quot;. This still describes HTML, as
&gt; far as I am aware).

What that document says in the next sentences is true, though: It is not as unidirectional as you say, anymore.

Frankly, &quot;out of the box&quot;, I am not very able to evaluate what that document says - I can only use common sense. And fact is that it matters to fragment URIs whether it points to an @id value that is normalized or not: if it points to id=&quot;t</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57626</commentid>
    <comment_count>22</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-10-02 07:18:16 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: see diff given below
Rationale: I spoke with Mark Davis, who informed me that I was wrong. Unicode does intend to allow combining characters to be styled differently. So I&apos;ve reverted the earlier change.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57627</commentid>
    <comment_count>23</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-10-02 07:20:28 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r6611.
Check-in comment: Allow combining characters wherever, per Mark Davis.
http://html5.org/tools/web-apps-tracker?from=6610&amp;to=6611</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57629</commentid>
    <comment_count>24</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2011-10-02 15:09:59 +0000</bug_when>
    <thetext>(In reply to comment #22)

&gt; Rationale: I spoke with Mark Davis, who informed me that I was wrong. Unicode
&gt; does intend to allow combining characters to be styled differently.

OK. This seems to be correct. For documentation, her are some quotes from Unicode 6.0:

]]
5.11 Editing and Selection
   [ snip ]
Nonlinear Boundaries. Use of nonlinear boundaries divides any stacked element into parts. For example, picking a point halfway across a lam + meem ligature can represent the division between the characters. One can either allow highlighting with multiple rectangles or use another method such as coloring the individual characters.
   [ snip ]
In most editing systems, the code point is the smallest addressable item, so the selection and assignment of properties (such as font, color, letterspacing, and so on) cannot be done on any finer basis than the code point. Thus the accent on an </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>57932</commentid>
    <comment_count>25</comment_count>
    <who name="Shai Berger">shai</who>
    <bug_when>2011-10-07 07:36:07 +0000</bug_when>
    <thetext>Hi,

I waited a little for possible feedback from some other interested parties; it is apparently not forthcoming.

Thank you all for the enlightening discussion, and for the resolution.</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>1032</attachid>
            <date>2011-09-28 02:01:02 +0000</date>
            <delta_ts>2011-09-28 02:01:02 +0000</delta_ts>
            <desc>Effects when a text node begins with a combining character</desc>
            <filename>combining-chars-first-in-text-node.html</filename>
            <type>text/html;charset=UTF-8</type>
            <size>2062</size>
            <attacher name="Leif Halvard Silli">xn--mlform-iua</attacher>
            
              <data encoding="base64">77u/PCFET0NUWVBFIGh0bWw+PGh0bWwgY2xhc3M9Im0wemlsbGEiID48bWV0YSBjaGFyc2V0PSJV
VEYtOCIgLz4KPHRpdGxlPmNvbWJpbmluZyBjaGFyczwvdGl0bGU+CjxzdHlsZT4KaHRtbHtmb250
LWZhbWlseTpzYW5zLXNlcmlmfQpoMSxkdHt0ZXh0LWFsaWduOmNlbnRlcn0KZGx7bWF4LXdpZHRo
OjY0MHB4O21hcmdpbjphdXRvO30KZGR7Y29sb3I6Z3JlZW47Zm9udC1zaXplOjUwcHg7fWRke21h
cmdpbjphdXRvO3RleHQtYWxpZ246Y2VudGVyO2JhY2tncm91bmQtY29sb3I6bGlnaHRncmVlbn0K
ZGQubmFycm93e3dpZHRoOjEyMXB4O2JhY2tncm91bmQtY29sb3I6bGlnaHRibHVlfQoKLmxldHRl
ciwubGV0dGVyOmZpcnN0LWxldHRlciB7Y29sb3I6cmVkO2hlaWdodDowLjVlbTt9Ci8qIG1vemls
bGEg4oCUIGxpa2UgSUUg4oCUIG5lZWRzIHRoZSBsZXR0ZXIgdG8gYmUgaW5saW5lLWJsb2NrICAq
LwoubVwwemlsbGEgKi5sZXR0ZXIge2Rpc3BsYXk6LW1vei1pbmxpbmUtYm94O30KZHR7ZmxvYXQ6
bGVmdDtiYWNrZ3JvdW5kOndoaXRlO21hcmdpbi1yaWdodDotMzMlO3dpZHRoOjMzJTt0ZXh0LWFs
aWduOmxlZnQ7Zm9udDoxNnB4LzEgc2Fucy1zZXJpZjt9CmR0Lm5hcnJvd3tmbG9hdDpyaWdodDtt
YXJnaW4tcmlnaHQ6MDsgYmFja2dyb3VuZDpiZWlnZX0KCioubGV0dGVyLWNzc3tjb2xvcjpyZWQ7
fQoqLmxldHRlci1jc3M6Zmlyc3QtbGV0dGVyIHtkaXNwbGF5OmlubGluZS1ibG9jaztjb2xvcjpi
bHVlIWltcG9ydGFudDt9CioubGV0dGVyLWNzczphZnRlciB7Y29udGVudDoiXDAwMDMwMSAiO2Nv
bG9yOmdyZWVuIWltcG9ydGFudDt9Cjwvc3R5bGU+CjwhLS1baWYgSUVdPjxzdHlsZT4KLmxldHRl
cntkaXNwbGF5OmlubGluZS1ibG9ja30KPC9zdHlsZT48IVtlbmRpZl0tLT4KPC9oZWFkPjxib2R5
Pgo8aDE+QWNjdXRlIGFjY2VudCB2YXJpYW50czwvaDE+CjxkbD4KPGR0PlJFRkVSRU5DRTogcHJl
Y29tcG9zZWQgbGV0dGVyIHdpdGggYWNjdXRlIGFjY2VudDwvZHQ+CjxkZD5hY2PDqW50PC9kZD4K
PGR0IGNsYXNzPSJuYXJyb3ciPlJFRkVSRU5DRTogbmFycm93IGJsb2NrIHZhcmlhbnQ8L2R0Pgo8
ZGQgY2xhc3M9Im5hcnJvdyIgPmFjY8OpbnQ8L2RkPgo8ZHQ+UkVGRVJFTkNFOiBhbiB1bnN0eWxl
ZCBzcGFuIGluIHRoZSBtaWRkbGUgb2YgdGhlIHdvcmQ8L2R0Pgo8ZGQ+YWNjPHNwYW4+ZTwvc3Bh
bj5udDwvZGQ+CjxkZCBjbGFzcz0ibmFycm93Ij5hY2M8c3Bhbj5lPC9zcGFuPm50PC9kZD4KCgo8
ZHQ+Q29tYmluaW5nIGNoYXJhY3RlciBkb2VzIDxzdHJvbmc+bm90PC9zdHJvbmc+IGJlZ2luIHRl
eHQgbm9kZS48L2R0Pgo8ZGQ+YWNjZSYjeDMwMTtudDwvZGQ+CjxkdCBjbGFzcz0ibmFycm93Ij5u
YXJyb3cgYmxvY2sgdmFyaWFudDwvZHQ+CjxkZCBjbGFzcz0ibmFycm93IiA+YWNjZSYjeDMwMTtu
dDwvZGQ+CjxkdD5Db21iaW5pbmcgY2hhcmFjdGVyIDxzdHJvbmc+ZG9lczwvc3Ryb25nPiBiZWdp
biB0ZXh0IG5vZGU8L2R0Pgo8ZGQ+YWNjPHNwYW4gY2xhc3M9ImxldHRlciIgPmU8L3NwYW4+JiN4
MzAxO250PC9kZD4KPGR0IGNsYXNzPSJuYXJyb3ciPm5hcnJvdyBibG9jayB2YXJpYW50PC9kdD4K
PGRkIGNsYXNzPSJuYXJyb3ciID5hY2M8c3BhbiBjbGFzcz0ibGV0dGVyIiA+ZTwvc3Bhbj4mI3gz
MDE7bnQ8L2RkPgo8ZGQ+YWNjPHNwYW4gY2xhc3M9ImxldHRlciIgPmU8L3NwYW4+zIFudDwvZGQ+
CjxkdCBjbGFzcz0ibmFycm93Ij5uYXJyb3cgYmxvY2sgdmFyaWFudDwvZHQ+CjxkZCBjbGFzcz0i
bmFycm93IiA+YWNjPHNwYW4gY2xhc3M9ImxldHRlciIgPmU8L3NwYW4+zIFudDwvZGQ+CgoKPGR0
PkNvbWJpbmluZyBjaGFyYWN0ZXIgdmlhIDxzdHJvbmc+Q1NTIDo6YWZ0ZXJ7Y29udGVudDoqO308
L3N0cm9uZz4gPC9kdD4KPGRkPmFjYzxzcGFuIGNsYXNzPSJsZXR0ZXItY3NzIiA+ZTwvc3Bhbj5u
dDwvZGQ+CjxkdCBjbGFzcz0ibmFycm93Ij5uYXJyb3cgYmxvY2sgdmFyaWFudDwvZHQ+CjxkZCBj
bGFzcz0ibmFycm93IiA+YWNjPHNwYW4gY2xhc3M9ImxldHRlci1jc3MiID5lPC9zcGFuPm50PC9k
ZD4KPC9kbD4KCg==
</data>

          </attachment>
      

    </bug>

</bugzilla>