This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13098 - Clarify whether <wbr> has the same effect as the zero-width space character
Summary: Clarify whether <wbr> has the same effect as the zero-width space character
Status: RESOLVED NEEDSINFO
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P4 enhancement
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/text-lev...
Whiteboard:
Keywords:
Depends on: 9097
Blocks: 13120
  Show dependency treegraph
 
Reported: 2011-06-30 15:33 UTC by Leif Halvard Silli
Modified: 2011-08-04 05:15 UTC (History)
6 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2011-06-30 15:33:19 UTC
PROBLEM: 

<WBR> and SOFT HYPHEN are equivalent, but HTML5 fails to point this out.


THE EQUIVALENCE STATEMENT:

   (1) HTML5 defines <wbr> as  "a line break opportunity". This seems fully equivalentg to Unicode's definition of the semantics of the SOFT HYPHEN charact as a "conditional" or "discretional" hyphen.
http://unicode.org/reports/tr14/#SoftHyphen
http://www.unicode.org/charts/PDF/U0080.pdf 

   (2) It does also seem compatible with HTML4's definition: "The soft hyphen tells the user agent where a line break can occur." 
http://www.w3.org/TR/html401/struct/text#h-9.3.3

   (3) Further more, at least one Unicode expert  have already identified <wbr> as a SOFT HYPHEN:
http://www.cs.tut.fi/~jkorpela/HTML4.0/comments.html

   (4) Given the above, and becasue SOFT HYPHEN by default is invisible, representing it with the <wbr> element is very similar to representing it with &shy; or &#xad;  references: they are all simply an authoring helps to the problem that the soft hyphen might be invisible in the code.

   (5)  And unlike for example <br>, there is no white-space affecting dfference between representing SOFT HYPHEN as the element (<wbr>) or as the character (directly typed or character refrence).  Thus there is  strong link between the SOFT HYPHEN character and the <wbr> element.



THEREFORE:

Please clarify that the <wbr> element represents (or is synonymous wiht) the SOFT HYPHEN, and explicitly state that authors may use a character references instead of <wbr> if the author is only after a visual representation in the code.


POSITIVE EFFECTS OF RECTIFYING THIS BUG:

 #  It would fullfill the need for a succinct explanation of what <wbr> is. Becasue, as a result of its former obsolete/non-standard status, many authors/developers do need such an explantion. For example HTML4 has an entire section (3 paragraphs) about hyphenation, including the soft hyphen, where it doesn't mention <wbr> a single time:  http://www.w3.org/TR/html401/struct/text#h-9.3.3 And the XHTML 1.x specifications are not any different.

 # It would make the section on the <wbr> element authoritative. Which could serve to remove the curren confusion and misinformation about which Unicode character it represents. For example Wikipedia, at the time of when this bug was filed, incorrectly stated that <wbr>, quote: "performs the same function as zero-width space (U+200B)": http://en.wikipedia.org/w/index.php?title=HTML_element&oldid=433185609#wbr
Another instance of the same misunderstanding: http://www.princexml.com/bb/viewtopic.php?f=5&t=14

# It would clarify to authors that there is an alternative to using <wbr>. Such a thing would be useful since, due to <wbr>'s previous obsolete/non-standard status, the <wbr> element is not as well supported as - simply put - UNICODE.  (For example it seems that PrinceXML does not support <wbr> yet, whereas it has full and excellent support for &shy;. And, in addition, there are many authoring tools which do not support HTML5 yet, and therefore does not offer users to insert <wbr/> but which nevertheless supports UNICODE, and thus allows to insert the SOFT HYPHEN character or a character reference for it.)

# It would clarify to parser engine developers as well as to authoring tool developeres that they should treat SOFT HYPHEN and <wbr> the same way. For example, if an authoring tool is supposed to have a single SOFT HYPHEN insertion menu, then the the tool developer should be aware of the many opportunities he/she has when it comes to allowing the user to insert it.
Comment 1 Philip Jägenstedt 2011-06-30 15:56:15 UTC
They're not equivalent from a parsing perspective. If you have a <wbr> tag in the markup, that should result in a HTMLElement in the DOM, not a text node as when use &shy; or the actual unicode character.
Comment 2 Leif Halvard Silli 2011-06-30 17:36:02 UTC
(In reply to comment #1)

HTML5's current problem is that there is that confusion exists in the wild (see the links I mentioned above)  and that no account is made of the soft hyphen character's role in hyphenation.

Othewise, you are right.

However, similar to how it often is of no importance to the author whether a line break inside a <pre> element  is made with a <br> or with the line feed character, it does for practical purposes usually not matter whether <wbr> or the soft hypen character is used.

> should result in a HTMLElement in the DOM, not a text node as
> when use &shy; or the actual unicode character.

You make it seem as if &shy; is represented in the DOM differently from the directly typed soft hyphen character.

However, when I check with Live DOM Viewer (or a built in DOM inspector of the browser) , then both &shy; &#xad; as well as the directly typed character each seem to result the same kind of text node with: a text node containing an invisible (and even hard to select!) soft hyphen character.

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1049
Comment 3 Aryeh Gregor 2011-06-30 20:43:13 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: <wbr> is not equivalent to soft hyphen.  According to your UAX#14 link,

"""
Unlike U+2010 hyphen, which always has a visible rendition, the character U+00AD soft hyphen (shy) is an invisible format character that merely indicates a preferred intraword line break position. If the line is broken at that point, then whatever mechanism is appropriate for intraword line breaks should be invoked, just as if the line break had been triggered by another hyphenation mechanism, such as a dictionary lookup.
"""
http://unicode.org/reports/tr14/#SoftHyphen

This is not how <wbr> behaves: it never inserts a hyphen.  Consider this test-case in Firefox and Chrome:

http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1050

<wbr> breaks with no hyphen, while &shy; breaks with a hyphen.  Thus the two are different in existing implementations, and we can't require them to be the same.  (IE/Opera have a different interpretation of <wbr>, but it still doesn't match &shy;.)

It's possible that <wbr> is the same as &zwsp;, but the latter might have other effects that aren't coming to mind.
Comment 4 Leif Halvard Silli 2011-06-30 22:59:57 UTC
(In reply to comment #3)

A most helpful and educating reply. Thanks for the clear analysis. Btw, in addition to Firefox and Chrome, then Konqueror, W3m and Lynx support wbr as well.  IE8 only supports <wbr> in Quirks-Mode.  Opera does not seem to support it at all.  But both IE, Firefox, Opera and Webkit support the zero-width character, in all modes. (But Konqueror as well as the text browsers have problems with the directly typed zero-width space character.) 

To what extent <wbr> is useful, given its rather poor support, is not clear to me - one could perhaps just as well obsolete it. Zero-width space has particulary many synonymous ways in which it can be represented, and this in itself might be a reason to obsolete it:
     as<wbr>, 
     as directly typed, 
     as the 2 flavours of numerical character refences
     as 5 different named character refences - citing the named char ref table:
             NegativeMediumSpace;	 U+0200B	&#8203;
             NegativeThickSpace;	 U+0200B	&#8203;
             NegativeThinSpace;	 U+0200B	&#8203;
             NegativeVeryThinSpace;	 U+0200B	
             ZeroWidthSpace;	 U+0200B

     It seems like Firefox and Webkit trunk supports all these ways.
 
> It's possible that <wbr> is the same as &zwsp;, but the latter might have other
> effects that aren't coming to mind.


I renamed and reopened this bug, for the following reasons:

(1) It does indeed seem like the synonymous character is the zero-width space character. However, this does not change issue very much: There is still confusion out there about what <wbr> represents - even you and I are not 100% certain.  The spec should therefore clarify what character the <wbr> is synonymous with.   Such a clarification would not only be useful to web authors like myself. But it would also be useful and important when we making the HTML5 test suite: If both <wbr> and &#x200b; are meant to work the same way, then it would make sense to have a parallel test cases.

(2) Also, the spec should point out that <wbr>/zerowidthspace is *not* the same as the soft hyphen. Even Wikipedia explains that <wbr>/zwsp is related to - but different from - SHY: "Its semantics and HTML implementation are comparable to but different from the soft hyphen. See: http://en.wikipedia.org/wiki/Zero-width_space (An important effect of the differences between ZeroWidthSpace and Soft Hypen is that zero-width breaks the word so that it looks as several words [the name "word-break" is thus slightly misleading - as it is a *space* character]. Wheras SHY breaks the word so that it still looks like a single word. )

(3) The spec should clarify whether the <wbr> element in anyway is recommended over a character (reference) representation, of if it is entirely up to the author.
Comment 5 Leif Halvard Silli 2011-07-01 00:52:16 UTC
Discussion of <wbr> versus zero with space character found here:

http://krijnhoetmer.nl/irc-logs/whatwg/20100329
Comment 6 Aryeh Gregor 2011-07-01 17:23:58 UTC
Seems reasonable to consider defining it as equivalent to zwsp.  I'm not sure if there's any reason not to.  I'll point out that I use <wbr> in one application instead of &zwsp; so that it doesn't get copy-pasted along with the content -- I want line breaks to be allowed at certain points so a table doesn't stretch, but don't want invisible characters to be copied along if I post it someplace else.
Comment 7 Leif Halvard Silli 2011-07-01 22:57:03 UTC
(In reply to comment #6)

W.r.t. "so that table don't stretch, but don't want invisible characters":  are you certain that it is a <wbr> you need? Personally, in most cases, I would rather like to have a imaginary <shy> element. 

In other words: I don't feel that you justify <wbr> very well. It sounds more like a "we have it, so now we must defend it".

Because, in most cases, when a word is broken up, like that, then I would want a hyphen to be inserted, so that the reader can see that it is one word and not several words. (In English you use far less compound words - so perhaps it is easier for users for the English language  to be satisfied with the effect of <wbr>.)

So, while I understand the beneficial side-effects of <wbr>, <wbr> is far from ideal when it comes to its primary effect. 

Hyphenation can be done via the hyphen-minus character, the soft-hyphen character or via hyphenation dictionaries. I imagine that a <shy> would have become something inbetween: like &shy; it would overule the hyphenation dictionary (if any). But like a hyphenation dictiionary, it would not/should not leave invisible traces in the text that would be copied.

I feel that the case for <wbr> in the first place was easier to defend if we had a <shy> element. So perhaps you could open a bug for that? As is, <wbr> will be used also when it is not at all very approriate.
Comment 8 Leif Halvard Silli 2011-07-02 09:13:04 UTC
(In reply to comment #7)
> (In reply to comment #6)

Note that the spec has this usecase - which seems congruent with the zero width space character:

]] something which, for effect, is written as one long word [[

Whereas your usecase seems to be a single, long word which, for non-linguistic  (read: parser technical) reasons should be treated as individual words (rather than as the single word that is in reality is) separated by the <wbr> element.
Comment 9 Aryeh Gregor 2011-07-03 17:11:26 UTC
(In reply to comment #7)
> W.r.t. "so that table don't stretch, but don't want invisible characters":  are
> you certain that it is a <wbr> you need? Personally, in most cases, I would
> rather like to have a imaginary <shy> element. 

My use-case is a table of browser tests which has HTML in the cells, with one column for input, one for the output required by the spec, and one for output produced by the current browser.  Browsers typically don't break at angle brackets, so I have my script insert <wbr> in supporting browsers to let it break before < and after >.  However, if I copy-paste the HTML (which I often do), I don't want invisible characters changing the meaning unpredictably.  It will possibly add text nodes where there weren't any, changing behavior of the algorithms I'm using.

I don't claim this is a common use-case, but it's why I use <wbr> on that page.

> In other words: I don't feel that you justify <wbr> very well. It sounds more
> like a "we have it, so now we must defend it".

Yes, the barrier to retain existing features is much lower than the barrier to accept new features.
Comment 10 Aryeh Gregor 2011-07-03 17:17:39 UTC
On some additional review, it seems clear that <wbr> has a different meaning from zero-width space.  It's supposed to override the CSS white-space property, so it's supposed to work in <nobr> or <pre>, for instance.

*** This bug has been marked as a duplicate of bug 9097 ***
Comment 11 Leif Halvard Silli 2011-07-04 10:15:45 UTC
(In reply to comment #10)
> On some additional review, it seems clear that <wbr> has a different meaning
> from zero-width space. 

* The purpose of this bug report is to make HTML5 state how it is supposed to be.
* "It seems" is not good enough, unless you can poin to spec text.
* Based on your previous statements, you include such things as the fact that <wbr> (unlike <br>) does not result in in a white-space character when copied to a (non-html) clipboard, as part of the "different meaning". This, however, is not specified anywhere.

> It's supposed to override the CSS white-space property,
> so it's supposed to work in <nobr> or <pre>, for instance.

You are citing Maciej's zealot reading as presented in Bug 9350. My comment: http://www.w3.org/Bugs/Public/show_bug.cgi?id=9350#c8

The Editor has already added a link in Bug 9350 to Bug 9711 (now reopened). And Bug 9711 suggests that there should be no particular overriding - <wbr> should instead be treated just like ZeroWidthSpace is treated.

> *** This bug has been marked as a duplicate of bug 9097 ***

I reoponed, making it dependent of bug 9097. If bug 9097 is solved in accordance with the bug filer's original request, then this bug can be closed, because then it  has been clarified that <wbr> *is* equal to zero width space.  However, if bug 9097 is *not* solved in accordance with bug filer's original request, then HTML5 needs to explain the likeness and difference between <wbr> and ZeroWidthSpace. as requested by this bug.
Comment 12 Leif Halvard Silli 2011-07-06 16:42:46 UTC
Documentation of more confusion in the wild:

(*) Sitepoint: http://reference.sitepoint.com/html/wbr
     Site point's reference (sic) says: """ The wbr element’s purpose is to
suggest/hint to the browser where within a word/phrase would be the most
appropriate point for it to be broken (indicated with a hyphen) in the event
that the browser viewport or containing element is reduced in size such that
wrapping occurs. """
    When it compares it to a hyphen, then it has misunderstood it. <wbr> splits
a word into two words. 

(*) W3Schools http://www.w3schools.com/html5/tag_wbr.asp  
    Quote: 

    """ The <wbr> tag defines where in a word it would be ok to add a line-break """

    Clearly, W3Schools too does not recognize that it splits the word into *two* workds.
Comment 13 Aryeh Gregor 2011-07-06 21:08:03 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Change Description: no spec change
Rationale: This bug, as originally filed and then amended, postulated that <wbr> and zwsp are equivalent, and asked for that to be clarified.  However, per the current spec, they are not equivalent.  Making them equivalent is what bug 9097 requests, and I've left that open for Hixie to review the evidence you provided.

Could you please explain what this bug is supposed to do in addition to bug 9097?  If 9097 is fixed, <wbr> and zwsp will be defined as equivalent as requested.  If 9097 is wontfixed, this bug is invalid because <wbr> and zwsp are defined to *not* be equivalent: e.g., as now, <wbr> will be defined to create a break even in <pre>.  So I don't get what this bug is asking for now.
Comment 14 Michael[tm] Smith 2011-08-04 05:15:59 UTC
mass-move component to LC1