13108 – Add &zwsp; as named character reference for zero width space (U+200B)

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13108 - Add &zwsp; as named character reference for zero width space (U+200B)

Summary: Add &zwsp; as named character reference for zero width space (U+200B)

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P3 enhancement
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://dev.w3.org/html5/spec/named-ch...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-07-01 01:25 UTC by Leif Halvard Silli
Modified:	2014-07-28 22:00 UTC (History)
CC List:	11 users (show)

See Also:

Attachments

Description Leif Halvard Silli 2011-07-01 01:25:38 UTC

REQUEST:
&zwsp; as a named character reference for the Zero Width Space Character (U+200B)  should be obligatory for all HTML5 parsers to support

STATUS:
# Internet Explorer 6 - 9 support &zwsp 
# Many web authors seems to think &zwsp; *is* a character reference, see 
   http://www.google.no/search?hl=nn&q=%26zwsp; and 
   http://www.w3.org/Bugs/Public/show_bug.cgi?id=13098#c3
# Many others seem to wonder why &zwsp; *isnt't* a named character reference
    (seems like Ian is one of those:  http://krijnhoetmer.nl/irc-logs/whatwg/20100329 
   Jukka have said things lik that: http://www.webmasterkb.com/Uwe/Forum.aspx/dhtml/2648/Zero-width-space-still-unsafe
   Several at TPAC  2007-11-09 were positive: http://lists.w3.org/Archives/Public/public-html/2007Nov/0158.html

NOTES ABOUT IE's SUPPORT: 
# IE does *not* support omitting the semicolon for this character references. Hence it would not be necessary to require. (HTML5 already distinguishes between references where this is necessary for parsers to support and where it is not necessary for parsers to support.)
# IE supports &zwsp; in both no-quirks and quirks mode. 

TEST CASE:
# http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1051

POSITIVE EFFECTS:
# All browsers would behave the same way
# Less author confusion. Many web authors only tests one web browser and thus belive &zwsp; to be a character references already
# The point has been made for a long time already that it would be very logical to support &zwsp;
# It is short and easy to remember.

NEGATIVE EFFECTS:
# There are already 5 named character references for U+200B in HTML5.

Comment 1 Henri Sivonen 2011-07-01 07:24:55 UTC

IIRC, we specifically decided not to include IE's bidi formatting named characters. I don't remember what the rationale was, though.

If this is added to the spec, I think we should add all of IE's bidi formatting named characters instead of waiting for them to be proposed one by one.

Comment 2 David Carlisle 2011-07-01 08:48:48 UTC

As noted in the negative effects there is already a name for this (ZeroWidthSpace) although admittedly that's a bit long for zero width space:-)

The other four names are not really to be used; they are just there for historical legacy compatibility reasons in an xml context (where removing an entity definition can introduce fatal errors to a document)

Is there actually a use case for using 200B as opposed to 200C (zwnj) or 200D (zwj) or is it mainly just that IE supports it so people will try to use it? (IE supporting it may in fact be a good enough reason).

I think it's important that the XML and HTML entity sets stay in sync now we've finally got them aligned so if this is added we need to do a second edition of the xml entities spec.

http://www.w3.org/TR/2010/REC-xml-entity-names-20100401/

although we probably need to do a 2nd edition of that in any case, the editor's edition already contains information about the new Arabic mathematical alphabets in the 1E000 block and other characters added at Unicode 6.x.

http://www.w3.org/2003/entities/2007doc/Overview.html#changes20100401

As a general rule the Math WG has always resisted adding new names, as the potential introduction of fatal xml parse errors is a high price to pay for what is essentially a cosmetic and deprecated xml feature anyway. However the tradeoffs in an HTML context are different, so I'm not necessarily totally opposed to adding new characters that are requested by the HTML WG.

Comment 3 Anne 2011-07-01 11:14:53 UTC

Search for "pdf" in http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-March/014125.html to see prior discussion. I think it ended there last time we discussed this.

Comment 4 Ian 'Hixie' Hickson 2011-07-26 03:35:03 UTC

This is basically up to the people maintaining unicode.xml. The spec generator script just generates the list of named char refs from that automatically.

Comment 5 David Carlisle 2011-07-26 08:41:21 UTC

(In reply to comment #4)
> This is basically up to the people maintaining unicode.xml.

Which is basically me:-)

> The spec generator
> script just generates the list of named char refs from that automatically.

The Math WG has traditionally tried very hard not to add any new names (or remove any old ones) and almost certainly if a request had come for this name to the Math WG as a comment on MathML or the XML Entities specs then it would have been rejected on those grounds.

Not adding any new names would still be our preference, however having finally achieved a consistent set of entity names across html and xml languages, maintaining that consistency is important and we recognise that the HTML user base is somewhat larger than MathML's so  we would not block any requests for change if instructed by the HTML WG.

I don't think I can change the status of this report under the HTML WG policies, but I'd suggest that it be closed as won't fix, with the usual note that it can be raised as an issue and re-opened if that is thought necessary.

Comment 6 Anne 2011-07-26 14:19:02 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: <http://dev.w3.org/html5/decision-policy/decision-policy.html>.

Status: Rejected
Change Description: no spec change
Rationale: See comment 5. Instead of entity references use a numeric reference or simply use the actual character directly.

Comment 7 Michael[tm] Smith 2011-08-04 05:34:17 UTC

mass-move component to LC1

Comment 8 fantasai 2014-07-22 15:34:13 UTC

Reopening to add some more information and correct a few misconceptions. Also, given both the usability win and the existing implementation in IE, I'd like to see this fixed. (I can say I've mistakenly used &zwsp; multiple times, expecting it to work.)

Henri Sivonen wrote:
> IIRC, we specifically decided not to include IE's bidi formatting named
> characters. I don't remember what the rationale was, though.
>
> If this is added to the spec, I think we should add all of IE's bidi formatting
> named characters instead of waiting for them to be proposed one by one.

I think this is a fair, but invalid, concern. ZWSP is not related to bidi at all.

David Carlisle wrote:
> Is there actually a use case for using 200B as opposed to 200C (zwnj) or
> 200D (zwj) or is it mainly just that IE supports it so people will try
> to use it? (IE supporting it may in fact be a good enough reason).

Yes, they are in fact quite different:
  ZWSP - Breaks a word (and therefore also Arabic joining) with no visible space.
  ZWJ  - Not a word break. Forces joining behavior.
  ZWNJ - Not a word break, but breaks joining.

Unless you are writing in a shaped script like Arabic, using ZWNJ or ZWJ is not useful. However, ZWSP provides an invisible break opportunity, like <wbr>.

Comment 9 David Carlisle 2014-07-22 15:50:45 UTC

Note that U+200B; can be accessed as &#x200b; or &ZeroWidthSpace; already so adding &zwsp; as an alias for that isn't really adding any new functionality
even if it were added then the the fallback on legacy systems that don't define it is sufficiently bad (printing "&zwsp;" in html or making the entire document be ill formed in XML) that the advice to users would be to use one of the existing forms.

Thus my preference would be not to add a name (we have only added one name since 1998) but as I said in comment #2 if the consensus is to add it I would of course update unicode.xml so it worked its way into mathml and html and the xml entities specs.

Comment 10 Ian 'Hixie' Hickson 2014-07-28 22:00:45 UTC

Closing per comment 5 and comment 9.

Note that you can just write "<wbr>" in HTML. It's actually fewer characters than the proposed "&swsp;", and does the same as &#x200b; or &ZeroWidthSpace;.