This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12539 - The numeric references to produce the gyphs in the third column should use the characters listed in the second columns. lang (and aliases) list (correctly) U+27ea, but the glyph is produced by #9001 (U+2329) which is not in normal form C and generates va
Summary: The numeric references to produce the gyphs in the third column should use th...
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Edward O'Connor
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
: 17170 (view as bug list)
Depends on: 19489
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-22 13:03 UTC by contributor
Modified: 2013-05-14 21:04 UTC (History)
9 users (show)

See Also:


Attachments

Description contributor 2011-04-22 13:03:12 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
Section: http://www.whatwg.org/specs/web-apps/current-work/#named-character-references-table

Comment:
The numeric references to produce the gyphs in the third column should use the
characters listed in the second columns. lang (and aliases)  list (correctly)
U+27ea, but the glyph is produced by #9001 (U+2329) which is not in normal
form C and generates validation errors in (eg) validaor.nu

Posted from: 80.177.31.128
User agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1 Firefox/4.0.1
Comment 1 David Carlisle 2011-04-22 13:04:58 UTC
(was posted by me, adding myself to CC)
Comment 2 Michael[tm] Smith 2011-04-23 11:55:33 UTC
This error is not in the single-page spec source, but instead only in the multipage version, which is generated by a python script.

The spec source at http://svn.whatwg.org/webapps/index and http://www.whatwg.org/specs/web-apps/current-work/index have this:

<tr id=entity-LeftAngleBracket><td> <code title="">LeftAngleBracket;</code> </td> <td> U+027E8 </td> <td> <span class=glyph title="">&lang;</span> </td>

So it seems like the &lang; is getting changed into &#9001; by the python script that generates the multi-page version. But that script is not doing anything special of its own with entities, so I think the cause of the bug must either be in some python library that the script relies on.
Comment 3 Ian 'Hixie' Hickson 2011-07-14 23:49:10 UTC
Reassigning to Philip since he runs the script in question.
Comment 4 Michael[tm] Smith 2011-08-04 05:15:31 UTC
mass-move component to LC1
Comment 5 Michael[tm] Smith 2011-11-20 14:32:52 UTC
(In reply to comment #3)
> Reassigning to Philip since he runs the script in question.

Philip? Do you think you'll have any time soon to look into this?
Comment 6 Michael[tm] Smith 2011-12-20 19:45:29 UTC
workaround I'm using in the makefile that generates the author view is to do this:

$(PERL) -pi -e "s/#9001;/#x27E8;/g"
$(PERL) -pi -e "s/#9002;/#x27E9;/g"
Comment 7 Anne 2011-12-21 10:15:01 UTC
If this is a problem on the WHATWG copy, it is probably my problem. Or foolip, but I don't think he has time to work on this.
Comment 8 Michael[tm] Smith 2011-12-21 10:23:48 UTC
As far as I can tell, it's a python bug or maybe lxml bug. I think the simplest way to deal with it would be to have anolis and/or the splitter script do the s/#9001;/#x27E8;/g and s/#9002;/#x27E9;/ -- or run some post-processing script (perl or sed or python or whatever) on the anolis/splitter output to do it.
Comment 9 David Carlisle 2011-12-21 10:30:56 UTC
(In reply to comment #8)
> As far as I can tell, it's a python bug or maybe lxml bug. I think the simplest
> way to deal with it would be to have anolis and/or the splitter script do the
> s/#9001;/#x27E8;/g and s/#9002;/#x27E9;/ -- or run some post-processing script
> (perl or sed or python or whatever) on the anolis/splitter output to do it.


"bug" is probably a bit harsh, probably fairer to say you're processing the html(5) spec with an html4 parser, but it comes to the same thing, those entity references get the old/wrong values.
Comment 10 Anne 2011-12-21 10:40:31 UTC
Hixie: btw, did you say the link-fixup.js script was no longer included? Because it does seem problematic if that is no longer updated. I think when foolip patched the splitter he forgot to put that file in the right place. I can make sure it will be included again if I'm indeed correct about this. Let me know.
Comment 11 Anne 2011-12-21 10:52:30 UTC
I think I fixed it by instead of going through entities just emitting the character directly. Let me know if this is correct. http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
Comment 12 Anne 2011-12-21 11:23:47 UTC
It's still broken. The problem is either in Python or libxml. I suppose I could do post-processing on named-character-references.html although that feels somewhat sucky. Anyone know the appropriate bash? Note that it is no longer emitting entity references but the real characters.
Comment 13 Anne 2012-01-02 12:27:25 UTC
The problem is with lxml (specifically using it once for single-page and then using the output of that for multi-page). James made a fix in Anolis so when Hixie generates a new copy it should go fine.
Comment 14 Anne 2012-01-02 12:28:21 UTC
Hixie, if you answer comment 10 though that would be useful.
Comment 15 Ian 'Hixie' Hickson 2012-01-13 00:13:00 UTC
link-fixup.js is a static script. I now symlink my own copy in, so it's no longer an issue.
Comment 16 Mathias Bynens 2012-05-30 09:05:02 UTC
*** Bug 17170 has been marked as a duplicate of this bug. ***
Comment 17 Edward O'Connor 2012-10-03 15:51:26 UTC
I will address this along with <https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430> and <https://www.w3.org/Bugs/Public/show_bug.cgi?id=18232>, as I believe they are all caused by the same underlying problem.
Comment 18 Edward O'Connor 2012-10-12 21:54:35 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: https://github.com/w3c/html/commit/573ee3cd9b07533b66ff6fa6ca8b6eaf2a27d8bf
Rationale: Fixed.
Comment 19 Edward O'Connor 2013-05-14 18:54:58 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: https://github.com/w3c/html-tools/commit/5f9f4a20b520da183084bb6a0c28cdac869ba0cc
Rationale: Fixed (again, as the fix for bug 20702 broke this).
Comment 20 David Carlisle 2013-05-14 21:04:40 UTC
thanks, closing, confirm fixed in (at least)
http://www.w3.org/html/wg/drafts/html/master/syntax.html#named-character-references