This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18232 - `entities.json` uses invalid syntax and has incorrect content
Summary: `entities.json` uses invalid syntax and has incorrect content
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Sam Ruby
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 17:46 UTC by contributor
Modified: 2012-10-08 19:03 UTC (History)
8 users (show)

See Also:


Attachments

Description contributor 2012-07-18 17:46:22 UTC
This was was cloned from bug 17490 as part of operation convergence.
Originally filed: 2012-06-14 19:29:00 +0000

================================================================================
 #0   contributor@whatwg.org                          2012-06-14 19:29:07 +0000 
--------------------------------------------------------------------------------
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
Multipage: http://www.whatwg.org/C#named-character-references
Complete: http://www.whatwg.org/c#named-character-references

Comment:
`entities.json` is invalid syntax and incorrect content

Posted from: 78.20.165.163 by mathias@qiwi.be
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1173.0 Safari/537.1
================================================================================
 #1   Mathias Bynens                                  2012-06-14 19:30:30 +0000 
--------------------------------------------------------------------------------
Created attachment 1144 [details]
Valid, working version

Based on http://mathias.html5.org/tests/html/named-character-references/data.json
================================================================================
 #2   Mathias Bynens                                  2012-06-15 07:27:19 +0000 
--------------------------------------------------------------------------------
http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json 
currently has the following format:

    {
      "&AElig": { "codepoints": [0x000C6], "characters": "\u00C6" },
      …
    }

However, hexadecimal integer literals (although valid in JavaScript) aren’t
allowed in JSON.

The easiest solution would be to use the numerical value in decimal notation
instead, e.g. `198` instead of `0x000C6`.

Another solution would be to make the `codepoints` property an array of strings
instead of hexadecimal integers.

(You can check for JSON conformance using a tool like http://jsonlint.com/.)
================================================================================
 #3   Mathias Bynens                                  2012-06-15 07:41:41 +0000 
--------------------------------------------------------------------------------
Possible fix for `entity-processor-json.py`:

Replace:

    codes = '0x' + value[1:6] + ', 0x' + value[7:]

With:

    codes = str(int(value[1:6], 16)) + ', ' + str(int(value[7:], 16))

And replace:

    codes = '0x' + value[1:]

With:

    codes = str(int(value[1:], 16))
================================================================================
 #4   Mathias Bynens                                  2012-06-16 07:14:21 +0000 
--------------------------------------------------------------------------------
Heads up: both http://www.whatwg.org/specs/web-apps/current-work/entities.json and http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json still show the old, invalid version.
================================================================================
Comment 1 David Carlisle 2012-07-19 10:52:37 UTC
A json format was also requested in bug 17994 . As noted there, there is a version available from


http://www.w3.org/2003/entities/2007/htmlmathml.json

this differs from the version that has been added to the spec in that it doesn't provide the values as integers, just as character strings (although it could do both if that is useful?) and more clearly distinguishes the ones without semicolons (which is useful for xml use as they aren't valid there).
bug 17994 can probably be closed in favour of this bug as there is now a json link in the spec.
Comment 3 Edward O'Connor 2012-10-03 01:19:31 UTC
I'll look at this at the same time as <https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430>.
Comment 4 Mathias Bynens 2012-10-03 06:52:49 UTC
(In reply to comment #3)
> I'll look at this at the same time as
> <https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430>.

See comment #2 — you could just merge in the new versions of the files mentioned there. Problem solved.
Comment 5 Sam Ruby 2012-10-08 19:03:38 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If
you are satisfied with this response, please change the state of
this bug to CLOSED. If you have additional information and would
like the Editor to reconsider, please reopen this bug. If you would
like to escalate the issue to the full HTML Working Group, please
add the TrackerRequest keyword to this bug, and suggest title and
text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this
document:   http://dev.w3.org/html5/decision-policy/decision-policy-v2.html

Status: Accepted
Change Description:
https://github.com/w3c/html/commit/ad9564f1a335d0601427637879a0eafb7f0aecce
Rationale: accepted WHATWG change

Additional comments:

Probable original source for entity-processor-json.py:

http://damowmow.com/temp/entity-processor-json.txt

Unclear where to find unicode.xml, choosing to parse boilerplate/entities.inc instead.

Output produces matches the following modulo sort order:

http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json