<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>18232</bug_id>
          
          <creation_ts>2012-07-18 17:46:22 +0000</creation_ts>
          <short_desc>`entities.json` uses invalid syntax and has incorrect content</short_desc>
          <delta_ts>2012-10-08 19:03:38 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          <see_also>https://www.w3.org/Bugs/Public/show_bug.cgi?id=12539</see_also>
    
    <see_also>https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430</see_also>
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#named-character-references</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Sam Ruby">rubys</assigned_to>
          <cc>davidc</cc>
    
    <cc>eoconnor</cc>
    
    <cc>ian</cc>
    
    <cc>mathias</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>rubys</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>70930</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-07-18 17:46:22 +0000</bug_when>
    <thetext>This was was cloned from bug 17490 as part of operation convergence.
Originally filed: 2012-06-14 19:29:00 +0000

================================================================================
 #0   contributor@whatwg.org                          2012-06-14 19:29:07 +0000 
--------------------------------------------------------------------------------
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html
Multipage: http://www.whatwg.org/C#named-character-references
Complete: http://www.whatwg.org/c#named-character-references

Comment:
`entities.json` is invalid syntax and incorrect content

Posted from: 78.20.165.163 by mathias@qiwi.be
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1173.0 Safari/537.1
================================================================================
 #1   Mathias Bynens                                  2012-06-14 19:30:30 +0000 
--------------------------------------------------------------------------------
Created attachment 1144
Valid, working version

Based on http://mathias.html5.org/tests/html/named-character-references/data.json
================================================================================
 #2   Mathias Bynens                                  2012-06-15 07:27:19 +0000 
--------------------------------------------------------------------------------
http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json 
currently has the following format:

    {
      &quot;&amp;AElig&quot;: { &quot;codepoints&quot;: [0x000C6], &quot;characters&quot;: &quot;\u00C6&quot; },
      …
    }

However, hexadecimal integer literals (although valid in JavaScript) aren’t
allowed in JSON.

The easiest solution would be to use the numerical value in decimal notation
instead, e.g. `198` instead of `0x000C6`.

Another solution would be to make the `codepoints` property an array of strings
instead of hexadecimal integers.

(You can check for JSON conformance using a tool like http://jsonlint.com/.)
================================================================================
 #3   Mathias Bynens                                  2012-06-15 07:41:41 +0000 
--------------------------------------------------------------------------------
Possible fix for `entity-processor-json.py`:

Replace:

    codes = &apos;0x&apos; + value[1:6] + &apos;, 0x&apos; + value[7:]

With:

    codes = str(int(value[1:6], 16)) + &apos;, &apos; + str(int(value[7:], 16))

And replace:

    codes = &apos;0x&apos; + value[1:]

With:

    codes = str(int(value[1:], 16))
================================================================================
 #4   Mathias Bynens                                  2012-06-16 07:14:21 +0000 
--------------------------------------------------------------------------------
Heads up: both http://www.whatwg.org/specs/web-apps/current-work/entities.json and http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json still show the old, invalid version.
================================================================================</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>71147</commentid>
    <comment_count>1</comment_count>
    <who name="David Carlisle">davidc</who>
    <bug_when>2012-07-19 10:52:37 +0000</bug_when>
    <thetext>A json format was also requested in bug 17994 . As noted there, there is a version available from


http://www.w3.org/2003/entities/2007/htmlmathml.json

this differs from the version that has been added to the spec in that it doesn&apos;t provide the values as integers, just as character strings (although it could do both if that is useful?) and more clearly distinguishes the ones without semicolons (which is useful for xml use as they aren&apos;t valid there).
bug 17994 can probably be closed in favour of this bug as there is now a json link in the spec.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74111</commentid>
    <comment_count>2</comment_count>
    <who name="Mathias Bynens">mathias</who>
    <bug_when>2012-09-20 06:18:30 +0000</bug_when>
    <thetext>http://www.whatwg.org/specs/web-apps/current-work/entities.json
and http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json have been updated and are now valid JSON.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75186</commentid>
    <comment_count>3</comment_count>
    <who name="Edward O&apos;Connor">eoconnor</who>
    <bug_when>2012-10-03 01:19:31 +0000</bug_when>
    <thetext>I&apos;ll look at this at the same time as &lt;https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430&gt;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75196</commentid>
    <comment_count>4</comment_count>
    <who name="Mathias Bynens">mathias</who>
    <bug_when>2012-10-03 06:52:49 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; I&apos;ll look at this at the same time as
&gt; &lt;https://www.w3.org/Bugs/Public/show_bug.cgi?id=14430&gt;.

See comment #2 — you could just merge in the new versions of the files mentioned there. Problem solved.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>75609</commentid>
    <comment_count>5</comment_count>
    <who name="Sam Ruby">rubys</who>
    <bug_when>2012-10-08 19:03:38 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If
you are satisfied with this response, please change the state of
this bug to CLOSED. If you have additional information and would
like the Editor to reconsider, please reopen this bug. If you would
like to escalate the issue to the full HTML Working Group, please
add the TrackerRequest keyword to this bug, and suggest title and
text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this
document:   http://dev.w3.org/html5/decision-policy/decision-policy-v2.html

Status: Accepted
Change Description:
https://github.com/w3c/html/commit/ad9564f1a335d0601427637879a0eafb7f0aecce
Rationale: accepted WHATWG change

Additional comments:

Probable original source for entity-processor-json.py:

http://damowmow.com/temp/entity-processor-json.txt

Unclear where to find unicode.xml, choosing to parse boilerplate/entities.inc instead.

Output produces matches the following modulo sort order:

http://www.whatwg.org/specs/web-apps/current-work/multipage/entities.json</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>