<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>10802</bug_id>
          
          <creation_ts>2010-09-29 11:05:00 +0000</creation_ts>
          <short_desc>Limit the number of identical items on the list of active formatting elements by removing previous duplicates when adding new items</short_desc>
          <delta_ts>2010-10-15 22:56:28 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML5 spec (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P1</priority>
          <bug_severity>critical</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Henri Sivonen">hsivonen</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>ian</cc>
    
    <cc>james</cc>
    
    <cc>jonas</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>w3c</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>39924</commentid>
    <comment_count>0</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-09-29 11:05:00 +0000</bug_when>
    <thetext>Please add the scheme described in http://lists.w3.org/Archives/Public/public-html/2010Sep/0163.html to the spec.

I&apos;ll suggest specific values for the tunable constants when I&apos;ve analyzed the data Philip kindly provided on this topic. (I&apos;m filing this bug now in order to have it on file before the deadline.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40117</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-30 09:26:25 +0000</bug_when>
    <thetext>See the comment in bug 10801. I&apos;m skeptical about specifying a specific algorithm here.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40999</commentid>
    <comment_count>2</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-10-12 07:48:48 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: please see bug 10801 comment 1, but s/stack/list/.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41221</commentid>
    <comment_count>3</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-10-13 12:50:25 +0000</bug_when>
    <thetext>Philip ran an instrumented parser over 422814 pages that parsed successfully.
Here&apos;s an analysis of that data:

maxNonFontDuplicates (cutoff: 0.999000)
0.9422: &lt;= 0
0.9868: &lt;= 1
0.9928: &lt;= 2
0.9953: &lt;= 3
0.9965: &lt;= 4
0.9971: &lt;= 5
0.9975: &lt;= 6
0.9980: &lt;= 7
0.9983: &lt;= 8
0.9986: &lt;= 9
0.9987: &lt;= 10
0.9989: &lt;= 11
Max: 7687

maxFontDuplicates (cutoff: 0.999000)
0.9468: &lt;= 0
0.9826: &lt;= 1
0.9890: &lt;= 2
0.9918: &lt;= 3
0.9933: &lt;= 4
0.9943: &lt;= 5
0.9950: &lt;= 6
0.9956: &lt;= 7
0.9960: &lt;= 8
0.9966: &lt;= 9
0.9969: &lt;= 10
0.9973: &lt;= 11
0.9975: &lt;= 12
0.9977: &lt;= 13
0.9978: &lt;= 14
0.9980: &lt;= 15
0.9981: &lt;= 16
0.9982: &lt;= 17
0.9982: &lt;= 18
0.9985: &lt;= 19
0.9986: &lt;= 20
0.9986: &lt;= 21
0.9987: &lt;= 22
0.9987: &lt;= 23
0.9988: &lt;= 24
0.9988: &lt;= 25
0.9988: &lt;= 26
0.9989: &lt;= 27
0.9989: &lt;= 28
0.9990: &lt;= 29
Max: 6829
This means that when adding a non-&lt;font&gt; formatting element to the list of formatting elements, on 94% of pages there was no identical element (element name and all attribute names and values matching) on the list *after the latest marker if any* already. On 99% of pages, there were 2 or fewer duplicates already on the list (after the latest marker if any). The worst case seen was 7687 duplicates.

In the case of &lt;font&gt; duplicates, on 99% of pages, there were 3 or fewer duplicates already on the list (after the latest marker if any). The worst case seen was 6829 duplicates.

So the worst cases are really crazy, so it makes sense to pick some limits. Furthermore, very low limits take care of the vast majority of cases. I&apos;d be inclined not to differentiate between &lt;font&gt; and non-&lt;font&gt;, and simply allowing a maximum of two identical elements already on the list when adding a third.

Again, please see http://lists.w3.org/Archives/Public/public-html/2010Sep/0163.html for how to deal with removing duplicates.

I think it would make sense to put the limit in the spec, because it would suck if an HTML5-compliance scoring site like http://html5test.com/ put 4 identical formatting start tags in a test case and called an implementation non-conforming.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41227</commentid>
    <comment_count>4</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-10-13 13:48:32 +0000</bug_when>
    <thetext>I did some further testing. I implemented my suggestion from this bug and bug 10801. The I extracted a list of pages that exceeded the limits from Philip&apos;s data. Then I loaded 24 such pages in the build with the limits in place and in another browser. I saw no breakage in the build with the limits.

My choice of 24 pages wasn&apos;t random. I tried to pick pages where I could guess from the URL that they were unlikely to be filth I don&apos;t want to see.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41250</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-10-13 18:34:13 +0000</bug_when>
    <thetext>Could you elaborate on what limit you would like to see specified?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41358</commentid>
    <comment_count>6</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-10-14 11:59:44 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; Could you elaborate on what limit you would like to see specified?

I thought I covered this in comment 3.

If before adding an element to the list of active formatting elements, there are already more than 2 duplicates (after the last marker if any) of the element about to be added to the list, remove the earliest one. Then proceed with adding the element that you were about to add to the list. (AFAICT, &quot;more than 2&quot; can only be &quot;3&quot;.)

Additionally, please edit the AAA: In step #1 of the AAA, if the first &quot;If there is no such node&quot; check is true, abort the AAA and process the token according to the rules for &quot;any other end tag token&quot;.

(I could be persuaded that &quot;more than 2&quot; above should be &quot;more than 1&quot; instead.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41445</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-10-15 22:56:04 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Reluctantly and obtusely concurred with reporter&apos;s comments.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41446</commentid>
    <comment_count>8</comment_count>
    <who name="">contributor</who>
    <bug_when>2010-10-15 22:56:28 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r5638.
Check-in comment: Add in some hard-coded limits for dealing with unclosed formatting elements to limit the explosive growth of the list of formatting elements in commonly-seen cases.
http://html5.org/tools/web-apps-tracker?from=5637&amp;to=5638</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>