This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10802 - Limit the number of identical items on the list of active formatting elements by removing previous duplicates when adding new items
Summary: Limit the number of identical items on the list of active formatting elements...
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-29 11:05 UTC by Henri Sivonen
Modified: 2010-10-15 22:56 UTC (History)
7 users (show)

See Also:


Attachments

Description Henri Sivonen 2010-09-29 11:05:00 UTC
Please add the scheme described in http://lists.w3.org/Archives/Public/public-html/2010Sep/0163.html to the spec.

I'll suggest specific values for the tunable constants when I've analyzed the data Philip kindly provided on this topic. (I'm filing this bug now in order to have it on file before the deadline.)
Comment 1 Ian 'Hixie' Hickson 2010-09-30 09:26:25 UTC
See the comment in bug 10801. I'm skeptical about specifying a specific algorithm here.
Comment 2 Ian 'Hixie' Hickson 2010-10-12 07:48:48 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: please see bug 10801 comment 1, but s/stack/list/.
Comment 3 Henri Sivonen 2010-10-13 12:50:25 UTC
Philip ran an instrumented parser over 422814 pages that parsed successfully.
Here's an analysis of that data:

maxNonFontDuplicates (cutoff: 0.999000)
0.9422: <= 0
0.9868: <= 1
0.9928: <= 2
0.9953: <= 3
0.9965: <= 4
0.9971: <= 5
0.9975: <= 6
0.9980: <= 7
0.9983: <= 8
0.9986: <= 9
0.9987: <= 10
0.9989: <= 11
Max: 7687

maxFontDuplicates (cutoff: 0.999000)
0.9468: <= 0
0.9826: <= 1
0.9890: <= 2
0.9918: <= 3
0.9933: <= 4
0.9943: <= 5
0.9950: <= 6
0.9956: <= 7
0.9960: <= 8
0.9966: <= 9
0.9969: <= 10
0.9973: <= 11
0.9975: <= 12
0.9977: <= 13
0.9978: <= 14
0.9980: <= 15
0.9981: <= 16
0.9982: <= 17
0.9982: <= 18
0.9985: <= 19
0.9986: <= 20
0.9986: <= 21
0.9987: <= 22
0.9987: <= 23
0.9988: <= 24
0.9988: <= 25
0.9988: <= 26
0.9989: <= 27
0.9989: <= 28
0.9990: <= 29
Max: 6829
This means that when adding a non-<font> formatting element to the list of formatting elements, on 94% of pages there was no identical element (element name and all attribute names and values matching) on the list *after the latest marker if any* already. On 99% of pages, there were 2 or fewer duplicates already on the list (after the latest marker if any). The worst case seen was 7687 duplicates.

In the case of <font> duplicates, on 99% of pages, there were 3 or fewer duplicates already on the list (after the latest marker if any). The worst case seen was 6829 duplicates.

So the worst cases are really crazy, so it makes sense to pick some limits. Furthermore, very low limits take care of the vast majority of cases. I'd be inclined not to differentiate between <font> and non-<font>, and simply allowing a maximum of two identical elements already on the list when adding a third.

Again, please see http://lists.w3.org/Archives/Public/public-html/2010Sep/0163.html for how to deal with removing duplicates.

I think it would make sense to put the limit in the spec, because it would suck if an HTML5-compliance scoring site like http://html5test.com/ put 4 identical formatting start tags in a test case and called an implementation non-conforming.
Comment 4 Henri Sivonen 2010-10-13 13:48:32 UTC
I did some further testing. I implemented my suggestion from this bug and bug 10801. The I extracted a list of pages that exceeded the limits from Philip's data. Then I loaded 24 such pages in the build with the limits in place and in another browser. I saw no breakage in the build with the limits.

My choice of 24 pages wasn't random. I tried to pick pages where I could guess from the URL that they were unlikely to be filth I don't want to see.
Comment 5 Ian 'Hixie' Hickson 2010-10-13 18:34:13 UTC
Could you elaborate on what limit you would like to see specified?
Comment 6 Henri Sivonen 2010-10-14 11:59:44 UTC
(In reply to comment #5)
> Could you elaborate on what limit you would like to see specified?

I thought I covered this in comment 3.

If before adding an element to the list of active formatting elements, there are already more than 2 duplicates (after the last marker if any) of the element about to be added to the list, remove the earliest one. Then proceed with adding the element that you were about to add to the list. (AFAICT, "more than 2" can only be "3".)

Additionally, please edit the AAA: In step #1 of the AAA, if the first "If there is no such node" check is true, abort the AAA and process the token according to the rules for "any other end tag token".

(I could be persuaded that "more than 2" above should be "more than 1" instead.)
Comment 7 Ian 'Hixie' Hickson 2010-10-15 22:56:04 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Reluctantly and obtusely concurred with reporter's comments.
Comment 8 contributor 2010-10-15 22:56:28 UTC
Checked in as WHATWG revision r5638.
Check-in comment: Add in some hard-coded limits for dealing with unclosed formatting elements to limit the explosive growth of the list of formatting elements in commonly-seen cases.
http://html5.org/tools/web-apps-tracker?from=5637&to=5638