This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10801 - Limit the number of iterations in the loops in the AAA
Summary: Limit the number of iterations in the loops in the AAA
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P1 critical
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-29 11:02 UTC by Henri Sivonen
Modified: 2010-10-19 07:27 UTC (History)
8 users (show)

See Also:


Attachments

Description Henri Sivonen 2010-09-29 11:02:06 UTC
WebKit currently limits the number of iterations in the loops in the AAA. Gecko has corresponding patch waiting to land.

For interop, the spec should have normative limits here, and they should be as low as compat with existing content permits.

I'll suggest specific limits when I've analyzed the data Philip kindly provided on this topic.
Comment 1 Ian 'Hixie' Hickson 2010-09-30 09:18:31 UTC
I considered this when first writing the parser (it's not just the AAA where you really want limits  it's actually reasonable to put limits on the depth of the stack in general), but I was of the opinion then that that would fall under the "hardware limitations" clause and be left up to the UAs, since the limits would grow over time.

Also, there are different ways to handle limits. The most clever way to handle an infinite depth of <font>s, for example, is probably to drop some of the <font>s from the middle of the stack, preferably those with no attributes, or at least with no event handlers and no IDs.

It might make sense to suggest places for limits and how to handle them, though, especially if it is important for interop. If you have good suggestions for what to do there, please reopen the bug. I'd be happy to consider them.


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: Closing temporarily pending further data.
Comment 2 Henri Sivonen 2010-10-13 12:32:22 UTC
(In reply to comment #1)
> I considered this when first writing the parser (it's not just the AAA where
> you really want limits  it's actually reasonable to put limits on the depth of
> the stack in general), but I was of the opinion then that that would fall under
> the "hardware limitations" clause and be left up to the UAs, since the limits
> would grow over time.

It's not clear that we'd want to increase the limits over time in order to get diminishing returns from supporting even crazier legacy content as the time goes on.

> Also, there are different ways to handle limits. The most clever way to handle
> an infinite depth of <font>s, for example, is probably to drop some of the
> <font>s from the middle of the stack, preferably those with no attributes, or
> at least with no event handlers and no IDs.

The way both Gecko and WebKit currently handle AAA limits is that the inner loop and the outer loop have a cap on max iterations per loop entry. If the max is hit, the code exists the loop without any fancy cleanup.

> It might make sense to suggest places for limits and how to handle them,
> though, especially if it is important for interop. If you have good suggestions
> for what to do there, please reopen the bug. I'd be happy to consider them.

Unless there's a good reason to do something more fancy than what Gecko and WebKit do, I suggest picking a number for the max iterations of the outer loop per entry to the loop and another number for the max iterations of the inner loop per loop entry.

Philip ran an instrumented parser over 422814 pages that parsed successfully. Here's an analysis of that data:

maxInnerLoopEntries (cutoff: 0.999900)
0.9110: <= 0
0.9940: <= 1
0.9989: <= 2
0.9995: <= 3
0.9997: <= 4
0.9998: <= 5
0.9998: <= 6
0.9999: <= 7
0.9999: <= 8
Max: 114

maxOuterLoopEntries (cutoff: 0.999900)
0.0506: <= 0
0.9110: <= 1
0.9677: <= 2
0.9813: <= 3
0.9889: <= 4
0.9953: <= 5
0.9978: <= 6
0.9989: <= 7
0.9994: <= 8
0.9996: <= 9
0.9998: <= 10
0.9998: <= 11
0.9999: <= 12
0.9999: <= 13
Max: 88

This means that 5% of the pages didn't run the AAA at all. 99.53% of the pages ran the outer loop 5 or fewer times. The largest number of outer loop iterations was 88.

In Gecko and WebKit, the limits are 10 iterations for the inner loop and 10 iterations for the outer loop. It seems to me it would be safe to pick e.g. 8 iterations for the outer loop and and 3 for the inner.
Comment 3 Henri Sivonen 2010-10-13 13:48:15 UTC
I did some further testing. I implemented my suggestion from this bug and bug 10802. The I extracted a list of pages that exceeded the limits from Philip's data. Then I loaded 24 such pages in the build with the limits in place and in another browser. I saw no breakage in the build with the limits.

My choice of 24 pages wasn't random. I tried to pick pages where I could guess from the URL that they were unlikely to be filth I don't want to see.
Comment 4 Maciej Stachowiak 2010-10-13 17:43:31 UTC
I think defining limits is a good idea. It's true that hardware gets faster over time. However, if neither loop of the AAA is limited, there is content that will take quadratic time to parse proportional to its size and may hang the parser. This has a few consequences:

1) It's not sufficiently Web-compatible to implement AAA with no limits; there will be occasional pages that hang for minutes at a time.

2) Faster hardware will not allow the limits to be substantially increased, because of the quadratic growth problem. There will always be plausible-sized pages that can cause a hang at higher limits.

Since browsers have to have limits and will likely need them indefinitely, it's better for long-term interop if browsers agree on the limits and if the spec reflects that agreement. Since it's unlikely the limits can be increased a great deal, and since raising them quickly runs into diminishing returns as far as better compatibility goes, it is not very helpful to retain the freedom to raise the limits later.
Comment 5 Ian 'Hixie' Hickson 2010-10-13 18:34:14 UTC
In practice, the limits have in fact been increasing over time (once or twice a decade currently). However, we could always put the limit in the spec and increase it there, I guess.

Could you elaborate on what limit you would like to see specified? What should happen when the loop count has been exceeded? In particular, I'm interested in what you think should happen if the loop is aborted after some changes have been made but not all the changes that would be made if it was run to completion.

Is the goal to simply stop creating elements? How would this handle a page like the ASCII art colour Tux constructed with nothing but one opening <font> per character, or pages that use nested <em>s for "semantic" tag clouds?
Comment 6 Henri Sivonen 2010-10-14 11:49:20 UTC
(In reply to comment #5)
> Could you elaborate on what limit you would like to see specified? What should
> happen when the loop count has been exceeded?

I thought I covered these is comment 2.

I'm suggesting that the outer loop be run at most for 8 iterations per entry into the outer loop. I'm suggesting that the inner loop be run at most for 3 iterations per entry into the inner loop. When the limit is hit, I'm suggesting running the code after the loop without any more fancy action than that.

> Is the goal to simply stop creating elements? How would this handle a page like
> the ASCII art colour Tux constructed with nothing but one opening <font> per
> character, or pages that use nested <em>s for "semantic" tag clouds?

Is either of these a problem when the end tag matches the current element?
Comment 7 Maciej Stachowiak 2010-10-14 17:51:09 UTC
(In reply to comment #5)
> In practice, the limits have in fact been increasing over time (once or twice a
> decade currently). However, we could always put the limit in the spec and
> increase it there, I guess.

AFAIK WebKit is the only browser to implement some form of the Adoption Agency Algorithm prior to the HTML5 standard parser. We have only been around about a decade, and I don't think we have a history of increasing the limits or a desire to do so in the future. In general, the limits have changed from (effectively) infinity to finite values. I do think the values may have changed to higher ones in the rewrite to the HTML5 parser, but otherwise the trend is in the other direction.

> 
> Could you elaborate on what limit you would like to see specified? What should
> happen when the loop count has been exceeded? In particular, I'm interested in
> what you think should happen if the loop is aborted after some changes have
> been made but not all the changes that would be made if it was run to
> completion.

Henri's explanation seems adequate; perhaps Adam can add more detail.

> 
> Is the goal to simply stop creating elements? How would this handle a page like
> the ASCII art colour Tux constructed with nothing but one opening <font> per
> character, or pages that use nested <em>s for "semantic" tag clouds?

I don't believe AAA is invoked at all in these cases, only mismatched end tags invoke the AAA. So the limits would not kick in. Deep self-nesting can be a problem in its own right, but it scales linearly not quadratically and can occur even in valid content, so it seems more reasonable to let the general "machine limits" caluse handle it..
Comment 8 James Graham 2010-10-14 20:46:36 UTC
I think that we would like normative limits here too. I'm not sure how much benefit there is from picking 3/8 rather than 10/10; I would be interested to see timings for the algorithm with various caps before picking the exact values.
Comment 9 Adam Barth 2010-10-14 21:20:23 UTC
> Henri's explanation seems adequate; perhaps Adam can add more detail.

Henri's description sounds fine.

> I think that we would like normative limits here too. I'm not sure how much
> benefit there is from picking 3/8 rather than 10/10; I would be interested to
> see timings for the algorithm with various caps before picking the exact
> values.

Based on the data presented earlier in this thread, 3/8 sounds fine.  We picked 10/10 just because 10 was a round number and let us run our AAA "stress test" within the time allotted by the regression test harness.
Comment 10 Ian 'Hixie' Hickson 2010-10-19 03:18:13 UTC
For the record, by "the limits" I meant the various limits that UAs have implemented in their HTML parsers.
Comment 11 Ian 'Hixie' Hickson 2010-10-19 07:25:16 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: 

I've implemented the limits as described, but to be honest I'm having great trouble visualising what effect these limits would actually have, so I don't really have a good handle on whether this is a good change or not.
Comment 12 contributor 2010-10-19 07:27:07 UTC
Checked in as WHATWG revision r5642.
Check-in comment: specify specific limits for AAA; let me know what pages break
http://html5.org/tools/web-apps-tracker?from=5641&to=5642