Bug 17871 - Element attributes should not be required to be stored in an ordered list
Element attributes should not be required to be stored in an ordered list
Status: REOPENED
Product: WebAppsWG
Classification: Unclassified
Component: DOM
unspecified
Other other
: P3 normal
: ---
Assigned To: Anne
public-webapps-bugzilla
:
: 13912 17176 (view as bug list)
Depends on:
Blocks: 13913
  Show dependency treegraph
 
Reported: 2012-07-18 07:09 UTC by contributor
Modified: 2014-02-13 11:39 UTC (History)
13 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description contributor 2012-07-18 07:09:35 UTC
This was was cloned from bug 17176 as part of operation convergence.
Originally filed: 2012-05-25 07:58:00 +0000
Original reporter: Divye <divye_kapoor@hotmail.com>

================================================================================
 #0   Divye                                           2012-05-25 07:58:34 +0000 
--------------------------------------------------------------------------------
Summary:
The Element specification [1] requires that the list of attributes be an _ordered_ list. However, this poses multiple issues with regards to the treatment of .innerHTML and .outerHTML, cross browser compatibility, anecdotal user expectations [2][3] and potentially performance. Specifically, the specification leaves undefined the algorithm used for ordering the keys in this ordered list. This causes the output of .innerHTML, .outerHTML to be browser dependent which in and of itself isn't a bad thing, but it causes a type of non-determinism that prevents writing a cross browser assertHTMLEquals(...) function in unit testing frameworks very cumbersome. (Details are mentioned below)

Minimal Test Case:
HTML: 
<div id="real"><a id="foo" class="blah" href="#">link</a></div>
<div id="temp"></div>

JS:
function f() {
var anchorElement = document.getElementById('foo');
anchorElement.id = 'foochanged';
return document.getElementById('real').innerHTML;
}

function g() {
var anchorElement = document.getElementById('foo');
anchorElement.removeAttribute('id');
anchorElement.setAttribute('id', 'foochanged');
return document.getElementById('real').innerHTML;
}

var tempDiv = document.getElementById('temp');
var expectedHTML = '<a id="foochanged" class="blah" href="#">link</a>';
tempDiv.innerHTML = expectedHTML;

Case1:
assertEquals(tempDiv.innerHTML, f()); 

Case 2:
assertEquals(tempDiv.innerHTML, g()); 


I guess "id" is a bad attribute to choose for this example due to it's special nature and implementation, but any other suitable attribute might suffice.

Discussion:
As per the spec, no normalization is mandatory before serialization to .innerHTML. (See references and links below for further info) Therefore, it is valid for both Case 1 and Case 2 to fail in a conforming browser. However, due to the drive to use the DOM parsing algorithm to generate instances of type HTMLElement and then serializing them, it is very likely that Case 1 will pass but Case 2 won't because Case 1 would have the id attribute value simply replaced while Case 2 would do a remove and an append to the attribute list, potentially causing a reordering of the attribute values and thus changing the rendered .innerHTML output.

Motivation:
The motivation for filing this bug is not pedantic speculation but a real world test case. I have significant amounts of test code of the form:
HTML: 
<body>
<... large complex DOM ...>
</body>

JS:
testXDoesY() {
   X(a,b,c,d,e);
   assertHTMLEquals("Expected DOM Structure", document.body);
}

testXDoesZ() {
   X(b,c,d,e,f);
   assertHTMLEquals("Expected DOM Strucuture", document.body);
}

Notes:
The assertHTMLEquals function here is the one implemented by the Closure library here:
http://closure-library.googlecode.com/svn-history/r27/trunk/closure/goog/docs/closure_goog_testing_asserts.js.source.html#line465

This is the JUnit documentation of the function:
http://www.jsunit.net/jsdoc/GLOBALS.html#!s!assertHTMLEquals
Note that the definition of "standardizing" is insufficient because of the issues pointed out above.


These tests work fine when run using a server side JS container that spoofs browser manipulation of the DOM in a deterministic manner but they fail when run on real browsers using Webdriver tests because FF and Chrome render .innerHTML with the attributes in different orders and they are in conformance with the spec when doing so. This unfortunate reality breaks .innerHTML as a means of writing JS tests that don't depend on implementation but just on state transformations on the DOM. The use of the DOM API to validate the HTML structure is extremely cumbersome because that would imply writing hundreds of brittle asserts (one for each element, attribute, text node etc.) which would make the tests really opaque to someone reading them.  An obvious workaround would be to implement an HTML(5) parser in JS to parse the output of .innerHTML in both cases and then validate the two with the attribute order ignored but that is exactly against the intent of the DOM Parsing spec (since it has version skew and doesn't support any of the browser goodies and it is extremely heavyweight). Another alternative would be to create a Document instance, use loadXML and then walk the tree and for each node collect the attributes, sort and compare them but this is ugly, works only for XHTML and does the same work twice for each assert: DOM -> String -> PseudoDOM and IMHO should not be encouraged. See [3] for the kinds of hacks to be done for supporting this type of functionality in IE.


Suggested wording:
The Element specification [1] should require the following:
(a) The list of attributes on an Element is an _unordered, indexed_ list.
(b) When rendering the .innerHTML/.outerHTML string, two distinct DOM elements with the same structure and attributes MUST render to the same .innerHTML string (with a deterministic ordering of attributes)

(a) is a required fix in wording to reflect the nature of the API since Element.attributes only requires the ability to be indexed and does not specify order as per the current spec.
(b) addresses the issue of determinism in the output of .innerHTML. The resultant consequences are discussed in Trade Offs.

Trade Offs

* Use of an ordered list at all times requires a maintenance cost to be paid at parse time. However, setAttribute(...), hasAttribute(...) and getAttribute(...) become O(log N) functions instead of the O(N) functions mandated today by the spec. However, given the skew of the attributes actually accessed by JS and the attributes parsed during page load, this is unlikely to be an exciting prospect.

* Re-ordering the attributes lazily during the evaluation of .innerHTML. Suggested wording (a) allows the UA to reorder the attributes when .innerHTML is accessed, but that breaks for loops iterating over attributes of an element and then accessing .innerHTML.
eg.
for (var i = 0; i < a.attributes.length; i++) {
   a.innerHTML;
}

* The third and IMHO the better option is to take a slight performance hit while rendering .innerHTML by actually sorting the attributes before appending them to the string. This retains the desirable property of determinism in the output in a cross platform manner without imposing too much of a performance penalty. 


Related information that I found useful while composing this bug report:
* The resolution of https://www.w3.org/Bugs/Public/show_bug.cgi?id=11204 required that .innerHTML and .outerHTML be built on parsing the DOM Parsing Algorithms defined here: http://html5.org/specs/dom-parsing.html

* The actual serialization algorithm is defined here:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm
"While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order."
(Note: There is no mention of other elements with the same DOM structure being serialized to the same string.)

* The definition of Element that indicates that the attribute list must be ordered:
http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute
"Elements also have an ordered attribute list. Unless explicitly given when an element is created, its attribute list is empty. An element has an attribute A if A is in its attribute list."

* The IDL description of Element:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm

* The DOMConfiguration object's canonicalize definition:
http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/DOM3-Core.html#core-DOMConfiguration
"Canonicalize the document according to the rules specified in [Canonical XML]. Note that this is limited to what can be represented in the DOM. In particular, there is no way to specify the order of the attributes in the DOM."

References:
[1] http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute
[2] http://stackoverflow.com/questions/1591841/how-do-i-get-html-attribute-order-to-be-consistent-when-testing-in-javascript
[3] http://stackoverflow.com/questions/7474710/can-i-load-an-entire-html-document-into-a-document-fragment-in-internet-explorer

--
Apologies for the length of the bug report. Also, this is my first time at filing bugs with the W3C so if I can do something better for the next time, please do let me know.
================================================================================
 #1   Divye                                           2012-05-25 08:22:36 +0000 
--------------------------------------------------------------------------------
The super short summary of the bug is:
1) There's a mistake in the spec by requiring that Element.attributes be an
ordered list instead of just an indexable list since the ordering function is
undefined.
2) The fact that we're now trying to use an unordered Element.attributes during
serialization poses difficulties in reliably comparing for equality the
.innerHTML of 2 identical DOMFragments in different parts of the DOM of the
page since the rendered strings will not match because even the same tags and
attributes can be rendered into different .innerHTML strings based on how they
ended up being constructed (page load time or through JS etc.).

(2) is a problem for testing because there doesn't seem to be a way to
"standardize" HTML even within the same browser and prevents the development of
a cross browser assertHTMLEquals function where one of the arguments is a well
defined "expected" string because a DOM dump from Chrome and FF yield strings
that differ in attribute ordering for identical DOM structures constructed in
different manners (eg. 2 DOMElements one constructed at Page load or through JS
attribute modifications, other through .innerHTML = string yield different
.innerHTML even on the same browser).
================================================================================
 #2   Simon Pieters                                   2012-05-25 08:33:41 +0000 
--------------------------------------------------------------------------------
We can't remove indexing of .attributes for Web compat. Even the named getter for attributes needs to know the order because there can be multiple attributes with the same name. This means that we have to specify the order for .attributes. DOM4 says it's an ordered list. The HTML spec should say what the order should be in parsing and serialization.
================================================================================
 #3   Aryeh Gregor                                    2012-05-28 09:14:55 +0000 
--------------------------------------------------------------------------------
*** Bug 17217 has been marked as a duplicate of this bug. ***
================================================================================
 #4   Aryeh Gregor                                    2012-05-28 09:22:49 +0000 
--------------------------------------------------------------------------------
And for web compat, the order of the attributes created by the HTML parser needs to be the same as the order in the markup, right?  We can't say they're put in, e.g., alphabetical order -- that would surely break the web.

(In reply to comment #0)
> Minimal Test Case:
> HTML: 
> <div id="real"><a id="foo" class="blah" href="#">link</a></div>
> <div id="temp"></div>
> 
> JS:
> function f() {
> var anchorElement = document.getElementById('foo');
> anchorElement.id = 'foochanged';
> return document.getElementById('real').innerHTML;
> }

DOM4 says this leaves the id attribute in the same position as it was.

> function g() {
> var anchorElement = document.getElementById('foo');
> anchorElement.removeAttribute('id');
> anchorElement.setAttribute('id', 'foochanged');
> return document.getElementById('real').innerHTML;
> }

DOM4 says this moves the id attribute to the end of the attribute list.

> var tempDiv = document.getElementById('temp');
> var expectedHTML = '<a id="foochanged" class="blah" href="#">link</a>';
> tempDiv.innerHTML = expectedHTML;
> 
> Case1:
> assertEquals(tempDiv.innerHTML, f()); 
> 
> Case 2:
> assertEquals(tempDiv.innerHTML, g()); 

The behavior of these two asserts is only undefined because the HTML parser's behavior is undefined.  If the parser orders the attribute list in the same order as the markup, and the serializer serializes them in DOM order, case 1 must pass and case 2 must fail.  This is probably how it should be.

> These tests work fine when run using a server side JS container that spoofs
> browser manipulation of the DOM in a deterministic manner but they fail when
> run on real browsers using Webdriver tests because FF and Chrome render
> .innerHTML with the attributes in different orders and they are in conformance
> with the spec when doing so.

What are cases where they serialize the attributes in different orders?  That's the bug -- it should always be the same.

> The use of the DOM API to validate the HTML
> structure is extremely cumbersome because that would imply writing hundreds of
> brittle asserts (one for each element, attribute, text node etc.) which would
> make the tests really opaque to someone reading them.

Actually, you could just write an assertHtmlEquals() function that wraps all the checks.  I've done this for standards tests (at least Range/Selection).

> Apologies for the length of the bug report. Also, this is my first time at
> filing bugs with the W3C so if I can do something better for the next time,
> please do let me know.

Generally it's better to keep it as concise as possible, and emphasize concrete pieces of HTML/JS, how the spec says they behave, how browsers treat them, and how you want them to behave, and why you want them to behave that way.  Longer and more detailed is better than leaving out details, though, so this bug report is very good for a first one.  Thanks!  :)
================================================================================
 #5   Henri Sivonen                                   2012-05-29 11:57:08 +0000 
--------------------------------------------------------------------------------
Web compatibility requires that when there are attributes with the same name the first attribute of a given name wins. In addition, at least some older versions of the NPAPI version of Flash Player required the order in which attributes of the embed element are passed to the plug-in to match the source order of attributes.

As far as I can tell, there isn't a good reason to believe that the iteration order of attributes or the serialization order of attributes matters for Web compatibility. After all, neither Trident nor Gecko preserves the order of attributes. It appears that Trident shuffles all attributes according to the implementation details of a hash function of some kind and Gecko doesn't preserve the order of attributes that are legacy presentational hints relative to attributes that are not legacy presentational hints.
================================================================================
 #6   Aryeh Gregor                                    2012-05-31 09:38:13 +0000 
--------------------------------------------------------------------------------
Interesting.  But we should still want to specify some kind of order, right?  There's no reason for this not to be interoperable.  In that case, is there any reason to not spec it as just being the same order as markup?  It sounds like the constraints you mention are compatible with that requirement, and it's the most obvious and simple way to do it.  In fact, it's so obvious and simple that that's how I thought browsers all worked until you just educated me right now.  :)
================================================================================
 #7   Henri Sivonen                                   2012-05-31 15:09:57 +0000 
--------------------------------------------------------------------------------
If Trident and Gecko get away with non-obvious attribute orders, it means the order doesn't really matter. It would be a shame to require Trident and Gecko to do something that doesn't really matter but that would either require them to change data structures in a fundamental way or to complicate their existing data structures by adding book-keeping data about the original order.

(In many cases, it's easy to incorrectly hypothesize that Gecko keeps the attribute order, so Gecko isn't a strong point of evidence in the direction that order doesn't matter. However, I think Trident is a strong piece of evidence, since it makes it very obvious that the order isn't preserved, so no script that has even superficially being tested with Trident can rely on the iteration order of attributes.)
================================================================================
 #8   Aryeh Gregor                                    2012-06-01 08:45:29 +0000 
--------------------------------------------------------------------------------
Hmm, okay.  As long as attribute order round-trips through innerHTML, I guess it's not a big problem for me.
================================================================================
Comment 1 Ian 'Hixie' Hickson 2012-07-26 23:39:55 UTC
Is this really an HTML bug?
Comment 2 Divye 2012-07-31 18:09:11 UTC
Well this is a bug regarding an inconsistency between the HTML and the DOM spec. The HTML Element spec requires order but the algorithm specified for innerHTML blatantly disregards order. So either we drop the Attribute ordering requirement from the HTML spec altogether (which makes it inconsistent with the DOM spec and probably breaks current web pages that depend on attributes[0] syntax) or we keep the attribute ordering (which makes it inconsistent with current browser behavior) but then have it round trip through innerHTML. The second option will cause a modification to browser datastructures but is required so that client JS can work efficiently with the output of .innerHTML of an element (otherwise doing anything with the output of .innerHTML will require a full fledged DOM parser in JS).

In both cases, the HTML specs are inconsistent and/or have underspecification with regards to other specs. So, I would consider it so.
Comment 3 Ian 'Hixie' Hickson 2012-10-08 19:27:58 UTC
I don't think using innerHTML in tests is sane. So I wouldn't recommend doing that. The order of attributes in valid HTML markup is not meaningful, and so you should not be comparing HTML markup for equality by string comparison, since you will have false negatives. It's like comparing numbers for equality by string comparison; you'll miss cases like "01" and "1.0" being the same number.

As Henri says, browsers don't keep a predictable order, so we're not required to expose one by compatibility; and in fact they get some good performance optimisations from not doing so, so I don't think it's a good idea to require a predictable order. Stable order, sure, but not predictable.

So I think this is a bug in DOM Core.
Comment 4 Anne 2012-10-09 14:22:30 UTC
Ian, the problem is that looping through .attributes would not be well-defined and people depend on the order there. I think innerHTML might serialize them in alphabetical order?
Comment 5 Edward O'Connor 2012-10-12 22:07:37 UTC
*** Bug 17176 has been marked as a duplicate of this bug. ***
Comment 6 Anne 2012-12-13 10:25:35 UTC
The definitions of getAttribute() and setAttribute() depend on attributes being ordered. The attributes member of Element depends on there being an order. Even Maps in JavaScript have a defined order. I think we should stick with attributes having an order.

See also: http://krijnhoetmer.nl/irc-logs/whatwg/20121213#l-339
Comment 7 Henri Sivonen 2012-12-13 10:39:35 UTC
bz, are you OK with the requirement of attributes having a prescribed iteration order? Considering that Trident and Gecko have gotten away with their current behaviors, Iā€™m skeptical that not preserving order is a Real Problem that merits changing Trident and Gecko.
Comment 8 Boris Zbarsky 2012-12-13 15:59:57 UTC
Changing the attribute order setup in Gecko would probably have unacceptable performance implications unless we completely rearchitect how we do presentational hints.  I don't believe we have any plans of doing either one of these things in any measurable timeframe.

So no, I guess I'm not OK with it.  ;)

As for setAttribute, say, depending on attribute order, that only matters if you have two attributes already set that have the same "name", right?  Since Gecko only changes the order for preshints, it only does so for names that do not contain a ':'.  Does setAttributeNS actually allow you to set an attribute with a non-null namespace but a name that doesn't contain ':'?  That's the only way there could possibly be a problem here.  And even then you'd have to have someone actually doing that, on an HTML element, with a localName that matches a preshint name.  Which no one does in practice, quite honestly.
Comment 9 Simon Pieters 2012-12-14 09:07:31 UTC
It seems more likely that people depend on the order with .attributes[index] or innerHTML.
Comment 10 Boris Zbarsky 2012-12-14 09:11:09 UTC
More likely, I agree.

How likely, unclear; we haven't had much in the way of bug reports about either one.
Comment 11 Anne 2012-12-14 11:03:01 UTC
Is there any other "set" in the platform where order is exposed, but the set is not ordered per se? JavaScript objects maybe? But I believe in hindsight we do not really like it there either.
Comment 12 Boris Zbarsky 2012-12-14 16:13:36 UTC
> Is there any other "set" in the platform where order is exposed, but the set is
> not ordered per se? 

The supported property names of HTMLFormControlsCollection, for which the spec doesn't define an order?  ;)  Same thing for the supported property names of Document.  Note that interop for those is poor anyway (e.g. Gecko doesn't even enumerate the latter, and doesn't enumerate the former until very recently).

Every single WebIDL prototype: you can enumerate the methods/attributes, but the order is not defined and differs across browsers (though this sometimes leads to compat issues).

There are probably others....
Comment 13 Ian 'Hixie' Hickson 2012-12-14 22:57:13 UTC
I think technically for WebIDL interfaces the order is the source order in the spec, at least, there's a comment in the 2D canvas spec's source to the effect that I need to keep certain members in a particular order for compat reasons.

I'm all for _requiring_ that UAs have some unpredictability in the sort order for attributes, e.g. by salting their hash differently on each run or some such, but that may be too far out there for y'all. (I remember requiring that WebSocket clients send their headers in a random order for similar reasons. :-) )
Comment 14 Boris Zbarsky 2012-12-15 01:11:08 UTC
> I think technically for WebIDL interfaces the order is the source order in the
> spec

That's what UAs do in practice, but it completely falls down once you start doing partial interfaces and "A implements B", since those are inherently not ordered with respect to each other.

> e.g. by salting their hash differently on each run 

You just assumed attributes are stored in a hashtable...  ;)
Comment 15 Ian 'Hixie' Hickson 2012-12-15 16:45:55 UTC
I just meant that as an example. So long as there's some randomness introduced, I don't really mind how it's done. The goal is just to make it so that authors can't depend on any one browser's ordering.
Comment 16 Divye 2012-12-17 20:23:06 UTC
Why is the status of this bug RESOLVED WONTFIX?
There is active discussion on these issues and AFAICT, the underlying issue has not been resolved.
Comment 17 Anne 2013-09-06 15:14:01 UTC
So it seems this comes down to comment 8 which basically means enshrining an architecture issue of a particular browser in the platform. In almost every case where we have non-deterministic behavior we wish it was deterministic. Should that not be here the case too? Even if that is half a decade away from reality? The DOM Standard has called for Attr to no longer inherit from Node for a while too, and that has not happened yet (nor has it been confirmed that it can not happen).
Comment 18 Boris Zbarsky 2013-09-06 15:55:03 UTC
Note that Trident also does not preserve attribute order, right?
Comment 19 Travis Leithead [MSFT] 2013-09-06 16:38:15 UTC
Correct, and every few releases when we add new supported attributes, it tends to re-shuffle the order, and yes we do get the occasional report (mostly from internal test cases written using the innerHTML string compare bad practice) that our attribute order change for innerHTML, etc., broke something. Rarely (if ever?) have we got this report from content in the wild, so I'm also skeptical that forcing an order on attribute serialization would solve a real problem vs. the effort required in our implementation.
Comment 20 Anne 2014-02-13 11:38:09 UTC
*** Bug 13912 has been marked as a duplicate of this bug. ***