17176 – Element attributes should not be required to be stored in an ordered list, .innerHTML remains unspecified

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17176 - Element attributes should not be required to be stored in an ordered list, .innerHTML remains unspecified

Summary: Element attributes should not be required to be stored in an ordered list, .i...

Status:	RESOLVED DUPLICATE of bug 17871

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Edward O'Connor
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Duplicates (1):	17217 (view as bug list)
Depends on:
Blocks:

Reported:	2012-05-25 07:58 UTC by Divye
Modified:	2012-10-12 22:07 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Divye 2012-05-25 07:58:34 UTC

Summary:
The Element specification [1] requires that the list of attributes be an _ordered_ list. However, this poses multiple issues with regards to the treatment of .innerHTML and .outerHTML, cross browser compatibility, anecdotal user expectations [2][3] and potentially performance. Specifically, the specification leaves undefined the algorithm used for ordering the keys in this ordered list. This causes the output of .innerHTML, .outerHTML to be browser dependent which in and of itself isn't a bad thing, but it causes a type of non-determinism that prevents writing a cross browser assertHTMLEquals(...) function in unit testing frameworks very cumbersome. (Details are mentioned below)

Minimal Test Case:
HTML: 
<div id="real"><a id="foo" class="blah" href="#">link</a></div>
<div id="temp"></div>

JS:
function f() {
var anchorElement = document.getElementById('foo');
anchorElement.id = 'foochanged';
return document.getElementById('real').innerHTML;
}

function g() {
var anchorElement = document.getElementById('foo');
anchorElement.removeAttribute('id');
anchorElement.setAttribute('id', 'foochanged');
return document.getElementById('real').innerHTML;
}

var tempDiv = document.getElementById('temp');
var expectedHTML = '<a id="foochanged" class="blah" href="#">link</a>';
tempDiv.innerHTML = expectedHTML;

Case1:
assertEquals(tempDiv.innerHTML, f()); 

Case 2:
assertEquals(tempDiv.innerHTML, g()); 


I guess "id" is a bad attribute to choose for this example due to it's special nature and implementation, but any other suitable attribute might suffice.

Discussion:
As per the spec, no normalization is mandatory before serialization to .innerHTML. (See references and links below for further info) Therefore, it is valid for both Case 1 and Case 2 to fail in a conforming browser. However, due to the drive to use the DOM parsing algorithm to generate instances of type HTMLElement and then serializing them, it is very likely that Case 1 will pass but Case 2 won't because Case 1 would have the id attribute value simply replaced while Case 2 would do a remove and an append to the attribute list, potentially causing a reordering of the attribute values and thus changing the rendered .innerHTML output.

Motivation:
The motivation for filing this bug is not pedantic speculation but a real world test case. I have significant amounts of test code of the form:
HTML: 
<body>
<... large complex DOM ...>
</body>

JS:
testXDoesY() {
   X(a,b,c,d,e);
   assertHTMLEquals("Expected DOM Structure", document.body);
}

testXDoesZ() {
   X(b,c,d,e,f);
   assertHTMLEquals("Expected DOM Strucuture", document.body);
}

Notes:
The assertHTMLEquals function here is the one implemented by the Closure library here:
http://closure-library.googlecode.com/svn-history/r27/trunk/closure/goog/docs/closure_goog_testing_asserts.js.source.html#line465

This is the JUnit documentation of the function:
http://www.jsunit.net/jsdoc/GLOBALS.html#!s!assertHTMLEquals
Note that the definition of "standardizing" is insufficient because of the issues pointed out above.


These tests work fine when run using a server side JS container that spoofs browser manipulation of the DOM in a deterministic manner but they fail when run on real browsers using Webdriver tests because FF and Chrome render .innerHTML with the attributes in different orders and they are in conformance with the spec when doing so. This unfortunate reality breaks .innerHTML as a means of writing JS tests that don't depend on implementation but just on state transformations on the DOM. The use of the DOM API to validate the HTML structure is extremely cumbersome because that would imply writing hundreds of brittle asserts (one for each element, attribute, text node etc.) which would make the tests really opaque to someone reading them.  An obvious workaround would be to implement an HTML(5) parser in JS to parse the output of .innerHTML in both cases and then validate the two with the attribute order ignored but that is exactly against the intent of the DOM Parsing spec (since it has version skew and doesn't support any of the browser goodies and it is extremely heavyweight). Another alternative would be to create a Document instance, use loadXML and then walk the tree and for each node collect the attributes, sort and compare them but this is ugly, works only for XHTML and does the same work twice for each assert: DOM -> String -> PseudoDOM and IMHO should not be encouraged. See [3] for the kinds of hacks to be done for supporting this type of functionality in IE.


Suggested wording:
The Element specification [1] should require the following:
(a) The list of attributes on an Element is an _unordered, indexed_ list.
(b) When rendering the .innerHTML/.outerHTML string, two distinct DOM elements with the same structure and attributes MUST render to the same .innerHTML string (with a deterministic ordering of attributes)

(a) is a required fix in wording to reflect the nature of the API since Element.attributes only requires the ability to be indexed and does not specify order as per the current spec.
(b) addresses the issue of determinism in the output of .innerHTML. The resultant consequences are discussed in Trade Offs.

Trade Offs

* Use of an ordered list at all times requires a maintenance cost to be paid at parse time. However, setAttribute(...), hasAttribute(...) and getAttribute(...) become O(log N) functions instead of the O(N) functions mandated today by the spec. However, given the skew of the attributes actually accessed by JS and the attributes parsed during page load, this is unlikely to be an exciting prospect.

* Re-ordering the attributes lazily during the evaluation of .innerHTML. Suggested wording (a) allows the UA to reorder the attributes when .innerHTML is accessed, but that breaks for loops iterating over attributes of an element and then accessing .innerHTML.
eg.
for (var i = 0; i < a.attributes.length; i++) {
   a.innerHTML;
}

* The third and IMHO the better option is to take a slight performance hit while rendering .innerHTML by actually sorting the attributes before appending them to the string. This retains the desirable property of determinism in the output in a cross platform manner without imposing too much of a performance penalty. 


Related information that I found useful while composing this bug report:
* The resolution of https://www.w3.org/Bugs/Public/show_bug.cgi?id=11204 required that .innerHTML and .outerHTML be built on parsing the DOM Parsing Algorithms defined here: http://html5.org/specs/dom-parsing.html

* The actual serialization algorithm is defined here:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm
"While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order."
(Note: There is no mention of other elements with the same DOM structure being serialized to the same string.)

* The definition of Element that indicates that the attribute list must be ordered:
http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute
"Elements also have an ordered attribute list. Unless explicitly given when an element is created, its attribute list is empty. An element has an attribute A if A is in its attribute list."

* The IDL description of Element:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm

* The DOMConfiguration object's canonicalize definition:
http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/DOM3-Core.html#core-DOMConfiguration
"Canonicalize the document according to the rules specified in [Canonical XML]. Note that this is limited to what can be represented in the DOM. In particular, there is no way to specify the order of the attributes in the DOM."

References:
[1] http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute
[2] http://stackoverflow.com/questions/1591841/how-do-i-get-html-attribute-order-to-be-consistent-when-testing-in-javascript
[3] http://stackoverflow.com/questions/7474710/can-i-load-an-entire-html-document-into-a-document-fragment-in-internet-explorer

--
Apologies for the length of the bug report. Also, this is my first time at filing bugs with the W3C so if I can do something better for the next time, please do let me know.

Comment 1 Divye 2012-05-25 08:22:36 UTC

The super short summary of the bug is:
1) There's a mistake in the spec by requiring that Element.attributes be an
ordered list instead of just an indexable list since the ordering function is
undefined.
2) The fact that we're now trying to use an unordered Element.attributes during
serialization poses difficulties in reliably comparing for equality the
.innerHTML of 2 identical DOMFragments in different parts of the DOM of the
page since the rendered strings will not match because even the same tags and
attributes can be rendered into different .innerHTML strings based on how they
ended up being constructed (page load time or through JS etc.).

(2) is a problem for testing because there doesn't seem to be a way to
"standardize" HTML even within the same browser and prevents the development of
a cross browser assertHTMLEquals function where one of the arguments is a well
defined "expected" string because a DOM dump from Chrome and FF yield strings
that differ in attribute ordering for identical DOM structures constructed in
different manners (eg. 2 DOMElements one constructed at Page load or through JS
attribute modifications, other through .innerHTML = string yield different
.innerHTML even on the same browser).

Comment 2 Simon Pieters 2012-05-25 08:33:41 UTC

We can't remove indexing of .attributes for Web compat. Even the named getter for attributes needs to know the order because there can be multiple attributes with the same name. This means that we have to specify the order for .attributes. DOM4 says it's an ordered list. The HTML spec should say what the order should be in parsing and serialization.

Comment 3 Aryeh Gregor 2012-05-28 09:14:55 UTC

*** Bug 17217 has been marked as a duplicate of this bug. ***

Comment 4 Aryeh Gregor 2012-05-28 09:22:49 UTC

And for web compat, the order of the attributes created by the HTML parser needs to be the same as the order in the markup, right?  We can't say they're put in, e.g., alphabetical order -- that would surely break the web.

(In reply to comment #0)
> Minimal Test Case:
> HTML: 
> <div id="real"><a id="foo" class="blah" href="#">link</a></div>
> <div id="temp"></div>
> 
> JS:
> function f() {
> var anchorElement = document.getElementById('foo');
> anchorElement.id = 'foochanged';
> return document.getElementById('real').innerHTML;
> }

DOM4 says this leaves the id attribute in the same position as it was.

> function g() {
> var anchorElement = document.getElementById('foo');
> anchorElement.removeAttribute('id');
> anchorElement.setAttribute('id', 'foochanged');
> return document.getElementById('real').innerHTML;
> }

DOM4 says this moves the id attribute to the end of the attribute list.

> var tempDiv = document.getElementById('temp');
> var expectedHTML = '<a id="foochanged" class="blah" href="#">link</a>';
> tempDiv.innerHTML = expectedHTML;
> 
> Case1:
> assertEquals(tempDiv.innerHTML, f()); 
> 
> Case 2:
> assertEquals(tempDiv.innerHTML, g()); 

The behavior of these two asserts is only undefined because the HTML parser's behavior is undefined.  If the parser orders the attribute list in the same order as the markup, and the serializer serializes them in DOM order, case 1 must pass and case 2 must fail.  This is probably how it should be.

> These tests work fine when run using a server side JS container that spoofs
> browser manipulation of the DOM in a deterministic manner but they fail when
> run on real browsers using Webdriver tests because FF and Chrome render
> .innerHTML with the attributes in different orders and they are in conformance
> with the spec when doing so.

What are cases where they serialize the attributes in different orders?  That's the bug -- it should always be the same.

> The use of the DOM API to validate the HTML
> structure is extremely cumbersome because that would imply writing hundreds of
> brittle asserts (one for each element, attribute, text node etc.) which would
> make the tests really opaque to someone reading them.

Actually, you could just write an assertHtmlEquals() function that wraps all the checks.  I've done this for standards tests (at least Range/Selection).

> Apologies for the length of the bug report. Also, this is my first time at
> filing bugs with the W3C so if I can do something better for the next time,
> please do let me know.

Generally it's better to keep it as concise as possible, and emphasize concrete pieces of HTML/JS, how the spec says they behave, how browsers treat them, and how you want them to behave, and why you want them to behave that way.  Longer and more detailed is better than leaving out details, though, so this bug report is very good for a first one.  Thanks!  :)

Comment 5 Henri Sivonen 2012-05-29 11:57:08 UTC

(In reply to comment #4)
> And for web compat, the order of the attributes created by the HTML parser
> needs to be the same as the order in the markup, right?

Web compatibility requires that when there are attributes with the same name the first attribute of a given name wins. In addition, at least some older versions of the NPAPI version of Flash Player required the order in which attributes of the embed element are passed to the plug-in to match the source order of attributes.

As far as I can tell, there isn't a good reason to believe that the iteration order of attributes or the serialization order of attributes matters for Web compatibility. After all, neither Trident nor Gecko preserves the order of attributes. It appears that Trident shuffles all attributes according to the implementation details of a hash function of some kind and Gecko doesn't preserve the order of attributes that are legacy presentational hints relative to attributes that are not legacy presentational hints.

Comment 6 Aryeh Gregor 2012-05-31 09:38:13 UTC

Interesting.  But we should still want to specify some kind of order, right?  There's no reason for this not to be interoperable.  In that case, is there any reason to not spec it as just being the same order as markup?  It sounds like the constraints you mention are compatible with that requirement, and it's the most obvious and simple way to do it.  In fact, it's so obvious and simple that that's how I thought browsers all worked until you just educated me right now.  :)

Comment 7 Henri Sivonen 2012-05-31 15:09:57 UTC

(In reply to comment #6)
> Interesting.  But we should still want to specify some kind of order, right? 
> There's no reason for this not to be interoperable.  In that case, is there any
> reason to not spec it as just being the same order as markup?  It sounds like
> the constraints you mention are compatible with that requirement, and it's the
> most obvious and simple way to do it.  In fact, it's so obvious and simple that
> that's how I thought browsers all worked until you just educated me right now. 
> :)

If Trident and Gecko get away with non-obvious attribute orders, it means the order doesn't really matter. It would be a shame to require Trident and Gecko to do something that doesn't really matter but that would either require them to change data structures in a fundamental way or to complicate their existing data structures by adding book-keeping data about the original order.

(In many cases, it's easy to incorrectly hypothesize that Gecko keeps the attribute order, so Gecko isn't a strong point of evidence in the direction that order doesn't matter. However, I think Trident is a strong piece of evidence, since it makes it very obvious that the order isn't preserved, so no script that has even superficially being tested with Trident can rely on the iteration order of attributes.)

Comment 8 Aryeh Gregor 2012-06-01 08:45:29 UTC

Hmm, okay.  As long as attribute order round-trips through innerHTML, I guess it's not a big problem for me.

Comment 9 contributor 2012-07-18 07:09:38 UTC

This bug was cloned to create bug 17871 as part of operation convergence.

Comment 10 Edward O'Connor 2012-10-12 22:07:37 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: No spec change.
Rationale: This is a bug in the DOM spec, which is being tracked in bug 17871.

*** This bug has been marked as a duplicate of bug 17871 ***