Bug 14037 - Should XMLDocument be standardized?
Should XMLDocument be standardized?
Status: RESOLVED FIXED
Product: WebAppsWG
Classification: Unclassified
Component: DOM Parsing and Serialization
unspecified
PC All
: P2 critical
: ---
Assigned To: Ian 'Hixie' Hickson
public-webapps-bugzilla
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-06 04:16 UTC by Boris Zbarsky
Modified: 2011-11-24 17:40 UTC (History)
20 users (show)

See Also:


Attachments
The script in question, prettyprinted, just to make sure we're on the same page (625.04 KB, text/plain)
2011-09-06 18:19 UTC, Boris Zbarsky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Boris Zbarsky 2011-09-06 04:16:34 UTC
All of Gecko, WebKit, and Presto have a window.XMLDocument.  It has pretty different behavior in the three in some ways, but its mere existence is sniffed for by some libraries (e.g. Sarissa).  See https://bugzilla.mozilla.org/show_bug.cgi?id=684671#c2 for the gory details.

Should this object exist?  If so, should it be an interface, or just some random object?  If the former, what things should live on this interface, what does it inherit from, etc.

http://html5.org/specs/dom-parsing.html defines an XMLDocument inheriting from Document, which seems to not be web-compatible if onreadystatechange has a typical webidl setter on Document.prototype.
Comment 1 Ian 'Hixie' Hickson 2011-09-06 04:35:48 UTC
(Note: We'd like to get this fixed within the next two weeks so that Gecko doesn't have to back out the patch unifying event handlers.)

cc'ed usual suspects; please cc more.

The only option I can see is to make window.XMLDocument an object that isn't a Document. Do people depend on Document objects having XMLDocument somewhere in their prototype?

It would help to have a better idea of what people are doing with this object.
Comment 2 Alexey Proskuryakov 2011-09-06 05:10:32 UTC
In WebKit, window.XMLDocument === window.Document. We've had this behavior
since 2006, and to the best of my knowledge, the compatibility story is
impeccable.

While I don't have hard data on whether XMLDocument is necessary at all, I'm
pretty sure that it is. So, I suggest speccing WebKit behavior.
Comment 3 Cameron McCormack 2011-09-06 05:21:05 UTC
I think the problem with that is that WebKit does not yet have accessor properties for IDL attributes.  When it does, the site Boris mentions in the Mozilla bug will break because of Document.prototype.onreadystatechange.
Comment 4 Simon Pieters 2011-09-06 08:20:06 UTC
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1130

In Opera I get
log: true
log: true
log: false
log: false
Comment 5 Anne 2011-09-06 08:24:20 UTC
For Opera it does not matter much what we end up with here.
Comment 6 Boris Zbarsky 2011-09-06 12:13:06 UTC
Yeah, the WebKit behavior is not compatible with having a setter for onreadystatechange on Document.prototype.  So I really doubt we want to spec it.

Not only that, but WebKit does not have load() on the XMLDocument it exposes, so the compat story is not "impeccable".  For example, the Sarissa library is in fact broken in WebKit as a result.  See https://sourceforge.net/tracker/?func=detail&aid=2149289&group_id=75155&atid=543074 for example.
Comment 7 Alexey Proskuryakov 2011-09-06 16:30:30 UTC
> Not only that, but WebKit does not have load() on the XMLDocument it exposes

We haven't been implementing document.load because it was non-standard. Now that it's standardized, it's not actually on XMLDocument, so I'm not sure how this affects the decision on whether to make XMLDocument an alias for Document.

Again, aliasing XMLDocument to Document worked really great for WebKit, and alternative solutions seem pretty scary.

Note that Sarissa does that aliasing itself for engines that don't have XMLDocument, although it's unclear to me if that actually affects the library behavior in any way:

if(typeof XMLDocument == "undefined" && typeof Document !="undefined"){ XMLDocument = Document; }
Comment 8 Boris Zbarsky 2011-09-06 16:35:33 UTC
> Now that it's standardized, it's not actually on XMLDocument

It's not clear that that's web-compatible.  Again, some libraries expect it to be on XMLDocument.

> Again, aliasing XMLDocument to Document worked really great for WebKit

Depending on how you define "great", sure.  As long as it doesn't involve compat with deployed content.

> Note that Sarissa does that aliasing itself for engines that don't have
> XMLDocument

Is load() on Document in the cases that don't have XMLDocument (that being just IE)?
Comment 9 Boris Zbarsky 2011-09-06 16:44:40 UTC
> Note that Sarissa does that aliasing itself 

The version used by virginamerica does not, btw.
Comment 10 Boris Zbarsky 2011-09-06 16:46:50 UTC
What it _does_ have is some UA sniffing for "applewebkit" which makes it use sync XHR in WebKit-only codepaths in some cases while using load() in other codepaths.
Comment 11 Alexey Proskuryakov 2011-09-06 17:16:45 UTC
> > Again, aliasing XMLDocument to Document worked really great for WebKit
> 
> Depending on how you define "great", sure.  As long as it doesn't involve
> compat with deployed content.

That's the primary metric for me, and hopefully for standardization, too. If you are aware of any sites that break in WebKit due to this aliasing, please do tell.
Comment 12 Boris Zbarsky 2011-09-06 17:33:16 UTC
My point was that compat with content was what WebKit currently fails on, forcing people to create various workarounds for it.

That is, sites work _despite_ the WebKit behavior, not because of it.

I gave you a link to at least one case where the WebKit behavior causes problems for someone trying to author a site.
Comment 13 Alexey Proskuryakov 2011-09-06 18:00:06 UTC
If I'm thinking of the same link (one to Sarissa bug tracker), it's about not having document.load, which I still don't see as relevant for this discussion.

1. xmlDocument = Sarissa.getDomDocument();
2. xmlDocument.async = false;
3. xmlDocument.load('xml/contents.xml');

> > Now that it's standardized, it's not actually on XMLDocument
>
> It's not clear that that's web-compatible.  Again, some libraries expect it to
> be on XMLDocument.

Are you aware of any sites (or at least libraries) that would break in practice?
Comment 14 Boris Zbarsky 2011-09-06 18:18:01 UTC
> it's about not having document.load

Sarissa looks for .load explicitly on XMLDocument.prototype if XMLDocument is defined.  If it's not there, it will fail, on the codepaths that use load().

In particular, the code looks like this, in the version of Sarissa used by virginamerica:

        if (window.XMLDocument) {
...
            XMLDocument.prototype._sarissa_load = XMLDocument.prototype.load;
            XMLDocument.prototype.load = function (sURI) {
...
                        this._sarissa_load(sURI);


If XMLDocument == Document, then .load needs to be on Document.prototype to make that code work.  Are you willing to put it there?

> Are you aware of any sites (or at least libraries) that would break

Sarissa, for one; see code above.  Is that really so unclear?
Comment 15 Boris Zbarsky 2011-09-06 18:19:32 UTC
Created attachment 1027 [details]
The script in question, prettyprinted, just to make sure we're on the same page
Comment 16 Alexey Proskuryakov 2011-09-06 18:35:50 UTC
OK, I now have a blemish on WebKit's otherwise impeccable history with this approach: http://www.edugis.nl uses an older version of Sarissa that doesn't have this special case for WebKit, so it fails to load a document.

Still, do you expect any other option to have a better compatibility story?
Comment 17 Boris Zbarsky 2011-09-06 19:03:46 UTC
Well, that's a much more interesting question.  That's what this bug is about....

Do sites depend on XMLDocument.prototype having Document on the prototype chain?

If we made XMLDocument just an empty interface not inheriting from Document or Node or anything, and left load() in it, then the Sarissa code would fail to hook load() on Document objects, right?
Comment 18 Boris Zbarsky 2011-09-06 19:20:19 UTC
For what it's worth, I suspect the most web-compatible solution is to have XMLDocument inherit from Document, put load() on XMLDocument, and have the onreadystatechange getter/setter from Document.prototype silently return null and no-op respectively when |this| is not a document object...
Comment 19 Ian 'Hixie' Hickson 2011-09-06 21:33:44 UTC
Unless we make that a generic behaviour (have the getters and setters of _all_ IDL attributes, or at least all on* event handler IDL attributes, act in this way when called on objects they aren't expecting), I'd be quite reluctant to do that. It's a really weird quirky behaviour.

Do people depend on Document objects having XMLDocument somewhere in their prototype or on Document === XMLDocument? Or could we make XMLDocument a non-Node-related object?
Comment 20 Boris Zbarsky 2011-09-07 02:50:47 UTC
The library in question depends on the document objects returned from createDocument() having XMLDocument in their prototype chain, as far as I can tell.  Certainly without that being true its attempts to hook document.load by changing XMLDocument.prototype.load won't work.  Not like those work in WebKit anyway, of course.

I could live with all the on* attributes silently doing nothing on bogus |this|, if we have to go there...  It feels really ugly, but I'm not sure what else here is web-compatible. :(
Comment 21 Ian 'Hixie' Hickson 2011-09-07 22:34:40 UTC
It's weird but at least it would be consistent across the platform, instead of a landmine waiting for people to step on.

heycam, any opinion on that?

I don't really see what else we can do given the constraints we're working with here.

Ms2ger, annevk: If we go for XMLDocument.load() rather than XMLDocumentLoader.load(), where should it go? DOM Core, DOM Parsing, or Web Apps 1.0?
Comment 22 Ms2ger 2011-09-08 08:07:30 UTC
WA1 sounds good to me.
Comment 23 Jonas Sicking 2011-09-08 08:13:08 UTC
I think changing the behavior for all on* properties to behave differently from every single other DOM function and attribute is creating a much larger landmine.

A better approach here seems to me to do nothing in the spec space about this problem for now. At least nothing normative. Instead we can in Gecko intentionally break the spec in this one instance and not throw. Whenever someone hits the non-throwing-but-should-throw code-path we'll put a warning in the developer console.

I would imagine that in a year or two we can remove this exception from Gecko. If that's not the case, we can re-evaluate then.

It might be that other browsers need to add a similar exception. So it might make sense to add an informative note in the spec to aid with this. But this doesn't need to affect the long-term plan here.
Comment 24 Boris Zbarsky 2011-09-08 12:34:22 UTC
Note that that still leaves the question of what to do about XMLDocument open....
Comment 25 Ian 'Hixie' Hickson 2011-09-10 04:33:41 UTC
So the situation here is a little changed, in that onreadystatechange is now a special attribute for other compat reasons. So it's less of an issue to make it even more special here if we want to go that route. We can also remove it from XMLDocument entirely, or something else like that. Do pages actually depend on readystatechange on XMLDocument actually doing something useful? (I assume they do, based on the Sarissa code.)

What do we want to do here? Have Document === HTMLDocument === SVGDocument, have XMLDocument inherited from Document, have doc.load() only appear on XMLDocument, and have createDocument() return an XMLDocument and all other document creation mechanisms return a regular Document? Does that sound right to everyone?
Comment 26 Boris Zbarsky 2011-09-10 04:38:17 UTC
> (I assume they do, based on the Sarissa code.)

Sarissa doesn't seem to, really (note that it's setting it on the _prototype_, not on the document itself!).

So as long as XMLDocument === Document or XMLDocument inherits from Document we'll need some sort of special-casing for the onreadystatechange setter at least, right?
Comment 27 Alexey Proskuryakov 2011-09-10 05:57:35 UTC
It seems that XMLDocument===Document results in a less convoluted Web platform. Maybe it's not a big deal, but finding your document that's loaded from .svg or .xhtml file not implement XMLDocument would be surprising.
Comment 28 Ian 'Hixie' Hickson 2011-09-14 22:03:35 UTC
(In reply to comment #26)
> 
> Sarissa doesn't seem to, really (note that it's setting it on the _prototype_,
> not on the document itself!).

There's no code that later tries to use it correctly?


> So as long as XMLDocument === Document or XMLDocument inherits from Document
> we'll need some sort of special-casing for the onreadystatechange setter at
> least, right?

I suppose so.

We still have to have XMLDocument.load be something explicit, so the XMLDocumentLoader thing still has to be changed to XMLDocument, right?


(In reply to comment #27)
> It seems that XMLDocument===Document results in a less convoluted Web platform.
> Maybe it's not a big deal, but finding your document that's loaded from .svg or
> .xhtml file not implement XMLDocument would be surprising.

That's the way it is currently, per spec. The load() method's interface is only implemented by the Document objects that come from createDocument(). Is that not required for compat? If not, then we can just make XMLDocument === Document as well.

So the changes to the spec being proposed here are:

* Document.onreadystatechange setter is made magical in that it does nothing if the context object is not a Document instance.
* Document === HTMLDocument === XMLDocument (=== SVGDocument), rather than the current "implements" strategy with its separate XMLDocument interface that inherits from Document.
* XMLDocumentLoader.load() is moved to Document and XMLDocumentLoader is dropped.

bz: If you agree, I suggest trying to implement that and reporting back.
Comment 29 Ian 'Hixie' Hickson 2011-09-14 23:38:34 UTC
ap points out that having Document.load() may be too dangerous for compat (because it would put a "load" member into the scope of event handler attributes with a higher precedence than the global scope "load"), so instead I would go back to more or less what bz suggested in comment 18:

* Document.onreadystatechange setter is made magical in that it does nothing if the context object is not a Document instance.
* Document === HTMLDocument (=== SVGDocument), rather than the current "implements" strategy.
* XMLDocument continues to inherit from Document.
* createDocument() starts returning XMLDocument instead of Document. (Regular XML documents, i.e. those with a browsing context, keep returning a regular Document object.)
* XMLDocumentLoader.load() is moved to XMLDocument and XMLDocumentLoader is dropped.
Comment 30 Boris Zbarsky 2011-09-16 18:09:06 UTC
> There's no code that later tries to use it correctly?

My admittedly cursory skim didn't show any.

> We still have to have XMLDocument.load be something explicit, so the
> XMLDocumentLoader thing still has to be changed to XMLDocument, right?

I believe so, yes.  Clearly Alexey disagrees.

> The load() method's interface is only implemented by the Document objects that
> come from createDocument(). Is that not required for compat?

load() needs to be on things returned from createDocument for sure.

Whether it's ok to have it on other things as well is unclear. I would be ok with just having it on return values of createDocument.  In Gecko it does make implementation simpler to not worry about what happens when load() is called on documents that are attached to a Window.  Of course we could just make it throw in that situation.

> bz: If you agree, I suggest trying to implement that and reporting back.

We're implementing item 1 from your list already.

We have no current plans for item 2; it requires significant rearchitecture and is a low priority for us....

Item 3 probably needs more of a spec for load() behavior than we have right now.
Comment 31 Boris Zbarsky 2011-09-16 18:10:17 UTC
The comment 29 plan, sounds fine to me modulo the perennial "how to treat document interfaces in general" issue.
Comment 32 Ian 'Hixie' Hickson 2011-09-19 22:48:18 UTC
Ok. Do you need anything specced in a hurry for comment 29's plan, or are you good to go?
Comment 33 Boris Zbarsky 2011-09-21 03:07:45 UTC
I don't think I need anything specced in a hurry at this point except the first bullet point from comment 29.  And even then the hurry no longer matters; we're shipping that change no matter what for now, since we see no other alternatives.
Comment 34 contributor 2011-10-20 23:05:11 UTC
Checked in as WHATWG revision r6718.
Check-in comment: Make HTMLDocument === Document.
http://html5.org/tools/web-apps-tracker?from=6717&to=6718
Comment 35 contributor 2011-10-20 23:07:27 UTC
Checked in as WHATWG revision r6719.
Check-in comment: Move XMLDocumentLoader.load to XMLDocument.
http://html5.org/tools/web-apps-tracker?from=6718&to=6719
Comment 36 Ian 'Hixie' Hickson 2011-10-20 23:10:22 UTC
Done (see diffs above for the first and third of these; second is a no-op):
> * Document === HTMLDocument (=== SVGDocument), rather than the current
> "implements" strategy.
> * XMLDocument continues to inherit from Document.
> * XMLDocumentLoader.load() is moved to XMLDocument and XMLDocumentLoader is
> dropped.

Still to do:
> * Document.onreadystatechange setter is made magical in that it does nothing
> if the context object is not a Document instance.

heycam, what do I need to do for this one?


> * createDocument() starts returning XMLDocument instead of Document. (Regular
> XML documents, i.e. those with a browsing context, keep returning a regular
> Document object.)

Anne, this is a change needed in DOM Core. (Probably you'll want to move XMLDocument from DOM Parsing to DOM Core too. It's a trivial 1-line IDL definition, so this should be easy.)
Comment 37 Ian 'Hixie' Hickson 2011-10-20 23:19:47 UTC
zcorpan points out that I made a mistake in the patch: I put HTMLDocument into Document instead of Window and made it return the document instead of the interface! Oops. So something else to do:

* Make Window.HTMLDocument === Document, and remove Document.HTMLDocument.
Comment 38 Anne 2011-10-21 04:25:16 UTC
Updated the DOM and DOM Parsing as per comment 36.
Comment 39 contributor 2011-10-21 05:46:21 UTC
Checked in as WHATWG revision r6725.
Check-in comment: Fix HTMLDocument's definition. Oops.
http://html5.org/tools/web-apps-tracker?from=6724&to=6725
Comment 40 Ian 'Hixie' Hickson 2011-10-21 05:50:35 UTC
Ok all that's left to do now is:
> * Document.onreadystatechange setter is made magical in that it does nothing
> if the context object is not a Document instance.

I'm not sure what the right way to specify this is.
Comment 41 Cameron McCormack 2011-10-21 06:06:26 UTC
(In reply to comment #40)
> Ok all that's left to do now is:
> > * Document.onreadystatechange setter is made magical in that it does nothing
> > if the context object is not a Document instance.
> 
> I'm not sure what the right way to specify this is.

I have not yet added the feature to Web IDL that would allow specifying this.  Will update the bug when I have done so.
Comment 42 Simon Pieters 2011-10-21 08:49:17 UTC
(In reply to comment #39)
> Checked in as WHATWG revision r6725.
> Check-in comment: Fix HTMLDocument's definition. Oops.
> http://html5.org/tools/web-apps-tracker?from=6724&to=6725

It's still not quite right.

+ readonly attribute <code><a href=#document>Document</a></code> <a href=#htmldocument title=HTMLDocument>HTMLDocument</a>;

The Document interface object is not an instance of Document. WebIDL says "Interface objects are always function objects.", so I guess s/Document/Function/ above.
Comment 43 Cameron McCormack 2011-10-21 16:55:26 UTC
(In reply to comment #42)
> It's still not quite right.
> 
> + readonly attribute <code><a href=#document>Document</a></code> <a
> href=#htmldocument title=HTMLDocument>HTMLDocument</a>;
> 
> The Document interface object is not an instance of Document. WebIDL says
> "Interface objects are always function objects.", so I guess
> s/Document/Function/ above.

That's also not going to do exactly the right thing, since you want to have "HTMLDocument" be an own data property on window, rather than an accessor property on Window.prototype, which is what the above attribute will do.

I suggest simply writing in prose the requirement to have a writable, configurable, non-enumerable property named "HTMLDocument" on window whose value is the Document interface object.
Comment 44 Ian 'Hixie' Hickson 2011-10-25 05:39:43 UTC
(In reply to comment #41)
> 
> I have not yet added the feature to Web IDL that would allow specifying this. 
> Will update the bug when I have done so.

Thanks, let me know.


(In reply to comment #42)
> 
> The Document interface object is not an instance of Document. WebIDL says
> "Interface objects are always function objects.", so I guess
> s/Document/Function/ above.

Function is not the same as "a function object" (actually they're more or less unrelated, technically). Function is an interface defined in the HTML spec that is a callback.

Anyway the spec now uses "object" as the type returned.


(In reply to comment #43)
> 
> That's also not going to do exactly the right thing, since you want to have
> "HTMLDocument" be an own data property on window, rather than an accessor
> property on Window.prototype, which is what the above attribute will do.

Why do we want that? Does it matter?


> I suggest simply writing in prose the requirement to have a writable,
> configurable, non-enumerable property named "HTMLDocument" on window whose
> value is the Document interface object.

I can do that, sure.
Comment 45 contributor 2011-10-27 20:57:39 UTC
Checked in as WHATWG revision r6774.
Check-in comment: Tweak definition to heycam's wording.
http://html5.org/tools/web-apps-tracker?from=6773&to=6774
Comment 46 Cameron McCormack 2011-10-29 22:17:28 UTC
I've added [LeninentThis] to make invalid this values result in a no-op when invoking the getter or setter of a property corresponding to an IDL attribute.

http://dev.w3.org/2006/webapi/WebIDL/#LenientThis
http://dev.w3.org/2006/webapi/WebIDL/#dfn-attribute-getter
http://dev.w3.org/2006/webapi/WebIDL/#dfn-attribute-setter
Comment 47 Boris Zbarsky 2011-10-30 03:02:22 UTC
Do we want to return undefined or null?  Gecko currently does null, but it's not that hard to change.
Comment 48 Cameron McCormack 2011-10-30 03:17:16 UTC
undefined seems better to me if we need to extend the use of [LenientThis] to attributes whose type doesn't allow null.  We can of course return null regardless of the type, but undefined is less type specific I think.
Comment 49 Ian 'Hixie' Hickson 2011-10-30 16:51:50 UTC
Thanks! This should now be completely fixed as discussed in comment 29. Obviously don't hesitate to reopen if I made a mistake or if it turns out we need other changes.
Comment 50 contributor 2011-10-30 16:53:56 UTC
Checked in as WHATWG revision r6786.
Check-in comment: Make document.prototype.onreadystatechange not throw, for compat reasons.
http://html5.org/tools/web-apps-tracker?from=6785&to=6786