Re: some technical thoughts about incremental improvements to forms from Lachlan Hunt on 2006-09-05 (www-forms@w3.org from September 2006)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Wed, 06 Sep 2006 01:04:30 +1000
To: Dave Raggett <dsr@w3.org>
CC: www-forms@w3.org
Message-ID: <44FD91FE.1090805@lachy.id.au>
Dave Raggett wrote:
> For text/html, IE applies special parsing rules for HTML elements
> because unfortunately there are many websites with malformed markup. For 
> non HTML elements it honor's the /> syntax, unlike Firefox and
> Opera. I don't have a Mac and hence wasn't able to test Safari.

I had thought this behaviour, which only applies to elements within an 
<xml> element or those with namespace prefixes, was all called XML Data 
Islands and was using it to refer to both.  It seems I was wrong about 
that terminology.

XML Data Islands [1] is the markup that appears within an <xml> element. 
  Whereas elements with namespace prefixes and the xmlns attributes are 
called Custom Tags [2].  However, that's just terminology and my 
statements against them still apply.

 From now on, I'll be referring to both collectively as IE's pseudo-XML 
(or equivalent).

> <?xml version="1.0" encoding="utf-8"?>

That triggers quirks mode in IE.

> <html xmlns="http://www.w3.org/1999/xhtml"
> xmlns:f="http://example.com/eforms"" xml:lang="en" lang="en">

The way IE handles the xmlns:f attribute is insane.  What it does is 
actually remove the xmlns:f attribute from the DOM, generates an SGML PI 
(as opposed to an XML PI) that looks like the following and inserts it 
into immediately before the first usage of the prefix.

<?xml:namespace prefix = f ns = "http://example.com/eforms" />

This was taken from the .innerHTML representation of your sample 
document [3].  However, the PI is strangely not visible in the DOM.

You get exactly the same result if you insert that PI into the markup 
yourself immediately before the first usage of the prefix, instead of 
using an xmlns:f attribute.  i.e. if you were to copy the innerHTML 
representation into a file, the result would be identical.

> <bind id="b1" name="fred"/>

IE treats any unknown element without a namespace prefix as an empty 
element.  In this case, it makes no difference if you include the slash 
or not, it simply gets ignored.

e.g. For unknown elements that aren't pseudo-XML, like this:

<foo>content</foo>

"FOO" and "/FOO" are both treated as distinct empty elements, rather 
than the start- and end-tags for the same element.  The DOM looks like this:

   FOO
   #text: content
   /FOO

> <f:model id="form">hello</f:model>

That's a "Custom Tag".

> <f:field ref="form/name1">Given Name</f:field>
> <f:field ref="form/name2">Family Name</f:field>
> 
> <f:submit id="submit" ref="form"/>
> 
> This form is written in custom XML.

No, it's not XML when handled by IE as text/html.  XML in HTML is 
*undefined* and this is nothing more than a useless proprietary 
extension that happens to share some similarities in syntax with XML. 
Do not make the mistake of thinking that it is XML, there are many 
significant differences, particularly in relation to the well-formedness 
(see below) and the handling of namespaces (see above).

> IE6 gives the following DOM for the body element:
> 
> [snip - sample DOM]

IE's DOM is often significantly broken.  Well-formedness errors are not 
fatal for pseudo-XML, they're treated similarly to the way such errors 
are treated in HTML.

With badly nested elements, it produces a DOM where a node doesn't even 
appear in its parent's childNodes list.  It's kind of confusing, Hixie 
explains it on more detail [4].

> The original case is preserved if the element has a namespace, e.g.
> 
>        <h:bind id="b1" name="fred"/>
> 
> where h has to have been bound to a namespace URI.

It doesn't have to have been bound to a namespace for IE to treat it as 
pseudo-XML, it just has to have a prefix.  Where the prefix is not 
defined, it just generates a PI like this:

<?xml:namespace prefix = h />

> Firefox and Opera treat "/>" as if it was ">". Firefox also forces all 
> elements to uppercase. Opera doesn't. It seems that browser developers 
> aren't particularly thorough in reverse engineering IE's behavior in 
> parsing well formed markup.

Why should they copy IE in this case?  IE's pseudo-XML nonsense is not 
widely used or *depended upon* for anything in the real world (even 
though it sneaks into the garbage generated by MS Office) and is not 
defined anywhere.

> My point is that if all browsers honored the /> syntax for non HTML 
> elements delivered as text/html and preserved the case, it would make it 
> that much easier to deploy mixed markup documents.

My point is that the whole idea of embedding XML in HTML is nonsense and 
should have no part in any transition from HTML to XML.  I'll be 
explaining this last point more in a future post.

[1] 
http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/xml.asp
[2] http://msdn.microsoft.com/workshop/author/dhtml/overview/customtags.asp
[3] http://software.hixie.ch/utilities/js/live-dom-viewer/
[4] http://ln.hixie.ch/?start=1037910467&count=1

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Tuesday, 5 September 2006 15:05:15 UTC