10692 – Fix coercion to Infoset for HTML5 to correctly preserve xmlns attributes

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10692 - Fix coercion to Infoset for HTML5 to correctly preserve xmlns attributes

Summary: Fix coercion to Infoset for HTML5 to correctly preserve xmlns attributes

Status:	CLOSED INVALID

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P3 normal
Target Milestone:	LC
Assignee:	contributor
QA Contact:	HTML WG Bugzilla archive list

URL:	http://www.w3.org/2010/02/rdfa/track/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2010-09-23 03:41 UTC by Manu Sporny
Modified:	2010-12-04 22:40 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Manu Sporny 2010-09-23 03:41:45 UTC

This bug is related to RDFa Working Group ISSUE-3:

http://www.w3.org/2010/02/rdfa/track/issues/3

RDFa uses the xmlns: pattern to declare prefix mappings, it is important that namespace information that is declared in non-XML mode HTML5 documents are mapped to an Infoset correctly. In order to ensure this mapping is performed correctly, the "Coercing an HTML DOM into an infoset" rules defined in [HTML5] must be extended. The spec-ready text can be found here:

http://www.w3.org/TR/rdfa-in-html/#preserving-namespaces-via-coercion-to-infoset

Note, that this text is not a replacement for the current Coercion to Infoset rules, but is an addition to the current Coercion to Infoset rules in the HTML5 specification. The goal is to preserve the old behavior of placing the xmlns: value in a non-namespaced tuple and add additional behavior on top of the old behavior that properly creates the correct Infoset namespace tuple. 

In other words, the change does not break backward compatibility, but is meant to ensure that xmlns: values can be extracted in the same way in Javascript and DOM environments without regard for whether or not the source document was an HTML5 document or an XHTML5 document.

Any solution that achieves this desired effect would be an acceptable resolution to this bug.

Comment 1 Anne 2010-09-23 07:30:57 UTC

Would this not break pages that use

html[xmlns] { background:green }

?

Comment 2 Henri Sivonen 2010-09-23 07:37:14 UTC

I think requesting that HTML5 be "fix[ed]" mischaracterizes the problem. It *might* be that the HTML5 side needs *changing* in order to paper over the problem RDF-in-XHTML TF caused by using the xmlns syntax while fully intending RDFa to be deployed in text/html.

When xmlns="foo" is parsed on an HTML element in text/html, it is an attribute with local name xmlns in no namespace. Mapping this into a namespace declaration would substantially alter the nature of the attribute in the infoset, so while such mapping might be desirable from the RDFa point of view, it clearly wouldn't be a "fix" or be "correct" from the point of view of retaining the meaning of the DOM as converted into an Infoset.

Comment 3 Simon Pieters 2010-09-23 08:11:25 UTC

Doesn't RDFa's prefixes="" solve this problem?

Comment 4 Ian 'Hixie' Hickson 2010-09-29 06:19:28 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: The infoset mapping is intended to be the minimum required to make the output of the HTML parser usable in an XML pipeline. The HTML parser doesn't support "xmlns", and therefore it would be inappropriate to have the infoset map "xmlns" attributes to something namespace-related in this way.

Fundamentally, using "xmlns" and "xmlns:*" attributes in text/html is bogus, anyway.

Comment 5 Manu Sporny 2010-10-04 01:25:41 UTC

(In reply to comment #1)
> Would this not break pages that use
> 
> html[xmlns] { background:green }
> 
> ?

The short answer is that I hope it doesn't break pages and if it does, there is a problem with this proposal. I don't, however, see how the proposed text would create this problem - partly because there may be a miscommunication in the proposal and partly because I'm not a CSS expert.

The intent isn't to break pages that depended on the previous behavior outlined by the HTML5 spec expressed by your example above. The goal is to preserve the old behavior outlined by the HTML5 spec and add an additional mechanism that would give Infoset-based processors more accurate information to work with when it comes to xmlns declarations.

That is, 

html[xmlns] { background:green }

should still work as it did before based on the proposal. Am I missing some technical nuance that would cause it not to work?

Comment 6 Manu Sporny 2010-10-04 01:38:51 UTC

(In reply to comment #2)
> I think requesting that HTML5 be "fix[ed]" mischaracterizes the problem. It
> *might* be that the HTML5 side needs *changing* in order to paper over the
> problem RDF-in-XHTML TF caused by using the xmlns syntax while fully intending
> RDFa to be deployed in text/html.

The problem exists whether or not RDFa is there. So, let's assume that RDFa does not exist. There are going to be plenty of people that are publishing text/html that includes SVG, MathML, XHTML declarations and other intranet applications that utilize xmlns:. When these documents are converted to an Infoset, it would be nice if the xmlns information was not mis-characterized upon conversion.

> When xmlns="foo" is parsed on an HTML element in text/html, it is an attribute
> with local name xmlns in no namespace. Mapping this into a namespace
> declaration would substantially alter the nature of the attribute in the
> infoset, so while such mapping might be desirable from the RDFa point of view,
> it clearly wouldn't be a "fix" or be "correct" from the point of view of
> retaining the meaning of the DOM as converted into an Infoset.

The proposal states that the exact same thing that happens now would happen when the proposal is applied. That is, the behavior you describe above would be preserved - an attribute with a local name "xmlns" in no namespace with a value of "foo" would appear in the Infoset.

To give a more in-depth example, if the following attribute was provided:

xmlns:foo="http://example.org/bar#"

There would be two attributes created in the converted Infoset:

(NULL_NAMESPACE, "xmlns:foo", "http://example.org/bar#") AND
(http://www.w3.org/2000/xmlns/, "foo", "http://example.org/bar#")

All old code would continue to operate, /and/ the xmlns: declarations would be carried into the Infoset in a loss-less way.

If this is not performed, the library that is searching for that term still has to find it and would look in NULL_NAMESPACE to search all of the "xmlns:*" patterns. That is, two code paths would have to be created for Infoset applications that want to find xmlns: attributes.

Comment 7 Manu Sporny 2010-10-04 01:48:00 UTC

(In reply to comment #3)
> Doesn't RDFa's prefixes="" solve this problem?

I'm assuming that you mean prefix=""? If so, unfortunately it doesn't. While we are recommending that people not use xmlns: when designing documents for HTML5 and HTML4, the fact is that some people may still do so and in order to ensure that Infoset-based RDFa processors have a simple code-path to follow, we must ensure that the xmlns: values are translated correctly from an HTML5 DOM into an Infoset.

However, as stated in a previous response to Henri - the problem isn't restricted to RDFa-based documents. It applies to any XHTML document that is served as text/html that contains xmlns declarations.

Comment 8 Manu Sporny 2010-10-04 02:19:47 UTC

(In reply to comment #4)
> EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
> satisfied with this response, please change the state of this bug to CLOSED. If
> you have additional information and would like the editor to reconsider, please
> reopen this bug.

Re-opening the bug as I'd like the editor to reconsider based on the response to his comments, as well as those sent to others in this bug thread.

> Rationale: The infoset mapping is intended to be the minimum required to make
> the output of the HTML parser usable in an XML pipeline. The HTML parser
> doesn't support "xmlns", and therefore it would be inappropriate to have the
> infoset map "xmlns" attributes to something namespace-related in this way.

While the HTML parser does not support XML namespaces, the intent of the Infoset mapping is to perform a reasonable translation of an HTML5 DOM into an Infoset so that it may be used in an XML pipeline. The preservation of the semantic meaning of the xmlns attributes should be considered when performing this translation.

It is true that it would be wrong to assert that the duty of translation to an Infoset is the responsibility of the HTML parser layer of the Web stack. However, I don't know where the HTML5 parser stops and the XML API begins in the Web stack envisioned by HTML5. I'm mostly concerned with how a Javascript application would access the attributes declared by the xmlns:xyz mechanism.

My understanding is that you would need two code paths in a Javascript application to find attributes declared by xmlns: - one to deal with the HTML5  DOM case and another to deal with the XHTML5 Infoset case. This proposal attempts to unify the two code paths into one code path by translating the HTML5 DOM attributes (with no namespace) into an XHTML5 Infoset (with namespaced attributes).

While you state that it would be "inappropriate to have the infoset map "xmlns" attributes to something namespace-related", you also state that "if the XML API doesn't support attributes in no namespace that are named "xmlns", attributes whose names start with "xmlns:", or attributes in the XMLNS namespace, then the tool may drop such attributes." - this presumes that the code doing the translation understands the difference between supporting attributes in no namespace or attributes in the XMLNS namespace. 

You also state that "The tool may annotate the output with any namespace declarations required for proper operation." - which seems to indicate that the tool understands how to map non-namespaced items in the HTML5 DOM to namespaced items in the Infoset.

I don't necessarily care how this is done as long as the Javascript that is executed on the document, intended to find "xmlns:*" mappings, doesn't have to have two code paths depending on if the document is in HTML5-mode or XHTML5-mode.

> Fundamentally, using "xmlns" and "xmlns:*" attributes in text/html is bogus,
> anyway.

Yes, but people are doing it - even if it is bogus, developers expect the same code to work between XHTML5 and HTML5.

Comment 9 Anne 2010-10-04 09:00:07 UTC

If you change the generated-DOM so that xmlns attributes are put in a namespace html[xmlns] will no longer match as that only selects xmlns attributes not in a namespace.

I think we have explained this problem enough by now. Being ignorant about it whenever the year passes by starts to get annoying.

http://lists.w3.org/Archives/Public/public-html/2009Aug/0359.html

Comment 10 Simon Pieters 2010-10-04 10:00:59 UTC

I thought this bug was about changing the coersion rules. However, statements about javascript/DOM confuse the issue, since the coersion rules are not implemented in browsers (and aren't intended to be and can't be for compat), so do not affect javascript/DOM in browsers.

Is this intended to affect non-browsers that implement the coersion rules, or is this intended to affect browsers also?

Comment 11 Anne 2010-10-04 10:23:55 UTC

Comment 1 and comment 9 are incorrect per comment 0. Sorry about that. I missed that you wanted to generate additional attributes.

I do not think it is good idea to have one attribute declaration generate two attributes. That makes the model extremely confusing. And given how much confusion namespaces have caused in XML I do not think it is a good idea to just import that baggage into HTML.

Comment 12 Ian 'Hixie' Hickson 2010-11-11 22:56:13 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: I really have no idea what this bug is asking for at this point. It seems to be based on some fundamental misunderstandings of how the Web platform works. I recommend finding me on IRC (#whatwg on Freenode or #html-wg on W3net) to explain what the actual intent is.

Comment 13 Manu Sporny 2010-12-04 22:40:31 UTC

(In reply to comment #10)
> I thought this bug was about changing the coersion rules. However, statements
> about javascript/DOM confuse the issue, since the coersion rules are not
> implemented in browsers (and aren't intended to be and can't be for compat), so
> do not affect javascript/DOM in browsers.
> 
> Is this intended to affect non-browsers that implement the coersion rules, or
> is this intended to affect browsers also?

It was intended to affect both, however, it's now clear to me that I misunderstood how the Coercion to Infoset rules are used in browser environments. My bad.

This all came about because I was trying to make sure that Henri's concerns about RDFa in HTML5 were addressed. However, now I think that this is a non-issue, and will continue to think so until Henri tells me otherwise.

I was under the mistaken impression that the Coercion to Infoset rules somehow affected how the xmlns: declarations were translated into attribute names and prefixes available via the DOM. That is, given the following markup:

----------------------------------------------------------
<!DOCTYPE html>
<html>
  <head>
    <title>An HTML5 Document</title>
  </head>
  <body>
    <h1>Example</h1>
    <p id="p1" xmlns:foo="http://example.org/foo#">
       This is an example HTML5 document.
    </p>
    <p id="end">The end.
    </p>

<script type="text/javascript">
var p1 = document.getElementById("p1");
var end = document.getElementById("end");
var attrs = p1.attributes;
for(var i = attrs.length - 1; i >= 0; i--) 
{
   var p = document.createElement("p");
   p.innerHTML = "prefix: " + attrs[i].prefix + ", " + 
      attrs[i].name + " = " + attrs[i].value;
   document.body.insertBefore(p, end);
}
</script>

  </body>
</html>
----------------------------------------------------------

The page above being loaded into an HTML5-capable browser would display this in the page somewhere:

prefix = null, xmlns:foo = http://example.org/foo#

I was under the impression that an XHTML-capable browser executing similar code:

----------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>An XHTML1 Document</title>
  </head>
  <body>
    <h1>Example</h1>
    <p id="p1" xmlns:foo="http://example.org/foo#">
       This is an example XHTML1 document.
    </p>
    <p id="end">The end.
    </p>

<script type="text/javascript">
var p1 = document.getElementById("p1");
var end = document.getElementById("end");
var attrs = p1.attributes;
for(var i = attrs.length - 1; i >= 0; i--) 
{
   var p = document.createElement("p");
   p.innerHTML = "prefix: " + attrs[i].prefix + ", " + 
      attrs[i].name + " = " + attrs[i].value;
   document.body.insertBefore(p, end);
}
</script>

  </body>
</html>
----------------------------------------------------------

would have produced this:

prefix = xmlns, foo = http://example.org/foo#

but instead, it produces this:

prefix: xmlns, xmlns:foo = http://example.org/foo#

I could have sworn that I ran this test when Henri first raised it two years ago and I got different results at that time. Previously, I had stated:

I don't necessarily care how this is done as long as the Javascript that is
executed on the document, intended to find "xmlns:*" mappings, doesn't have to
have two code paths depending on if the document is in HTML5-mode or
XHTML5-mode.

After re-running the tests above, I'm convinced that two code paths for finding all "xmlns:*" values are not necessary. 

Apologies for the confusion, I'm marking the issue as INVALID, and setting it's state to CLOSED.