This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25168 - Should XML Serialization be allowed to produce invalid XML?
Summary: Should XML Serialization be allowed to produce invalid XML?
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: DOM Parsing and Serialization (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Travis Leithead [MSFT]
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-26 22:56 UTC by Travis Leithead [MSFT]
Modified: 2014-04-03 17:57 UTC (History)
3 users (show)

See Also:


Attachments

Description Travis Leithead [MSFT] 2014-03-26 22:56:27 UTC
Today in an HTML document,
  createElement("first:last")
Will create an HTMLElement node with prefix = null, and localName = "first:last".

An XML Serialization according to the spec today (and matching IE/Firefox and soon Chrome) will generate the following invalid XML:
  <first:last xmlns="http://www.w3.org/1999/xhtml"/>

This is invalid (when round-tripped through DOMParser) because the prefix "first" is not defined. The XML parser does not know that "first:last" should be interpreted as a localName only.

There are two ways to avoid serializing invalid XML fragments:
1) Not allow the Serializer to emit localNames (for elements or attributes) that would not have been possible to create in an XML environment. This would involve changing the actual element or attribute localNames which would have a web compatibility problem. For example, "first:last" could be Serialized as "first_last" instead. (Underscore is preferred to a hyphen since hyphens are the character delineating a Custom Element for a web component.)
2) Fail to serialize on potential invalid output.

#2 above seems like it would have too great a potential to break web compatibility--it's a pretty big hammer to apply to the API in the event of a validation issue. Though it could be useful for programmatic validation of a DOM. Personally, I don't prefer this option.

#1 seems feasible, though it could change the name of various element and or attribute, so it's not without any side-effects.

If, in fact, we think that the XMLSerializer should always produce valid XML, then I would prefer an escaping approach to minimize back-compat on calling APIs. Otherwise, we should agree to allow the serializer to produce invalid XML and have that understanding.
Comment 1 Simon Pieters 2014-03-27 11:22:06 UTC
If we go with escaping, we should probably use the same rules as http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#coercing-an-html-dom-into-an-infoset
Comment 2 Travis Leithead [MSFT] 2014-03-27 22:03:27 UTC
Makes sense, thanks for the tip.

I just noticed today that the innerHTML/outerHTML APIs specify throwing behavior for nodes in XML Documents that don't meet the rules outlined in:

http://www.w3.org/html/wg/drafts/html/master/single-page.html#xml-fragment-serialization-algorithm

I think the right plan moving forward is to embed these throwing rules into the algorithm, made conditional on a flag. Then serializeToString would not set the throwing flag to be consistent with the way it works today, but innerHTML/outerHTML would. Then, if we want to, we could extend the capability of serializeToString to allow passing a flag to enable the throwing behavior. This way applications that want the strict serialization via this mechanism can get it.
Comment 3 Travis Leithead [MSFT] 2014-04-03 17:57:30 UTC
OK. I believe this commit covers all the cases. We now have a dynamically-switchable algorithm to throw or not to throw based on the flag.

Current setting matches the behavior of browsers and of the former spec by not throwing for serializeToString, and throwing for the inner/outerHTML getters (on non-well-formed DOM).

https://dvcs.w3.org/hg/innerhtml/rev/f3d96628e2b5