<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>25168</bug_id>
          
          <creation_ts>2014-03-26 22:56:27 +0000</creation_ts>
          <short_desc>Should XML Serialization be allowed to produce invalid XML?</short_desc>
          <delta_ts>2014-04-03 17:57:30 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebAppsWG</product>
          <component>DOM Parsing and Serialization</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Travis Leithead [MSFT]">travil</reporter>
          <assigned_to name="Travis Leithead [MSFT]">travil</assigned_to>
          <cc>mike</cc>
    
    <cc>www-dom</cc>
    
    <cc>zcorpan</cc>
          
          <qa_contact>public-webapps-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>102967</commentid>
    <comment_count>0</comment_count>
    <who name="Travis Leithead [MSFT]">travil</who>
    <bug_when>2014-03-26 22:56:27 +0000</bug_when>
    <thetext>Today in an HTML document,
  createElement(&quot;first:last&quot;)
Will create an HTMLElement node with prefix = null, and localName = &quot;first:last&quot;.

An XML Serialization according to the spec today (and matching IE/Firefox and soon Chrome) will generate the following invalid XML:
  &lt;first:last xmlns=&quot;http://www.w3.org/1999/xhtml&quot;/&gt;

This is invalid (when round-tripped through DOMParser) because the prefix &quot;first&quot; is not defined. The XML parser does not know that &quot;first:last&quot; should be interpreted as a localName only.

There are two ways to avoid serializing invalid XML fragments:
1) Not allow the Serializer to emit localNames (for elements or attributes) that would not have been possible to create in an XML environment. This would involve changing the actual element or attribute localNames which would have a web compatibility problem. For example, &quot;first:last&quot; could be Serialized as &quot;first_last&quot; instead. (Underscore is preferred to a hyphen since hyphens are the character delineating a Custom Element for a web component.)
2) Fail to serialize on potential invalid output.

#2 above seems like it would have too great a potential to break web compatibility--it&apos;s a pretty big hammer to apply to the API in the event of a validation issue. Though it could be useful for programmatic validation of a DOM. Personally, I don&apos;t prefer this option.

#1 seems feasible, though it could change the name of various element and or attribute, so it&apos;s not without any side-effects.

If, in fact, we think that the XMLSerializer should always produce valid XML, then I would prefer an escaping approach to minimize back-compat on calling APIs. Otherwise, we should agree to allow the serializer to produce invalid XML and have that understanding.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>102993</commentid>
    <comment_count>1</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2014-03-27 11:22:06 +0000</bug_when>
    <thetext>If we go with escaping, we should probably use the same rules as http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#coercing-an-html-dom-into-an-infoset</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103052</commentid>
    <comment_count>2</comment_count>
    <who name="Travis Leithead [MSFT]">travil</who>
    <bug_when>2014-03-27 22:03:27 +0000</bug_when>
    <thetext>Makes sense, thanks for the tip.

I just noticed today that the innerHTML/outerHTML APIs specify throwing behavior for nodes in XML Documents that don&apos;t meet the rules outlined in:

http://www.w3.org/html/wg/drafts/html/master/single-page.html#xml-fragment-serialization-algorithm

I think the right plan moving forward is to embed these throwing rules into the algorithm, made conditional on a flag. Then serializeToString would not set the throwing flag to be consistent with the way it works today, but innerHTML/outerHTML would. Then, if we want to, we could extend the capability of serializeToString to allow passing a flag to enable the throwing behavior. This way applications that want the strict serialization via this mechanism can get it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103364</commentid>
    <comment_count>3</comment_count>
    <who name="Travis Leithead [MSFT]">travil</who>
    <bug_when>2014-04-03 17:57:30 +0000</bug_when>
    <thetext>OK. I believe this commit covers all the cases. We now have a dynamically-switchable algorithm to throw or not to throw based on the flag.

Current setting matches the behavior of browsers and of the former spec by not throwing for serializeToString, and throwing for the inner/outerHTML getters (on non-well-formed DOM).

https://dvcs.w3.org/hg/innerhtml/rev/f3d96628e2b5</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>