<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>29831</bug_id>
          
          <creation_ts>2016-09-20 12:59:27 +0000</creation_ts>
          <short_desc>[FO31] fn:transform and serialization to string</short_desc>
          <delta_ts>2016-12-16 19:55:24 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Functions and Operators 3.1</component>
          <version>Candidate Recommendation</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Abel Braaksma">abel.braaksma</reporter>
          <assigned_to name="Michael Kay">mike</assigned_to>
          <cc>tim</cc>
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>127436</commentid>
    <comment_count>0</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-09-20 12:59:27 +0000</bug_when>
    <thetext>When serializing to a string, what serialization-options are applicable? And in the case of a 1.0 processor, does disable-output-escaping come into play (provided a processor supports it)?

Consider:

&lt;xsl:value-of select=&quot;&apos;&amp;lt;br&gt;&apos;&quot; disable-output-escaping=&quot;true&quot; /&gt;

The output would be &quot;&lt;br&gt;&quot;.

and: &lt;xsl:value-of select=&quot;&apos;&amp;lt;br&gt;&apos;&quot; disable-output-escaping=&quot;false&quot; /&gt;

The output would be &quot;&amp;lt;br&gt;&quot;

Does the returned string contain &apos;&lt;&apos;, &apos;b&apos;, &apos;r&apos;, &apos;&gt;&apos; (&quot;&lt;br&gt;&quot;, and therefore illegal XML), or does the string contain &apos;&amp;&apos;, &apos;l&apos;, &apos;t&apos;, &apos;;&apos; , &apos;b&apos;, &apos;r&apos;, &apos;&gt;&apos; (&quot;&amp;lt;br&gt;&quot; and therefore legal XML)?

And with respect to other output options:
- if non-UTF is specified, do we return expanded entities?
- if character-maps is specified, are they invoked (leading, again, to potentially illegal XML)
- if HTML is specified, do we return the string for HTML, or a parsable XML string?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127452</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-09-20 18:53:55 +0000</bug_when>
    <thetext>Looking at the spec, I think it is as clear as it needs to be.

Firstly, it&apos;s clear how you request a serialized result, and it&apos;s clear that serialization-params specified in the request take precedence over those specified in xsl:output.

There is a slight mismatch because we require the serialized result as a string. The serialization spec says:

Note:
Serialization is only defined in terms of encoding the result as a stream of octets. However, a serializer MAY provide an option that allows the encoding phase to be skipped, so that the result of serialization is a stream of Unicode characters. The effect of any such option is implementation-defined, and a serializer is not required to support such an option.


and we might perhaps refer to that note. If the serializer does not support serialization to a string, then the stream of octets can always be decoded as a string. It might be worth mentioning that there are two possible ways of doing this: either skip the encoding phase (in which case escaping of non-encodable characters probably doesn&apos;t happen), or decode the octet stream (in which case you&apos;re probably left with character references for unencodable characters).

Support for disable-output-escaping in XSLT has always been optional, and remains so for the fn:transform function. It&apos;s a deprecated feature and I don&apos;t think we need to say anything about it: if you use it, you&apos;re not guaranteed interoperable.

I don&apos;t see any reason to restrict the output methods available. HTML and JSON make perfectly good sense, for example.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127463</commentid>
    <comment_count>2</comment_count>
    <who name="Abel Braaksma">abel.braaksma</who>
    <bug_when>2016-09-21 00:16:56 +0000</bug_when>
    <thetext>Re comment#1:

I think you are right. I now vaguely remember having seen that paragraph before, and perhaps even asking this question before (which in itself may be an indication that a Note could be helpful).

I used d-o-e as an example. I would have done better using character-maps as an example (they have similar semantics but are interoperable). 

You write &quot;However, a serializer MAY provide an option that allows the encoding phase to be skipped&quot;. 

That suggests that applying character maps (which happens before encoding) has effect. It also suggests that the output method has effect (which, I agree, makes sense, otherwise one should use &quot;raw&quot; if you just needed a tree).

All in all: everything *except* (optionally) encoding takes place and the result is given back as a string (that is, a series of characters, not octets). If encoding takes place, you may get the string &quot;&amp;#x416;&quot; (7 characters) instead of the string &quot;Ж&quot; (Cyrillic Zhe, 1 character).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127550</commentid>
    <comment_count>3</comment_count>
    <who name="Tim Mills">tim</who>
    <bug_when>2016-09-27 15:24:55 +0000</bug_when>
    <thetext>See 

4 Phases of Serialization

(note under point 5).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>127563</commentid>
    <comment_count>4</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2016-09-27 20:03:45 +0000</bug_when>
    <thetext>The WG decided that there was scope here for editorial clarification, but no substantive error in the spec.

I have explained the point about serializing to a character string by reference to the fn:serialize function where the same considerations apply.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>