This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28540 - [FO30] [FO31] defaults for serialization parameters with fn:serialize not defined
Summary: [FO30] [FO31] defaults for serialization parameters with fn:serialize not def...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-22 22:48 UTC by Abel Braaksma
Modified: 2016-12-16 19:55 UTC (History)
2 users (show)

See Also:


Attachments

Description Abel Braaksma 2015-04-22 22:48:04 UTC
The function fn:serialize (section 14.7.3 9 FO31 and 14.9.3 in FO30) does not enlist what the defaults are for the serialization. It stands to reason that it takes similar defaults as defined in XSLT for the xsl:output method, but I couldn't find this referenced in the text.

The reference to XDM 3.0 and 3.1 respectively does not solve this issue either, specifically, that specification says:

   There are a number of parameters that influence how serialization is 
   performed. Host languages MAY allow users to specify any or all of these 
   parameters, but they are not REQUIRED to be able to do so. However, the 
   host language specification MUST specify how the values of all applicable 
   parameters are to be determined.

The crux is in the last sentence, which I understand such that the host language (XPath F&O) should define these defaults, esp. since no defaults are mentioned in XDM.

I figured for a moment that booleans should default to "no", method to "xml", but for instance escape-uri-attributes should probably default to "yes" and there are other enumerations with unclear defaults, like encoding and omit-xml-declaration (default for a file is probably "no", as with XSLT, but default for inlining, as the most common use-case, should probably be "yes").

Perhaps any and all of these settings are supposed to be implementation dependent (which would make calls to this function non-interoperable unless you provide a fully filled set of serialization options), but if so, I think we should say so in the spec.

Other perhaps: if the idea is that the host language of XPath (i.e., XSLT or XQuery) should define these defaults, then also, I think we should emphasize that, and specify it in the host languages.
Comment 1 Abel Braaksma 2015-04-22 22:59:29 UTC
> The reference to XDM 3.0 and 3.1

I meant Serialization 3.0 and 3.1.

Note also, that in addition to the MUST in the previous quote, there's also a MAY in regards to defaults of the serialization XDM tree, but I understand it as subject to the MUST in the previous quote, where the following quote only applies to the additional mechanism of the provided serialization settings XDM tree, which MAY have other defaults than the defaults in the quote in comment#0:

   The host language MAY provide additional mechanisms for overriding the 
   values of any serialization parameters specified through the mechanism 
   defined in this section, as well as additional mechanisms for specifying 
   the values of any serialization parameters whose values are absent after 
   applying the mechanism defined in this section.
Comment 2 Abel Braaksma 2015-04-22 23:01:21 UTC
(from a comment by MKay in bug 28537 it appears that I should post these as FO31, reclassified it as such)
Comment 3 Michael Kay 2015-05-05 21:55:59 UTC
I don't think the defaults should depend on which host language fn:serialize() is called from. It's generally bad form for the result of a function to depend on who is calling it, unless there is very strong justification. Also, the XSLT rules are pretty messy.

I was first inclined to use the defaults defined in Appendix C.1 of the XQuery spec. However, I think these leave too many things implementation-defined. The result of fn:serialize() will typically be processed by the application, so it really needs predictability, e.g. of whether there is going to be an XML declaration.

Also, Appendix C.1 declines to give defaults for parameters that are not applicable to the XML output method. This begs the question, what values do this parameters take if a different method is chosen? I have therefore given defaults for all parameters.

So I would suggest the following:

allow-duplicate-names	no
byte-order-mark	no
cdata-section-elements	empty
doctype-public	none
doctype-system	none
encoding	utf-8 (the result is a string, not a sequence of octets, but this still affects the encoding option in the XML declaration)	
escape-uri-attributes	yes
html-version	5
include-content-type	yes
indent	no
item-separator	tab
json-node-output-method	xml
media-type	depends on the chosen method
method	xml
normalization-form none
omit-xml-declaration no
standalone no
suppress-indentation	empty
undeclare-prefixes	no
use-character-maps	empty
version	1.0
Comment 4 Josh Spiegel 2015-05-05 23:25:05 UTC
I do not want the change proposed in comment 3.  It is too late to mandate defaults for these parameters since fn:serialize is already part of a w3c recommendation:
http://www.w3.org/TR/xpath-functions-30/#func-serialize

Adding specific defaults would force implementations to make backwards-incompatible changes when they adopt 3.1 and thus violates requirement 2.2.1:

  "XQuery 3.1 MUST be backward compatible with [XQuery 3.0]."
  http://www.w3.org/TR/xquery-31-requirements/#backward-compatibility

For example, an XQuery implementation is currently allowed to select a default for omit-xml-declaration in the static context (Appendix C.1) and for fn:serialize.  An implementation may have reasonably selected "yes" as the default in both cases.  For this implementation, the change proposed in comment 3 would:

(1) Cause fn:serialize to return different results in 3.1
(2) Create an undesirable inconsistency between the default behavior of fn:serialize and the default behavior when serializing the result of a query
Comment 5 Abel Braaksma 2015-05-06 02:41:20 UTC
I don't see the backward compatibility issue. If previously XQuery specified implementation-defined, it already means that different implementations may have different defaults. Specifying a default removes that ambiguity.

Also, I think that both XQuery and XSLT can override the defaults of XPath, and such an override could itself be specified as "implementation defined", possibly with a *should* or *may* pointing to the XPath default.

The point here is to define a default for XPath on itself, outside of the context of XQuery or XSLT.

On comment#3:

> encoding               utf-8

I would opt for utf-16 here, it seems more appropriate to the likely internal encoding of the string, which most platforms will have as utf-16 in memory.

> item-separator         tab

Wouldn't a space be closer to the principal of least-surprise, as it is already the default for tokens and combining sequences?

> omit-xml-declaration   no

If we'd assume further string-processing, comparing, searching etc, "yes" seems more appropriate. For an in-memory string presentation, I see little use for an xml declaration. Also, consider the scenario where you output a sequence of atomic values, an xml declaration as default seems out of place then.

My 2p, incl. above changes, would then be:

allow-duplicate-names      no
byte-order-mark            no
cdata-section-elements     empty
doctype-public             none
doctype-system             none
encoding                   utf-16
escape-uri-attributes      yes
html-version               5
include-content-type       yes
indent                     no
item-separator             &#x20
json-node-output-method    xml
media-type                 depends on the chosen method
method                     xml
normalization-form         none
omit-xml-declaration       yes
standalone                 omit (prefer over no)
suppress-indentation       empty
undeclare-prefixes         no
use-character-maps         empty
version                    1.0
Comment 6 Josh Spiegel 2015-05-08 15:22:30 UTC
I checked 4 implementations from different companies:

   3 evaluate fn:serialize(<e/>) as "<e/>"

   1 evaluates fn:serialize(<e/>) as "<?xml version="1.0" encoding="UTF-8"?><e/>"

By forcing the default value of "omit-xml-declaration", it will cause a backwards compatibility issue for somebody.  This is just one example.

----

Consider this expression:

   (1, 2, 3)

Serializing the result as XML in 1.0 would produce the string:

   "1 2 3"

Consider this 3.0 expression:

   fn:serialize((1, 2, 3))

The least surprising behavior is that it evaluates to the same string.  Under Mike's proposal, in XQuery 3.1 it would not.  

Forcing the defaults for fn:serialize will create inconsistencies like this.  In general, I want the defaults I use in the static context and in XQJ to be the same as the defaults I use for fn:serialize().
Comment 7 Michael Kay 2015-05-08 16:02:12 UTC
>3 evaluate fn:serialize(<e/>) as "<e/>"

>1 evaluates fn:serialize(<e/>) as "<?xml version="1.0" encoding="UTF-8"?><e/>"

I understand the compatibility argument, but the fact that different implementations do different things, in my mind, adds to the importance of defining the defaults so that users get interoperable behaviour. (I don't really care very much what the defaults are, that's a second-order discussion.)

We could address the compatibility concern by suggesting that implementations that have provided different defaults in previous product releases could provide a configuration switch to retain the previous behaviour.
Comment 8 Josh Spiegel 2015-05-08 17:24:42 UTC
I don't see why the specifications would treat the serialization defaults in the static context any differently than the defaults for fn:serialize.  The reality is that they both impact the predictability of an implementation.  I think it is more important that the default serialization behavior is consistent within an implementation than it is that serialization behaves consistently across implementations. 

Making changes to the specifications that cause implementations to ship with compatibility configuration switches would be unfortunate.  It puts the burden on existing users to discover and set the switch or makes the implementation non-conformant by default.
Comment 9 Christian Gruen 2015-05-11 22:26:03 UTC
I share some of Josh's concerns when it comes to introducing defaults that haven't existed in earlier versions of XQuery. On the other hand, I would have favored if defaults had been defined in the very beginning, so I have a hard time deciding for one or another.

However, these would be my favorites when defaults are to be chosen:

> encoding               utf-8

This would also be my favorite. One of the reasons is that most today's XML documents are encoded as UTF-8, so to me it seems to be the most obvious output format as well (think e.g. that the results are also stored in files, or simply output on command-line as well).

> item-separator         tab

I think it would be the most unobtrusive choice not to define any character by default (i.e., to have it absent). If a character is to be assigned, though, the newline character (&#xA;) could be another alternative. It is already used in the adaptive method if the item-separator is absent. As second alternative, I would also prefer to have a space instead of a tab.

> standalone             no

Similar to Abel, I would prefer 'omit' as default.

> omit-xml-declaration   no

I would also vote for 'yes' as default value. The XML declaration only makes sense if the XML method is used for serializing a single document. Moreover, if UTF-8 will be defined as default encoding, and if 'omit' was chosen as the default of 'standalone', the XML declaration would be superfluous for the default case.
Comment 10 Josh Spiegel 2015-05-12 15:53:45 UTC
The working group discussed this:

The specification will clarify that the default serialization parameters used by fn:serialize are implementation defined.  

When the second parameter is a map, any parameter not specified by the map will have a specific default value.  The defaults used in this case will be the defaults proposed in Comment 9 by Christian.  

Mike will write a proposal along these lines.
Comment 11 Christian Gruen 2015-05-20 12:54:29 UTC
A related note (I'm adding it here, because I'm not sure if it has already been discussed): A default should also be specified for the indent option of fn:xml-to-json ('false'?).