This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The function fn:serialize (section 14.7.3 9 FO31 and 14.9.3 in FO30) does not enlist what the defaults are for the serialization. It stands to reason that it takes similar defaults as defined in XSLT for the xsl:output method, but I couldn't find this referenced in the text. The reference to XDM 3.0 and 3.1 respectively does not solve this issue either, specifically, that specification says: There are a number of parameters that influence how serialization is performed. Host languages MAY allow users to specify any or all of these parameters, but they are not REQUIRED to be able to do so. However, the host language specification MUST specify how the values of all applicable parameters are to be determined. The crux is in the last sentence, which I understand such that the host language (XPath F&O) should define these defaults, esp. since no defaults are mentioned in XDM. I figured for a moment that booleans should default to "no", method to "xml", but for instance escape-uri-attributes should probably default to "yes" and there are other enumerations with unclear defaults, like encoding and omit-xml-declaration (default for a file is probably "no", as with XSLT, but default for inlining, as the most common use-case, should probably be "yes"). Perhaps any and all of these settings are supposed to be implementation dependent (which would make calls to this function non-interoperable unless you provide a fully filled set of serialization options), but if so, I think we should say so in the spec. Other perhaps: if the idea is that the host language of XPath (i.e., XSLT or XQuery) should define these defaults, then also, I think we should emphasize that, and specify it in the host languages.
> The reference to XDM 3.0 and 3.1 I meant Serialization 3.0 and 3.1. Note also, that in addition to the MUST in the previous quote, there's also a MAY in regards to defaults of the serialization XDM tree, but I understand it as subject to the MUST in the previous quote, where the following quote only applies to the additional mechanism of the provided serialization settings XDM tree, which MAY have other defaults than the defaults in the quote in comment#0: The host language MAY provide additional mechanisms for overriding the values of any serialization parameters specified through the mechanism defined in this section, as well as additional mechanisms for specifying the values of any serialization parameters whose values are absent after applying the mechanism defined in this section.
(from a comment by MKay in bug 28537 it appears that I should post these as FO31, reclassified it as such)
I don't think the defaults should depend on which host language fn:serialize() is called from. It's generally bad form for the result of a function to depend on who is calling it, unless there is very strong justification. Also, the XSLT rules are pretty messy. I was first inclined to use the defaults defined in Appendix C.1 of the XQuery spec. However, I think these leave too many things implementation-defined. The result of fn:serialize() will typically be processed by the application, so it really needs predictability, e.g. of whether there is going to be an XML declaration. Also, Appendix C.1 declines to give defaults for parameters that are not applicable to the XML output method. This begs the question, what values do this parameters take if a different method is chosen? I have therefore given defaults for all parameters. So I would suggest the following: allow-duplicate-names no byte-order-mark no cdata-section-elements empty doctype-public none doctype-system none encoding utf-8 (the result is a string, not a sequence of octets, but this still affects the encoding option in the XML declaration) escape-uri-attributes yes html-version 5 include-content-type yes indent no item-separator tab json-node-output-method xml media-type depends on the chosen method method xml normalization-form none omit-xml-declaration no standalone no suppress-indentation empty undeclare-prefixes no use-character-maps empty version 1.0
I do not want the change proposed in comment 3. It is too late to mandate defaults for these parameters since fn:serialize is already part of a w3c recommendation: http://www.w3.org/TR/xpath-functions-30/#func-serialize Adding specific defaults would force implementations to make backwards-incompatible changes when they adopt 3.1 and thus violates requirement 2.2.1: "XQuery 3.1 MUST be backward compatible with [XQuery 3.0]." http://www.w3.org/TR/xquery-31-requirements/#backward-compatibility For example, an XQuery implementation is currently allowed to select a default for omit-xml-declaration in the static context (Appendix C.1) and for fn:serialize. An implementation may have reasonably selected "yes" as the default in both cases. For this implementation, the change proposed in comment 3 would: (1) Cause fn:serialize to return different results in 3.1 (2) Create an undesirable inconsistency between the default behavior of fn:serialize and the default behavior when serializing the result of a query
I don't see the backward compatibility issue. If previously XQuery specified implementation-defined, it already means that different implementations may have different defaults. Specifying a default removes that ambiguity. Also, I think that both XQuery and XSLT can override the defaults of XPath, and such an override could itself be specified as "implementation defined", possibly with a *should* or *may* pointing to the XPath default. The point here is to define a default for XPath on itself, outside of the context of XQuery or XSLT. On comment#3: > encoding utf-8 I would opt for utf-16 here, it seems more appropriate to the likely internal encoding of the string, which most platforms will have as utf-16 in memory. > item-separator tab Wouldn't a space be closer to the principal of least-surprise, as it is already the default for tokens and combining sequences? > omit-xml-declaration no If we'd assume further string-processing, comparing, searching etc, "yes" seems more appropriate. For an in-memory string presentation, I see little use for an xml declaration. Also, consider the scenario where you output a sequence of atomic values, an xml declaration as default seems out of place then. My 2p, incl. above changes, would then be: allow-duplicate-names no byte-order-mark no cdata-section-elements empty doctype-public none doctype-system none encoding utf-16 escape-uri-attributes yes html-version 5 include-content-type yes indent no item-separator   json-node-output-method xml media-type depends on the chosen method method xml normalization-form none omit-xml-declaration yes standalone omit (prefer over no) suppress-indentation empty undeclare-prefixes no use-character-maps empty version 1.0
I checked 4 implementations from different companies: 3 evaluate fn:serialize(<e/>) as "<e/>" 1 evaluates fn:serialize(<e/>) as "<?xml version="1.0" encoding="UTF-8"?><e/>" By forcing the default value of "omit-xml-declaration", it will cause a backwards compatibility issue for somebody. This is just one example. ---- Consider this expression: (1, 2, 3) Serializing the result as XML in 1.0 would produce the string: "1 2 3" Consider this 3.0 expression: fn:serialize((1, 2, 3)) The least surprising behavior is that it evaluates to the same string. Under Mike's proposal, in XQuery 3.1 it would not. Forcing the defaults for fn:serialize will create inconsistencies like this. In general, I want the defaults I use in the static context and in XQJ to be the same as the defaults I use for fn:serialize().
>3 evaluate fn:serialize(<e/>) as "<e/>" >1 evaluates fn:serialize(<e/>) as "<?xml version="1.0" encoding="UTF-8"?><e/>" I understand the compatibility argument, but the fact that different implementations do different things, in my mind, adds to the importance of defining the defaults so that users get interoperable behaviour. (I don't really care very much what the defaults are, that's a second-order discussion.) We could address the compatibility concern by suggesting that implementations that have provided different defaults in previous product releases could provide a configuration switch to retain the previous behaviour.
I don't see why the specifications would treat the serialization defaults in the static context any differently than the defaults for fn:serialize. The reality is that they both impact the predictability of an implementation. I think it is more important that the default serialization behavior is consistent within an implementation than it is that serialization behaves consistently across implementations. Making changes to the specifications that cause implementations to ship with compatibility configuration switches would be unfortunate. It puts the burden on existing users to discover and set the switch or makes the implementation non-conformant by default.
I share some of Josh's concerns when it comes to introducing defaults that haven't existed in earlier versions of XQuery. On the other hand, I would have favored if defaults had been defined in the very beginning, so I have a hard time deciding for one or another. However, these would be my favorites when defaults are to be chosen: > encoding utf-8 This would also be my favorite. One of the reasons is that most today's XML documents are encoded as UTF-8, so to me it seems to be the most obvious output format as well (think e.g. that the results are also stored in files, or simply output on command-line as well). > item-separator tab I think it would be the most unobtrusive choice not to define any character by default (i.e., to have it absent). If a character is to be assigned, though, the newline character (
) could be another alternative. It is already used in the adaptive method if the item-separator is absent. As second alternative, I would also prefer to have a space instead of a tab. > standalone no Similar to Abel, I would prefer 'omit' as default. > omit-xml-declaration no I would also vote for 'yes' as default value. The XML declaration only makes sense if the XML method is used for serializing a single document. Moreover, if UTF-8 will be defined as default encoding, and if 'omit' was chosen as the default of 'standalone', the XML declaration would be superfluous for the default case.
The working group discussed this: The specification will clarify that the default serialization parameters used by fn:serialize are implementation defined. When the second parameter is a map, any parameter not specified by the map will have a specific default value. The defaults used in this case will be the defaults proposed in Comment 9 by Christian. Mike will write a proposal along these lines.
A related note (I'm adding it here, because I'm not sure if it has already been discussed): A default should also be specified for the indent option of fn:xml-to-json ('false'?).