This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28140 - [FO31] Options of fn:serialize
Summary: [FO31] Options of fn:serialize
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-04 22:04 UTC by Christian Gruen
Modified: 2016-12-16 19:55 UTC (History)
1 user (show)

See Also:


Attachments

Description Christian Gruen 2015-03-04 22:04:20 UTC
The fn:serialize function is used a lot in practice. However, our users report back to us that the declaration of serialization parameters is perceived as inconvenient, and the namespace URI is often forgotten.

It would be great if the last function signature of fn:serialize could be extended such that it accepts both element(output:serialization-parameters)? and map(*):

1) Existing syntax:

  fn:serialize(map { 'year': 1984 },
    <output:serialization-parameters
      xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
      <output:method value="json"/>
    </output:serialization-parameters>
  )

2) Map syntax:

  fn:serialize(map { 'year': 1984 }, map { 'method': 'json })
Comment 1 Michael Kay 2015-03-10 15:53:47 UTC
The WG was favourably disposed and directed the editor to prepare a proposal.
Comment 2 Michael Kay 2015-03-10 18:14:45 UTC
So, how to do it?

Doing some examples is not difficult. The four "boxed" examples in Serialization ยง3.1 become:

map {
  "method" : "xml",
  "version" : 1.0,
  "indent" : true()
}

map {
  "cdata-section-elements" : ["heading", "Q{http://example.org/book}footnote"]
}

map {
  "method" : "html"
}

map {
  "method" : "Q{http://example.org/ext}jsp"
  "use-character-maps" : map {
     "&#xAB;" : "&lt;%",
     "&#xBB;" : "%&gt;"
}

This suggests the following decisions:

* boolean values represented as booleans, not as "yes" and "no"

* lists represented as arrays, not as whitespace-separated strings

* QNames: we should accept bare names for names in no namespace, and for namespaced names, either an EQName as a string, or an xs:QName value. This applies equally to keys and values.

* Character maps: a nested map

The next question is how to specify it. We could either define a mapping directly to the serialization parameters defined in the table at the start of serialization section 3, or we could define a transformation to the XML format for serialization parameters. The latter has attractions because it automatically extends when new parameters are added, but I think it's going to be very long-winded and difficult to get to grips with, so I'm inclined to do the former.

Rather than defining the format within the fn:serialize function, I think it makes sense to rename Serialization 3.1 as "Setting serialization parameters by means of an XDM document or element node", and adding a new section 3.2 as "Setting serialization parameters by means of an XDM map". I will go on to draft the content of that section.
Comment 3 Christian Gruen 2015-03-10 19:44:29 UTC
Thanks for presenting for the first examples. This solution is more elaborate than I originally expected it to be. A simple variant would have been to represent all values as strings, similar to the way the parameters are specified in the query prolog:

  declare option output:indent "no";

But your proposal is obviously more powerful, in particular when it comes to the definition of parameters like use-character-maps.
Comment 4 Michael Kay 2015-03-11 11:03:29 UTC
The following rules are proposed:

Representing Serialization Parameters as a Map

The fn:serialize() function will accept serialization parameters either using the current representation (defined in Serialization 3.1) or using a new representation defined here.

The general principles of the representation are as follows:

The serialization parameters are represented as a map with one entry for each parameter. The representation is designed with two objectives: (a) it should be easy to construct programmatically in XPath; (b) it should be easy to construct as the result of applying the fn:json-doc() function to a JSON document. The latter constraint means that the format must not be dependent on use of data types such as xs:QName that are not expressible in JSON, and it should allow lists to be represented as arrays rather than sequences.

The key of the entry is a string. For standard serialization parameters (such as "method" and "indent") this is the local name of the parameter. For extended serialization parameters, the key is a string in EQName format (for example "Q{http://www.vendor.com/ns}indent-spaces").

The value part of the entry is represented according to the following rules:

* Where the value is a string (for example doctype-public or item-separator), then an atomic value, where the parameter value is the result of casting the supplied atomic value to a string. Note: The value supplied will normally be an instance of xs:string, but it may also be xs:untypedAtomic, and in some cases it may be convenient to supply an instance of xs:anyURI (say for doctype-system) or xs:decimal (say for version).

* Where the required value is one of an enumerated set of values, then an instance of xs:string; except:

** where the enumerated set is ("yes", "no"), then an instance of xs:boolean in which "true" represents "yes" and "false" represents "no". 

** where the enumerated set is ("yes", "no", "omit"), then an instance of xs:boolean? in which "true" represents "yes", "false" represents "no", and the empty sequence represents "omit". 

* Where the required value is a decimal, then an instance of xs:decimal

* Where the required value is an expanded QName, then the representation may be any of the following:

** an instance of xs:QName
** an xs:string in the form of an NCName representing a name in no namespace
** an xs:string in the form of an EQName

* Where the required value is a list of expanded QNames, then either a sequence or an array, whose members represent these names using any of the three representations described in the previous list item.

* Where the required value is a list of (character C, string S) pairs (that is, for the use-character-maps parameter), a map of type map(xs:string, xs:string) containing one entry for each pair, where the key is C represented as an xs:string instance of length 1, and the associated value is S as an instance of xs:string. 

Error handling works as with the existing serialization parameters (this means it's an error if the value for a parameter is invalid, but it's not an error to specify parameters that aren't defined in the spec.)
Comment 5 Michael Kay 2015-03-13 16:09:26 UTC
Note also that fn:transform has the option to accept serialization parameters as a map (and not as a node tree!) This should be aligned so that serialization parameters can be supplied in the same way to both functions, and also to fn:put.
Comment 6 Michael Kay 2015-03-25 12:18:18 UTC
The following recasts the proposed rules to take account of the general rules for option parameters, which were defined in response to bug #28196.

Representing Serialization Parameters as a Map

The fn:serialize() function will accept serialization parameters either using the current representation (defined in Serialization 3.1) or using a new representation defined here.

The *option parameter conventions* apply.

The option values represented according to the following rules:

* Where the serialization parameter value is a string (for example doctype-public or item-separator), then the required type is xs:string. Note: because the function conversion rules apply, the value can also be supplied as xs:untypedAtomic or xs:anyURI.

* Where the serialization parameter value is defined as an enumerated set of values, then the required type is xs:string; except:

** where the enumerated set is ("yes", "no"), then the required type is xs:boolean in which "true" represents "yes" and "false" represents "no". 

** where the enumerated set is ("yes", "no", "omit"), then the required type is xs:boolean? in which "true" represents "yes", "false" represents "no", and the empty sequence represents "omit". 

* Where the serialization parameter value is decimal [for example, the html-version parameter], then the required type is xs:double. Note: because the function conversion rules apply, the value can also be supplied as (for example) xs:integer, xs:decimal, or xs:untypedAtomic.

* Where the serialization parameter value is an expanded QName, then the required type is a union type with member types xs:QName and xs:string. The value may be supplied as any of the following:

** an instance of xs:QName
** an xs:string in the form of an NCName representing a name in no namespace
** an xs:string in the form of an EQName

* Where the serialization parameter value is a list of expanded QNames, then the required type is item()*. The list of values may be supplied as either a sequence or an array, whose members represent these names using any of the three representations described in the previous list item.

* Where the required value is a list of (character C, string S) pairs (that is, for the use-character-maps parameter), a map of type map(xs:string, xs:string) containing one entry for each pair, where the key is C represented as an xs:string instance of length 1, and the associated value is S as an instance of xs:string. The *option parameter conventions* do NOT apply to this map.
Comment 7 Christian Gruen 2015-03-25 12:55:54 UTC
Thanks, Michael, for adding more details. Two questions:

> * Where the serialization parameter value is decimal [for example, the html-version parameter]

In my understanding, version numbers are not necessarily decimal numbers (think e.g. of version strings with more than two numbers, such as "5.0.1"). Maybe strings would be more flexible?

> * Where the serialization parameter value is a list of expanded QNames, then the required type is item()*. The list of values may be supplied as either a sequence or an array


> * Where the required value is a list of (character C, string S) pairs [...] The *option parameter conventions* do NOT apply to this map.

Does this mean that function conversion does not apply here? Is there any particular reason for it?
Comment 8 Michael Kay 2015-05-05 17:48:49 UTC
The proposal in comment 6 was accepted; the editor was given discretion on how to handle the concerns expressed in comment 7.

Concerning the first comment "version numbers are not necessarily decimal numbers", the proposal uses type xs:decimal only for a parameter defined in the Serialization spec to be of type xs:decimal, and this is currently true only for "html-version". The "version" parameter is defined to be a string. I shall point this out more clearly in a Note.

Concerning the second comment, regarding use of the function conversion rules for the character map, I agree, it probably makes sense to use them. It's easier for an implementation to treat xs:untypedAtomic and xs:string as equivalent than to treat them differently.