Bug 9302 - [XQuery11] How are output declarations processed?
Summary: [XQuery11] How are output declarations processed?
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 3.0 (show other bugs)
Version: Working drafts
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL: http://www.w3.org/TR/xquery-11/#id-se...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-23 20:15 UTC by Henry Zongaro
Modified: 2010-07-20 15:35 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Zongaro 2010-03-23 20:15:40 UTC
According to section 2.2.4 of the XQuery 1.1 working draft,[1] "A serialization parameter that is not applicable to the chosen output method must be ignored, except that if its value is not a valid value for that parameter, the error may be reported.

A processor that is performing serialization must report a serialization error if the values of any serialization parameters (other than any that are ignored under the previous paragraph) are incorrect."

There's no indication how parameters are to be interpreted by a processor that is performing serialization.

In most cases, one can rely on intuition to decide how an output declaration translates to a serialization parameter, but in some cases that's not possible.  For example, the use-character-maps serialization parameter,[2] is "A list of pairs, possibly empty, with each pair consisting of a single Unicode character and a string of Unicode characters."  It's not clear how one writes that in an output declaration.  I think that needs to be specified.

In the case of cdata-section-elements and suppress-indentation, the value of the parameter is a list of expanded QNames.  Presumably the value of the output declaration for those two parameters should be a whitespace-separated list of lexical QNames.  But if the prefix of one such lexical QName is not declared in the in-scope namespaces, I would expected err:XPST0081[3] should be reported, rather than a serialization error.  I think how lexical QNames are handled in the value of an output declaration needs to be specified, at least.

[1] http://www.w3.org/TR/2009/WD-xquery-11-20091215/#id-serialization
[2] http://www.w3.org/XML/Group/qtspecs/specifications/xslt-xquery-serialization-11/html/Overview.html#serparam
[3] http://www.w3.org/TR/xquery-11/#ERRXPST0081
Comment 1 Henry Zongaro 2010-03-23 20:17:55 UTC
Sorry, the reference for [2] should have been:

[2] http://www.w3.org/TR/xslt-xquery-serialization-11/#serparam
Comment 2 Jonathan Robie 2010-05-04 14:42:45 UTC
I think this is best resolved by:

1. Specifying a format for a list of pairs in a character map. This may be a bit of a bike shed, I think a string literal encoded in Python's map syntax would make sense here:

declare option output:use-character-maps "{ 
  "«" :"<%", 
  "»" : "%>"
  "§" : '"'
 }"; 

Or we could delimit the pairs in some other way, e.g. using [] or ().


declare option output:use-character-maps "{
  ["«" ,"<%"], 
  ["»" , "%>"], 
  ["§" , '"'] 
}"; 


2. Raising err:XPST0081 if the prefix of a lexical QName in a serialization parameter is not defined.
Comment 3 Michael Kay 2010-05-04 14:54:41 UTC
The value in declare option has to be a string literal, so it can't contain embedded quotes unless they are escaped. One solution to this might be to generalize "declare option" so the value can be something other than a string literal, for example (1,2,3), ("abc", "def"), or <a x="y"/>. This would give us more syntactic possibilities to play with, though I'm not sure what the right answer would be.

Another possibility is for the value of the use-character-maps property to be the URL of an XML document that contains the character map using some well-defined XML vocabulary.
Comment 4 Jonathan Robie 2010-05-11 21:58:47 UTC
The WG has agreed to allow output declarations to specify the URL of a file containing serialization options, using the same format as XSL. Any serialization option can be specified this way. This is the only way we will support declarations for character maps at present. Other options can still be declared using the existing syntax.
Comment 5 Michael Kay 2010-05-11 23:26:07 UTC
I'm reopening this because the resolution in comment #4 still leaves some detail to be defined. Saying that character maps should use "the same format as XSLT" isn't really a full specification. The XSLT rules allow character maps to appear in stylesheet modules and there are complex rules for dealing with duplicate mappings based on import precedence and the like; character maps can also be built up hierarchically from smaller character maps. This seems over-engineered for XQuery (which doesn't have facilities for overriding other rather more important things, such as functions). I think a simple map of the form:

<x:character-map>
  <x:output-character character="char" string="string"/>
  ...
</x:character-map>

would do the job just fine. Since this syntax isn't identical to the XSLT syntax (character maps are not named and there is no use-character-maps attribute), I would think that a namespace other than the XSLT namespace should be used.
Comment 6 Henry Zongaro 2010-05-12 12:55:54 UTC
I agree with Michael's comment #5.  I would suggest that we specify put this in the serialization specification, so that any host language including XQuery can easily include it by reference.
Comment 7 Jonathan Robie 2010-05-12 12:58:36 UTC
(In reply to comment #6)
> I agree with Michael's comment #5.  I would suggest that we specify put this in
> the serialization specification, so that any host language including XQuery can
> easily include it by reference.

That makes sense to me. To me, it's important to have one mechanism that works for both XSLT and XQuery. I don't think we should force users to define their serialization options once for one language, then turn around and define the same options again for the other language. And many users will use both languages.
Comment 8 Michael Kay 2010-05-12 13:09:03 UTC
(In re(In reply to comment #7)

> That makes sense to me. To me, it's important to have one mechanism that works
> for both XSLT and XQuery.

I agree: if we define this mechanism, I would want to allow it to be used in XSLT, perhaps by means of a new xsl:output attribute such as external-character-map="uri", or perhaps by an attribute on xsl:character-map itself.
Comment 9 Henry Zongaro 2010-06-08 14:17:39 UTC
I agreed in comment 5 to propose changes to the Serialization 1.1 draft to allow serialization parameters to be specified in the form of an XML document, which I will refer to as a parameters document below.  I'd like consensus on the approach before specifying it in great detail.  To that end I propose the following changes to Serialization 1.1:

We define a schema for serialization parameters.  An instance might look like this:

<ser:parameters xmlns:ser="http://www.w3.org/2010/xslt-xquery-serialization"
       ser-rec-version="1.1">
  <ser:method>xml</ser:method>
  <ser:indent>yes</ser:indent>
  <ser:cdata-section-elements xmlns:my="http://example.org">
     my:cdata-elem my:other-elem
  </ser:cdata-section-elements>
  <ser:use-character-maps>
    <ser:character-map character="~" map-string=">"/>
    <ser:character-map character="!" map-string="&lt;"/>
  </ser:use-character-maps>
</ser:parameters>

Of course, many of these are simple types, so we might decide that the values of such parameters should be specified in "value" attributes this way:

<ser:parameters xmlns:ser="http://www.w3.org/2010/xslt-xquery-serialization"
       ser-rec-version="1.1">
  <ser:method value="xml"/>
  <ser:indent value="yes"/>
  <ser:omit-xml-declaration value="no"/>
  <ser:use-character-maps>
    <ser:character-map character="~" map-string=">"/>
    <ser:character-map character="!" map-string="&lt;"/>
  </ser:use-character-maps>
</ser:parameters>

Or make them attributes of ser:parameters itself:

<ser:parameters xmlns:ser="http://www.w3.org/2010/xslt-xquery-serialization"
       method="xml" indent="yes" omit-xml-declaration="no"
       ser-rec-version="1.1">
  <ser:use-character-maps>
    <ser:character-map character="~" map-string=">"/>
    <ser:character-map character="!" map-string="&lt;"/>
    <ser:character-map character="&#xFFFD;" map-string="Oh, oh!"/>
  </ser:use-character-maps>
</ser:parameters>

I would recommend against the third of these choices.  Implementers will likely want to tailor the behaviour of some parameters - the indent parameter, for instance, by specifying whether to use tabs or the number of whitespace characters to add at each level of indentation.  Placing such implementation-specific parameters in an attribute that modifies an element would make it easier to see than jumbled in amongst a long list of attributes on the ser:parameters element itself.

If we decide against the third choice, I propose the content of the ser:parameters element would be an "all" group, with an optional element for each parameter (minOccurs="0" maxOccurs="1").  We can allow for wildcard elements for implementation-defined parameters and wildcard attributes that an implementation may use to modify the behaviour of a parameter - all to the extent such things are permitted by the Serialization specification today.

XSLT and XQuery differ in their default values for serialization parameters.  Therefore, I would recommend against specifying default values in the schema.

Then, we can specify that a host language may use a parameters document (or an Infoset) to specify the values of serialization parameters as if by doing the following:
(1) validating the parameters document against the schema for serialization parameters (and possibly against other implementation-defined schemata);
(2) constructing an instance of the XDM from the resulting PSVI; and
(3) evaluating XPath expressions of the form /ser:parameters/ser:method and so forth to determine the values of the serialization parameters, with appropriate settings of the components of the static and dynamic contexts.

Host languages would be permitted to override any or all of those serialization parameter settings in any host language dependent way and supply values for serialization parameters whose values were not determined by that process.
Comment 10 Henry Zongaro 2010-07-06 13:06:59 UTC
The story thus far:  at the joint XQuery/XSL call of 8 June 2010,[4] the working groups decided that a "file format" similar to the following example (the second example of comment #9) would be preferred.

<ser:parameters xmlns:ser="http://www.w3.org/2010/xslt-xquery-serialization"
       ser-rec-version="1.1">
  <ser:method value="xml"/>
  <ser:indent value="yes"/>
  <ser:omit-xml-declaration value="no"/>
  <ser:use-character-maps>
    <ser:character-map character="~" map-string=">"/>
    <ser:character-map character="!" map-string="&lt;"/>
  </ser:use-character-maps>
</ser:parameters>

A more formal proposal and ensuing discussion began with [5].

Member-only links:
[4] http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jun/0034.html
[5] http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jun/0151.html
Comment 11 Henry Zongaro 2010-07-20 15:35:45 UTC
From the draft minutes of the XQuery and XSL F2F meeting:[6]   (All links are member-only)

DECIDED We adopt http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jul/0159.html as the solution to
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9302, modulo the
following corrections:

The schema seems to have errors:

   Errors in proposed schema for serialization parameters
   http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jul/0164.html

This pattern is too loose - there's a better RE in the XML spec:

   <xs:simpleType name="encoding-string-type">
     <xs:restriction base="xs:string">
       <xs:pattern value="[&#x21;-&#x7F;]*"/>
     </xs:restriction>
   </xs:simpleType>

Is the hyphen behind the 9 correct here? It looks suspicious ...

   <xs:simpleType name="pubid-char-string-type">
     <xs:restriction base="xs:string">
       <xs:pattern value="([- \r\n\ta-zA-Z0-9-'()+,./:=?;!*#@$_%])*"/>
     </xs:restriction>
   </xs:simpleType>

The spec needs examples that compile against the schema.

Mike Kay has a version that parses here - he hasn't fixed the patterns:

http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jul/0165.html

"output:parameter-document =":

   should be no =

ACTION Henry to fix problems with the schema and add examples that
parse against the schema. Henry to change 'ser' prefix to 'output'

DECISION - move serialization options (XQuery C3) to the static
context. Module scope. Implementations are allowed to overwrite or
augment. Consistency rules column needs to be filled in.

[6] http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Jul/0174.html