RE: [Serial] I18N WG last call comments

Hello Henry,

[this is a reply to
http://lists.w3.org/Archives/Public/public-qt-comments/2004Jun/0038.html]

We have looked at your code examples below in detail. The examples
you are giving look reasonable, but we are concerned about
is cases where text is not put together programmatically,
but just concatenated, e.g. in an example such as

<p>Document creation date: <xsl:sequence
          select="hz:format(xs:date('2004-12-21'), 'y-m-d')"/>.</p>

Overall, I think that the convention of using a space between
strings, inherited from SGML NMTOKENS and IDREFS, should not be the
default in XQuery and XSLT to contatenate strings. Either there
should be a function, e.g. called stringify-tokens, to handle
cases such as "red green blue", which I guess would make the
first of your examples

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 version="2.0">
   <xsl:template match="/">
     <elem colors="stringify-tokens({data(elem/@colors)})"/>
   </xsl:template>
</xsl:stylesheet>

or alternatively, making sure that the data model can distinguish,
based on schema information, between tokens such as NMTOKENS and IDREFS
and plain strings that don't need spaces when concatenated.

Simply defining that all strings behave like tokens because some
strings are tokens doesn't seem to make sense at all.

Regards,   Martin.


At 12:39 04/06/09, Henry Zongaro wrote:
 >
 >Hi, Martin.
 >
 >     In [1], you wrote:
 >
 >Martin Duerst wrote on 2004-05-24 05:31:53 AM:
 >> At 17:52 04/05/06 +0100, Michael Kay wrote:
 >> >The places where XSLT/XQuery use space as a default separator are all
 >> >associated with converting a typed value to the string value of a node,
 >and
 >> >are therefore closely associated with this XML Schema convention for
 >> >representing lists. Of course we can't totally control how the facility
 >is
 >> >used, but we do provide a string-join function that allows any
 >separator to
 >> >be used in the lexical representation of a sequence, so we are not
 >imposing
 >> >any constraints on users.
 >>
 >> Would it be possible for you to write the following three examples:
 >>
 >> - An example (such as above with "red", "green", "blue", but with the
 >>    actual code) where these are e.g. NMTOKENS, and where the
 >serialization
 >>    with spaces makes sense.
 >
 >Assume the following input document, where the type of the colors
 >attribute is xs:NMTOKENS.
 >
 ><elem colors="red   green  blue"/>
 >
 >and the following stylesheet:
 >
 ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 >                version="2.0">
 >  <xsl:template match="/">
 >    <xsl:sequence select="data(elem/@colors)"/>
 >  </xsl:template>
 ></xsl:stylesheet>
 >
 >The result of serialization will be the following external general parsed
 >entity.
 >
 ><?xml version="1.0" encoding="UTF-8"?>red green blue
 >
 >That entity might be subsequently referenced in the content of an element
 >that has the simple type xs:NMTOKENS.  If the PSVI that results is used to
 >construct an instance of the XPath/XQuery Data Model, the typed valued of
 >the element would be a sequence of three values of type xs:NMTOKEN;
 >without the spaces, the typed value would be a sequence of a single value
 >of type xs:NMTOKEN:  "redgreenblue".
 >
 >
 >Compare that with the result of the following stylesheet, where the rules
 >for evaluating an attribute value template (section 5.5 of the last call
 >draft of XSLT 2.0) state that each atomized value in the sequence that
 >results from evaluating each XPath expression will be converted to a
 >string, and separated by a space:
 >
 ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 >                version="2.0">
 >  <xsl:template match="/">
 >    <elem colors="{data(elem/@colors)}"/>
 >  </xsl:template>
 ></xsl:stylesheet>
 >
 >Result:
 >
 ><?xml version="1.0" encoding="UTF-8"?><elem colors="red green blue"/>
 >
 >Again, if that serialized entity is assessed against a schema in which the
 >colors attribute has type xs:NMTOKENS, the typed value of the attribute
 >will be a sequence of three values of type xs:NMTOKEN.
 >
 >
 >Similarly, the result of the following stylesheet, where the rules for
 >constructing complex content (section 5.6.1 of XSLT 2.0) describe how a
 >text node is created from the sequence of atomic values that results from
 >evaluating the xsl:sequence instruction:
 >
 ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 >                version="2.0">
 >  <xsl:template match="/">
 >    <elem><xsl:sequence select="data(elem/@colors)"/></elem>
 >  </xsl:template>
 ></xsl:stylesheet>
 >
 >Result:
 >
 ><elem>red green blue</elem>
 >
 >> - An example with e.g. strings used as intermediate text in a formating-
 >>    like operation (a la printf in C), where inserting spaces would
 >happen,
 >>    but would not be desired.
 >
 >Is this the kind of example you're looking for?  I've used an XPath
 >expression to perform a simple date formatting operation, constructing the
 >result as a sequence of strings.
 >
 ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 >                xmlns:hz="http://www.example.org"
 >                xmlns:xs="http://www.w3.org/2001/XMLSchema"
 >                version="2.0"
 >                exclude-result-prefixes="hz xs">
 >  <xsl:function name="hz:format">
 >    <xsl:param name="date" as="xs:date"/>
 >    <xsl:param name="format" as="xs:string"/>
 >
 >    <xsl:sequence
 >       select="
 >         for $c in
 >           (for $i in (1 to string-length($format))
 >            return substring($format, $i, 1))
 >         return
 >           if ($c = 'y') then
 >             get-year-from-date($date)
 >           else if ($c = 'd') then
 >             get-day-from-date($date)
 >           else if ($c = 'm') then
 >             get-month-from-date($date)
 >           else if ($c = 'M') then
 >             ('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
 >              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
 >             [get-month-from-date($date)]
 >           else
 >             $c"/>
 >  </xsl:function>
 >
 >  <xsl:template match="/">
 >    <doc>
 >      <v1>
 >       <xsl:sequence
 >         select="hz:format(xs:date('2004-12-21'), 'y-m-d')"/>
 >      </v1>
 >      <v2>
 >       <xsl:sequence
 >         select="hz:format(xs:date('2004-12-31'), 'M d, y')"/>
 >      </v2>
 >    </doc>
 >  </xsl:template>
 ></xsl:stylesheet>
 >
 >This stylesheet will produce the following result, which is probably not
 >what was intended.
 >
 ><doc><v1>2004 - 12 - 21</v1><v2>Dec   31 ,   2004</v2></doc>
 >
 >> - The previous example with the above 'string-join' function used to
 >>    avoid the problems with spaces.
 >
 >If I change the definition of hz:format to add in a reference to
 >string-join, specifying '' as the separator,
 >
 >  <xsl:function name="hz:format-date">
 >    <xsl:param name="date" as="xs:date"/>
 >    <xsl:param name="format" as="xs:string"/>
 >
 >    <xsl:sequence
 >      select="string-join(
 >                for $c in
 >                  (for $i in (1 to string-length($format))
 >                     return substring($format, $i, 1))
 >                return
 >                  ...
 >              , '')"/>
 >  </xsl:function>
 >
 >the result will be:
 >
 ><doc><v1>2004-12-21</v1><v2>Dec 31, 2004</v2></doc>
 >
 >Thanks,
 >
 >Henry
 >[1]
 >http://lists.w3.org/Archives/Public/public-qt-comments/2004May/0053.html
 >------------------------------------------------------------------
 >Henry Zongaro      Xalan development
 >IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
 >mailto:zongaro@ca.ibm.com 

Received on Tuesday, 26 October 2004 06:15:32 UTC