I18n WG comments on XSLT 2.0 and XQuery 1.0 Serialization

Version reviewed

http://www.w3.org/TR/2005/WD-xslt-xquery-serialization-20050404/

Main reviewer

Felix Sasaki fsasaki@w3.org

Notes

These are so far personal comments, NOT on behalf of the I18N WG.

Comments

ID Location Comment Additional information Accepted
1 Sec. 2, point 3 "each separated by a single space" Inserting a space may not be the right thing, in particular for Chinese, Japanese. Thai, ... which don't have spaces between words. This has to be checked very carefully. Based on Martin's review, comment [5]. See Bugzilla objected
2 Sec. 3, serialization parameter 'encoding' Given that this is already required for the XML output method, we think it's highly desirable to make the requirement for support for UTF-8 and UTF-16 general (including text output). Based on Martin's review, comment [6]. See Bugzilla accepted
3 Sec. 3, 'encoding' Here or for each individual output method, something should be said about the BOM. As for the byte-order-mark parameter in sec. 3, you say "If the concept of a Byte Order Mark is not meaningful in connection with the value of the encoding parameter, the byte-order-mark parameter is ignored." We think in sec. 3 or for each output method you could elaborate "meaningful" to the following:
  • XML/XHTML: UTF-16: BOM required; UTF-8: may be used.
  • HTML/text: UTF-16: BOM recommended; UTF-8: may be used.
Based on Martin's review, comment [7]. See Bugzilla accepted
4 Sec. 3, 'encoding' The respective sections for the individual output methods (5.1.2, , 6.1.2, 7.4.2, 8.1.2) should say that for UTF-16, endianness is implementation-dependent (or implementation-defined). Based on Martin's review, comment [8]. See Bugzilla accepted
5 Sec. 3, 'include-content-type' Please explain in more detail in this section or in the sections for XHTML (6.1.13) / HTML (7.4.13) why this parameter is necessary. It seems that it may be better to always include a respective <meta> element in XHTML / HTML. Based on Martin's review, comment [11]. See Bugzilla announced
6 Sec. 4, point 2a You define URI-escaping in terms of XLINK. We propose to refer to section 3.1 of the IRI specification (RFC 3987) instead, because XLINK in that version lacks a normalization procedure to NFC which might be a necessary step for the escaping procedure, which in our sense is a mapping from IRI to URI. Btw, the working draft for XLINK 1.1 http://www.w3.org/TR/2005/WD-xlink11-20050428/#link-locators normatively references IRI. See Bugzilla -
7 Sec. 5.1.2, XML output method, encoding "When outputting a newline character in the instance of the data model, the serializer is free to represent it using any character sequence that will be normalized to a newline character by an XML parser, unless a specific mapping for the newline character is provided in a character map: see 9 Character Maps." This should probably say that for interoperability, it is better to avoid x85 and x2028. See sec. 2.11 of XML 1.1 for further information. Based on Martin's review, comment [17]. See Bugzilla rejected
8 Sec. 5.1.5, XML output method, omit-xml-declaration The interplay between omit-xml-declaration and the standalone parameter might disallow producing xml documents which are in another encoding than UTF-8 or UTF-16 and has no XML declaration. Nevertheless this should be possible, e.g. if xml is served over HTTP with a corresponding charset parameter. With XML 1.1, the xml declaration is mandatory anyway, no matter what the values of omit-xml-declaration and standalone are. Based on Martin's review, comment [18]. See Bugzilla rejected
9 Sec. 6.1.12 and Sec. 7.4.12, Note starting: "This escaping is deliberately confined to non-ASCII characters ...". There are certain ASCII characters that are not allowed in URIs, namely namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`". They should be escaped. Based on Martin's review, comment [32]. See Bugzilla rejected; confirmation from i18n-wg needed
10 Sec. 7.3 "When outputting a sequence of whitespace characters in the data model, within an element where whitespace is treated normally, (but not in elements such as pre and textarea) the html output method may represent it using any character sequence that will be treated as whitespace by an HTML user agent." We need to check whether this (which allows replacement of whitespace including linebreaks by whitespace not including linebreaks and vice-versa) is okay for Chinese, Japanese, Thai, ... (languages without spaces between words). This has to be checked extremely carefully. Based on Martin's review, comment [19]. See Bugzilla accepted
11 Sec. 8.1.13 The text should talk about "include-content-type" instead of "escape-uri-attributes". See Bugzilla -

Version: $Id: xq-xt-serialization-review.html,v 1.7 2005/05/25 03:18:24 fsasaki Exp $