1307 2005-05-05 09:37:03 +0000 [XQuery] Line Endings 2005-09-29 10:45:41 +0000 1 1 1 Unclassified XPath / XQuery / XSLT XQuery 1.0 Last Call drafts PC Windows XP CLOSED FIXED grammar P2 normal --- 1 mike scott_boag public-qt-comments oldest_to_newest 3538 0 mike 2005-05-05 09:37:03 +0000 [XQuery] Line Endings There are two places in XQuery where line endings are normalized: (a) in the content of direct element constructors, and (b) in string literals. Within attribute content, the spec refers to the rules for attribute value normalization in XML 3.3.3. These start with the rule "All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way." This can be read as implying that XQuery also normalizes line endings in attribute content (which means that CRLF in attribute content is turned into a single space, not into two spaces). (Aside: The XML rules for attribute value normalization depend on the type of the attribute. I think it would be helpful if we specify that the algorithm is executed on the basis that the attribute type is CDATA. If it isn't, then schema validation will take care of it.) There are contexts in XQuery where line endings are *not* normalized. For example, they are not normalized in a CDATA section, a direct comment constructor, or a direct processing instruction constructor. Since the syntax of each of these constructs is explicitly designed to mimic XML, it seems odd that the handling of line endings here should differ from XML. Further, XQuery doesn't normalize line endings appearing in the middle of an ordinary expression (for example, between "declare" and "function". This doesn't matter unless XML 1.1-style line endings are used. Currently, if an XQuery processor decides to follow the XML 1.1 profile, then it allows and normalizes a NEL character appearing in element content or in a string literal; allows and doesn't normalize it in a CDATA section, comment, or PI; and disallows it in the middle of an expression. I can't see any good reason why XQuery doesn't do the same as XML, and specify that all line endings in the query are normalized according to the XML 1.0 or 1.1 rules irrespective of the syntactic context, before any lexical or syntactic analysis of the query starts. Michael Kay 3169 1 scott_boag 2005-05-17 19:53:09 +0000 I can't see any technical issue with the grammar with doing uniform line ending normalization, except the need for pre-processing of the VersionDecl in the case of XQuery, which has to be done in any event. One has to assume the encoding is known for XPath, that that isn't an issue. Since the normalization occurs essentially out-of-band to the syntax parsing process, I don't think there has to be any effect to the rest of the document. Of course, a real world parser would not do two passes... it's just cleaner to specify it this way. I suggest a new section immediately above the section on whitespace, where we pretty much do the same as the XML specifications. I'm not very happy with how the XML 1.0 vs. XML 1.1 wording is done in the first paragraph... this would be easier if we had a proper XML 1.1 named feature, or the like. Any suggestions on ways to better handle this would be much appreciated. ========= A.2.2 End-of-Line Handling The [XPath/XQuery] processor MUST behave as if it normalized all line breaks on input, before parsing. The normalization should be done according to the choice to support [XML 1.0], or [XML 1.1] lexical processing. A.2.2.1 XML 1.0 End-of-Line Handling For [XML 1.0] processing, all of the following MUST be translated to a single #xA character: 1. the two-character sequence #xD #xA 2. any #xD character that is not immediately followed by #xA. A.2.2.2 XML 1.1 End-of-Line Handling For [XML 1.1] processing, all of the following MUST be translated to a single #xA character: 1. the two-character sequence #xD #xA 2. the two-character sequence #xD #x85 3. the single character #x85 4. the single character #x2028 5. any #xD character that is not immediately followed by #xA or #x85. (XQuery-only)The characters #x85 and #x2028 cannot be reliably recognized and translated until the VersionDecl declaration (if present) has been read. =========== 2740 2 chamberl 2005-06-11 02:59:02 +0000 I have deleted references to line-ending normalization from XQuery Sections 3.1.1 (Literals) and 3.7.1.3 (Dir. Elem. Constructor--Content, Rule 1a). Scott, after you have inserted the new section on global end-of-line handling into Appendix A, you can close this bug report. --Don 5177 3 scott_boag 2005-07-22 20:47:40 +0000 This work has been completed.