This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1307 - [XQuery] Line Endings
Summary: [XQuery] Line Endings
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQuery 1.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Scott Boag
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard: grammar
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-05 09:37 UTC by Michael Kay
Modified: 2005-09-29 10:45 UTC (History)
0 users

See Also:


Attachments

Description Michael Kay 2005-05-05 09:37:03 UTC
[XQuery] Line Endings

There are two places in XQuery where line endings are normalized: (a) in the
content of direct element constructors, and (b) in string literals.

Within attribute content, the spec refers to the rules for attribute value
normalization in XML 3.3.3. These start with the rule "All line breaks must have
been normalized on input to #xA as described in 2.11 End-of-Line Handling, so
the rest of this algorithm operates on text normalized in this way." This can be
read as implying that XQuery also normalizes line endings in attribute content
(which means that CRLF in attribute content is turned into a single space, not
into two spaces).

(Aside: The XML rules for attribute value normalization depend on the type of
the attribute. I think it would be helpful if we specify that the algorithm is
executed on the basis that the attribute type is CDATA. If it isn't, then schema
validation will take care of it.)

There are contexts in XQuery where line endings are *not* normalized. For
example, they are not normalized in a CDATA section, a direct comment
constructor, or a direct processing instruction constructor. Since the syntax of
each of these constructs is explicitly designed to mimic XML, it seems odd that
the handling of line endings here should differ from XML.

Further, XQuery doesn't normalize line endings appearing in the middle of an
ordinary expression (for example, between "declare" and "function". This doesn't
matter unless XML 1.1-style line endings are used. Currently, if an XQuery
processor decides to follow the XML 1.1 profile, then it allows and normalizes a
NEL character appearing in element content or in a string literal; allows and
doesn't normalize it in a CDATA section, comment, or PI; and disallows it in the
middle of an expression.

I can't see any good reason why XQuery doesn't do the same as XML, and specify
that all line endings in the query are normalized according to the XML 1.0 or
1.1 rules irrespective of the syntactic context, before any lexical or syntactic
analysis of the query starts.

Michael Kay
Comment 1 Scott Boag 2005-05-17 19:53:09 UTC
I can't see any technical issue with the grammar with doing uniform line ending
normalization, except the need for pre-processing of the VersionDecl in the case
of XQuery, which has to be done in any event.  One has to assume the encoding is
known for XPath, that that isn't an issue.

Since the normalization occurs essentially out-of-band to the syntax parsing
process, I don't think there has to be any effect to the rest of the document. 
Of course, a real world parser would not do two passes... it's just cleaner to
specify it this way.

I suggest a new section immediately above the section on whitespace, where we
pretty much do the same as the XML specifications.  I'm not very happy with how
the XML 1.0 vs. XML 1.1 wording is done in the first paragraph... this would be
easier if we had a proper XML 1.1 named feature, or the like.  Any suggestions
on ways to better handle this would be much appreciated.

=========
A.2.2 End-of-Line Handling

The [XPath/XQuery] processor MUST behave as if it normalized all line breaks on
input, before parsing. The normalization should be done according to the choice
to support [XML 1.0], or [XML 1.1] lexical processing.

A.2.2.1 XML 1.0 End-of-Line Handling

For [XML 1.0] processing, all of the following MUST be translated to a single
#xA character:

   1.  the two-character sequence #xD #xA
   2.   any #xD character that is not immediately followed by #xA.

A.2.2.2 XML 1.1 End-of-Line Handling

For [XML 1.1] processing, all of the following MUST be translated to a single
#xA character:

   1.  the two-character sequence #xD #xA
   2.  the two-character sequence #xD #x85
   3.  the single character #x85
   4.   the single character #x2028
   5.   any #xD character that is not immediately followed by #xA or #x85.

(XQuery-only)The characters #x85 and #x2028 cannot be reliably recognized and
translated until the VersionDecl declaration (if present) has been read. 
===========
Comment 2 Don Chamberlin 2005-06-11 02:59:02 UTC
I have deleted references to line-ending normalization from XQuery Sections 
3.1.1 (Literals) and 3.7.1.3 (Dir. Elem. Constructor--Content, Rule 1a). Scott, 
after you have inserted the new section on global end-of-line handling into 
Appendix A, you can close this bug report.
--Don
Comment 3 Scott Boag 2005-07-22 20:47:40 UTC
This work has been completed.