This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The Xquery Test file Queries/XQuery/Expressions/Operators/CompExpr/GenComprsn/ GenCompEq/generalexpression12.xq Has what is apparently a malformed comment (: operand2 = empty sequence) ^ (:*******************************************************:) and the test parser applet and saxon both report a parse error on this file. I was going to raise a bug under the test suite but the grammar for comments is [154] Comment ::= "(:" (CommentContents | Comment)* ":)" [155] CommentContents ::= (Char+ - (Char* ':)' Char*)) note that [155] is any string of characacters that don't include an _end_ marker. As such it would appear that operand2 = empty sequence) (:******************************************************* matches CommentContents and so (: operand2 = empty sequence) (:*******************************************************:) is a valid (single) comment. It would appear that [155] is in error and should forbid a comment start rather than a comment end: [155] CommentContents ::= (Char+ - (Char* '(:' Char*)) ^^ which would then force a nested (: to be parsed as a nested Comment as allowed by [154]. (And the test file would then be in error and require (: operand2 = empty sequence) changing to (: operand2 = empty sequence:) David
Changing Component to XPath since this impacts both XQuery and XPath. /paulc
Good catch. But I think the new rule should continue to forbid comment-end delimiters, otherwise we risk recreating the mirror image of this problem. I think: [155] CommentContents ::= (Char+ - (Char* ('(:'|':)') Char*)) (although i am not sure why the first token on the RHS is Char+ instead of Char*. By the current rule, (::) is not a legal comment, which seems restrictive.
Yes, I was going to send a follow on saying to catch (: and :). However seing as most regex engines don't (I think) have this "subtraction" syntax, and people might want to find comments with regex tools it might be simpler to give an additive definition rather than a subtractive one. Basically CommentContents is a run of any character other than : or ( or : followed by not-( or ( followed by not-: That is: ([^:\(]|\([^:]|:[^\)* or in your EBNF syntax (((Char - ":") - "(") | ("(" (Char - ":")) | (":" (Char - ")")))* > (although i am not sure why the first token on the RHS is > Char+ instead of Char*. I noticed that, I suspect it's a hang over from the earlier draft's (:: pragma syntax. David (I assume I don't need to open a bugzilla entry on the test suite generalexpression12.xq)
(In reply to comment #3) > > Basically CommentContents is a run of > any character other than : or ( > or > : followed by not-( > or > ( followed by not-: You mean ": followed by not-)". (Which is what you say in the regex and EBNF.) > That is: > > ([^:\(]|\([^:]|:[^\)* > > or in your EBNF syntax > > (((Char - ":") - "(") | ("(" (Char - ":")) | (":" (Char - ")")))* Note that these exclude a CommentContents ending in "(" or ":", which is not excluded by the current EBNF or the EBNF proposed in Comment #2.
> Note that these exclude a CommentContents ending in "(" or ":", which is not > excluded by the current EBNF or the EBNF proposed in Comment #2. sorry, yes that would be a (fixable) bug in my expressions. Not sure whether it's worth fixing or to stick with the subtraction originally proposed?
Changed EBNF to: [85] CommentContents ::= (Char+ - (Char* ('(:' | ':)') Char*))