1882 – comment syntax

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1882 - comment syntax

Summary: comment syntax

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XPath 2.0 (show other bugs)
Version:	Last Call drafts
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Scott Boag
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-08-23 09:55 UTC by David Carlisle
Modified:	2005-09-29 12:55 UTC (History)
CC List:	0 users

See Also:

Attachments

Description David Carlisle 2005-08-23 09:55:57 UTC

The Xquery Test file

 Queries/XQuery/Expressions/Operators/CompExpr/GenComprsn/
                           GenCompEq/generalexpression12.xq

Has what is apparently a malformed comment

(:  operand2 = empty sequence)
                             ^
(:*******************************************************:)

and the test parser applet and saxon both report a parse error on this file.
I was going to raise a bug under the test suite but the grammar for comments
is
[154] Comment         ::= "(:" (CommentContents | Comment)* ":)"
[155] CommentContents ::= (Char+ - (Char* ':)' Char*))

note that [155] is any string of characacters that don't include an
_end_ marker. As such it would appear that

  operand2 = empty sequence)
(:*******************************************************

matches CommentContents  and so 

(:  operand2 = empty sequence)
(:*******************************************************:)

is a valid (single) comment. It would appear that [155] is in error and should
forbid a comment start rather than a comment end:

[155] CommentContents ::= (Char+ - (Char* '(:' Char*))
                                            ^^
          
which would then force a nested (: to be parsed as a nested Comment as allowed
by [154]. (And the test file would then be in error and require 

(:  operand2 = empty sequence)

changing to

(:  operand2 = empty sequence:)

David

Comment 1 Paul Cotton 2005-08-29 16:39:34 UTC

Changing Component to XPath since this impacts both XQuery and XPath.

/paulc

Comment 2 C. M. Sperberg-McQueen 2005-08-30 16:33:30 UTC

Good catch.  But I think the new rule should continue to
forbid comment-end delimiters, otherwise we risk recreating
the mirror image of this problem.

I think:

[155] CommentContents ::= (Char+ - (Char* ('(:'|':)') Char*))

(although i am not sure why the first token on the RHS is
Char+ instead of Char*.  By the current rule, (::) is not
a legal comment, which seems restrictive.

Comment 3 David Carlisle 2005-08-31 09:58:42 UTC

Yes, I was going to send a follow on saying to catch (: and :).

However seing as most regex engines don't (I think) have this "subtraction"
syntax, and people might want to find comments with regex tools it might be
simpler to give an additive definition rather than a subtractive one.

Basically CommentContents is a run of 
any character other than : or (
or 
: followed by not-(
or
( followed by not-:

That is:

([^:\(]|\([^:]|:[^\)*

or in your EBNF syntax

(((Char - ":") - "(") | ("(" (Char - ":")) | (":" (Char - ")")))*

> (although i am not sure why the first token on the RHS is
> Char+ instead of Char*.

I noticed that, I suspect it's a hang over from the earlier draft's (:: pragma
syntax.

David

(I assume I don't need to open a bugzilla entry on the  test suite
generalexpression12.xq)

Comment 4 Michael Dyck 2005-08-31 21:58:24 UTC

(In reply to comment #3)
>
> Basically CommentContents is a run of 
> any character other than : or (
> or 
> : followed by not-(
> or
> ( followed by not-:

You mean ": followed by not-)". (Which is what you say in the regex and EBNF.)

> That is:
> 
> ([^:\(]|\([^:]|:[^\)*
> 
> or in your EBNF syntax
> 
> (((Char - ":") - "(") | ("(" (Char - ":")) | (":" (Char - ")")))*

Note that these exclude a CommentContents ending in "(" or ":", which is not
excluded by the current EBNF or the EBNF proposed in Comment #2.

Comment 5 David Carlisle 2005-08-31 22:08:32 UTC

> Note that these exclude a CommentContents ending in "(" or ":", which is not
> excluded by the current EBNF or the EBNF proposed in Comment #2.

sorry, yes that would be a (fixable) bug in my expressions.
Not sure whether it's worth fixing or to stick with the subtraction originally
proposed?

Comment 6 Scott Boag 2005-09-27 16:43:03 UTC

Changed EBNF to:
[85] CommentContents ::= (Char+ - (Char* ('(:' | ':)') Char*))