This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29501 - [xp31] syntactic problems around the colon of a MapConstructorEntry
Summary: [xp31] syntactic problems around the colon of a MapConstructorEntry
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XPath 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Jonathan Robie
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-23 16:34 UTC by Michael Kay
Modified: 2016-04-25 21:34 UTC (History)
3 users (show)

See Also:


Attachments

Description Michael Kay 2016-02-23 16:34:12 UTC
The list of non-terminal symbols in A.2.2 does not include colon, which is present in the syntax of MapConstructor.

It would seem to make sense to make it a delimiting terminal symbol, and to treat it like "." and "-" as an exception case (see last para of the section) where a symbol separator is needed if the character follows an NCName. Whether it also needs a symbol separator if it follows a prefixed QName is an open question.

These choices would invalidate a number of test cases recently added to the MapConstructor test set: for example MapConstructor-029 uses map{*:b:b} where not only the parser but also the human reader can otherwise be easily misled into an incorrect reading.
Comment 1 Josh Spiegel 2016-02-23 16:44:56 UTC
I agree that a constraint should be added to require a separator when ":" follows NCName. 

It seems ":" is already a delimiting symbol - it is identified in the list as "(colon)".  

However, I don't see why it should require a separator if ":" follows a QName.  If we don't do this, then MapConstructor-029 should still pass.  i.e.

   map{*:b:b}

Is equivalent to:

   map{(*:b) : b}
Comment 2 Michael Kay 2016-02-23 16:48:48 UTC
Benito pointed out that I didn't read carefully enough. The list does include ":" (written out as (colon), for some reason).

The other comment, about treating it like "." and "-", remains.

We do have a note in 3.11.1.1

Note:

In some circumstances, it is necessary to include whitespace before or after the colon to ensure that this grammar is correctly parsed; this arises for example when the MapKeyExpr ends with a name and the MapValueExpr starts with a name.

But it's a Note, it's very unspecific about what the exact rules are, and it's in the wrong place.
Comment 3 Michael Kay 2016-02-23 17:00:21 UTC
MapConstructor-029 is actually quite interesting, because the wildcard *:b is not one terminal symbol, but three (even though whitespace is not allowed). If it were one terminal symbol, then it would be chosen in preference to "*" by the "longest token" rule. But it isn't, so one could legitimately argue:

Given the input *:b:b

(a) the first token is unambiguously "*"

(b) the second token is unambiguously ":"

(c) the third token can be either the NCName "b" or the QName "b:b". Both are "consistent with the EBNF" in the sense that there is a grammatical production that accepts this token as a continuation - a Wildcard accepts the NCName, while a MapConstructorEntry accepts the QName. The QName is longer, so this should be chosen, meaning that the construct parses as map{*: (b:b)}
Comment 4 Josh Spiegel 2016-02-23 17:47:56 UTC
Yes, it seems that simply requiring a separator when colon follows NCName also breaks this wildcard.  

  foo:*
Comment 5 Josh Spiegel 2016-03-01 19:36:56 UTC
Setting aside "how" for a moment, I would like the specification to clarify the meaning of these queries:

(1)  map{a:b:c}
(2)  map{*:b:c}
(3)  map{a:*:c}

The W3C applets currently identify these key expressions:

(1)  a:b
(2)  *:b
(3)  a:*

My implementation gives the same answers.
Comment 6 Michael Kay 2016-03-01 21:21:33 UTC
I agree that if these three expressions are allowed, then they should have the meanings described. I was very surprised by the discovery that the "longest token" rule currently leads to *:a:b parsing as * : a:b

But I think my preference is probably to impose restrictions requiring the use of whitespace - I think that reduces the risk of users writing something they didn't intend. Again, I'm not sure exactly what the rule should be. Part of it is probably that (like "-" and ".") ":" as a token is not allowed immediately after an NCName (including an NCName that is part of a QName). That leaves questions about ":" after "*".

We might find that the grammar works better if we define wildcards using composite tokens "*:" and ":*", or even if we make wildcards (like QNames) into single tokens.
Comment 7 Josh Spiegel 2016-03-01 21:55:34 UTC
> We might find that the grammar works better if we define wildcards using composite tokens "*:" and ":*", 

I like this.  It would be a simple fix and have a small editorial impact.

> or even if we make wildcards (like QNames) into single tokens.

My implementation arrives at the answers in comment 5 because it already does this.  I don't this distinction mattered until map constructors were added (notice, ws:explicit is in effect for Wildcard).
Comment 8 Andrew Coleman 2016-04-22 08:31:08 UTC
At the meeting on 2016-04-19, the WG discussed the options presented in https://lists.w3.org/Archives/Public/public-xsl-query/2016Mar/0035.html (action A-635-03).

It was decided to adopt option 2b.  The change to appendix A, the applet and the test cases will be tracked by action A640-02