6795 – [FO] ^ should not be allowed by XmlCharIncDash

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6795 - [FO] ^ should not be allowed by XmlCharIncDash

Summary: [FO] ^ should not be allowed by XmlCharIncDash

Status:	REOPENED

Alias:	None

Product:	XML Schema
Classification:	Unclassified
Component:	Datatypes: XSD Part 2 (show other bugs)
Version:	1.0 only
Hardware:	All All

Importance:	P2 normal
Target Milestone:	REC
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:	11248
Keywords:	decided

Depends on:
Blocks:	5
	Show dependency tree / graph

Reported:	2009-04-09 07:16 UTC by Murata
Modified:	2018-10-09 00:23 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description Murata 2009-04-09 07:16:36 UTC

^ should not be allowed by XmlCharIncDash

Because it is now allowed, "[^a]" can be interpreted as a posCharGroup.

charClassExpr -> '[' charGroup ']' 
  ->  '[' posCharGroup ']' 
  ->  '[' charRange charRange ']' 
  ->  '[' XmlCharIncDash XmlCharIncDash ']' 
  ->  '[' '^' 'a' ']'

Comment 1 Michael Kay 2009-04-09 07:28:33 UTC

Personal response:

I think this is covered by the prose rule in XML Schema Part 2:

"The ^ character is only valid at the beginning of a ·positive character group· if it is part of a ·negative character group·"

(and the subsequent note which points out that this rule is needed to disambiguate the grammar).

The revision to this text in XSD 1.1 Part 2 describes the rule more clearly:

"If the first character in a charGroup is '^', this is taken as indicating that the charGroup starts with a negCharGroup. A posCharGroup can itself start with '^' but only when it appears within a negCharGroup, that is, when the '^' is preceded by another '^'. "

Come to think of it, I'm not sure why this needs to be written in English - it feels like it could have been achieved in the BNF.

Comment 2 Murata 2009-04-09 08:14:03 UTC

(In reply to comment #1)
> Personal response:
> 
> I think this is covered by the prose rule in XML Schema Part 2:
> 
> "The ^ character is only valid at the beginning of a ·positive character group·
> if it is part of a ·negative character group·"

I had no ideas about this sentence.  But the rewrite in W3C XML Schema 1.1 
makes much more sense.

Comment 3 Michael Kay 2009-04-14 15:38:55 UTC

The joint WGs agreed today to resolve this as Invalid. If this resolution is acceptable, Murata-san, I would be grateful if you could mark it as CLOSED.

Michael Kay

Comment 4 Murata 2009-04-14 15:48:55 UTC

(In reply to comment #3)
> The joint WGs agreed today to resolve this as Invalid. If this resolution is
> acceptable, Murata-san, I would be grateful if you could mark it as CLOSED.

Well, although I agree that the prose in W3C XML Schema 1.0 is correct, 
I do not think that it is very understandable.  It may be succint but is 
underspecified for non-native speakers.  Is it possible to incorporate 
that part of 1.1 into 1.0?

Comment 5 Michael Kay 2009-04-14 16:46:16 UTC

This is now addressing a weakness in XSD 1.0 Part 2, rather than in QT Functions and Operators. We could transfer the bug there if you like. However, I think that with the limited resources available to the XML Schema WG, the chances of this getting fixed in an erratum are small. The Schema WG made a conscious decision to put 1.0 bugs on hold until 1.1 is done, and with every year that passes, the value of "improving" the 1.0 spec becomes less. And there are many things more pressing than this problem - the 1.0 spec might not state the answer very elegantly, but I don't think it's ambiguous.

Comment 6 Murata 2009-04-14 17:39:57 UTC

(In reply to comment #5)

I do not trust extensions from those who do not care defects of 
old versions.

> the 1.0 spec might not state
> the answer very elegantly, but I don't think it's ambiguous.

IMHO, it's understandable only when you understand the intention already.

Comment 7 Michael Kay 2009-04-15 09:17:11 UTC

I will reopen this bug and reallocate it to XML Schema 1.0 (assuming the system allows me to do that).

Comment 8 David Ezell 2009-06-12 15:52:48 UTC

(from the telcon minutes of 2009-06-12)

Mike Kay summarized the bug and suggested that the best solution would be to change not just the passage objected to by Murata but all of 1.1's regex appendix into 1.0

This would resolve both this issue and some other outstanding bugs and problems.

Some retrofitting will be required.

We will NOT change the Unicode baseline for XSD 1.0 -- it will remain Unicode 3.1

We will continue to allow support for new versions of Unicode (1.0 status quo already does)

We will re-synch the table of character name to Unicode 3.1

We will rephrase the paragraphs about changes to Unicode since 3.1, to tell the story from the 1.0 pov, not the 1.1 pov. (Editorial)

We will scour the text to remove references to the implementaiton option of supporting XML 1.1.

RESOLVED:  to resolve bug 6795 by adopting Michael Kay's change proposal (take the whole regex appendix into 1.0, modulo minor changes outlined above).