This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24779 - Apparent improper escaping of characters, such as \#x5B in regex pattern docs.
Summary: Apparent improper escaping of characters, such as \#x5B in regex pattern docs.
Status: NEW
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.0/1.1 both
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-23 13:21 UTC by Tom
Modified: 2014-02-24 09:59 UTC (History)
2 users (show)

See Also:


Attachments

Description Tom 2014-02-23 13:21:25 UTC
In this doc...
    http://www.w3.org/TR/xmlschema11-2/

I notice the following types of escaping which seem incorrect...

Single Unescaped Character
[82]    SingleCharNoEsc    ::=    [^\#x5B#x5D]  /*  N.B.:  #x5B = '[', #x5D = ']'  */ 

Specifically, notice \#x5B... I believe the intent here is to express simply #x5B, where the backslash \ makes the documentation incorrect.

Note, I noticed the same issue in the 1.1 docs as well. It's easy to find these, just search for '\#'. Thanks.
Comment 1 Michael Kay 2014-02-24 00:03:27 UTC
I think the rule is correct; it means that SingleCharNoEsc is any single character except backslash, left square bracket, or right square bracket. (But the prose immediately following says that SingleCharEsc is any single character except left square bracket ot right square bracket, which seem inconsistent and wrong).
Comment 2 Tom 2014-02-24 09:54:56 UTC
You are correct. I was seeing the slash '\' character and expecting an escaped slash '\\' if slash had been intended. Instead, I saw an escaped '#' character, as in '\#' ... my bad for this is EBNF outlining how to construct a regex, not show examples of them. 

Flipping back-and-forth between considering ENBF and actual examples is a skill in itself for these particular sections. :) 

I opened this 24779 up while my head was wrapped around the issue of bug 24780 which I subsequently opened (https://www.w3.org/Bugs/Public/show_bug.cgi?id=24780). On hindsight of both of these issues (non-issues), my only suggestion is that a "Note:" be placed at the introduction to the regular expression section which clearly outlines this difference. I would even recommend showing a brief example which itself contrasts the difference between a regex definition and a related actual example. 

The existing "Note:" of this kind occurs way out of context, at the start of the document, and it's not completely clear what the concern is at that time. Additionally, somebody wanting to jump to the regex section may benefit from a "Note:" they'd otherwise miss. When I had read the currently placed "Note:" I missed the importance by the time I got the regex section because it was easy to naturally fall into the trap that some of the ENBF were actual regex examples being used to show sets of characters. Completely my fault in a strict sense, but this seems like a case where a spec may offer decent payoff for the tweaking here. I'm not certain what everyone has to deal with in deciding these sorts of things so no worries if this doesn't meet any bars... all just a suggestion at this point.
Comment 3 Tom 2014-02-24 09:59:30 UTC
(In reply to Michael Kay from comment #1)
> I think the rule is correct; it means that SingleCharNoEsc is any single
> character except backslash, left square bracket, or right square bracket.
> (But the prose immediately following says that SingleCharEsc is any single
> character except left square bracket ot right square bracket, which seem
> inconsistent and wrong).

Sorry, I haven't used bugzilla in years and had posted this as a new comment, not a reply. This is a reply so please excuse the duplicate entry. 

...

You are correct. I was seeing the slash '\' character and expecting an escaped slash '\\' if slash had been intended. Instead, I saw an escaped '#' character, as in '\#' ... my bad for this is EBNF outlining how to construct a regex, not show examples of them. 

Flipping back-and-forth between considering ENBF and actual examples is a skill in itself for these particular sections. :) 

I opened this 24779 up while my head was wrapped around the issue of bug 24780 which I subsequently opened (https://www.w3.org/Bugs/Public/show_bug.cgi?id=24780). On hindsight of both of these issues (non-issues), my only suggestion is that a "Note:" be placed at the introduction to the regular expression section which clearly outlines this difference. I would even recommend showing a brief example which itself contrasts the difference between a regex definition and a related actual example. 

The existing "Note:" of this kind occurs way out of context, at the start of the document, and it's not completely clear what the concern is at that time. Additionally, somebody wanting to jump to the regex section may benefit from a "Note:" they'd otherwise miss. When I had read the currently placed "Note:" I missed the importance by the time I got the regex section because it was easy to naturally fall into the trap that some of the ENBF were actual regex examples being used to show sets of characters. Completely my fault in a strict sense, but this seems like a case where a spec may offer decent payoff for the tweaking here. I'm not certain what everyone has to deal with in deciding these sorts of things so no worries if this doesn't meet any bars... all just a suggestion at this point.