This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5431 - Normal characters, character references
Summary: Normal characters, character references
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.0/1.1 both
Hardware: Macintosh All
: P4 minor
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: cluster: regex
Keywords: editorial, resolved
Depends on:
Blocks:
 
Reported: 2008-01-26 02:38 UTC by Dave Peterson
Modified: 2009-01-21 02:22 UTC (History)
0 users

See Also:


Attachments

Description Dave Peterson 2008-01-26 02:38:10 UTC
The paragraph following the Normal Character ("Char") production in the RE appendix says "Note that a ·normal character· can be represented either as itself, or with a character reference."

Two problems: 

1.  '-' and '^' are ·normal characters·, but cannot always represent themselves in an RE.

2.  A character reference is a string of characters; the productions and accompanying semantics defining REs do not allow such a string to be interpreted other than matching each character autonymously.  Character references are used in productions, not REs.
Comment 1 Michael Kay 2008-01-26 10:25:15 UTC
As is revealed by following the hyperlink, the character references it is referring to are those (such as #) used in XML, not those (such as #x5B) used in production rules.

It might be clearer to say something like: "Note: when regular expressions are written in an XML document, for example in the value attribute of the xs:pattern element, non-ASCII characters can be represented using XML entity or character references. For this reason, the regular expression syntax does not provide any way of representing characters using octal or hexadecimal character codes. The syntax defined here assumes that XML entity and character references have already been expanded."
Comment 2 Dave Peterson 2008-01-27 02:40:23 UTC
(In reply to comment #1)
> As is revealed by following the hyperlink, the character references it is
> referring to are those (such as #) used in XML, not those (such as #x5B)
> used in production rules.

You got me.  :-(  I'm embarrassed.  See following.

> It might be clearer to say something like: "Note: when regular expressions are
> written in an XML document, for example in the value attribute of the
> xs:pattern element, non-ASCII characters can be represented using XML entity or
> character references. For this reason, the regular expression syntax does not
> provide any way of representing characters using octal or hexadecimal character
> codes. The syntax defined here assumes that XML entity and character references
> have already been expanded."

Good start.  We also will need to comment that people using the mechanism in other situations (such as born-binary derivations in non-XML environments) will have to provide other solutions.
Comment 3 C. M. Sperberg-McQueen 2009-01-20 23:58:05 UTC
At its telcon of 19 December 2008, the XML Schema WG accepted a proposal
presented in 

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.dp081203.html

with amendments, as a resolution of this issue.  The changes have now
been integrated into the status-quo document, so I'm marking the issue 
resolved.

DaveP, you know what to do next.