This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6908 - Definition of pattern facet
Summary: Definition of pattern facet
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows NT
: P2 minor
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2009-05-14 10:44 UTC by Michael Kay
Modified: 2009-07-17 17:37 UTC (History)
3 users (show)

See Also:


Attachments

Description Michael Kay 2009-05-14 10:44:22 UTC
The definition of the pattern facet in 4.3.4 says: "[Definition:]   pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to ·literals· which match each member of a set of patterns."

This seems to invite the incorrect reading that if there is more than one xs:pattern element in a restriction, the supplied literal must match each of them; whereas the detailed specification makes clear that it need only match one of them.

Suggestion (which also avoids the apparent cyclicity in the current definition):

"[Definition:]   pattern is a constraint on the ·value space· of a datatype which is expressed by restricting the ·lexical space· to ·literals· that match one or more regular expressions."
Comment 1 Dave Peterson 2009-05-14 15:48:12 UTC
(In reply to comment #0)
> The definition of the pattern facet in 4.3.4 says: "[Definition:]   pattern is
> a constraint on the ·value space· of a datatype which is achieved by
> constraining the ·lexical space· to ·literals· which match each member of a set
> of patterns."
> 
> This seems to invite the incorrect reading that if there is more than one
> xs:pattern element in a restriction, the supplied literal must match each of
> them; whereas the detailed specification makes clear that it need only match
> one of them.

I assume you mean "more than one <xs:pattern> element in a <xs:restriction> element".  But only one RE is added to pattern component (i.e., its {value}, of course) by that restriction--the "or" of the REs in the <xs:pattern> elements.  (Yes, I know you understand that--but it's important to be aware of in what follows.)

> Suggestion (which also avoids the apparent cyclicity in the current
> definition):
> 
> "[Definition:]   pattern is a constraint on the ·value space· of a datatype
> which is expressed by restricting the ·lexical space· to ·literals· that match
> one or more regular expressions."

Take this literally.  "pattern", being defined, is a facet.  (It can't be an <xs;pattern> element--that always has only one RE in it.)  But the REs you are talking about are the content of one or more <xs:pattern> elements (presuming we're not in a born-binary situation).  It seems to me you're mixing apples and oranges jumping from the facet component to the elements in a schema document that select that particular facet component.

Someone who doesn't understand what a pattern facet is and when reading the definition gets confused presumably already knows that the pattern facet arises from <xs:facet> elements, but doesn't know the semantics of that element type and has incorrectly assumed that each <xs:facet> element gives rise to a separate RE in the facet's {value}. 

Perhaps we could instead add a sentence, within or following the definition, saying that each derivation step adds at most one RE to the set.  That should make people realize, if they've made that wrong assumption, that they'd better find out what the real relation between the <xs:facet> elements and the RE in the {value} really is.

Your definition effectively says the pattern facet constrains by an OR of REs.  The prose then goes on to say, down in the details, that it constrains by an AND of REs.  I think this would cause a lot more confusion than the current wording.
Comment 2 Michael Kay 2009-05-14 16:05:21 UTC
I was trying to come up with a wording for the definition that did not give the impression it was a complete statement of the semantics, that avoided the use of "each" which is misleading to a casual reader, and that avoids using the word "pattern" in the definition of "pattern", which demands of the reader a rather deep knowledge of our esoteric typographical conventions.
Comment 3 Dave Peterson 2009-05-14 16:14:50 UTC
(In reply to comment #2)
> I was trying to come up with a wording for the definition that did not give the
> impression it was a complete statement of the semantics, that avoided the use
> of "each" which is misleading to a casual reader, and that avoids using the
> word "pattern" in the definition of "pattern", which demands of the reader a
> rather deep knowledge of our esoteric typographical conventions.

Well, just changing the final "patterns" to "regular expressions" gets rid of the circularity.

How is "each" misleading?  Would "all members" help?

Comment 4 Dave Peterson 2009-06-08 16:11:10 UTC
(In reply to comment #3)
> (In reply to comment #2)

> Well, just changing the final "patterns" to "regular expressions" gets rid of
> the circularity.
> 
> How is "each" misleading?  Would "all members" help?

Let's make this a formal proposal:

Change

[Definition:]   pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to ·literals· which match each member of a set of patterns.

to:

[Definition:]   pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to ·literals· which match all members of a set of regular expressions.

Comment 5 Michael Kay 2009-06-08 16:35:18 UTC
As I said in comment #2, the problem with "each" or "all members" is that the casual reader, who is not necessarily versed in the subtleties of the spec, may easily come to the incorrect conclusion that if a schema document defines a simple type with multiple xs:pattern elements, then the value has to satisfy them all. I wanted to use wording that would lead them to read further before leaping to this conclusion.

Try to put yourself in the position of a reader of the spec. They see a schema document with more than one xs:pattern element; they want to know what it means; they turn to section 4.3.4 which describes the pattern facet; and the first thing they read is that the value must match each pattern in a set of patterns (or regular expressions). Are they likely to read any further?
Comment 6 Pete Cordell 2009-06-08 17:02:31 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)

> > How is "each" misleading?  Would "all members" help?

I think "each" is supposed to map to "at least one of the members", e.g.

> [Definition:]   pattern is a constraint on the ·value space· of a datatype
> which is achieved by constraining the ·lexical space· to ·literals· which match
> all members of a set of regular expressions.

should be:

[Definition:]   pattern is a constraint on the ·value space· of a datatype which is achieved by constraining the ·lexical space· to ·literals· which match at least one member of a set of regular expressions.
Comment 7 Dave Peterson 2009-06-08 17:45:08 UTC
(In reply to comment #6)

> I think "each" is supposed to map to "at least one of the members", e.g.

> [Definition:]   pattern is a constraint on the ·value space· of a datatype
> which is achieved by constraining the ·lexical space· to ·literals· which match
> at least one member of a set of regular expressions.

No.  If there are more than one <pattern> elements (each having an RE as content) in the XML defining a single restriction step, all of those REs will be "or"ed together to produce a single RE which will be placed in the set of REs which is the value of the pattern component.  Then if a subsequent restriction also uses a pattern, the single RE for that step is also placed in the set of REs for that subsequent restriction's pattern component.

The result of multiple pattern-based derivations is a set of REs, one for each derivation step.  The final datatype's lexical representations must satisfy the conditions added at each step, i.e., must satisfy all of the REs in the set.

(In reply to comment #5)
> As I said in comment #2, the problem with "each" or "all members" is that the
> casual reader, who is not necessarily versed in the subtleties of the spec, may
> easily come to the incorrect conclusion that if a schema document defines a
> simple type with multiple xs:pattern elements, then the value has to satisfy
> them all. I wanted to use wording that would lead them to read further before
> leaping to this conclusion.
> 
> Try to put yourself in the position of a reader of the spec. They see a schema
> document with more than one xs:pattern element; they want to know what it
> means; they turn to section 4.3.4 which describes the pattern facet; and the
> first thing they read is that the value must match each pattern in a set of
> patterns (or regular expressions). Are they likely to read any further?

If a reader is looking at an XML derivation representation having one or more <pattern> elements looks at the definition of the component and doesn't have sense enough to look at "XML Representation Summary: pattern Element Information Item", I'm not sure what we can do to help.

You *can't* just say "one or more" in the definition, and then later on say "we really meant all".  That will invite a whole lot more confusion.

Would a warning in the first paragraph help?

"Note:  An XML <restriction> containing more than one <pattern> elements gives rise to a single regular expression in the set; this regular expression is an "or" of the regular expressions that are the content of the <pattern> elements."
Comment 8 David Ezell 2009-06-26 15:41:08 UTC
Resolution

Reword in terms of regular expressions:
[Definition:]  pattern is a constraint on the ·value space· of a datatype
which is achieved by constraining the ·lexical space· to ·literals· which match
each member of a set of regular expressions.

And add a note in the first paragraph:
"Note:  An XML <restriction> containing more than one <pattern> element gives
rise to a single regular expression in the set; this regular expression is an
"or" of the regular expressions that are the content of the <pattern>
elements."
Comment 9 C. M. Sperberg-McQueen 2009-07-15 21:09:06 UTC
The change described in comment 9 has now been integrated into the status-quo documents.  I am accordingly marking this issue resolved.

Michael, as the originator, if you would signal your acceptance or rejection of this resolution by closing or reopening the bug in the usual manner, it would be helpful.  If the working group does not hear otherwise from you in the next two weeks, we will assume that you are content with the disposition of this comment.