This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12657 - Regex for durationLexicalRep in 3.3.7.2 Lexical Mapping
Summary: Regex for durationLexicalRep in 3.3.7.2 Lexical Mapping
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: David Ezell
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: decided
Depends on:
Blocks:
 
Reported: 2011-05-14 15:16 UTC by saasha
Modified: 2011-07-31 21:42 UTC (History)
2 users (show)

See Also:


Attachments
HTML fragment of XSD 1.1 datatypes spec, with diff markup (76.60 KB, text/html)
2011-06-03 01:00 UTC, C. M. Sperberg-McQueen
Details

Description saasha 2011-05-14 15:16:37 UTC
Hello!

According to
http://www.w3.org/TR/xmlschema11-2/#nt-durationRep the regex should "satisfy all of the following three" requirements:

> the fields occur in the proper order.

> at least one field occurs.

> 'T' is not the final character

The regex given thereafter (on six lines) does not seem to satisfy the second requirement. It seems that five question marks need to be removed from the regex. In other words, replacing

|([0-9]+D)?(

by

|([0-9]+D)(

and the four occurrences of

|([0-9]+M)?(

by

|([0-9]+M)(

seems to solve the problem.

Regards!

Saaha,
Comment 1 Dave Peterson 2011-05-14 18:55:58 UTC
(In reply to comment #0)

> > at least one field occurs.

> The regex given thereafter (on six lines) does not seem to satisfy the second
> requirement. It seems that five question marks need to be removed from the
> regex. 

You have indeed found an oops! in the published regex, but the fix is not as drastic as you suggest.

Each lin in the regex is intended to make one of the fields required, the first field in the regex.  It presumes all preceding fields are missing (or a preceding line would be matched), and it makes all following fields optional.  Thus, one of the six lines should match starting with the first actually present field.  Each first-in-the-line field of the regex should be required, i.e., not followed by a question mark.  There are two lines in which a spurious question mark on the first field is present and must be removed:  the line beginning '|([0-9]+D)?' should have that question mark removed; similarly for the line beginning '|([0-9]+M)?'.  @%#%& cut-and-paste errors.

Sigh.  You can't imagine how many times that regex has been checked.  :-(
Comment 2 David Ezell 2011-05-20 15:23:49 UTC
RESOLVED: resolve the bug as stated in the bug report.
Comment 3 Dave Peterson 2011-05-20 15:41:42 UTC
(In reply to comment #2)
> RESOLVED: resolve the bug as stated in the bug report.

Specifically: as stated in the original bug description, not just as in comment 1.  Comment 1 is correct that the two fixes are the only ones needed to satisfy "at least one field occurs", but in fact the other spurious question marks must also be removed to satisfy "'T' is not the final character".
Comment 4 C. M. Sperberg-McQueen 2011-06-03 01:00:27 UTC
Created attachment 993 [details]
HTML fragment of XSD 1.1 datatypes spec, with diff markup
Comment 5 C. M. Sperberg-McQueen 2011-06-03 01:11:33 UTC
A proposal to resolve this issue is on the server at

  http://www.w3.org/XML/Group/2004/06/xmlschema-2/datatypes.b12657.html
  (member-only link)

and also in an attachment added to this bug report.

The following details are given for the benefit of those who wish to check the correctness of the bug fix; others may skip them.

Three changes have been made in the source XML that generates the specification.

(1) The regex has been reformatted to have shorter lines to make it easier to identify the overall structure; at the same time, the optional hour/minute/second portion of durations with years, months, or days fields has been factored out so it appears only once.

(2) The bogus '?' following an initial D field has been deleted. 

(3) The bogus '?' following an initial M field in the hour/minute/second portion of the expression has been deleted.

To make it slightly easier to check the changes, I give them here in source form:

(1) and (2) are visible here; the first 'eg' element is the status-quo text, and the second is the proposed replacement.

<eg diff="del" dg="b12657">
-?P(&YY;&MM;?&DD;?&oHMS0;
   |&MM;&DD;?&oHMS0;
   |&DD;?&oHMS0;
   |T(&HH;&MM;?&SS;?
     |&MM;?&SS;?
     |&SS;))</eg>
<eg diff="add" dg="b12657">
-?P( ( ( &YY;&MM;?&DD;?
       | &MM;&DD;?
       | &DD;
       )
       &oHMS;
    )
  | (T ( &HH;&MM;?&SS;?
       | &MM;&SS;?
       | &SS;
       )
    )
  )
</eg>

(3) takes the form of correcting the definition of the 'oHMS' entity.  (At the same time, the old declaration has been renamed oHMS0, and existing reference to &oHMS; have been changed to references to &oHMS0; because otherwise the old text would be silently corrected in several locations.)

<!--* oHMS:  optional time (hour minute seconds) segment *-->
<!ENTITY oHMS0 '(T(&HH;&MM;?&SS;?|&MM;?&SS;?|&SS;))?'>
<!ENTITY oHMS '(T ( &HH;&MM;?&SS;?
          | &MM;&SS;?
          | &SS;
          )
       )?'>

(If you're still reading, the editors thank you for your help in verifying the correctness of the change.)
Comment 6 David Ezell 2011-06-03 15:43:40 UTC
RESOLVED: adopt the wording proposal as presented.
Comment 7 David Ezell 2011-07-28 15:52:45 UTC
This bug should be resolved in the CR at:
http://www.w3.org/TR/xmlschema11-2/

The WG appreciates the effort of the commenter in reporting this bug.  Please indicate your satisfaction with the resolution by marking it as CLOSED.

Thank you.