<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>1922</bug_id>
          
          <creation_ts>2005-08-31 15:26:24 +0000</creation_ts>
          <short_desc>&apos;x&apos; regex flag not entirely clear</short_desc>
          <delta_ts>2005-09-29 12:55:59 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Functions and Operators 1.0</component>
          <version>Last Call drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows 2000</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mary Holstege">holstege</reporter>
          <assigned_to name="Ashok Malhotra">ashok.malhotra</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>5607</commentid>
    <comment_count>0</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2005-08-31 15:26:24 +0000</bug_when>
    <thetext>Section 7.6.1.1 of F&amp;O says only this about the &apos;x&apos; flag:
&quot;x: If present, whitespace characters within the regular expression are ignored. 
By default, whitespace characters match themselves. This allows, for example, 
regular expressions to be broken up into lines for readability.&quot;

Our implementors ask for clarification of what &apos;ignored&apos; means. Here are some
cases:

fn:matches(&quot;helloworld&quot;, &quot;hello[ ]world&quot;, &quot;x&quot;)
   Error? (because [] is not a valid character set?) Or true()?
fn:matches(&quot;hello world&quot;, &quot;hello\ sworld&quot;, &quot;x&quot;)
   True or false? That is is &apos;\ s&apos; == &apos;\s&apos;?
And so forth for spaces in other odd places:
&quot;(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)(a|b)\1 0&quot; 
     \1 followed by &apos;0&apos; or \10?
&quot;\p{ Lu}&quot; 
&quot;\p{L u}&quot; 
&quot;[a- ]&quot;
&quot;[a- z]&quot; 
&quot;hello\ &quot;
&quot;[ ^a]&quot;
&quot;[^ ]&quot;

We assume the appropriate semantic is to pre-strip all whitespace and then parse
the resulting regex; this is certainly simpler from an implementation 
standpoint, but &quot;ignore&quot; isn&apos;t entirely clear and could me to ignore in 
matching, not parsing.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5622</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2005-08-31 21:36:15 +0000</bug_when>
    <thetext>I agree with Mary&apos;s suggestion: the &quot;x&quot; flag should cause all whitespace to be
stripped from the regex in an initial pass, and the semantics are then those of
the resulting regex after whitespace-removal.

Michael Kay</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>5635</commentid>
    <comment_count>2</comment_count>
    <who name="Liam R E Quin">liam</who>
    <bug_when>2005-09-01 00:43:26 +0000</bug_when>
    <thetext>The Perl documentation is useful here (it had to happen once).

The &quot;perldoc perlre&quot; page says,
[[
The &quot;/x&quot; modifier itself needs a little more explanation.  It tells the
regular expression parser to ignore whitespace that is neither back&amp;#8208;
slashed nor within a character class.  You can use this to break up
your regular expression into (slightly) more readable parts.  The &quot;#&quot;
character is also treated as a metacharacter introducing a comment,
just as in ordinary Perl code.  This also means that if you want real
whitespace or &quot;#&quot; characters in the pattern (outside a character class,
where they are unaffected by &quot;/x&quot;), that you&apos;ll either have to escape
them or encode them using octal or hex escapes.  Taken together, these
features go a long way towards making Perl&apos;s regular expressions more
readable. 
]]

I believe this is a sensible and appropriate definition, and
means that [ ] matches a single space (and also, for Perl,
that you can&apos;t put comments inside character classes).

It&apos;s not clear to me how to allow host-language comments
inside a regular expression, and I think that should be up
to the host language to specify, rather than using Perl&apos;s
# comments.  So just take the whitespace part of this.

Liam</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>6121</commentid>
    <comment_count>3</comment_count>
    <who name="Ashok Malhotra">ashok.malhotra</who>
    <bug_when>2005-09-13 21:32:35 +0000</bug_when>
    <thetext>On the joint 9/13 telcon the WGs agreed to change the explanation of the &apos;x&apos;
flag based on the Perl semnatics as suggested by Liam.  Suggested replacement
text is below.  Please comment.

x: If present, whitespace characters in the regex are removed prior to matching
with two exceptions:  whitespace characters preceded by a backslash are not
removed and whitespace characters within character classes are not removed. 
This can be used, for example to break up long regex&apos; into readable lines.

Examples:
fn:matches(&quot;helloworld&quot;, &quot;hello world&quot;, &quot;x&quot;) returns true
fn:matches(&quot;helloworld&quot;, &quot;hello[ ]world&quot;, &quot;x&quot;) returns false
fn:matches(&quot;hello world&quot;, &quot;hello\ sworld&quot;, &quot;x&quot;) returns false
fn:matches(&quot;hello world&quot;, &quot;hello\sworld&quot;, &quot;x&quot;) returns true</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>6122</commentid>
    <comment_count>4</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2005-09-13 21:58:41 +0000</bug_when>
    <thetext>Actually, I don&apos;t believe our syntax allows backslash to be followed by a
whitespace character. There&apos;s little point in preserving the whitespace
character if it&apos;s illegal, so I suggest we strip it.

A character class (charClass) is either a charClassExpr or a charClassEsc. Since
charClassEsc embraces things like \P{IsCombiningDiacriticalMarks} I think that
it&apos;s only within a Character Class Expression (charClassExpr) that you wanted
whitespace to be preserved.

Michael Kay</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>6124</commentid>
    <comment_count>5</comment_count>
    <who name="Ashok Malhotra">ashok.malhotra</who>
    <bug_when>2005-09-14 13:26:48 +0000</bug_when>
    <thetext>Amended proposal based on Michael Kay observation that whitespace characters are
not allowed after the backslash and so we should strip them out if they do occur.

REVISED PROPOSAL

x: If present, whitespace characters in the regex are removed prior to matching
with one exception:  whitespace characters within character class expressions
(charClassExpr) are not removed. This can be used, for example, to break up long
regex&apos; into readable lines.

Examples:
fn:matches(&quot;helloworld&quot;, &quot;hello world&quot;, &quot;x&quot;) returns true

fn:matches(&quot;helloworld&quot;, &quot;hello[ ]world&quot;, &quot;x&quot;) returns false

fn:matches(&quot;hello world&quot;, &quot;hello\ sworld&quot;, &quot;x&quot;) returns true</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>6125</commentid>
    <comment_count>6</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2005-09-14 13:50:22 +0000</bug_when>
    <thetext>Another useful example might be:

fn:matches(&quot;hello world&quot;, &quot;hello world&quot;, &quot;x&quot;) returns false

Michael Kay</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>6497</commentid>
    <comment_count>7</comment_count>
    <who name="Ashok Malhotra">ashok.malhotra</who>
    <bug_when>2005-09-27 15:29:53 +0000</bug_when>
    <thetext>The WGs decided on 9/27 to accept the proposal below to fix this bug.

x: If present, whitespace characters in the regex are removed prior to matching
with one exception:  whitespace characters within character class expressions
(charClassExpr) are not removed. This can be used, for example, to break up long
regex&apos; into readable lines.

Examples:
fn:matches(&quot;helloworld&quot;, &quot;hello world&quot;, &quot;x&quot;) returns true

fn:matches(&quot;helloworld&quot;, &quot;hello[ ]world&quot;, &quot;x&quot;) returns false

fn:matches(&quot;hello world&quot;, &quot;hello\ sworld&quot;, &quot;x&quot;) returns true

fn:matches(&quot;hello world&quot;, &quot;hello world&quot;, &quot;x&quot;) returns false</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>