<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>26278</bug_id>
          
          <creation_ts>2014-07-07 20:48:01 +0000</creation_ts>
          <short_desc>getElementText - no info about U+200E, U+200F</short_desc>
          <delta_ts>2014-07-16 20:09:16 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Browser Test/Tools WG</product>
          <component>WebDriver</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>20860</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Andrey Botalov">botalov.andrey</reporter>
          <assigned_to name="Browser Testing and Tools WG">public-browser-tools-testing</assigned_to>
          <cc>dburns</cc>
    
    <cc>mike</cc>
          
          <qa_contact name="Browser Testing and Tools WG">public-browser-tools-testing</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>108765</commentid>
    <comment_count>0</comment_count>
    <who name="Andrey Botalov">botalov.andrey</who>
    <bug_when>2014-07-07 20:48:01 +0000</bug_when>
    <thetext>Atoms contain some treatment for those characters - https://github.com/SeleniumHQ/selenium/blob/master/javascript/atoms/dom.js#L1208

This change seems to be done in August, 2013 after the algorithm for Webdriver W3C spec was written. So I think README.md of Webdriver project should have a not that when someone makes changes to API of remote end he should also file a bug against Webdriver spec (or even file a bug against Webdriver spec prior to making a change) so Webdriver spec and Selenium won&apos;t become out-of-sync.

P.S.: Also I haven&apos;t noticed in lines near L1208 code that removes \f, \v</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>109003</commentid>
    <comment_count>1</comment_count>
    <who name="David Burns :automatedtester">dburns</who>
    <bug_when>2014-07-16 13:32:28 +0000</bug_when>
    <thetext>(In reply to Andrey Botalov from comment #0)
&gt; 
&gt; P.S.: Also I haven&apos;t noticed in lines near L1208 code that removes \f, \v

Step 2 -&gt; 1 -&gt; 2nd bullet -&gt; 1 handles this scenario</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>109006</commentid>
    <comment_count>2</comment_count>
    <who name="David Burns :automatedtester">dburns</who>
    <bug_when>2014-07-16 13:53:18 +0000</bug_when>
    <thetext>https://dvcs.w3.org/hg/webdriver/rev/4e8c789c7f54</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>109025</commentid>
    <comment_count>3</comment_count>
    <who name="Andrey Botalov">botalov.andrey</who>
    <bug_when>2014-07-16 20:02:26 +0000</bug_when>
    <thetext>There are other whitespace and BiDi characters in http://www.unicode.org/Public/6.3.0/ucd/PropList.txt and http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode.

I think that if only \u200b, \u200e, \u200f, \v, \f should be removed by getElementText() from the string, then the spec should also contain an explanation (note) about what makes those characters special and why other invisible &quot;spaces&quot; shouldn&apos;t be removed.

I don&apos;t know much about Unicode but IMO those &quot;spaces&quot; also look like zero-width:
U+180E
U+200C
U+2060
U+061C
etc.

I also found this line in gecko-dev repository:
https://github.com/mozilla/gecko-dev/blob/master/browser/base/content/browser.js#L2205:

&gt; value = value.replace(/[\u00ad\u034f\u061c\u115f-\u1160\u17b4-\u17b5\u180b-\u180d\u200b\u200e-\u200f\u202a-\u202e\u2060-\u206f\u3164\ufe00-\ufe0f\ufeff\uffa0\ufff0-\ufff8]|\ud834[\udd73-\udd7a]|[\udb40-\udb43][\udc00-\udfff]/g, encodeURIComponent);

It seems that implementation in Firefox is a bit more complicated.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>109026</commentid>
    <comment_count>4</comment_count>
    <who name="David Burns :automatedtester">dburns</who>
    <bug_when>2014-07-16 20:09:16 +0000</bug_when>
    <thetext>I would rather have this either mimick the current implementation, which this bug initial was about.

Adding other spaces would need to be in a new bug with specific use cases so that we can discuss. I suggest bringing this issue up on the mailing list once you have a bug.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>