This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29530 - Proposal for Get Element Text
Summary: Proposal for Get Element Text
Status: REOPENED
Alias: None
Product: Browser Test/Tools WG
Classification: Unclassified
Component: WebDriver (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Browser Testing and Tools WG
QA Contact: Browser Testing and Tools WG
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 20860
  Show dependency treegraph
 
Reported: 2016-03-14 17:20 UTC by clmartin@microsoft.com
Modified: 2016-09-19 23:10 UTC (History)
6 users (show)

See Also:


Attachments

Description clmartin@microsoft.com 2016-03-14 17:20:27 UTC
"12.5 getElementText()"

John Jansen and I were reading the specification for getElementText() and he remembers discussing it at a face to face but can't find the minutes. Below is what we think was agreed upon:

Get Element Text

GET /session/{session id}/element/{element id}/text

The Get Element Text command retrieves the textContent value of the given web element.

1. If the current top-level browsing context is no longer open, return error with error code no such window.
2. Handle any user prompts and return its value if it is an error.
3. Let element result be the result of getting a known element by UUID parameter element id.
4. If element result is a success, let element be element result's data.
Otherwise, return element result.
5. If element is stale, return error with error code stale element reference.
6. Let element text be the result of calling <a href="https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Node3-textContent">textContent</a> on the specified element.
7. Let body be a JSON Object with the "value" memeber set to the element text.
8. Return success with data body.

What do you think?
Comment 1 Daniel Wagner-Hall 2016-03-15 03:00:04 UTC
As I recall, the reasons we bothered to specify our own algorithm were:
 * textContent will include any text which is hidden by CSS/other styling; i.e. <div>foo<span style="display: none;">bar</span></div>'s textContent will return "foobar" where we specify it to return "foo".
 * textContent doesn't perform any whitespace normalisation, so looks different to how the text will look in the browser; i.e. <div>foo  bar</div>'s textContent will return "foo  bar" where we specify it to return "foo bar".
 * We forcibly insert newlines at the start of new block-level elements; i.e. <div>foo<div>bar</div></div>'s textContent will return foobar where we specify it to return "foo\nbar".

The thing we really wanted to defer to was much closer to innerText, but innerText isn't standardised (and wasn't supported in Firefox until very recently).
Comment 2 David Burns :automatedtester 2016-03-15 10:08:55 UTC
We discussed in Sapporo that we want innerText[1] (I know this isnt an official specification) but what it gives us most of the end state that we want.

There is an outstanding bug[2] for Roc's spec[1] to be incorporated into the html spec

[1] http://rocallahan.github.io/innerText-spec/
[2] https://github.com/whatwg/html/issues/465
Comment 3 clmartin@microsoft.com 2016-03-15 18:15:46 UTC
Just tested to verify, the major advantage of textContent is that all browsers support it and return the same value for it in most cases.

I tested some sample markup: https://jsfiddle.net/whqptr65/


textContent returned the exact same string for foo.textContent:
Edge:
"\r\n            bar\r\n            \r\n                Foo bar\r\n            \r\n            foo\r\n        "
Chrome:
"\r\n            bar\r\n            \r\n                Foo bar\r\n            \r\n            foo\r\n        "
Firefox:
"\r\n            bar\r\n            \r\n                Foo bar\r\n            \r\n            foo\r\n        "

innerText failed in IE (not supported) and returned a different string for Chrome/Firefox/Edge as seen below for foo.innerText:

Edge:
"bar \r\n\r\nFoo bar\r\nfoo "
Chrome:
"bar\r\nFoo bar\r\n\r\nfoo"
Firefox:
"bar\r\n\r\nFoo bar\r\n\r\nfoo"

I would argue in this case having something that works in all browsers the same way would be more valuable than something that works completely differently in each (and is unsupported in IE) not to mention not a spec.
I would also argue that a tester would know what content they can ignore and what is valuable to them, so hidden elements can be circumvented.
Comment 4 John Jansen 2016-09-19 16:42:20 UTC
We should follow HTML definition here. Need tests to make sure nothing is broken...
Comment 5 Simon Stewart 2016-09-19 16:43:12 UTC
We should avoid breaking existing selenium tests --- this method is used extensively, and we can "break the web tests" of many users if we're not extremely cautious.
Comment 6 juangj 2016-09-19 23:08:01 UTC
Reopened pending further discussion of the test results.

In summary, the only atoms tests that fail are these two tests about <title> elements: https://github.com/SeleniumHQ/selenium/blob/c10e8a955883f004452cdde18096d70738397788/javascript/webdriver/test/atoms/element_test.html#L151-L161

36 of ~800-ish tests from the Selenium Java suite failed, largely because of extra leading or trailing whitespace, or differing numbers of internal newlines.

For example, TextHandlingTest#testShouldHandleNestedBlockLevelElements fails:
Expected: is "Cheese\nSome text\nSome more text\nand also\nBrie"
     but: was "Cheese\n\nSome text\n\nSome more text\n\nand also\n\nBrie"
Comment 7 juangj 2016-09-19 23:10:35 UTC
We could also run this across a much broader suite of "real" tests if that seems helpful, though obviously Google isn't totally representative of WebDriver's user base.