This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
"12.5 getElementText()" John Jansen and I were reading the specification for getElementText() and he remembers discussing it at a face to face but can't find the minutes. Below is what we think was agreed upon: Get Element Text GET /session/{session id}/element/{element id}/text The Get Element Text command retrieves the textContent value of the given web element. 1. If the current top-level browsing context is no longer open, return error with error code no such window. 2. Handle any user prompts and return its value if it is an error. 3. Let element result be the result of getting a known element by UUID parameter element id. 4. If element result is a success, let element be element result's data. Otherwise, return element result. 5. If element is stale, return error with error code stale element reference. 6. Let element text be the result of calling <a href="https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Node3-textContent">textContent</a> on the specified element. 7. Let body be a JSON Object with the "value" memeber set to the element text. 8. Return success with data body. What do you think?
As I recall, the reasons we bothered to specify our own algorithm were: * textContent will include any text which is hidden by CSS/other styling; i.e. <div>foo<span style="display: none;">bar</span></div>'s textContent will return "foobar" where we specify it to return "foo". * textContent doesn't perform any whitespace normalisation, so looks different to how the text will look in the browser; i.e. <div>foo bar</div>'s textContent will return "foo bar" where we specify it to return "foo bar". * We forcibly insert newlines at the start of new block-level elements; i.e. <div>foo<div>bar</div></div>'s textContent will return foobar where we specify it to return "foo\nbar". The thing we really wanted to defer to was much closer to innerText, but innerText isn't standardised (and wasn't supported in Firefox until very recently).
We discussed in Sapporo that we want innerText[1] (I know this isnt an official specification) but what it gives us most of the end state that we want. There is an outstanding bug[2] for Roc's spec[1] to be incorporated into the html spec [1] http://rocallahan.github.io/innerText-spec/ [2] https://github.com/whatwg/html/issues/465
Just tested to verify, the major advantage of textContent is that all browsers support it and return the same value for it in most cases. I tested some sample markup: https://jsfiddle.net/whqptr65/ textContent returned the exact same string for foo.textContent: Edge: "\r\n bar\r\n \r\n Foo bar\r\n \r\n foo\r\n " Chrome: "\r\n bar\r\n \r\n Foo bar\r\n \r\n foo\r\n " Firefox: "\r\n bar\r\n \r\n Foo bar\r\n \r\n foo\r\n " innerText failed in IE (not supported) and returned a different string for Chrome/Firefox/Edge as seen below for foo.innerText: Edge: "bar \r\n\r\nFoo bar\r\nfoo " Chrome: "bar\r\nFoo bar\r\n\r\nfoo" Firefox: "bar\r\n\r\nFoo bar\r\n\r\nfoo" I would argue in this case having something that works in all browsers the same way would be more valuable than something that works completely differently in each (and is unsupported in IE) not to mention not a spec. I would also argue that a tester would know what content they can ignore and what is valuable to them, so hidden elements can be circumvented.
We should follow HTML definition here. Need tests to make sure nothing is broken...
We should avoid breaking existing selenium tests --- this method is used extensively, and we can "break the web tests" of many users if we're not extremely cautious.
Reopened pending further discussion of the test results. In summary, the only atoms tests that fail are these two tests about <title> elements: https://github.com/SeleniumHQ/selenium/blob/c10e8a955883f004452cdde18096d70738397788/javascript/webdriver/test/atoms/element_test.html#L151-L161 36 of ~800-ish tests from the Selenium Java suite failed, largely because of extra leading or trailing whitespace, or differing numbers of internal newlines. For example, TextHandlingTest#testShouldHandleNestedBlockLevelElements fails: Expected: is "Cheese\nSome text\nSome more text\nand also\nBrie" but: was "Cheese\n\nSome text\n\nSome more text\n\nand also\n\nBrie"
We could also run this across a much broader suite of "real" tests if that seems helpful, though obviously Google isn't totally representative of WebDriver's user base.