WebDriver

1. Introduction
- 1.1 Intended Audience
- 1.2 Relationship of WebDriver API and Existing Specifications
- 1.3 Naming the Two Sides of the API
2. Commands and Responses
- 2.1 Command
  - 2.1.1 Attributes
- 2.2 Response
  - 2.2.1 Attributes
- 2.3 Processing Additional Fields on Commands and Responses
- 2.4 Error Codes
3. Browser Capabilities
- 3.1 Capabilities
  - 3.1.1 Attributes
  - 3.1.2 Methods
- 3.2 MutableCapabilities
  - 3.2.1 Methods
4. Sessions
- 4.1 Creating a Session
5. Navigation
- 5.1 Page Load Strategies
- 5.2 Navigation Commands
  - 5.2.1 Invalid SSL Certificates
- 5.3 Detecting When to Handle Commands
6. Controlling Windows
- 6.1 Defining "window" and "frame"
- 6.2 Window Handles
- 6.3 Iterating Over Windows
- 6.4 Closing Windows
- 6.5 Resizing and Positioning Windows
- 6.6 Scaling the Content of Windows
7. Where Commands Are Handled
- 7.1 Default Content
- 7.2 Switching Windows
- 7.3 Switching Frames
8. Running Without Window Focus
9. Elements
- 9.1 Attributes
- 9.2 Lists of WebElements
- 9.3 Element Location Strategies
10. Reading Element State
- 10.1 Determining Visibility
- 10.2 Determining Whether a WebElement Is Selected
- 10.3 Reading Attributes and Properties
- 10.4 Rendering Text
11. Executing Javascript
- 11.1 Javascript Command Parameters
  - 11.1.1 Attributes
- 11.2 Synchronous Javascript Execution
- 11.3 Asynchronous Javascript Execution
- 11.4 Reporting Errors
12. Cookies
13. Timeouts
14. User Input
- 14.1 Interaction directly with elements
- 14.2 High Level APIs: Clicking and Typing
  - 14.2.1 Methods
- 14.3 Low Level APIs
15. Modal Dialogs
- 15.1 window.alert, prompt and confirm
- 15.2 Modal Windows
16. Snapshots
- 16.1 Screen
- 16.2 Current Window
- 16.3 Element
17. Handling non-HTML Content
18. Extending the Protocol
A. Command Summary
B. Command Format
C. Thread Safety
D. Logging
E. Mapping to HTTP and JSON
F. Acknowledgements
G. References
- G.1 Normative references
- G.2 Informative references

1. Introduction

The WebDriver API aims to provide a synchronous API that can be used for a variety of use cases, though it is primarily designed to support automated testing of web apps.

1.1 Intended Audience

This specification is intended for implementors of the WebDriver API. It is not intended as light bed time reading.

1.2 Relationship of WebDriver API and Existing Specifications

Where possible and appropriate, the WebDriver API references existing specifications. For example, the list of boolean attributes for elements is drawn from the HTML5 specification. When references are made, this specification will link to the relevant sections.

1.3 Naming the Two Sides of the API

The WebDriver API can be thought of as a client/server process. However, implementation details can mean that this terminology becomes confusing. For this reason, the two sides of the API are called the "local" and the "remote" ends.

Local: The user-facing API. Command objects are sent and Response objects are consumed by the local end of the WebDriver API. It can be thought of as being "local" to the user of the API.
Remote: The implementation of the user-facing API. Command objects are consumed and Response objects are sent by the remote end of the WebDriver API. The implementation of the remote end may be on machine remote from the user of the local end.

There is no requirement that the local and remote ends be in different processes.

2. Commands and Responses

The WebDriver API is designed to be used both in-process and out-of-process. The IDL given in this specification and summarized in Appendix XXXX should be used as the basis for the user-facing API. When used out-of-process, the WebDriver API defines command/repsonse objects that must be used. How these are encoded and transmitted between the browser being automated and the user of the API is left undefined, but a non-normative implementation of this as JSON over HTTP is given in appendix XXXX.

2.1 Command

A command represents a call to the remote end of the WebDriver API.

interface Command {
    attribute string     name;
    attribute dictionary parameters;
    attribute string     sessionId;
};

2.1.1 Attributes

name of type string: The case-sensitive name of the command to execute
parameters of type dictionary: A map of the named parameters to an object representing its value.
sessionId of type string: A reference to the session to which this command is associated.

2.2 Response

A response represents the value returned from the remote end of the WebDriver API.

interface Response {
    readonly attribute string  sessionId;
    readonly attribute integer status;
    readonly attribute object  value;
};

2.2.1 Attributes

sessionId of type string, readonly: A reference to the session to which this command is associated.
status of type integer, readonly: The status code representing the success or failure of the method. Anything other than 0 indicates a failure of some kind
value of type object, readonly: The return value of the method call. It's type is determined by the Command that has been executed. In the specification, each command definition will make clear what the expected return type is.

2.3 Processing Additional Fields on Commands and Responses

Any Command or Response may contain additional fields than those listed above. The content of fields must be maintained, unaltered by any intermeditate processing nodes. There is no requirement to maintain the ordering of fields.

Note

This requirement exists to allow for extension of the protocol, and to allow implementors to decorate Commands and Responses with additional information, perhaps giving context to a series of messages. or providing security information.

2.4 Error Codes

The WebDriver API indicates the success or failure of a command invocation via a status code on the Responseobject. The following values are used and have the following meanings.

Status Code	Summary	Detail
0	Success	The command executed successfully.
7	NoSuchElement	An element could not be located on the page using the given search parameters.
8	NoSuchFrame	A request to switch to a frame could not be satisfied because the frame could not be found.
9	UnknownCommand	The requested resource could not be found, or a request was received using an HTTP method that is not supported by the mapped resource.
10	StaleElementReference	An element command failed because the referenced element is no longer attached to the DOM.
11	ElementNotVisible	An element command could not be completed because the element is not visible on the page.
12	InvalidElementState	An element command could not be completed because the element is in an invalid state (e.g. attempting to click a disabled element).
13	UnknownError	An unknown server-side error occurred while processing the command.
15	ElementIsNotSelectable	An attempt was made to select an element that cannot be selected.
17	JavaScriptError	An error occurred while executing user supplied !JavaScript.
19	XPathLookupError	An error occurred while searching for an element by XPath.
21	Timeout	An operation did not complete before its timeout expired.
23	NoSuchWindow	A request to switch to a different window could not be satisfied because the window could not be found.
24	InvalidCookieDomain	An illegal attempt was made to set a cookie under a different domain than the current page.
25	UnableToSetCookie	A request to set a cookie's value could not be satisfied.
26	UnexpectedAlertOpen	A modal dialog was open, blocking this operation
27	NoAlertOpenError	An attempt was made to operate on a modal dialog when one was not open.
28	ScriptTimeout	A script did not complete before its timeout expired.
29	InvalidElementCoordinates	The coordinates provided to an interactions operation are invalid.
30	IMENotAvailable	IME was not available.
31	IMEEngineActivationFailed	An IME engine could not be started.
32	InvalidSelector	Argument was an invalid selector (e.g. XPath/CSS).
33	SessionNotCreatedException	A new session could not be created.
34	MoveTargetOutOfBounds	The target for mouse interaction is not on the viewport and cannot be brought into the viewport.

3. Browser Capabilities

Different browsers support different levels of various specifications. For example, some support SVG or the CSS Selector API, but only browsers that implement HTML5 will support LocalStorage. The WebDriver API provides a mechanism to query the supported capabilities of a browser. Each broad area of functionality within the WebDriver API has an associated capability string. Whether a particular capability must or may be supported — as well as fallback mechanisms for handling those cases where a capability is not supported — is discussed where the capability string is defined.

3.1 Capabilities

interface Capabilities {
    readonly attribute dictionary capabilities;
    boolean has (string capabilityName);
    (string or boolean or number)?    get (string capabilityName);
};

3.1.1 Attributes

capabilities of type dictionary, readonly: The underlying collection of capabilities, represented as a dictionary mapping strings to values which may be of type boolean, numerical or string.

3.1.2 Methods

get

Get the value of the key matching capabilityName in the underlying capabilities or nullif no value is defined.

Parameter	Type	Nullable	Optional	Description
capabilityName	`string`	✘	✘

Return type: stringbooleannumber, nullable

has

Queries the underlying capabilities to see whether the value is set. This will return true if the capabilities contain a key with the given capabilityName and the value of that key is defined. If the value is a boolean, this function will return that boolean value. If the value is null, an empty string or a 0 then this method will return false.

Parameter	Type	Nullable	Optional	Description
capabilityName	`string`	✘	✘

Return type: boolean

A Capabilities instance must be immutable. If a mutable Capabilities instance is required, then the MutableCapabilities must be used instead.

3.2 MutableCapabilities

interface MutableCapabilities : Capabilities {
    void set (string capabilityName, (string or boolean or number)? value);
};

3.2.1 Methods

set

Set the value of the given capabilityNameto the given value. If the value is not a boolean, numerical type or a string, a WebDriverException should be thrown.

Parameter	Type	Nullable	Optional	Description
capabilityName	`string`	✘	✘
value	`stringbooleannumber`	✔	✘

Return type: void

4. Sessions

Non-normative summary: A session is equivalent to a single instantiation of a particular browser, including all child windows. The WebDriver API gives each session a UUID stored as a string that can be used to differentiate one session from another, allowing multiple browsers to be controlled on the same machine if needed, and allowing sessions to be routed via a multiplexer. This ID is sent with every Command and returned with every Response and is stored on the sessionIdfield.

4.1 Creating a Session

The process for successfully creating a session follows.

The local end creates a new Capabilities or MutableCapabilities instance describing the desired capabilities for the session. The Capabilities object may be empty, but must be defined.
The local end creates a new Command with the "name" being "newSession" and the "parameters" containing an entry named "desiredCapabilities" with the value set to the Capabilities instance from the previous step. An optional "requiredCapabilities" entry may also be created and populated with a Capabilities instance. The "sessionId" fields should be left empty.
The Command is serialized and transmitted to the remote end.
The remote end examines the two Capabilities parameters, and creates a new session matching as many of the Capabilities as possible from the "desiredCapabilities" and all of the Capabilities given in the "requiredCapabilities". How the new session is created depends on the implementation of this specification. In the case of a browser automation framework, it is expected that a new instance of the browser is started if possible.
- If any of the "requiredCapabilities" cannot be fulfilled by the new session, the remote end must quit the session and return the SessionNotCreatedException error code. The error message should list all unmet required capabilities though only the first unmet required capability must be given.
The session must be assigned a UUID which must be unique for each session (by definition). Generating the UUID may occur before the session is created. If the Command object had the "sessionId" field set, this may be discarded in favour of the freshly generated UUID. Because of this, it is recommended that UUID generation be done on the remote end. If the UUID has already been used, a Response must be sent with the status code set to SessionNotCreatedException and the value being an explanation that the UUID has previously been used.
The remote end create a new Response object.
- The "sessionId" field is assigned the UUID associated with this session.
- The session is described by filling a Capabilities instance with keys matching the parts of this specification that can be fulfilled. This is assigned to the "value" field of the Response. This fields must be filled
- The "status" field is set to the SUCCESS error code.
The Response is transmitted or returned back to the local end.

There is no requirement for the local end to validate that some or all of the fields sent on the Capabilities associated with the Command match those returned in the Response.

4.1.1 Capability Names

The following keys are to be used in the Capabilities instances.

browserName: The name of the desired browser as a string
browserVersion: The version number of the browser, given as a string
platformName: The OS that the browser is running on, matching any of the platform names given below.
platformVersion: The version of the OS that the browser is running on as a string.

4.1.1.1 Platform Names

These should be named in the style of enums in C-like languages.

ANDROID
IOS
LINUX
MAC
UNIX
WINDOWS

In addition "ANY" may be used to indicate the underlying OS is either unknown or does not matter. Implementors may add additional platform names.

4.1.2 Error Handling

The following status codes must be returned by the "newSession" command. Please consult the table in the "commands" section for numerical values:

Success: The session was successfully created. The "value" field of the Response must contain a Capabilities object describing the session
Timeout: The new session could not be created within the time allowed for command execution on the remote end. This time may be infinite. The "value" field of the Response should contain a string explaining that a timeout has occurred, but it may be left empty or filled with the empty string.
UnknownError: An unhandled error of some sort has occurred. The "value" field of the Response should contain a more detailed description of the error.

4.1.3 Remote End Matching of Capabilities

This section is non-normative.

The suggested order for comparing keys in the Capabilities instance when creating a session is:

browserName
browserVersion
platform
platformVersion

For all comparisons, if the key is missing (as determined by a call to Capability.has()), that particular criteria shall not factor into the comparison.

5. Navigation

Almost all usages of the WebDriver API begin by navigating to a particular URL. This section not only describes the commands used for navigation, but also describes when commands must be processed.

All WebDriver implementations must support navigating between different HTTP domains and between HTTPS and HTTP domains if the underlying browser supports this.

5.1 Page Load Strategies

conservative: The remote end must wait until all frames and iframes in the window currently be used to process commands that contain an HTML document are at "document.readyState == 'complete'" and there are no outstanding HTTP requests, other than those caused by XMLHttpRequests. If a frame or iframe does not contain an HTML document, the remote end should wait until all HTTP traffic to that frame is complete.
normal: The remote ends must wait until the frame currently handling commands reaches "document.readyState == 'complete'" or there are no more outstanding network requests other than XMLHttpRequests.
eager: The remote end must wait until the frame currently handling commands reaches "document.readyState == 'interactive' || document.readyState == 'complete'" or there are no more outstanding network requests.
none: The remote end does not do any checks to see if a page load is currently active.

All WebDriver implementations must support the normal and eager modes and should support the conservative and none modes. If no page loading strategy is chosen, then normal must be the default. In addition, implementors may add additional page loading strategies.

5.2 Navigation Commands

Command Name	get
Parameters	"url" {string} The URL to be navigated to.
Return Value	None
Errors	TimedOutException if the page load takes too long as specified by the timeouts.

The "get" command is used to cause the browser under test to navigate to a new location, and is named after the HTTP verb. From a user's point of view, this is as if they have entered the "url" into the URL bar. When the command returns is based on the page load strategy that the user has selected with the following exceptions when the strategy is not "none":

HTTP redirects must be automatically followed without first returning control to the user.
Control must return to the user on pages with an HTML META tags that would cause a refresh only if the timeout is greater than 1 second.
Control must be returned to the user if any modal dialog box, such as those opened by on window.onbeforeunload or window.alert, is opened at any point.
Control must be returned to the user if user credentials are requested by the browser. That is, if BASIC, DIGEST, NTLM or similar authentication is required. This does not include FORM-based authentication.

5.2.1 Invalid SSL Certificates

Capability Name	Type
secureSsl	boolean

WebDriver implementations must support users accessing sites served via HTTPS. Access to those sites using self-signed or invalid certificates, and where the certificate does not match the serving domain must be the same as if the HTTPS was configured properly.

Note

The reason for this is that WebDriver instances are often used for testing. It's a sorry fact that many QA and testers are asked to verify that apps work on sites that have insecure HTTPS configurations

The exception to requirement is if the Capabilities used to initialize has the WebDriver session had the capability secureSsl set to true. In this case, implementations may chose to make accessing a site with bad HTTPS configurations cause a WebDriverException to be thrown. If this is the case, the Capabilities describing the session must also set the secureSsl capability to "true".

5.3 Detecting When to Handle Commands

WebDriver instances must only process commands when the page loading strategy being used indicates that control would be returned to the user.

6. Controlling Windows

6.1 Defining "window" and "frame"

Within this specification, a window equates to anything that would be referred to as "window.top" in javascript. Put another way, within this spec browser tabs are counted as separate windows.

TODO: define "frame"

6.2 Window Handles

Each window has a "window handle" associated with it. This is an opaque string which is unique to the window. The suggested implementation is as a UUID. The "getWindowHandle" command can be used to obtain the window handle for the window that commands are currently acting upon:

Command Name	getWindowHandle
Parameters	"sessionId" {string} The key that identifies which session this request is for.
Return Value	string

6.3 Iterating Over Windows

Command Name	getWindowHandles
Parameters	"sessionId" {string} The key that identifies which session this request is for.
Return Value	Array.<string>

This array of returned strings must contain a handle for every window associated with the browser session and no others. In addition, at the time of collecting the window handles the javascript expression "window.top.closed" must evaluate to false.

The ordering of the keys is not defined, but should be determined by iterating over each top level browser window and returning the tabs within that window before iterating over the tabs of the next top level browser window. For example, in the diagram below, the window handles should be returned as the handles for: win1tab1, win1tab2, win2.

Two top level windows. The first window has two tabs, lablled win1tab1 and win1tab2. The second window has only one tab labelled win2

6.4 Closing Windows

Command Name	close
Parameters	"sessionId" {string} The key that identifies which session this request is for.
Return Value	None

The close command closes the window that commands are currently being sent to. If this means that a call to get the list of window handles returns an empty list, then this close command must be the equivalent of calling "quit". In all other cases, control must be returned to the calling process once the window has been closed or an alert is displayed by the closing window.

Once the window has closed, future commands must return an error NoSuchWindowException until a new window is selected for receiving commands.

6.5 Resizing and Positioning Windows

Command Name	setWindowSize
Parameters	"sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize. "width" {number} The new window width. "height" {number} The new window height.
Return Value	None
Errors	UnsupportedOperationException if the window could not be resized.
Command Name	getWindowSize
Parameters	"sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize.
Return Value	An object with two keys: "width" {number} The width of the specified window. "height" {number} The height of the specified window.
Command Name	maximizeWindow
Parameters	"sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize.
Return Value	None
Errors	UnsupportedOperationException if the window could not be resized.
Command Name	fullscreenWindow
Parameters	"sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize.
Return Value	None
Errors	UnsupportedOperationException if the window could not be resized.

Each of these commands accept the window handles returned by "getWindowHandles" and "getWindowHandle". In addition, the window handle may be "current", in which case the window that commands are currently being handled by must be acted upon.

The "width" and "height" values refer to the "window.outerheight" and "window.outerwidth" properties. For those browsers that do not support these properties, these represent the height and width of the whole browser window including window chrome and window resizing borders/handles.

After setWindowSize, the whole browser window must be left as if the restore button had been pressed, and must not be in the maximised state.

After maximizeWindow, the whole browser window must be left as if the maximise button had been pressed; it is not sufficient to leave the window "restored", but with the full screen dimensions.

If a request is made to resize a window to a size which cannot be performed (e.g. the browser has a minimum, or fixed window size), an UnsupportedOperationException must be thrown.

6.6 Scaling the Content of Windows

TODO

7. Where Commands Are Handled

Web applications can be composed of multiple windows and/or frames. For a normal user, the context in which an operation is performed is obvious: it's the window or frame that currently has OS focus and which has just received user input. The WebDriver API does not follow this convention. There is an expectation that many browsers using the WebDriver API may be used at the same time on the same machine. This section describes how WebDriver tracks which window or frame is currently the context in which commands are being executed.

7.1 Default Content

WebDriver's default context is the equivalent of window.top.

7.2 Switching Windows

Command Name	switchToWindow
Parameters	"sessionId" {string} The key that identifies which session this request is for. "name" {string} The identifier used for a window.
Return Value	None
Errors	NoSuchWindowException if no matching window can be found

The "switchToWindow" command is used to select which window should currently be accepting commands. In order to determine which window should be used for accepting commands, the "switchToWindow" command will iterate over all windows. For each window, the following will be compared --- in this order --- with the "name" parameter:

A window handle, obtained from "getWindowHandles" or "getWindowHandle".
The window name, as defined when the window was opened (the value of "window.name")
The "id" attribute of the window.

If no windows match, then a "NoSuchWindowException" must be thrown, otherwise the "default content" of the first window to match will be selected for accepting commands.

When a new browser session is started by WebDriver and only a single window is present then the default content of that window becomes the "current" window. When more than one window is opened, the "current" window is undefined. Any commands that are executed at this point that require a window must throw an exception (TODO: Which exception? Ideally the same as if a window had just been closed). The correct way for a user to recover from this situation is to obtain the list of window handles and to "switch to" one of these.

7.3 Switching Frames

Command Name	switchToFrame
Parameters	"sessionId" {string} The key that identifies which session this request is for. "id" {?(string\|number\|!WebElement=)} The identifier used for a window.
Return Value	None
Errors	NoSuchFrameException if no matching frame can be found

The "switchToFrame" command is used to select which frame within a window should be used for handling future commands. All frame switching is taken from the current context from which commands are currently being handled. The "id" parameter can be one of a string, number of an element. WebDriver implementations must determine which frame to select using the following algorithm:

If the "id" is a number the current context is set to the equivalent of the JS expression "window.frames[n]" where "n" is the number and "window" is the DOM window represented by the current context.
If the "id" is null, the current context is set to the default context.
If the "id" is a string:
1. If the JS expression "window.frames[id]" evaluated in the current context returns a window, where "id" is the value of the the "id" parameter, the current context is set to that.
2. Otherwise for each value of "window.frames" (referred to as "window"):
  1. If "window" has a "name" property or attribute equal to the "id" parameter, this becomes the current context.
  2. If "window" has an "id" property or attribute equal to the "id" parameter, this becomes the current context.
If the "id" represents a WebElement, and the corresponding DOM element represents a FRAME or an IFRAME, and the WebElement is part of the current context, the "window" property of that DOM element becomes the current context.

In all cases if no match is made a "NoSuchFrameException" must be thrown.

Frame switching must succeed even if doing so would cross a security origin, or javascript executing in window.top's context would otherwise not be able to access the frame being switched to.

8. Running Without Window Focus

All browsers must comply with the focus section of the [HTML5] spec. In particular, the requirement that the element within a top-level browsing context be independent of whether or not the top-level browsing context itself has system focus must be followed.

Note

This requirement is put in place to allow efficient machine utilization when using the WebDriver API to control several browsers independently on the same desktop

9. Elements

One of the key abstractions of the WebDriver API is the WebElement interface. Each instance of this interface represents an Element as defined in the [DOM4] specification. Because the WebDriver API is designed to allow users to interact with apps as if they were actual users, the capabilities offered by the WebElement interface are somewhat different from those offered by the DOM Element interface.

Each WebElement instance must have an ID, which is distinct from the value of the DOM Element's "id" property. The ID for every WebElement representing the same underlying DOM Element must be the same. The IDs used to refer to different underlying DOM Elements must be unique.

interface WebElement {
    readonly attribute DOMString id;
};

9.1 Attributes

id of type DOMString, readonly: The WebDriver ID of this particular element. This should be a UUID.

Note

This requirement around WebElement IDs allows for efficient equality checks when the WebDriver API is being used out of process.

This section of the specification covers finding elements. Later sections deal with querying and interacting with these located elements. The primary interface used by the WebDriver API for locating elements is the SearchContext.

9.2 Lists of WebElements

The primary grouping of WebElement instances is an array of WebElement instances

A reference to an WebElement is obtained via a SearchContext. The key interfaces are:

interface Locator {
    readonly attribute DOMString strategy;
    readonly attribute DOMString value;
};

Attributes

strategy of type DOMString, readonly: The name of the strategy that should be used to locate elements.
value of type DOMString, readonly: The value to pass to the element finding strategy

interface SeachContext {
    WebElement[] findElements (Locator locator);
    WebElement   findElement (Locator locator);
};

Methods

findElement

Parameter	Type	Nullable	Optional	Description
locator	`Locator`	✘	✘

Return type: WebElement

findElements

Parameter	Type	Nullable	Optional	Description
locator	`Locator`	✘	✘

Return type: WebElement[]

9.3 Element Location Strategies

9.3.1 ARIA

This section is non-normative: It should be possible to find elements using their ARIA roles. It may be possible to find elements using their ARIA states and properties. All references to "ARIA" refer to [WAI-ARIA]

9.3.2 CSS Selectors

Capability Name	Type
cssSelectors	boolean

If a browser supports the CSS Selectors API ([SELECTORS-API]) it must support locating elements by CSS Selector. If the browser does not support the browser CSS Selector spec it may chose to implement locating by this mechanism. Elements must be returned in the same order as if "querySelectorAll" had been called. Compound selectors are allowed.

9.3.3 ECMAScript

Finding elements by ecmascript is covered in the ecmascript part of this spec.

9.3.4 Element ID

This strategy must be supported by all WebDriver implementations.

The HTML5 specification ([HTML5]) states that element IDs must be unique within their home subtree. Sadly, this uniqueness requirement is not always met. Consequently, this strategy is equally valid for finding a single element, or groups of elements. In the case of finding a single WebElement, this must be functionally identical to a call to "document.getElementById()" from the Web DOM Core specification ([DOM4]). When finding multiple elements, this is equivalent to an CSS query of "#value" where "value" is the ID being searched for with all "'" characters being properly escaped..

9.3.5 Link Text

This strategy must be supported by all WebDriver implementations.

The following algorithm must be used:

WebDriver will need to look up all of the A elements on the page. The lookup should be done using [SELECTORS-API] but may use "document.getElementsByTagName()" from the Web DOM Core specification ([DOM4]). If the document being searched is a valid XHTML document, then this step must return all elements that the browser would consider an anchor tag.
The value of getElementText must match with case sensitive of the search term passed in

9.3.6 Partial Link Text

This strategy must be supported by all WebDriver implementations.

The following algorithm must be used:

Look up all of the A elements on the page. The lookup should be done using CSS Selectors API ([SELECTORS-API]) but may use "document.getElementsByTagName()" from [DOM4]. If the document being searched is a valid XHTML document, then this step must return all elements that the browser would consider an anchor tag..
The search term must match case sensitive a subsection of the value of getElementText

9.3.7 XPath

All WebDriver implementations must support finding elements by XPath 1.0 [XPATH] with the edits from section 3.3 of the [HTML5] specification made. If no native support is present in the browser, a pure JS implementation may be used. When called, the returned values must be equivalent of calling "evaluate" function from the DOM Level 3 XPath spec [DOM-LEVEL-3-XPATH] with the result type set to "ORDERED_NODE_SNAPSHOT_TYPE (7).

10. Reading Element State

10.1 Determining Visibility

The following algorithm is used to determine if an element has been displayed.

The element has a height and width greater than 0px.
The element must not be visible if that element, or any of its ancestors, is hidden or has a display property that is none.
OPTIONs and OPTGROUP are treated as special cases, they are considered shown if and only if the enclosing select element is visible.
MAP elements are shown if and only if the image it uses is visible. Areas within a map are shown if the enclosing MAP is visible.
Any INPUT elements of "type=hidden" are not visible
Any NOSCRIPT elements must not be visible if Javascript is enabled.
The element must be not be visible if any ancestor in the element's transitive closure of offsetParents has a fixed size, and has the CSS style of "overflow:hidden", and the element's location is not within the fixed size of the parent

Command Name	isDisplayed
Parameters	"id" {string} The ID of the WebElement on which to operate.
Return Value	{boolean} Whether the element is displayed.
Errors	StaleElementReferenceException if the element referenced is no longer attached to the DOM

10.2 Determining Whether a WebElement Is Selected

WebDriver determines whether a WebElement is selected using the following algorithm:

If the item is not "selectable", the WebElement is not selected. A selectable element is either an OPTION element or an INPUT element of type "checkbox" or "radio".
If the WebElement represents an INPUT element, call the "getProperty" method described above looking for the "checked" property. This indicates whether the element is selected.
Otherwise, call the "getProperty" method described above looking for the "selected" property. This indicates whether the element is selected.

Command Name	isSelected
Parameters	"id" {string} The ID of the WebElement on which to operate.
Return Value	{boolean} Whether the element is selected, according to the above algorithm.
Errors	StaleElementReferenceException if the element referenced is no longer attached to the DOM

10.3 Reading Attributes and Properties

Although the [HTML5] spec is very clear about the difference between the properties and attributes of a DOM element, users are frequently confused between the two. Because of this, the WebDriver API offers a single command ("getElementAttribute") which covers the case of returning both the value of a DOM element's property or attribute. If a user wishes to refer specifically to an attribute or a property, they should evaluate Javascript in order to be unambiguous. In this section, the "attribute" with name name shall refer to the result of calling the Javascript "getAttribute" function on the element, with the following exceptions:

If, in the current rendering mode, the content attribute name reflects a boolean IDL attribute, as per the HTML specification, the value must be the string 'true' if that IDL attribute's value is true or the null value if the IDL attribute's value is false.
If the element is an OPTION element and name is "value" and there is no "value" attribute, then the text content of the OPTION element must be returned, in accordance with [HTML401] spec, specifically the section on pre-selected options. The text content must be the result of calling the "getElementText" command on the OPTION element.
If the element is selectable, and name is "selected", or the element is an INPUT element of type "checkbox" or "radio" and name is "checked", return the string 'true' if the element is selected, and the null value otherwise.
If name is "style", the value returned must be serialized as defined in the [CSSOM-VIEW] spec. Notably, css property names must be cased the same as specified in in section 6.5.1 of the [CSSOM-VIEW] spec.
- Consequently, it should be equivalent to obtaining the "cssText" property, with the additional constraint that the same value must be returned after a round trip through "executeScript". That is, the following pseudo-code must be true (where "driver" is a WebDriver instance, and "element" is a WebElement):
```
var style = element.getAttribute('style');
driver.executeScript('arguments[0].style = arguments[1]', element, style);
var recovered = element.getAttribute('style');
assertEquals(style, recovered);
```
- Color property values must be standardized to rgba format, matching the regular expression: rgba$\d+, \d+, \d+, (1|0(\.\d+)?)$.

If the value is expected to be a URL (see the below table), return the property named name, i.e. a fully resolved URL: TODO: This doesn't feel like an exhaustive list

Tag name	"name" value
A	href
IMG	src

If name is in the below table, and the above stages have not yielded a defined, non-null value, the value of the aliased attribute in the table below should be returned:

Original property name Aliased property name

class className

readonly readOnly

Original property name	Aliased property name
class	className
readonly	readOnly

Note

These aliases provide the commonly used names for element properties.

Command Name	getElementAttribute
Parameters	"sessionId" {string} The key that identifies which session this request is for. "id" {string} The ID of the WebElement on which to operate. "name" {string} The name of the property of attribute to return.
Return Value	{string\|null} The value returned by the above algorithm, coerced to a nullable string, or null if no value is defined.
Errors	StaleElementException If the element is no longer attached to the DOM.

10.4 Rendering Text

All WebDriver implementations must support getting the visible text of a WebElement, with excess whitespace compressed.

The following definitions are used in this section:

Whitespace: Any text that matches the ECMAScript regular expression class \s.
Whitespace excluding non-breaking spaces: Any text that matches the ECMAScript regular expression [^\S\xa0]
Block level element: A block-level element is one which is not a table cell, and whose effective CSS display style is not in the set ['inline', 'inline-block', 'inline-table', 'none', 'table-cell', 'table-column', 'table-column-group']
Horizontal whitespace characters: Horizontal whitespace characters are defined by the ECMAScript regular expression [\x20\t\u2028\u2029].

The expected return value is roughly what a text-only browser such as Lynx would display. The algorithm for determining this text is as follows:

Let lines equal an empty array. Then:

For each child of node, at time of execution, in order:
1. Get whitespace, text-transform, and then, if child is:
  - a node which is not visible, do nothing
  - a [DOM4] text node let text equal the nodeValue property of child. Then:
    1. Remove any zero-width spaces (\u200b), form feeds (\f) or vertical tab feeds (\v) from text.
    2. Canonicalize any recognized single newline sequence in text to a single newline (greedily matching (\r\n|\r|\n) to a single \n)
    3. If the parent's effective CSS whitespace style is 'normal' or 'nowrap' replace each newline (\n) in text with a single space character (\x20). If the parent's effective CSS whitespace style is 'pre' or 'pre-wrap' replace each horizontal whitespace character with a non-breaking space character (\xa0). Otherwise replace each sequence of horizontal whitespace characters except non-breaking spaces (\xa0) with a single space character
    4. Apply the parent's effective CSS text-transform style as per the CSS 2.1 specification ([CSS21])
    5. If last(lines) ends with a space character and text starts with a space character, trim the first character of text.
    6. Append text to last(lines) in-place
  - an element which is visible. If the element is a:
    - BR element: Push '' to lines and continue
    - Block-level element and if last(lines) is not '', push '' to lines.
    And then recurse depth-first to step 1 with child set to the current element
  - If element is a TD element, or the effective CSS display style is 'table-cell', and last(lines) is not '', and last(lines) does not end with whitespace append a single space character to last(lines) [Note: Most innerText implementations append a \t here]
  - If element is a block-level element: push '' to lines
For each line in lines trim any leading and trailing whitespace excluding non-breaking space characters.
Let s be lines.join('\n')
Trim any leading and trailing whitespace excluding non-breaking space characters from s.
Replace any non-breaking spaces (\xa0) with spaces (\x20) in s.
Return s.

11. Executing Javascript

Note

Open questions: What happens if a user's JS triggers a modal dialog? Blocking seems like a reasonable idea, but there is an assumption that WebDriver is not threadsafe. What happens to unhandled JS errors? Caused by a user's JS? Caused by JS on a page? How does a user of the API obtain the list of errors? Is that list cleared upon read?

If a browser supports JavaScript and JavaScript is enabled, it must set the "javascriptEnabled" capability to true, and it must support the execution of arbitrary JavaScript.

Capability Name	Type
javascriptEnabled	boolean

11.1 Javascript Command Parameters

The Argument type is defined as being {(number|boolean|DOMString|WebElement|dictionary|Array.>Argument>)?}

interface JavascriptCommandParameters {
    readonly attribute DOMString  script;
    readonly attribute Argument[] args;
};

11.1.1 Attributes

args of type array of Argument, readonly: The parameters to the function defined by script.
script of type DOMString, readonly: The JavaScript to execute, in the form of a Function body.

When executing Javascript, it must be possible to reference the args parameter using the function's arguments object. The arguments must be in the same order as defined in args. Each WebDriver implementation must preprocess the values in args using the following algorithm:

For each index, index in args, if args[index] is...

a number, boolean, DOMString, or null, then let args[index] = args[index].
an array, then recursively apply this algorithm to args[index] and assign the result to args[index].
a dictionary, then recursively apply this algorithm to each value in args[index] and assign the result to args[index].
a WebElement, then:
1. If the element's ID does not represent a DOMElement, or it represents a DOMElement that is no longer attached to the document's tree, then the WebDriver implementation must immediately abort the command and return a StaleElementReference error.
2. Otherwise, let args[index] be the underlying DOMElement.
Otherwise WebDriver implementations may throw an UnknownError indicating the index of the unhandled parameter (TODO: Should a more specific error be thrown?) but should attempt to convert the value into a dictionary.

11.2 Synchronous Javascript Execution

Command Name	executeScript
Parameters	"sessionId" {string} The key that identifies which session this request is for. "script" {string} The script to execute. "args" {Array.<Argument>} The script arguments.
Return Value	{Argument} The value returned by the script, or null.
Errors	JavascriptError if the executing script threw an exception. StaleElementReferenceException if a WebElement referenced is no longer attached to the DOM. UnknownError if an argument or the return value is of an unhandled type.

When executing JavaScript, the WebDriver implementations must use the following algorithm:

Let window be the Window object for WebDriver's current command context.
Let script be the DOMString from the command's script parameter.
Let fn be the Function object created by executing new Function(script);
Let args be the JavaScript array created by the pre-processing algorithm defined above.
Invoke fn.apply(window, args);
If step #5 threw, then:
1. Let error be the thrown value.
2. Set the command's response status to JavascriptError.
3. Set the command's response value to a dictionary, dict.
4. If error is an Error, then set a "message" entry in dict whose value is the DOMString defined by error.message.
5. Otherwise, set a "message" entry in dict whose value is the DOMString representation of error.
Otherwise:
1. Let result be the value returned by the function in step #5.
2. Set the command's response status to Success.
3. Let value be the result of the following algorithm:
  1. If result is:
    1. undefined or null, return null.
    2. a number, boolean, or DOMString, return result.
    3. a DOMElement, then return the corresponding WebElement for that DOMElement.
    4. an array or NodeList, then return the result of recursively applying this algorithm to result.
    5. an object, then return the dictionary created by recursively applying this algorithm to each property in result.
4. Set the command's response value to value.
Return the command response.

11.3 Asynchronous Javascript Execution

Command Name	executeAsyncScript
Parameters	"sessionId" {string} The key that identifies which session this request is for. "script" {string} The script to execute. "args" {Array.<Argument>} The script arguments.
Return Value	{Argument} The value returned by the script, or null.
Errors	JavascriptError if the executing script threw an exception. StaleElementReferenceException if a WebElement referenced is no longer attached to the DOM. Timeout if the callback is not called within the time specified by the "script" timeout. UnknownError if an argument or the return value is of an unhandled type.

When executing asynchronous JavaScript, the WebDriver implementation must use the following algorithm:

Let timeout be the value of the last "script" timeout command, or 0 if no such commands have been received.
Let window be the Window object for WebDriver's current command context.
Let script be the DOMString from the command's script parameter.
Let fn be the Function object created by executing new Function(script);
Let args be the JavaScript array created by the pre-processing algorithm defined above.
Let callback be a Function object pushed to the end of args.
Register a one-shot timer on window set to fire timeout milliseconds in the future.
Invoke fn.apply(window, args);
If step #8 threw, then:
1. Let error be the thrown value.
2. Set the command's response status to JavascriptError.
3. Set the command's response value to a dictionary, dict.
4. If error is an Error, then set a "message" entry in dict whose value is the DOMString defined by error.message.
5. Otherwise, set a "message" entry in dict whose value is the DOMString representation of error.
Otherwise, the WebDriver implementation must wait for one of the following to occur:
1. if the timer from step #7 fires, the WebDriver implementation must immediately set the command response status to Timeout and return.
2. if the window fires an unload event, the WebDriver implementation must immediately set the command response status to JavascriptError and return with the error message set to "Javascript execution context no longer exists.".
3. if the callback function is invoked, then:
  1. Let result be the first argument passed to callback.
  2. Set the command's response status to Success.
  3. Let value be the result of the following algorithm:
    1. If result is:
      1. undefined or null, return null.
      2. a number, boolean, or DOMString, return result.
      3. a DOMElement, then return the corresponding WebElement for that DOMElement.
      4. an array or NodeList, then return the result of recursively applying this algorithm to result. WebDriver implementations should limit the recursion depth.
      5. an object, then return the dictionary created by recursively applying this algorithm to each property in result.
  4. Set the command's response value to value.
4. Return the command response.

11.4 Reporting Errors

12. Cookies

13. Timeouts

This section describes how timeouts and implicit waits are handled within WebDriver

The "timeouts" command is used to set the value of a timeout that a command can execute for.

Command Name	timeouts
Parameters	"sessionId" {string} The key that identifies which session this request is for. "type" {string} The type of operation to set the timeout for. Valid values are: "implicit", "page load", "script" "ms" - {number} The amount of time, in milliseconds, that time-limited commands are permitted to run.
Return Value	None
Errors	None

implicit - Set the amount of time the driver should wait when searching for elements. When searching for a single element, the driver should poll the page until an element is found or the timeout expires, whichever occurs first. When searching for multiple elements, the driver should poll the page until at least one element is found or the timeout expires, at which point it should return an empty list.
If this command is never sent, the driver must default to an implicit wait of 0ms.
page load -TODO(David, Simon) fill me in
script - Set the amount of time the driver should wait when scripts are loading on to the page.

14. User Input

There are two ways to interact with elements: directly or implicitly. The difference between the two is similar to the difference between "Do what I mean" vs "Do as I say": The commands for directly interacting with elements express explicit intention for the desired outcome. For this kind of interaction, the implementation of this specification should take additional steps to ensure the interaction would happen as the user intended (for example, by scrolling the element into the viewport). Implicit interaction differs by following the user's instructions without additional interpretation. Interaction would be with the currently active element, as defined by the browser. The implementation would maintain the current keyboard and mouse state in order to fulfill the user's instructions.

14.1 Interaction directly with elements

14.1.1 Interactable elements

Some user-input actions required the element to be interactable. The following conditions must be met for the element to be considered interactable:

The element must be visible, as defined in section 10.1.
The element must not be disabled:
- If the currently loaded document is HTML4, the element does not support the disabled attribute (according to the [HTML401] spec), or the disabled attribute is not set.
- If the currently loaded document is HTML5, the element is not disabled as defined in the [HTML5] spec.

14.1.2 Clicking

partial interface WebElement {
    void click ();
};

14.1.2.1 Methods

click

Clicks in the middle of the WebElement instance. The middle of the element is defined as the middle of the box returned by calling getBoundingClientRect on the underlying DOM Element, according to the [CSSOM-VIEW] spec. If the element is outside the viewport (according to the [CSS21] spec), the implementation should bring the element into view first. The implementation may invoke scrollIntoView on the underlying DOM Element. The element must be visible, as defined in section 10.1. See the note below for when the element is obscured by another element. Exceptions:

links (A elements): Clicking happens in the middle of the first bounding client rectangle. This is to overcome overflowing links where the middle of the bounding client rectangle does not actually fall on a clickable part of the link.
Select elements with without the multiple attribute set: Clicking on the select element must open up a selection menu. After clicking on an option, the selection menu must be closed. Clicking directly on an option element (without clicking on the select element previously) must open a selection menu, as if the select option was clicked first. Ultimately, after clicking clicking one of the options, the selection menu must be closed.

The possible errors for this command:

StaleElementReference if the given element is no longer attached to the DOM.
ElementNotVisible if the element is hidden and thus cannot be interacted with.
MoveTargetOutOfBounds if the element cannot be scrolled into view.

No parameters.

Return type: void

Note

As the goal is to emulate users as closely as possible, the implementation should not allow clicking on elements that are obscured by other elements. The implementation should try to scroll the element into view, but in case it is fully obscured, it should not be clickable.

14.1.3 Typing keys

A pre-requirement for keys-based interaction with an element is that it is interactable (as defined earlier in the section). Typing into an element could take place if one of the following conditions is met:

The element is focusable as defined in the editing section of the [HTML5] spec.
The element could be the activeElement. In addition to focusable elements, this allows typing to the body element.
In an HTML5 document, the element is editable as a result of having its contentEditable attribute set or the containing document is in designMode.

Prior to any keyboard interaction, the element should be focused if it does not currently have the focus. This is the case if one of the following holds:

The element is not document.activeElement
The owner document of the element to be interacted with is not the focused document.

In case focusing is needed, the implementation must follow the focusing steps as described in the focus management section of the [HTML5] spec. The focus must not leave the element at the end of the interaction, other than as a result of the interaction itself (i.e. when the tab key is sent).

partial interface WebElement {
    void clear ();
    void sendKeys (string[] keysToSend);
};

14.1.3.1 Methods

clear

Clears the value of the element.

No parameters.

Return type: void

sendKeys

Sends a sequence of keyboard events representing the keys in the keysToSend parameter.

Caret positioning: If focusing was needed, after following the focusing steps, the caret must be positioned at the end of the text currently in the element. At the end of the interaction, the caret must be positioned at the end of the typed text sequence, unless the keys sent position it otherwise (e.g. using the LEFT key).

There are four different types of keys that are emulated:

Character literals - lower-case symbols.
Uppercase letters and symbols requiring the SHIFT key for typing.
Modifier keys
Special keys

The rest of this section details the values used to represent the different keys, as well as the expected behaviour for each key type.

Parameter	Type	Nullable	Optional	Description
keysToSend	`string[]`	✘	✘

Return type: void

When emulating user input, the implementation must generate the same sequence of events that would have been produced if a real user was sitting in front of the keyboard and typing the sequence of characters. In cases where there is more than one way to type this sequence, the implementation must choose one of the valid ways. For example, typing AB may be achieved by:

Holding down the Shift key
Pressing the letter 'a'
Pressing the letter 'b'
Releasing the Shift key

Alternatively, it can be achieved by:

Holding down the Shift key
Pressing the letter 'a'
Releasing the Shift key
Holding down the Shift key
Pressing the letter 'b'
Releasing the Shift key

Or by simply turning on the CAPS LOCK first. Since all methods are valid, any of them can be used.

The implementation may use the following algorithm to generate the events. If the implementation is using a different algorithm, it must adhere to the requirements listed below.

For each key, key in keysToSend, do

If key is a lower-case symbol:
1. If the Shift key is not pressed:
  1. Generate a sequence of key-down, key-press and key-up events with key as the character to emulate
2. else (The Shift key is pressed)
  1. let uppercaseKey be the upper-case character matching key
  2. Generate a sequence of key-down, key-press and key-up events with uppercaseKey as the character to emulate
Else if key is an upper-case symbol:
1. If the Shift key is not pressed:
  1. Generate a key-down event of the Shift key.
  2. Generate a sequence of key-down, key-press and key-up events with key as the character to emulate
  3. Generate a key-up event of the Shift key.
2. else (The Shift key is pressed)
  1. Generate a sequence of key-down, key-press and key-up events with key as the character to emulate
Else if key represents a modifier key:
1. let modifier be the modifier key represented by key
2. If modifier is currently held down:
  1. Generate a key-up event of modifier
3. Else:
  1. Generate a key-down event of modifier
4. Maintain this key state and use it to modify the input until it is pressed again.
Else if key represents the NULL key:
1. Generate key-up events of all modifier keys currently held down.
2. All modifier keys are now assumed to be released.
Else if key represents a special key:
1. Translate key to the special key it represents
2. Generate a sequence of key-down, key-press and key-up events for the special key.

Once keyboard input is complete, an implicit NULL key is sent unless the final character is the NULL key.

Any implementation must comply with these requirements:

For uppercase letters and symbols that require the Shift key to be pressed, there are two options:
- A single Shift key-down event is generated before the entire sequence of uppercase letters.
- Before each such letter or symbol, a Shift key-down event is generated. After each letter or symbol, a Shift key-up event is generated.
A user-specified Shift press implies capitalization of all following characters.
If a user-specified Shift press precedes uppercase letters and symbols, a second Shift key-down event must not be generated. In that case, a Shift key-up event must not be generated implicitly by the implementation.
The NULL key releases all currently held down modifier keys.
The state of all modifier keys must be reset at the end of each sendKeys call and the appropriate key-up events generated

Character types

The keysToSend parameter contains a mix of printable characters and pressable keys that aren't text. Press-able keys that aren't text are stored in the Unicode PUA (Private Use Area) code points, 0xE000-0xF8FF. The following table describes the mapping between PUA and key:

Key	Code	Type
NULL	\uE000	NULL
CANCEL	\uE001	Special key
HELP	\uE002	Special key
BACK_SPACE	\uE003	Special key
TAB	\uE004	Special key
CLEAR	\uE005	Special key
RETURN	\uE006	Special key
ENTER	\uE007	Special key
SHIFT	\uE008	Modifier
LEFT_SHIFT	\uE008	Modifier
CONTROL	\uE009	Modifier
LEFT_CONTROL	\uE009	Modifier
ALT	\uE00A	Modifier
LEFT_ALT	\uE00A	Modifier
PAUSE	\uE00B	Special key
ESCAPE	\uE00C	Special key
SPACE	\uE00D	Special key
PAGE_UP	\uE00E	Special key
PAGE_DOWN	\uE00F	Special key
END	\uE010	Special key
HOME	\uE011	Special key
LEFT	\uE012	Special key
ARROW_LEFT	\uE012	Special key
UP	\uE013	Special key
ARROW_UP	\uE013	Special key
RIGHT	\uE014	Special key
ARROW_RIGHT	\uE014	Special key
DOWN	\uE015	Special key
ARROW_DOWN	\uE015	Special key
INSERT	\uE016	Special key
DELETE	\uE017	Special key
SEMICOLON	\uE018	Special key
EQUALS	\uE019	Special key
NUMPAD0	\uE01A	Special key
NUMPAD1	\uE01B	Special key
NUMPAD2	\uE01C	Special key
NUMPAD3	\uE01D	Special key
NUMPAD4	\uE01E	Special key
NUMPAD5	\uE01F	Special key
NUMPAD6	\uE020	Special key
NUMPAD7	\uE021	Special key
NUMPAD8	\uE022	Special key
NUMPAD9	\uE023	Special key
MULTIPLY	\uE024	Special key
ADD	\uE025	Special key
SEPARATOR	\uE026	Special key
SUBTRACT	\uE027	Special key
DECIMAL	\uE028	Special key
DIVIDE	\uE029	Special key
F1	\uE031	Special key
F2	\uE032	Special key
F3	\uE033	Special key
F4	\uE034	Special key
F5	\uE035	Special key
F6	\uE036	Special key
F7	\uE037	Special key
F8	\uE038	Special key
F9	\uE039	Special key
F10	\uE03A	Special key
F11	\uE03B	Special key
F12	\uE03C	Special key
META	\uE03D	Special key
COMMAND	\uE03D	Special key
ZENKAKU_HANKAKU	\uE040	Special key

The keys considered upper-case symbols are either defined by the current keyboard locale or are derived from the US 104 keys Windows keyboard layout, which are:

A - Z
!$^*()+{}:?|~@#%_\" & < >

When the user input is emulated natively (see note below), the implementation should use the current keyboard locale to determine which symbols are upper case. In all other cases, the implementation must use the US 104 key Windows keyboard layout to determine those symbols.

The state of the physical keyboard must not affect emulated user input.

Internationalized input

Non-latin symbols: TBD

Complex scripts using Input Method Editor (IME): TBD

Note

User input should be emulated natively: The input events should be indistinguishable from a user, behind a screen and a keyboard, interacting with the browser. For that purpose, it is highly recommended that input events not be generated at the DOM level. Instead, emulated input events should originate from the browser's own event queue, just like other user input. This is the order of preference for methods to emulate user input:

Injection into the browser's event queue.
Sending OS-specific input messages to the browser's window. This has the disadvantage of being OS-specific.
Use of Javascript to inject events at the DOM level.
Use of the accessibility API. The disadvantage of this method, in addition to being OS-specific, is that the browser's window must be focused. As a result, multiple tests cannot run in parallel on the same desktop session.

Note

These input methods could be used to interact with whe browser's chrome. However, the way to do so is not defined as it most likely to be implementation-specific.

14.2 High Level APIs: Clicking and Typing

This section deals with implicit interaction. In this kind of interaction, a user describes a series of input actions which the implementation should fulfill with little to no additional actions. This kind of interactions allow dragging and dropping or combining keyboard and mouse actions.

interface Actions {
    void keyDown (Keys theKey);
    void keyUp (Keys theKey);
    void sendKeys (string[] keysToSend);
    void moveToElement (WebElement toElement);
    void moveByOffset (int xOffset, int yOffset);
    void clickAndHold ();
    void release ();
    void click ();
    void doubleClick ();
    void contextClick ();
};

14.2.1 Methods

click

Single-clicks the left mouse button, at the current mouse location.

No parameters.

Return type: void

clickAndHold

Clicks, without releasing, the left mouse button, at the current mouse location.

No parameters.

Return type: void

contextClick

Performs a context-click at the current mouse location.

No parameters.

Return type: void

doubleClick

Double-clicks the left mouse button, at the current mouse location.

No parameters.

Return type: void

keyDown

Performs a modifier key press. Does not release the modifier key - it is kept pressed for subsequent interactions. The only valid values for theKey parameter ones defined as modifier keys in the previous section.

Parameter	Type	Nullable	Optional	Description
theKey	`Keys`	✘	✘

Return type: void

keyUp

Performs a modifier key release. The only valid values for theKey parameter ones defined as modifier keys in the previous section.

Parameter	Type	Nullable	Optional	Description
theKey	`Keys`	✘	✘

Return type: void

moveByOffset

Moves the mouse cursor from its current position by the given offset. The offset provided may be negative. If the end coordinates are outside the viewport, then the viewport should be scrolled to match.

Parameter	Type	Nullable	Optional	Description
xOffset	`int`	✘	✘
yOffset	`int`	✘	✘

Return type: void

moveToElement

Moves the mouse cursor to the middle of toElement, after toElement has been scrolled into view (if outside the viewport). The middle of the element is calculated using getBoundingClientRect

Parameter	Type	Nullable	Optional	Description
toElement	`WebElement`	✘	✘

Return type: void

release

Releases the previously-held left mouse button, at the current mouse location.

No parameters.

Return type: void

sendKeys

Sends a sequence of keys to the active element.

Parameter	Type	Nullable	Optional	Description
keysToSend	`string[]`	✘	✘

Return type: void

The following conditions must hold for implementation of this high-level API:

Modifier keys are never implicitly released, once pressed.
To release all currently-held modifier keys, Keys.NULL must be used.
The results of using this API to generate sequences of actions that are not logically possible (such as releasing a modifier key twice) are undefined.
The current mouse position must be persistent during the life of the browser. The interaction commands are not batched together, so one command should be able to rely on the mouse position set by another command.

Mouse movement and scrolling: Despite the implicit nature of this API, some mouse movement actions still cause implicit scrolling. The alternative, of providing an API to scroll the viewport and forcing the user to do so, would be inconvenient. When moving to an element, the implementation should use the same method of scrolling as is used for WebElement.click(). When moving by an offset, the implementation has a greater freedom to decide how much to scroll. In both cases, the implementation must not scroll if the target coordinates are already in the viewport.

Note

The methods described in this interface are a minimal set. An implementation may add additional helper methods for convenience. Such a method could be dragAndDrop(WebElement, WebElement) as a shorthand form of calling moveTo(WebElement), clickAndHold(), moveTo(WebElement), release()

14.3 Low Level APIs

TODO: Describe the commands for basic control of keyboard and mouse.

14.3.1 Mouse

14.3.2 Keyboard

14.3.2.1 IME

14.3.3 Touch

15. Modal Dialogs

This entire section should be considered non-normative

The remote end must fail fast if any command is received while a modal dialog is open, unless that command handles the dialog.

WebDriver

W3C Working Draft 10 July 2012

Abstract

Status of This Document

Table of Contents

1. Introduction

1.1 Intended Audience

1.2 Relationship of WebDriver API and Existing Specifications

1.3 Naming the Two Sides of the API

2. Commands and Responses

2.1 Command

2.1.1 Attributes

2.2 Response

2.2.1 Attributes

2.3 Processing Additional Fields on Commands and Responses

2.4 Error Codes

3. Browser Capabilities

3.1 Capabilities

3.1.1 Attributes

3.1.2 Methods

3.2 MutableCapabilities

3.2.1 Methods

4. Sessions

4.1 Creating a Session

4.1.1 Capability Names

4.1.1.1 Platform Names

4.1.2 Error Handling

4.1.3 Remote End Matching of Capabilities

5. Navigation

5.1 Page Load Strategies

5.2 Navigation Commands

5.2.1 Invalid SSL Certificates

5.3 Detecting When to Handle Commands

6. Controlling Windows

6.1 Defining "window" and "frame"

6.2 Window Handles

6.3 Iterating Over Windows

6.4 Closing Windows

6.5 Resizing and Positioning Windows

6.6 Scaling the Content of Windows

7. Where Commands Are Handled

7.1 Default Content

7.2 Switching Windows

7.3 Switching Frames

8. Running Without Window Focus

9. Elements

9.1 Attributes

9.2 Lists of WebElements

Attributes

Methods

9.3 Element Location Strategies

9.3.1 ARIA

9.3.2 CSS Selectors

9.3.3 ECMAScript

9.3.4 Element ID

9.3.5 Link Text

9.3.6 Partial Link Text

9.3.7 XPath

10. Reading Element State

10.1 Determining Visibility

10.2 Determining Whether a WebElement Is Selected

10.3 Reading Attributes and Properties

10.4 Rendering Text

11. Executing Javascript

11.1 Javascript Command Parameters

11.1.1 Attributes

11.2 Synchronous Javascript Execution

11.3 Asynchronous Javascript Execution

11.4 Reporting Errors

12. Cookies

13. Timeouts

14. User Input

14.1 Interaction directly with elements

14.1.1 Interactable elements

14.1.2 Clicking

14.1.2.1 Methods

14.1.3 Typing keys

14.1.3.1 Methods

Character types

Internationalized input