WebDriver F2F meeting July 2016 -- 13 Jul 2016

<ato> Excellent question!

<ato> https://www.w3.org/2002/03/RRSAgent

<ato> RRSAgent: draft

<ato> RRSAgent: listen

<ato> RRSAgent: Meeting: WebDriver F2F meeting July 2016

<ato> Meeting: WebDriver F2F meeting July 2016

<ato> RRSAgent: draft

<ato> RRSAgent: draft minutes

<ato> MikeSmith: Why would I not have access to https://www.w3.org/2016/07/13-webdriver-minutes.html,access?

<ato> RRSAgent: bookmark

<ato> RRSAgent: please make these logs world-visible

<ato> RRSAgent: start a new log

<ato> RRSAgent: do not start a new log

<JohnJansen> hey lukeis

<ato> RRSAgent: please draft the minutes

<ato> MikeSmith: nm, solved it!

<JohnJansen> JohnJansen: present+

<jgraham> RRSAgent: generate minutes

<AutomatedTester> chair AutomatedTester

<AutomatedTester> RRSAgent: generate minutes

<ato> https://www.w3.org/wiki/WebDriver/2016-July-F2F#Agenda

<ato> Scribe: ato

State of the union

AutomatedTester: We are mostly done with the writing of the specification into a format that is more precise and actionable.

jgraham: Citation needed on that count.

AutomatedTester: But we’ve come a long way.
... We have a client that can be used for testing, which allows us to write tests for the specification.
... It can speak to the HTTPDs that the vendors are producing and shipping.
... As far as I know no one are running the tests yet, but it gives us a good starting point.
... Key parts that needs to be written, is the actions API and that is one of the major topics for discussion today.
... Where are we unnecessarily divergent from the open source project.

JohnJansen: The tests, you speak of the Web Platform Tests?

AutomatedTester: Yes.

jimevans: As far as this WG is concerned, they are the only tests that matter.

ato: [elaborates a bit on the new tests in WPT]

jgraham: There are some tests there.

ato: We test the protocol and main loop extensively. It’s test complete.

ClayMartin: I have a new agenda item. Where should I add it?

ato: https://sny.no/2016/05/wdspec

brrian, jimevans: https://github.com/w3c/web-platform-tests/pull/2752

Agenda review

jgraham: We should review the agenda.

jimevans: I think the most important thing to get done are the actions.

Actions

<jgraham> https://github.com/jgraham/webdriver-actions

RRSAgent: draft

RRSAgent: draft minutes

jgraham: There is some text in the spec that I think is not right.
... There are many implementation that are incompatible, and not conformant to the spec.
... I have written a draft of what I _think_ we were aiming for.

JohnJansen: Is it a PR?

jgraham: It’s two text documents
... (See link above)

<AutomatedTester> https://github.com/jgraham/webdriver-actions

<AutomatedTester> https://github.com/jgraham/webdriver-actions/commit/ae33aa579605ee215e8a0b3dc1b2182c3b6de074

ClayMartin: Can you mix and match per action item the input type?

jgraham: The idea is that each “track” can only represent one device type.

ClayMartin: So why does each action item also have a type?

jgraham: It’s a sub-type.

ClayMartin: Where do you specify the parallelisation?

jgraham: If all implementations were perfect they would run top-down and left-right.

brrian: Are the sources implicitly ordered?

jgraham: They are ordered by the natural ordering of the array.

samuong: Should it matter, the order?

jgraham: Yes.

samuong: Within the sequence, the order matters.

jgraham: For example, you can compress this.
... There’s no real parallalism here.
... Mouse move is interesting.
... An open question.
... Let’s say you have a pointer that’s at point A, then you want to move it to some element.
... It has to move along some path.
... Along that path there will be other elements, and it is not a priori obvious what should happen here.
... The easiest thing to imagine is a mouse. A finger can teleport.
... On the elements you hover over, you might or might not see events.
... I think the implementaiton should calculate the path at the start of the action, divide it into sub-points that are probably implementation defined, and how many there should be I don’t know; but we should give a hint. One per requestanimationframe, for example.

samuong: What should a tick be?

jgraham: By default a tick happens as fast as you can process it.
... It is possible in the API to specify a pause duration.
... There should be a null device for specifying a pause so that you can spread out a mouse move.

ato: But what if you don’t have a duration?

ClayMartin: Yeah, there should be a default duration.

jgraham: A minimum duration is not a bad idea.
... If you haven’t specified a pause, maybe it should really teleport.

ato: But it”s not something the user would do.

jimevans: But for the open source project users it doesn’t matter.

ClayMartin: If you do the movement for everything, you could introduce intermittents.
... It’s doing the shortest path possible, and the path could trigger something that interrupts the sequence.

jimevans: Some content on the page could interfere with the movement.

JohnJansen: Didn’t we decide to defer some of this stuff?

jgraham: I think touch is actually conceptually easier.

JohnJansen: We don’t want to tell the browsers what to fire.
... We want to describe what the user wants to do.
... And then you expect the browser to fire the right events.

jgraham: There should a note saying “if you don’t follow these other specs, then you probably shouldn’t try to implement this”.

samuong: pointerDown, pointerUp, pointerDown, pointerUp (double click), is the double click event fired?
... What does "primary" mean here?

jgraham: It’s basically a shorthand way of saying “fire normal mouse events”.

samuong: There was also sub-type, wouldn’t that tell you?

jgraham: It tells you the type of event.
... [explains how the pointer events spec works]

<AutomatedTester> https://w3c.github.io/pointerevents/#

JohnJansen: So we’re going to have a normative reference to this spec.
... Can’t we just delegate to it?

jgraham: Yeah, except it’s very hand wavy.

samuong: We have people in Blink using this for pointer events testing: If we’re not feeding in user input, if we’re just specifying the events, that’s not what we want.

ato: WebDriver is adding additional value here.
... mouseMove, keyboard layout/modifier keys

jimevans: There used to be a mandate that you use a OS level input.
... But we’ve stripped that out.
... I think what we want to say, is that we want the browser react as-if this input occurred. This implies that certain DOM events get generated by the browser on certain elements.
... This is difficult, or if not impossible, to specify for the reasons jgraham gave earlier.

JohnJansen: Double click could be controlled by the OS.

ato: But these are all emulated, virtual devices.

jgraham: If we implement at the level of DOM events, then the advantages are that it’s consistent across browsers.
... And that we actually know how to write that as a spec.
... It has the disadvantage that if you literally implement the spec, it gives you different behaviour.

samuong: Is anyone implementing this? In JS?

AutomatedTester: We do in Marionette.
... But we generate trusted events, so it’s not content DOM events.

ato: [explains about synthetic events in gecok]

samuong: We have something similar.

brrian: We generate an appropriate platform event.
... We synthesise events, doing it more level has problems.

ato: FirefoxDriver tried native input too.

jimevans: IEDriver same.

jgraham: There might be a hand-wavy way of doing this. “Generate platform events that eventually causes the following DOM events to be generated”

jimevans: That’s what I meant.

ato: There are three levels here: The spec describes the expected output you should expect in DOM after performing the actions, all the different UAs have different input stacks so we can’t specify that. Instead we have a more general abstraction that describes a more general input approach to this.

samuong: At Google they wanted to test tab completion (?)

jgraham: I think there’s a tension between the features needed to test a browsers and testing a content page.

JohnJansen: I want to test the ability to create a new tab, as a browser vendor.

<lukeis> RRSAgent: draft minutes

samuong: As a browser vendor having it at the OS level is what they want.

jimevans: New tab/new window is something users want

ClayMartin: [explains how to do UI automation in edge]

jgraham: Why don’t we have a command to open a new window?

<AutomatedTester> scribe automatedtester

<AutomatedTester> ato: let me describe how we do things in Marionette

<AutomatedTester> ato: if you have right-click that will create a context menu

<AutomatedTester> ato: and then we have a command called set_context and switch to browser chrome

<AutomatedTester> ato: we use this context to test addons and Firefox UI testing. Update and localization testing.

<AutomatedTester> ato: we should be careful not put yourself into a state that webdriver cant return from

<AutomatedTester> JohnJansen: We have that with EdgeDriver and we want to addons/chrome

<AutomatedTester> ato: If we describe that stuff is in other specs but as long as the end state is what we expect.

<AutomatedTester> brrian: What if there was a browser flag for cross platform handling?

<AutomatedTester> jgraham: my thinking is for now, this is the algorithm with the event sthat should be generated but implementation may inject them at a higher level

<AutomatedTester> jgraham: we expect this to do the "hand waving" thing and try use pointer events.

<AutomatedTester> ato: [reads out http://w3c.github.io/webdriver/webdriver-spec.html#algorithms]

<AutomatedTester> jgraham: we should describe what we think it should do

<AutomatedTester> ClayMartin: there might be a interop bug

<AutomatedTester> jgraham: but then the interop bug is in a different specification and not in webdriver

<AutomatedTester> jgraham: there are reasons where we should describe what events we should do. [explains example with shift+key]

<AutomatedTester> jgraham: going back to mouse movement

<AutomatedTester> jgraham: an interesting implementation detail is how to handle pinch, we should asynchronously do each of the items on each finger interleaved

<AutomatedTester> ato: how would we describe the micromovements?

<AutomatedTester> jgraham: not sure what problems could come up as I havent written this.

<AutomatedTester> jimevans: for pointermove event specifically, is there any mileage in adding a duration for how a tick (not micromovement)

<AutomatedTester> jimevans: as far as specify, we will need to have a default value for duration

<AutomatedTester> jgraham: we will have a pause(0) which is requestAnimationFrame duration

<AutomatedTester> jimevans: we could add a duration to pointer move and this is where the duration has a meaningful impact

<AutomatedTester> jgraham: and this would be different to pause which is wait for something or pad

<AutomatedTester> ClayMartin: for a move what data would we know? can I see the action

<AutomatedTester> jgraham: The default of the pointer for the start coordinates is 0,0

<AutomatedTester> jgraham: how we do describe coordinates? [draws a box]

<AutomatedTester> jgraham: if you pass in x,y that will be x, y of the viewport

<AutomatedTester> jgraham: if you pass in an element it would be the centre of the visibile centre of the element

http://w3c.github.io/webdriver/webdriver-spec.html#dfn-pointer-interactable-element

<AutomatedTester> jgraham: if you pass in an element and x,y we should take the top/left of the element and move to x,y from that point

<AutomatedTester> jimevans: [explains how this is similar to the OSS project

<AutomatedTester> jgraham: should we do MoveBy or MoveTo?

<AutomatedTester> [group votes for MoveTo]

<AutomatedTester> RRSAgent: draft minutes

<AutomatedTester> scribe automatedtester

<AutomatedTester> jgraham: yesterday, Mozillians were discussing the following

<scribe> Scribe: AutomatedTester

<ato> AutomatedTester: (You need the colon.)

ahh: )

jgraham: we wondered if there should be a a scroll to an element command for actions

automatedtester: how can we help prevent footguns for users. If there are test bed that are 800x600 screens how can we prevent their tests from randomly breaking?

ato: takeElementScreenshot doesnt screenshot to the element as requested by Microsoft in the past
... part of me is uncertain that we have specialisation in certain commands where we could have a generalisation on all commands for scrolling
... if we have separate command for scrolling, what would we do for normal commands. If we have a separate command we can see actions are more of a pipeline for commands

<JohnJansen> microserf

ato: we can then, in a later version, see about batching other commands via the pipeline

jgraham: we move to a pipeline we would then need to work out a storage system

ato: we can use this to save on bandwidth between local and remote end points

automatedtester: back to scroll to element

ato: we should not hamper ourselves with our design if we wanted a pipeline later...

jgraham: scrolling is an interesting case, high level does scrolling implicitly

automatedtester: there is a end point from the OSS project called Location in view

JohnJansen: we dont necessarily want to just scroll the element into view, we might want to scroll X pixels

ato: we need to have the implicit scroll in high level

<JohnJansen> lukeis: yes, but I'd like to accomplish it without requiring any script execution.

jgraham: if I wanted a 2 finger touch, there is currently no way to make sure the elements are in view. I would need to run script and then do the actions

<ato> ClayMartin: Should we fire scroll event when moving element into view?

ClayMartin: Should we fire scroll event when moving element into view?

automatedtester: that is a different question. We need to see if we want the command before we seeing what events we want to do

ClayMartin: in touch we dont do mouse scroll

automatedtester: on those we are going to do flick type events

jimevans: I recognise the usefulness of an action for scrolling, is it something that we can add in L2?

jgraham: yes

JohnJansen: implicitly scroll?

jimevans: no, we defer to L2

yup: )

brrian: what would happen in interweaved actions and a scroll?

jgraham: [Described what could happen in that scenario]

ato: The document could do items to the page which could break things

samuong: in ChromeDriver we get all the coords at the beginning of the action so scrolling could cause issues
... should we get the coordinates at the beginning of the tick?

jgraham: yes, you could have a pinch zoom that would move things

RESOLUTION: defer scroll to L2

JohnJansen: high level commands can not be described in low level commands

jgraham: after the tick finished, should we add an event to the event loop? (postMessage or setTimeout) or it waits for an animation frame?

brrian: I want it so that it will yield to the event loop
... if its a timer or requestAnimationFrame...

jgraham: vertical items in the actions should be done as fast as possible and then a vsync for the next horizontal item

samuong: what about dialogs? We stop the event loop if the alert happens

ato: it might cycle for the current vertical
... should we check for the dialogs at the beginning of each action?

jgraham: suppose I have 2 key press events, if the first 1 causes the alert we can't check for a dialog because we dont know the 2nd item has been processed
... because the event loop is paused and then you are...

samuong: if we are putting a lot of checks on each check then it could cause issues with the processing of the event loop
... if we have to return to the user we could take longer than the tick was supposed to happen

RESOLUTION: Defer scroll to L2

RRSAgent: draft minutes

jgraham: [describing how if there is an alert appears we block the event loop]

ato: If I was implement it in marionette we have a user prompt service that we can use that is and then do the alert and check a global state

jgraham: you would need check that state before moving on to the next tick and then dismiss the dialog before moving to the next tick
... we inject [keydown, pointerdown] into the event loop. We have to do something to say do the next tick e.g. setTimeout
... if we do [keydown, pointerdown, setTimeout] and the pointerdown causing the alert, we can't reach the setTimeout

ato: [writes pseudo code on whiteboard]
... for (let action of tick) { event.sendSyntheticpointerdown(action)} yield content.executeScript("window.setTimeout")

jgraham: but we can't reach the executeScript
... the pointerDown will cause the timeout to never be reached
... to deal with alerts, you would need to have an event (a non content event that isnt blocked by the event loop)
... the difference between this and what is in the spec. We can't always check in actions

ato: if I were implementing it, if this was in a thread I would check during it and then shutdown the thread abort processing the following steps

jimevans: if at any point of the action sequence, we need to make sure we dont hang the driver
... how it is handled is totally up to the person writing webdriver code

jgraham: if the page injects something that does an alert it would be good to remove the current writing

ato: we need to have something that keeps track that the dialog has appeared

jimevans: bottom line, you either expect the dialog or not
... I am clicking on a button that will create a alert or not

RRSAgent: draft minutes

jgraham: we should have a basic bit of text how to handle alerts not on each command.

ato: no, we need it on all commands
... there are special cases were we dont want to check for alerts

RESOLUTION: for commands that spin the event loop, prompt handling should be invoked if an alert appears at any time

RRSAgent: draft minutes

samuong: Should we remove element references and only use X, Y coords in actions?

jimevans: we need to be wary of backwards compat

jgraham: I prefer the idea of checking the coords before the start of the tick

ato: there might be a race condition or the other actions have meant to move it.

keyboards

JohnJansen: does this really matter?

jgraham: what is the keycode you get when pass in a key? The following options are :
... Should we just hard code 104 keyboard and just use that keyboard.
... use the keyboard attached, but if in a test farm there are not always keyboards
... option 3: defaults to a US keyboard but the ability to change

ato: we could add a new command, "setThisKeyboard"

jgraham: option 4: set a per session state for the keyboard

JohnJansen: for L1 we default to 104 US QWERTY keyboard

brrian: how would I input Japanese into a page

jgraham: that works, currently you send through unicode code points. We may not do the right keycode and they might be using IME

brrian: do we do it as a string?

jgraham: no, per unicode code point

brrian: not all japanese strings are divisible to code points
... if you deliver it in some OSes like this, it might not handle this properly. There are dead keys that only activate when you press the next letter
... safaridriver uses graphine cluster boundaries
... I would like the spec to say to be split on this boundaries

ato: what would you do with ü?

brrian: We would send it as 2 code points

jgraham: what happens in the DOM for keycodes
... do you get 2 events or 1?

brrian: you get 1

[ato and jgraham discussing example of dead keys]

<ato> AutomatedTester: This key has passed on.

<ato> AutomatedTester: It is deceased.

<ato> AutomatedTester: It is no more.

<jgraham> http://unixpapa.com/js/testkey.html

the keyUp is registered for the umlaut but not the keypress

ato the keyUp is registered for the umlaut but not the keypress

<brrian> http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules

brrian: it would be better if the spec split on grapheme clusters

ato: I am unclear on the benefit here. Would the client split on grapheme cluster?

jgraham: no, you would send the unicode string in a decomposed form. e.g. "u.
... you would send over 2 code points

ato: this is different to how everyone currently doing it

jgraham: I think we should investigate more and discuss again in Lisbon

<scribe> ACTION: brrian to come up more use cases for splitting [recorded in http://www.w3.org/2016/07/13-webdriver-minutes.html#action01]

jgraham: for actions, and maybe sendKeys, what should [shift, a ] do?

ato: it should be an A
... should this only happen on sendKEys or should it work with Actions?

jgraham: there are 2 options
... 1) if you have modifier key pressed you get the next letter
... 2) if you can't get a char with a modifier, e.g. modifier is pressed by we release, do the other char, and then do the modifier

JohnJansen: [does an example with caps lock]

jimevans: for sendKeys, and only with shift modifier, you would send the string you wanted, and it should be string as is
... if you send shift + 1 results in !. In sendKeys we implicitly release the modifier
... [looks up some data in the OSS code base]
... the current OSS version does have sendKeys actions end point

ato: what would happen in the current clients to handle this against the spec version

jimevans: most Selenium users are using A and not shift + a

ato: the language bindings might have to do a lot of extra work here

jimevans: that is fine, we are doing a non-trivial amount of work here
... I am not worried about backwards compat here in actions because it is "Do what I say". People will want to be combining keys where in sendKeys they will send the result

RESOLUTION: for keyUp/keyDown actions we won't do implicit conversion of shift. e.g. for shift + 7 we do 7 with shift modifier set

<samuong> scribe: samuong

new session

jgraham: want to have top-level capabilities for non-feature-matrix capabilities
... current api is designed for source labs/google-style use case with grid of test machines
... however for many uses cases, having desired and required capabilities is confusing
... can we simplify this, can we standardize new session data?

ato: intermediary could handle selecting a host from the pool with correct features, but driver doesn't need to worry about matrix selection

<simonstewart> “I have opinions"

ato: e.g. proxy settings doesn't make sense as a "desired" capability

<simonstewart> ato is correct

<simonstewart> Until then :)

<simonstewart> The “new session” data is what’s required to successfully set up the browser instance

<simonstewart> So the proxy is required

<simonstewart> Whether it’s honoured or not is a different thing entirely

<simonstewart> Which is why they’re “desired” capabailities

<simonstewart> (services such as Sauce Labs and Browser Stack may choose to ignore the setting, for example)

<simonstewart> My personal view?

<simonstewart> We need a minimum set of routing data (browser, OS, version numbers) for intermediary nodes

<simonstewart> And then each browser can figure out what it wants to do with the data

<simonstewart> We also need to support multiple “profiles” (to use Mozilla’s term) in the same new session request

ato: distinction between desired and required capabilities is a CI-level concept that might be out-of-scope for the spec

<simonstewart> I disagree

<simonstewart> “desired” == optional, and can be ignored

<simonstewart> “required” == must be set, or fail the new session

<simonstewart> Is there a facetime audio number I can call into?

<simonstewart> Or a Hangout?

<simonstewart> The difference between the two is within scope for the spec

<simonstewart> And also what the current drivers do

<simonstewart> So we’re just speccing existing behaviour

<simonstewart> Which is good, right?

<simonstewart> Whoever is scribing, please continue to do so

jgraham: is that we should discuss this at lisbon

<simonstewart> That sentence makes no sense?

<simonstewart> We should discuss “new session” in Lisbon?

yes

<simonstewart> Thanks :)

<JohnJansen> ok, let's kill this. we need to discuss in lisbon with Simon

<simonstewart> I can dial in, if there’s a number I can call over wifi without SIP

<simonstewart> Or skype. Nothing I have has skype in :)

ato: there was some discussion about having a capability that gets returned to clients to allow feature detection for w3c features

jimevans: simon's email lays out the handshake
... this is consistnet with how current bindings do it

<simonstewart> Feature detection is definitely the way forward

ato: only c#, not node.js

<simonstewart> The JS mob got into a horrible mess with assuming certain capabilities based on version numbers

ato: marionette returns a capability with marionette=true

jimevans: this probalby isn't how it should be done

<simonstewart> I’m looking at Opera here in particular

<simonstewart> marionette saying it’s marionette is cool

<simonstewart> I’d expect browsers to report themselves

<simonstewart> Additional metadata is fine

jimevans: you can construct new session command that is valid for both dialects

<simonstewart> But saying “I’m level 1 compliant” is incredibly dangerous

jimevans: the response should tell you what dialect to continue speaking
... this can be done by looking at the status field

<simonstewart> for reference: https://lists.w3.org/Archives/Public/public-browser-tools-testing/2016JulSep/0001.html

<simonstewart> Search that for “handshake"

<simonstewart> Hahaha :)

<ClayMartin> https://www.irccloud.com/pastebin/FnoqIzbO/s4b

<simonstewart> Hang on. I need to download skype

<ClayMartin> you should be able to just call the phone number

<ClayMartin> or go here

<ClayMartin> https://join.microsoft.com/meet/clmartin/IA7XCEF4

<ClayMartin> that will join you in the browser

<ClayMartin> Skype for Business != Skype so don't bother downloading that I believe

<simonstewart> I’m not allowed to install the plugin for all users, and it won’t install for just me

<simonstewart> Swithcing to Chrome

<brrian> :|

<simonstewart> I need to install the plugin as root for my local user account

<simonstewart> JohnJansen: I have a bug report for you

<JohnJansen> i was against this from the start

<JohnJansen> :-)

<brrian> I can verify

<simonstewart> “The organiser will let you in soon"

<simonstewart> Apparently

<simonstewart> :)

<JohnJansen> simonstewart: are you on mute?

<simonstewart> I’m speaking

<simonstewart> Can you see me?

<jgraham> simonstewart: We can hear you now

<simonstewart> Can you hear me as I speak?

<jgraham> simonstewart: No, so there is some terrible hack going on here

<simonstewart> Ok

<simonstewart> https://www.youtube.com/watch?v=htobTBlCvUU

simonstewart: let's discuss this now instead of in lisbon

johnjansen: conversation so far is that this is configuration we want to set that's not related to the browser we get back

simonstewart: if you've got safari, proxy is set at os level, firefox is at browser level, edge is os-level
... some things that are browser-specific, some things are os-specific
... for every case, it's a case of "i want a session that fits in these parameters"
... if local end requests (e.g.) IE on linux, remote end can give back IE on windows
... response from new session command is the set of capabilities you've got (not what you asked for)
... in open-soruce project, proxy has been omitted, since it's hard for remote ends to sniff proxy settings
... but some tests might absolutely require certain proxies, otherwise session is not useful
... need to find balance, e.g. can't serialize entire browser profile and send back

jgraham: from my pov, makes sense to talk about desired/required capabilities, in terms of keys that you can have in new session command
... not clear that spec has to specify what intermediary nodes shoudl do with that
... if nodes want to be compatible with each other, they should have a separate shorter spec

<simonstewart> I am listening jgraham

<simonstewart> Keep going, please :)

jgraham: this is the only thing that's specific to intermediary nodes
... current spec just says what capabilities are there, doesn't say how to resolve between desired/required capabilities

<simonstewart> Give me a signal when I should start replying

jgraham: this is a legitimate desire, but it's not useful to have a desired/required distinction for configuration that gets sent to browsers
... don't want to haev to distinguish this in gecko driver

simonstewart: idea behind new session command is that hte request is the allocation of the resource
... originally, everything was a desired capability
... and then users would inspect those capabilities and fail the test if the capabilities don't meet requirements
... it turned out that people believed that "desired" meant "required", which is unfortunate
... so some googlers (jleyba) pushed for required capabilities
... browser name is a good candidate for being in required capabilities
... preferences, proxies, etc. are good candidates for being in desired capabilities
... capabilities (local end -> remote end) is a list of requests

jgraham: i agree this is what the current system is, but this pushes complexity onto gecko driver and other drivers
... e.g. gecko driver needs to have code to handle the binary

simonstewart: this is simple, the binary is either there or not

jgraham: but we need different code depending on whether its desired or required

simonstewart: are there any other capabilities we can use as an example?

ato: we should only treat browser name, version, platform specially
... according to spec, we need to create a "third" capabilities object

simonstewart: binary is a browser-specific feature, it's common but not global (e.g. on android we need an android package)

jgraham: it shoudl be interoperable for each of those common cases

ato: chrome has chromeoptions, gecko driver has something similar, no reason why chrome and firefox couldn't share some of those keys

simonstewart: user could request "a browser on windows"

claymartin: drivers and browsers ahve a one-to-one relationship
... selenium shoudl handle desired/required capabilities, i don't get why servers should have to care about it
... why can't selenium handle the complexity, and only pass the needed capabilities?

simonstewart: because existing intermediate nodes would then need to track versions and features
... intermediary nodes need a base set of capabilties to do routing (whcih is already in spec)
... (brwoser name, version, os version)
... we also should have a way to specify ranges for versions
... could do a translation in the intermediary node, but then this would make them very complicated, and prone to breakage
... this limits browser vendors' ability to innovate and experiment

jgraham: i agree, but don't think that browser-specific information should end up in capabilities
... current system makes it hard to specify binary path depending on os

simonstewart: that's true, we need to allow differentiation at the os-level

jgraham: current design is poor

simonstewart: we shouldn't redesign this because there's a very large existing userbase, and we don't want to cause unnecessary churn
... it's hard for users to do updates, they've spent a lot of time and money to build webdriver tests
... we should keep changes to a minimum, although we can do tweaks
... e.g. we should allow specifying multiple version numbers
... intermediary nodes shouldn't need to care, they just need to select a vm or host to run on
... then ie driver, chromedriver, etc. can have their own config

jgraham: i'm not proposing we change anything with routing
... change would only be in the clients

simonstewart: but there are many clients

jgraham: but they have to update anyway, to be spec compliant
... my proposal is that we should have a set of keys that browsers agree on, so that it's easier to implement remote ends
... if it is necessary to have version-dependent fields (e.g. set a certain preference for firefox on linux version X), should be able to express that

simonstewart: but why can't this be separated into desired/required capabilities?

johnjansen: did we lose you?

<simonstewart> Can you still hear me?

<jgraham> No

<ato> simonstewart: I think the connection dropped.

<simonstewart> I was “removed from the meeting"

<brrian> simonstewart: skype became very cross and hung itself up

<simonstewart> Joining again

<simonstewart> Can you hear me?

<simonstewart> Of course, it always starts muted

<simonstewart> Can you hear me now?

jgraham: my point is that for things that aren't about matrix-selection, desired vs required requires extra code
... it's not clear because this isn't spec'd

simonstewart: spec should just say that if a required capability is not met, then it should fail

jgraham: i don't want to implement that

ato: existing drivers conflate desired and required capabilities

simonstewart: we should have it in the spec that new session should fail if a required capability is not met

jgraham: if i can't start the binary, and the binary is "desired", what should it do?

<ClayMartin> This is Sam speaking now

samuong: what if firefox driver gets a desired binary that points to a chrome binary

<ato> jimevans: (you are right)

simonstewart: if a driver doesn't know how to handle a desired capability, it's ok

jgraham: this pushes complexity in to hte drivers

simonstewart: the other way pushes complexity into the clients
... we only need to specify browser name, version, etc. (to allow routing)

and this should be specified

<ClayMartin> browser name, browser version, platform name, platform version

simonstewart: don't want to require intermediary nodes to do processing, they should pass through payloads without interpretation
... otherwise we need to specify how to transforms blobs of data for keys that we don't know about
... blob of data from local end should make it to remote end?

jgraham: i don't disagree, but we disagree about how that blob should be structured

jimevans: the stuff that an intermediary node cares about is encapsulated in desired/required capabilities
... the stuff that a terminal node cares about is also encapsulated there, but i think james is saying it shouldn't be

simonstewart: my position is the opposite - you either have somethign that is optional, or it's mandated

jgraham: we don't do this now
... who implements this?

samuong: chromedriver doesn't

claymartin: i'd have to check

jimevans: ie driver doesn't

ato: distinction doesn't make sense - e.g. for the binary, either it runs or it doesn't

simonstewart: then it's not required
... if it's required and you can't run the binary, it should fail

jimevans: if path to binary is a desired capability, and the path doesn't exist, should we have a "reasonable default"?
... this would be inferred from the browser, version, platform
... but james is saying he doesn't want to have to implement that lgoic

brrian: we have use cases where we test different safari binaries
... if we get a required binary and it isn't the right version, it'll fail
... we don't use intermediate nodes, so we can't say "find me a node that has this binary"

simonstewart: intermediary node needs to look at browser name and platform, and spin up the appropriate node
... it's up to the intermediary node to decide when a set of required capabiliteis are overly specific

brrian: clarifying use case: no intermediary node, want to test stable and beta version of browser
... encode version number, binary for browser, in required capabilities

simonstewart: so it's perfectly legitimate if the new session command fails if the binary doesn't exist

jgraham: that's how firefox works too - it doesn't go and find a different binary to the one that was desired
... does anyone care about this use case?

simonstewart: example use case is when a grid auto-scales out to aws or another provider, and binaries are not in standard locations
... local ends need to sniff capabilities, and make sure that browser name and version meet the requirements, and fail if it doesn't
... current open source implementation doens't work very well if you need to sniff out proxy, for example

claymartin: edge driver does fail if required caps are not met, but will continue if desired caps aren't met

jgraham: if someone requires an extension in edge, and it fails, waht happens?

johnjansen: it would fail to create a session

automatedtester: if you pass in a binary location for edge (desired) and the location doesn't exist on endpoint, what happens?

simonstewart: if binary is desired but not present, then it falls back to other methods of finding the binary (e.g. browser name, browser version, platform name, platform version)

jimevans: or it could look in the registry for edge's location, or something

simonstewart: missing desires are not such a big deal

(simonstewart has an analogy about quitting work and hang gliding, and the nature of desire, but claims this isn't deeply philosophical)

jgraham: not sure this is conversation is productive

<brrian> samuong++

johnjansen: the other thing we wanted to talk about is handshakes, versioning

now we're talking about the handshake described in https://lists.w3.org/Archives/Public/public-browser-tools-testing/2016JulSep/0001.html

jimevans: in a successful session creation, open source protocol has an integer status

ato: so in your c# code, it should check for that field, right?

jimevans: yes

simonstewart: if you are w3c compliant, you only send w3c responses, remote end should respond appropriately
... from a spec point of view, we shouldn't have to care about this, we just assume everyone's w3c-compliant
... there's enough info in the new session request and the response to determine whether we're speaking the open source dialect or the w3c protocol

jimevans: w3c doesn't send an integer status code in the response

ato: ok

johnjansen: we don't need to do anything

simonstewart: yes, we don't need to modify spec, only changes are in open source code

RRSAgent: please draft the minutes

RRSAgent: please track the action items

RRSAgent: please track action items

RRSAgent: stop

- DRAFT -

WebDriver F2F meeting July 2016

13 Jul 2016

Attendees

Contents

State of the union

Agenda review

Actions

keyboards

new session

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output