See also: IRC log
<ato> Excellent question!
<ato> https://www.w3.org/2002/03/RRSAgent
<ato> RRSAgent: draft
<ato> RRSAgent: listen
<ato> RRSAgent: Meeting: WebDriver F2F meeting July 2016
<ato> Meeting: WebDriver F2F meeting July 2016
<ato> RRSAgent: draft
<ato> RRSAgent: draft minutes
<ato> MikeSmith: Why would I not have access to https://www.w3.org/2016/07/13-webdriver-minutes.html,access?
<ato> RRSAgent: bookmark
<ato> RRSAgent: please make these logs world-visible
<ato> RRSAgent: start a new log
<ato> RRSAgent: do not start a new log
<JohnJansen> hey lukeis
<ato> RRSAgent: please draft the minutes
<ato> MikeSmith: nm, solved it!
<JohnJansen> JohnJansen: present+
<jgraham> RRSAgent: generate minutes
<AutomatedTester> chair AutomatedTester
<AutomatedTester> RRSAgent: generate minutes
<ato> https://www.w3.org/wiki/WebDriver/2016-July-F2F#Agenda
<ato> Scribe: ato
AutomatedTester: We are mostly done with the writing of the specification into a format that is more precise and actionable.
jgraham: Citation needed on that count.
AutomatedTester: But we’ve come a
long way.
... We have a client that can be used for testing, which allows
us to write tests for the specification.
... It can speak to the HTTPDs that the vendors are producing
and shipping.
... As far as I know no one are running the tests yet, but it
gives us a good starting point.
... Key parts that needs to be written, is the actions API and
that is one of the major topics for discussion today.
... Where are we unnecessarily divergent from the open source
project.
JohnJansen: The tests, you speak of the Web Platform Tests?
AutomatedTester: Yes.
jimevans: As far as this WG is concerned, they are the only tests that matter.
ato: [elaborates a bit on the new tests in WPT]
jgraham: There are some tests there.
ato: We test the protocol and main loop extensively. It’s test complete.
ClayMartin: I have a new agenda item. Where should I add it?
ato: https://sny.no/2016/05/wdspec
brrian, jimevans: https://github.com/w3c/web-platform-tests/pull/2752
jgraham: We should review the agenda.
jimevans: I think the most important thing to get done are the actions.
<jgraham> https://github.com/jgraham/webdriver-actions
RRSAgent: draft
RRSAgent: draft minutes
jgraham: There is some text in
the spec that I think is not right.
... There are many implementation that are incompatible, and
not conformant to the spec.
... I have written a draft of what I _think_ we were aiming
for.
JohnJansen: Is it a PR?
jgraham: It’s two text
documents
... (See link above)
<AutomatedTester> https://github.com/jgraham/webdriver-actions
<AutomatedTester> https://github.com/jgraham/webdriver-actions/commit/ae33aa579605ee215e8a0b3dc1b2182c3b6de074
ClayMartin: Can you mix and match per action item the input type?
jgraham: The idea is that each “track” can only represent one device type.
ClayMartin: So why does each action item also have a type?
jgraham: It’s a sub-type.
ClayMartin: Where do you specify the parallelisation?
jgraham: If all implementations were perfect they would run top-down and left-right.
brrian: Are the sources implicitly ordered?
jgraham: They are ordered by the natural ordering of the array.
samuong: Should it matter, the order?
jgraham: Yes.
samuong: Within the sequence, the order matters.
jgraham: For example, you can
compress this.
... There’s no real parallalism here.
... Mouse move is interesting.
... An open question.
... Let’s say you have a pointer that’s at point A, then you
want to move it to some element.
... It has to move along some path.
... Along that path there will be other elements, and it is not
a priori obvious what should happen here.
... The easiest thing to imagine is a mouse. A finger can
teleport.
... On the elements you hover over, you might or might not see
events.
... I think the implementaiton should calculate the path at the
start of the action, divide it into sub-points that are
probably implementation defined, and how many there should be I
don’t know; but we should give a hint. One per
requestanimationframe, for example.
samuong: What should a tick be?
jgraham: By default a tick
happens as fast as you can process it.
... It is possible in the API to specify a pause
duration.
... There should be a null device for specifying a pause so
that you can spread out a mouse move.
ato: But what if you don’t have a duration?
ClayMartin: Yeah, there should be a default duration.
jgraham: A minimum duration is
not a bad idea.
... If you haven’t specified a pause, maybe it should really
teleport.
ato: But it”s not something the user would do.
jimevans: But for the open source project users it doesn’t matter.
ClayMartin: If you do the
movement for everything, you could introduce
intermittents.
... It’s doing the shortest path possible, and the path could
trigger something that interrupts the sequence.
jimevans: Some content on the page could interfere with the movement.
JohnJansen: Didn’t we decide to defer some of this stuff?
jgraham: I think touch is actually conceptually easier.
JohnJansen: We don’t want to tell
the browsers what to fire.
... We want to describe what the user wants to do.
... And then you expect the browser to fire the right
events.
jgraham: There should a note saying “if you don’t follow these other specs, then you probably shouldn’t try to implement this”.
samuong: pointerDown, pointerUp,
pointerDown, pointerUp (double click), is the double click
event fired?
... What does "primary" mean here?
jgraham: It’s basically a shorthand way of saying “fire normal mouse events”.
samuong: There was also sub-type, wouldn’t that tell you?
jgraham: It tells you the type of
event.
... [explains how the pointer events spec works]
<AutomatedTester> https://w3c.github.io/pointerevents/#
JohnJansen: So we’re going to
have a normative reference to this spec.
... Can’t we just delegate to it?
jgraham: Yeah, except it’s very hand wavy.
samuong: We have people in Blink using this for pointer events testing: If we’re not feeding in user input, if we’re just specifying the events, that’s not what we want.
ato: WebDriver is adding
additional value here.
... mouseMove, keyboard layout/modifier keys
jimevans: There used to be a
mandate that you use a OS level input.
... But we’ve stripped that out.
... I think what we want to say, is that we want the browser
react as-if this input occurred. This implies that certain DOM
events get generated by the browser on certain elements.
... This is difficult, or if not impossible, to specify for the
reasons jgraham gave earlier.
JohnJansen: Double click could be controlled by the OS.
ato: But these are all emulated, virtual devices.
jgraham: If we implement at the
level of DOM events, then the advantages are that it’s
consistent across browsers.
... And that we actually know how to write that as a
spec.
... It has the disadvantage that if you literally implement the
spec, it gives you different behaviour.
samuong: Is anyone implementing this? In JS?
AutomatedTester: We do in
Marionette.
... But we generate trusted events, so it’s not content DOM
events.
ato: [explains about synthetic events in gecok]
samuong: We have something similar.
brrian: We generate an
appropriate platform event.
... We synthesise events, doing it more level has problems.
ato: FirefoxDriver tried native input too.
jimevans: IEDriver same.
jgraham: There might be a hand-wavy way of doing this. “Generate platform events that eventually causes the following DOM events to be generated”
jimevans: That’s what I meant.
ato: There are three levels here: The spec describes the expected output you should expect in DOM after performing the actions, all the different UAs have different input stacks so we can’t specify that. Instead we have a more general abstraction that describes a more general input approach to this.
samuong: At Google they wanted to test tab completion (?)
jgraham: I think there’s a tension between the features needed to test a browsers and testing a content page.
JohnJansen: I want to test the ability to create a new tab, as a browser vendor.
<lukeis> RRSAgent: draft minutes
samuong: As a browser vendor having it at the OS level is what they want.
jimevans: New tab/new window is something users want
ClayMartin: [explains how to do UI automation in edge]
jgraham: Why don’t we have a command to open a new window?
<AutomatedTester> scribe automatedtester
<AutomatedTester> ato: let me describe how we do things in Marionette
<AutomatedTester> ato: if you have right-click that will create a context menu
<AutomatedTester> ato: and then we have a command called set_context and switch to browser chrome
<AutomatedTester> ato: we use this context to test addons and Firefox UI testing. Update and localization testing.
<AutomatedTester> ato: we should be careful not put yourself into a state that webdriver cant return from
<AutomatedTester> JohnJansen: We have that with EdgeDriver and we want to addons/chrome
<AutomatedTester> ato: If we describe that stuff is in other specs but as long as the end state is what we expect.
<AutomatedTester> brrian: What if there was a browser flag for cross platform handling?
<AutomatedTester> jgraham: my thinking is for now, this is the algorithm with the event sthat should be generated but implementation may inject them at a higher level
<AutomatedTester> jgraham: we expect this to do the "hand waving" thing and try use pointer events.
<AutomatedTester> ato: [reads out http://w3c.github.io/webdriver/webdriver-spec.html#algorithms]
<AutomatedTester> jgraham: we should describe what we think it should do
<AutomatedTester> ClayMartin: there might be a interop bug
<AutomatedTester> jgraham: but then the interop bug is in a different specification and not in webdriver
<AutomatedTester> jgraham: there are reasons where we should describe what events we should do. [explains example with shift+key]
<AutomatedTester> jgraham: going back to mouse movement
<AutomatedTester> jgraham: an interesting implementation detail is how to handle pinch, we should asynchronously do each of the items on each finger interleaved
<AutomatedTester> ato: how would we describe the micromovements?
<AutomatedTester> jgraham: not sure what problems could come up as I havent written this.
<AutomatedTester> jimevans: for pointermove event specifically, is there any mileage in adding a duration for how a tick (not micromovement)
<AutomatedTester> jimevans: as far as specify, we will need to have a default value for duration
<AutomatedTester> jgraham: we will have a pause(0) which is requestAnimationFrame duration
<AutomatedTester> jimevans: we could add a duration to pointer move and this is where the duration has a meaningful impact
<AutomatedTester> jgraham: and this would be different to pause which is wait for something or pad
<AutomatedTester> ClayMartin: for a move what data would we know? can I see the action
<AutomatedTester> jgraham: The default of the pointer for the start coordinates is 0,0
<AutomatedTester> jgraham: how we do describe coordinates? [draws a box]
<AutomatedTester> jgraham: if you pass in x,y that will be x, y of the viewport
<AutomatedTester> jgraham: if you pass in an element it would be the centre of the visibile centre of the element
http://w3c.github.io/webdriver/webdriver-spec.html#dfn-pointer-interactable-element
<AutomatedTester> jgraham: if you pass in an element and x,y we should take the top/left of the element and move to x,y from that point
<AutomatedTester> jimevans: [explains how this is similar to the OSS project
<AutomatedTester> jgraham: should we do MoveBy or MoveTo?
<AutomatedTester> [group votes for MoveTo]
<AutomatedTester> RRSAgent: draft minutes
<AutomatedTester> scribe automatedtester
<AutomatedTester> jgraham: yesterday, Mozillians were discussing the following
<scribe> Scribe: AutomatedTester
<ato> AutomatedTester: (You need the colon.)
ahh: )
jgraham: we wondered if there should be a a scroll to an element command for actions
automatedtester: how can we help prevent footguns for users. If there are test bed that are 800x600 screens how can we prevent their tests from randomly breaking?
ato: takeElementScreenshot doesnt
screenshot to the element as requested by Microsoft in the
past
... part of me is uncertain that we have specialisation in
certain commands where we could have a generalisation on all
commands for scrolling
... if we have separate command for scrolling, what would we do
for normal commands. If we have a separate command we can see
actions are more of a pipeline for commands
<JohnJansen> microserf
ato: we can then, in a later version, see about batching other commands via the pipeline
jgraham: we move to a pipeline we would then need to work out a storage system
ato: we can use this to save on bandwidth between local and remote end points
automatedtester: back to scroll to element
ato: we should not hamper ourselves with our design if we wanted a pipeline later...
jgraham: scrolling is an interesting case, high level does scrolling implicitly
automatedtester: there is a end point from the OSS project called Location in view
JohnJansen: we dont necessarily want to just scroll the element into view, we might want to scroll X pixels
ato: we need to have the implicit scroll in high level
<JohnJansen> lukeis: yes, but I'd like to accomplish it without requiring any script execution.
jgraham: if I wanted a 2 finger touch, there is currently no way to make sure the elements are in view. I would need to run script and then do the actions
<ato> ClayMartin: Should we fire scroll event when moving element into view?
ClayMartin: Should we fire scroll event when moving element into view?
automatedtester: that is a different question. We need to see if we want the command before we seeing what events we want to do
ClayMartin: in touch we dont do mouse scroll
automatedtester: on those we are going to do flick type events
jimevans: I recognise the usefulness of an action for scrolling, is it something that we can add in L2?
jgraham: yes
JohnJansen: implicitly scroll?
jimevans: no, we defer to L2
yup: )
brrian: what would happen in interweaved actions and a scroll?
jgraham: [Described what could happen in that scenario]
ato: The document could do items to the page which could break things
samuong: in ChromeDriver we get
all the coords at the beginning of the action so scrolling
could cause issues
... should we get the coordinates at the beginning of the
tick?
jgraham: yes, you could have a pinch zoom that would move things
RESOLUTION: defer scroll to L2
JohnJansen: high level commands can not be described in low level commands
jgraham: after the tick finished, should we add an event to the event loop? (postMessage or setTimeout) or it waits for an animation frame?
brrian: I want it so that it will
yield to the event loop
... if its a timer or requestAnimationFrame...
jgraham: vertical items in the actions should be done as fast as possible and then a vsync for the next horizontal item
samuong: what about dialogs? We stop the event loop if the alert happens
ato: it might cycle for the
current vertical
... should we check for the dialogs at the beginning of each
action?
jgraham: suppose I have 2 key
press events, if the first 1 causes the alert we can't check
for a dialog because we dont know the 2nd item has been
processed
... because the event loop is paused and then you are...
samuong: if we are putting a lot
of checks on each check then it could cause issues with the
processing of the event loop
... if we have to return to the user we could take longer than
the tick was supposed to happen
RESOLUTION: Defer scroll to L2
RRSAgent: draft minutes
jgraham: [describing how if there is an alert appears we block the event loop]
ato: If I was implement it in marionette we have a user prompt service that we can use that is and then do the alert and check a global state
jgraham: you would need check
that state before moving on to the next tick and then dismiss
the dialog before moving to the next tick
... we inject [keydown, pointerdown] into the event loop. We
have to do something to say do the next tick e.g.
setTimeout
... if we do [keydown, pointerdown, setTimeout] and the
pointerdown causing the alert, we can't reach the
setTimeout
ato: [writes pseudo code on
whiteboard]
... for (let action of tick) {
event.sendSyntheticpointerdown(action)} yield
content.executeScript("window.setTimeout")
jgraham: but we can't reach the
executeScript
... the pointerDown will cause the timeout to never be
reached
... to deal with alerts, you would need to have an event (a non
content event that isnt blocked by the event loop)
... the difference between this and what is in the spec. We
can't always check in actions
ato: if I were implementing it, if this was in a thread I would check during it and then shutdown the thread abort processing the following steps
jimevans: if at any point of the
action sequence, we need to make sure we dont hang the
driver
... how it is handled is totally up to the person writing
webdriver code
jgraham: if the page injects something that does an alert it would be good to remove the current writing
ato: we need to have something that keeps track that the dialog has appeared
jimevans: bottom line, you either
expect the dialog or not
... I am clicking on a button that will create a alert or
not
RRSAgent: draft minutes
jgraham: we should have a basic bit of text how to handle alerts not on each command.
ato: no, we need it on all
commands
... there are special cases were we dont want to check for
alerts
RESOLUTION: for commands that spin the event loop, prompt handling should be invoked if an alert appears at any time
RRSAgent: draft minutes
samuong: Should we remove element references and only use X, Y coords in actions?
jimevans: we need to be wary of backwards compat
jgraham: I prefer the idea of checking the coords before the start of the tick
ato: there might be a race condition or the other actions have meant to move it.
JohnJansen: does this really matter?
jgraham: what is the keycode you
get when pass in a key? The following options are :
... Should we just hard code 104 keyboard and just use that
keyboard.
... use the keyboard attached, but if in a test farm there are
not always keyboards
... option 3: defaults to a US keyboard but the ability to
change
ato: we could add a new command, "setThisKeyboard"
jgraham: option 4: set a per session state for the keyboard
JohnJansen: for L1 we default to 104 US QWERTY keyboard
brrian: how would I input Japanese into a page
jgraham: that works, currently you send through unicode code points. We may not do the right keycode and they might be using IME
brrian: do we do it as a string?
jgraham: no, per unicode code point
brrian: not all japanese strings
are divisible to code points
... if you deliver it in some OSes like this, it might not
handle this properly. There are dead keys that only activate
when you press the next letter
... safaridriver uses graphine cluster boundaries
... I would like the spec to say to be split on this
boundaries
ato: what would you do with ü?
brrian: We would send it as 2 code points
jgraham: what happens in the DOM
for keycodes
... do you get 2 events or 1?
brrian: you get 1
[ato and jgraham discussing example of dead keys]
<ato> AutomatedTester: This key has passed on.
<ato> AutomatedTester: It is deceased.
<ato> AutomatedTester: It is no more.
<jgraham> http://unixpapa.com/js/testkey.html
the keyUp is registered for the umlaut but not the keypress
ato the keyUp is registered for the umlaut but not the keypress
<brrian> http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules
brrian: it would be better if the spec split on grapheme clusters
ato: I am unclear on the benefit here. Would the client split on grapheme cluster?
jgraham: no, you would send the
unicode string in a decomposed form. e.g. "u.
... you would send over 2 code points
ato: this is different to how everyone currently doing it
jgraham: I think we should investigate more and discuss again in Lisbon
<scribe> ACTION: brrian to come up more use cases for splitting [recorded in http://www.w3.org/2016/07/13-webdriver-minutes.html#action01]
jgraham: for actions, and maybe sendKeys, what should [shift, a ] do?
ato: it should be an A
... should this only happen on sendKEys or should it work with
Actions?
jgraham: there are 2
options
... 1) if you have modifier key pressed you get the next
letter
... 2) if you can't get a char with a modifier, e.g. modifier
is pressed by we release, do the other char, and then do the
modifier
JohnJansen: [does an example with caps lock]
jimevans: for sendKeys, and only
with shift modifier, you would send the string you wanted, and
it should be string as is
... if you send shift + 1 results in !. In sendKeys we
implicitly release the modifier
... [looks up some data in the OSS code base]
... the current OSS version does have sendKeys actions end
point
ato: what would happen in the current clients to handle this against the spec version
jimevans: most Selenium users are using A and not shift + a
ato: the language bindings might have to do a lot of extra work here
jimevans: that is fine, we are
doing a non-trivial amount of work here
... I am not worried about backwards compat here in actions
because it is "Do what I say". People will want to be combining
keys where in sendKeys they will send the result
RESOLUTION: for keyUp/keyDown actions we won't do implicit conversion of shift. e.g. for shift + 7 we do 7 with shift modifier set
<samuong> scribe: samuong
jgraham: want to have top-level
capabilities for non-feature-matrix capabilities
... current api is designed for source labs/google-style use
case with grid of test machines
... however for many uses cases, having desired and required
capabilities is confusing
... can we simplify this, can we standardize new session
data?
ato: intermediary could handle selecting a host from the pool with correct features, but driver doesn't need to worry about matrix selection
<simonstewart> “I have opinions"
ato: e.g. proxy settings doesn't make sense as a "desired" capability
<simonstewart> ato is correct
<simonstewart> Until then :)
<simonstewart> The “new session” data is what’s required to successfully set up the browser instance
<simonstewart> So the proxy is required
<simonstewart> Whether it’s honoured or not is a different thing entirely
<simonstewart> Which is why they’re “desired” capabailities
<simonstewart> (services such as Sauce Labs and Browser Stack may choose to ignore the setting, for example)
<simonstewart> My personal view?
<simonstewart> We need a minimum set of routing data (browser, OS, version numbers) for intermediary nodes
<simonstewart> And then each browser can figure out what it wants to do with the data
<simonstewart> We also need to support multiple “profiles” (to use Mozilla’s term) in the same new session request
ato: distinction between desired and required capabilities is a CI-level concept that might be out-of-scope for the spec
<simonstewart> I disagree
<simonstewart> “desired” == optional, and can be ignored
<simonstewart> “required” == must be set, or fail the new session
<simonstewart> Is there a facetime audio number I can call into?
<simonstewart> Or a Hangout?
<simonstewart> The difference between the two is within scope for the spec
<simonstewart> And also what the current drivers do
<simonstewart> So we’re just speccing existing behaviour
<simonstewart> Which is good, right?
<simonstewart> Whoever is scribing, please continue to do so
jgraham: is that we should discuss this at lisbon
<simonstewart> That sentence makes no sense?
<simonstewart> We should discuss “new session” in Lisbon?
yes
<simonstewart> Thanks :)
<JohnJansen> ok, let's kill this. we need to discuss in lisbon with Simon
<simonstewart> I can dial in, if there’s a number I can call over wifi without SIP
<simonstewart> Or skype. Nothing I have has skype in :)
ato: there was some discussion about having a capability that gets returned to clients to allow feature detection for w3c features
jimevans: simon's email lays out
the handshake
... this is consistnet with how current bindings do it
<simonstewart> Feature detection is definitely the way forward
ato: only c#, not node.js
<simonstewart> The JS mob got into a horrible mess with assuming certain capabilities based on version numbers
ato: marionette returns a capability with marionette=true
jimevans: this probalby isn't how it should be done
<simonstewart> I’m looking at Opera here in particular
<simonstewart> marionette saying it’s marionette is cool
<simonstewart> I’d expect browsers to report themselves
<simonstewart> Additional metadata is fine
jimevans: you can construct new session command that is valid for both dialects
<simonstewart> But saying “I’m level 1 compliant” is incredibly dangerous
jimevans: the response should
tell you what dialect to continue speaking
... this can be done by looking at the status field
<simonstewart> for reference: https://lists.w3.org/Archives/Public/public-browser-tools-testing/2016JulSep/0001.html
<simonstewart> Search that for “handshake"
<simonstewart> Hahaha :)
<ClayMartin> https://www.irccloud.com/pastebin/FnoqIzbO/s4b
<simonstewart> Hang on. I need to download skype
<ClayMartin> you should be able to just call the phone number
<ClayMartin> or go here
<ClayMartin> https://join.microsoft.com/meet/clmartin/IA7XCEF4
<ClayMartin> that will join you in the browser
<ClayMartin> Skype for Business != Skype so don't bother downloading that I believe
<simonstewart> I’m not allowed to install the plugin for all users, and it won’t install for just me
<simonstewart> Swithcing to Chrome
<brrian> :|
<simonstewart> I need to install the plugin as root for my local user account
<simonstewart> JohnJansen: I have a bug report for you
<JohnJansen> i was against this from the start
<JohnJansen> :-)
<brrian> I can verify
<simonstewart> “The organiser will let you in soon"
<simonstewart> Apparently
<simonstewart> :)
<JohnJansen> simonstewart: are you on mute?
<simonstewart> I’m speaking
<simonstewart> Can you see me?
<jgraham> simonstewart: We can hear you now
<simonstewart> Can you hear me as I speak?
<jgraham> simonstewart: No, so there is some terrible hack going on here
<simonstewart> Ok
<simonstewart> https://www.youtube.com/watch?v=htobTBlCvUU
simonstewart: let's discuss this now instead of in lisbon
johnjansen: conversation so far is that this is configuration we want to set that's not related to the browser we get back
simonstewart: if you've got
safari, proxy is set at os level, firefox is at browser level,
edge is os-level
... some things that are browser-specific, some things are
os-specific
... for every case, it's a case of "i want a session that fits
in these parameters"
... if local end requests (e.g.) IE on linux, remote end can
give back IE on windows
... response from new session command is the set of
capabilities you've got (not what you asked for)
... in open-soruce project, proxy has been omitted, since it's
hard for remote ends to sniff proxy settings
... but some tests might absolutely require certain proxies,
otherwise session is not useful
... need to find balance, e.g. can't serialize entire browser
profile and send back
jgraham: from my pov, makes sense
to talk about desired/required capabilities, in terms of keys
that you can have in new session command
... not clear that spec has to specify what intermediary nodes
shoudl do with that
... if nodes want to be compatible with each other, they should
have a separate shorter spec
<simonstewart> I am listening jgraham
<simonstewart> Keep going, please :)
jgraham: this is the only thing
that's specific to intermediary nodes
... current spec just says what capabilities are there, doesn't
say how to resolve between desired/required capabilities
<simonstewart> Give me a signal when I should start replying
jgraham: this is a legitimate
desire, but it's not useful to have a desired/required
distinction for configuration that gets sent to browsers
... don't want to haev to distinguish this in gecko driver
simonstewart: idea behind new
session command is that hte request is the allocation of the
resource
... originally, everything was a desired capability
... and then users would inspect those capabilities and fail
the test if the capabilities don't meet requirements
... it turned out that people believed that "desired" meant
"required", which is unfortunate
... so some googlers (jleyba) pushed for required
capabilities
... browser name is a good candidate for being in required
capabilities
... preferences, proxies, etc. are good candidates for being in
desired capabilities
... capabilities (local end -> remote end) is a list of
requests
jgraham: i agree this is what the
current system is, but this pushes complexity onto gecko driver
and other drivers
... e.g. gecko driver needs to have code to handle the
binary
simonstewart: this is simple, the binary is either there or not
jgraham: but we need different code depending on whether its desired or required
simonstewart: are there any other capabilities we can use as an example?
ato: we should only treat browser
name, version, platform specially
... according to spec, we need to create a "third" capabilities
object
simonstewart: binary is a browser-specific feature, it's common but not global (e.g. on android we need an android package)
jgraham: it shoudl be interoperable for each of those common cases
ato: chrome has chromeoptions, gecko driver has something similar, no reason why chrome and firefox couldn't share some of those keys
simonstewart: user could request "a browser on windows"
claymartin: drivers and browsers
ahve a one-to-one relationship
... selenium shoudl handle desired/required capabilities, i
don't get why servers should have to care about it
... why can't selenium handle the complexity, and only pass the
needed capabilities?
simonstewart: because existing
intermediate nodes would then need to track versions and
features
... intermediary nodes need a base set of capabilties to do
routing (whcih is already in spec)
... (brwoser name, version, os version)
... we also should have a way to specify ranges for
versions
... could do a translation in the intermediary node, but then
this would make them very complicated, and prone to
breakage
... this limits browser vendors' ability to innovate and
experiment
jgraham: i agree, but don't think
that browser-specific information should end up in
capabilities
... current system makes it hard to specify binary path
depending on os
simonstewart: that's true, we need to allow differentiation at the os-level
jgraham: current design is poor
simonstewart: we shouldn't
redesign this because there's a very large existing userbase,
and we don't want to cause unnecessary churn
... it's hard for users to do updates, they've spent a lot of
time and money to build webdriver tests
... we should keep changes to a minimum, although we can do
tweaks
... e.g. we should allow specifying multiple version
numbers
... intermediary nodes shouldn't need to care, they just need
to select a vm or host to run on
... then ie driver, chromedriver, etc. can have their own
config
jgraham: i'm not proposing we
change anything with routing
... change would only be in the clients
simonstewart: but there are many clients
jgraham: but they have to update
anyway, to be spec compliant
... my proposal is that we should have a set of keys that
browsers agree on, so that it's easier to implement remote
ends
... if it is necessary to have version-dependent fields (e.g.
set a certain preference for firefox on linux version X),
should be able to express that
simonstewart: but why can't this be separated into desired/required capabilities?
johnjansen: did we lose you?
<simonstewart> Can you still hear me?
<jgraham> No
<ato> simonstewart: I think the connection dropped.
<simonstewart> I was “removed from the meeting"
<brrian> simonstewart: skype became very cross and hung itself up
<simonstewart> Joining again
<simonstewart> Can you hear me?
<simonstewart> Of course, it always starts muted
<simonstewart> Can you hear me now?
jgraham: my point is that for
things that aren't about matrix-selection, desired vs required
requires extra code
... it's not clear because this isn't spec'd
simonstewart: spec should just say that if a required capability is not met, then it should fail
jgraham: i don't want to implement that
ato: existing drivers conflate desired and required capabilities
simonstewart: we should have it in the spec that new session should fail if a required capability is not met
jgraham: if i can't start the binary, and the binary is "desired", what should it do?
<ClayMartin> This is Sam speaking now
samuong: what if firefox driver gets a desired binary that points to a chrome binary
<ato> jimevans: (you are right)
simonstewart: if a driver doesn't know how to handle a desired capability, it's ok
jgraham: this pushes complexity in to hte drivers
simonstewart: the other way
pushes complexity into the clients
... we only need to specify browser name, version, etc. (to
allow routing)
and this should be specified
<ClayMartin> browser name, browser version, platform name, platform version
simonstewart: don't want to
require intermediary nodes to do processing, they should pass
through payloads without interpretation
... otherwise we need to specify how to transforms blobs of
data for keys that we don't know about
... blob of data from local end should make it to remote
end?
jgraham: i don't disagree, but we disagree about how that blob should be structured
jimevans: the stuff that an
intermediary node cares about is encapsulated in
desired/required capabilities
... the stuff that a terminal node cares about is also
encapsulated there, but i think james is saying it shouldn't
be
simonstewart: my position is the opposite - you either have somethign that is optional, or it's mandated
jgraham: we don't do this
now
... who implements this?
samuong: chromedriver doesn't
claymartin: i'd have to check
jimevans: ie driver doesn't
ato: distinction doesn't make sense - e.g. for the binary, either it runs or it doesn't
simonstewart: then it's not
required
... if it's required and you can't run the binary, it should
fail
jimevans: if path to binary is a
desired capability, and the path doesn't exist, should we have
a "reasonable default"?
... this would be inferred from the browser, version,
platform
... but james is saying he doesn't want to have to implement
that lgoic
brrian: we have use cases where
we test different safari binaries
... if we get a required binary and it isn't the right version,
it'll fail
... we don't use intermediate nodes, so we can't say "find me a
node that has this binary"
simonstewart: intermediary node
needs to look at browser name and platform, and spin up the
appropriate node
... it's up to the intermediary node to decide when a set of
required capabiliteis are overly specific
brrian: clarifying use case: no
intermediary node, want to test stable and beta version of
browser
... encode version number, binary for browser, in required
capabilities
simonstewart: so it's perfectly legitimate if the new session command fails if the binary doesn't exist
jgraham: that's how firefox works
too - it doesn't go and find a different binary to the one that
was desired
... does anyone care about this use case?
simonstewart: example use case is
when a grid auto-scales out to aws or another provider, and
binaries are not in standard locations
... local ends need to sniff capabilities, and make sure that
browser name and version meet the requirements, and fail if it
doesn't
... current open source implementation doens't work very well
if you need to sniff out proxy, for example
claymartin: edge driver does fail if required caps are not met, but will continue if desired caps aren't met
jgraham: if someone requires an extension in edge, and it fails, waht happens?
johnjansen: it would fail to create a session
automatedtester: if you pass in a binary location for edge (desired) and the location doesn't exist on endpoint, what happens?
simonstewart: if binary is desired but not present, then it falls back to other methods of finding the binary (e.g. browser name, browser version, platform name, platform version)
jimevans: or it could look in the registry for edge's location, or something
simonstewart: missing desires are not such a big deal
(simonstewart has an analogy about quitting work and hang gliding, and the nature of desire, but claims this isn't deeply philosophical)
jgraham: not sure this is conversation is productive
<brrian> samuong++
johnjansen: the other thing we wanted to talk about is handshakes, versioning
now we're talking about the handshake described in https://lists.w3.org/Archives/Public/public-browser-tools-testing/2016JulSep/0001.html
jimevans: in a successful session creation, open source protocol has an integer status
ato: so in your c# code, it should check for that field, right?
jimevans: yes
simonstewart: if you are w3c
compliant, you only send w3c responses, remote end should
respond appropriately
... from a spec point of view, we shouldn't have to care about
this, we just assume everyone's w3c-compliant
... there's enough info in the new session request and the
response to determine whether we're speaking the open source
dialect or the w3c protocol
jimevans: w3c doesn't send an integer status code in the response
ato: ok
johnjansen: we don't need to do anything
simonstewart: yes, we don't need to modify spec, only changes are in open source code
RRSAgent: please draft the minutes
RRSAgent: please track the action items
RRSAgent: please track action items
RRSAgent: stop
This is scribe.perl Revision: 1.144 of Date: 2015/11/17 08:39:34 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Found Scribe: ato Inferring ScribeNick: ato Found Scribe: AutomatedTester Inferring ScribeNick: AutomatedTester Found Scribe: samuong Inferring ScribeNick: samuong Scribes: ato, AutomatedTester, samuong ScribeNicks: ato, AutomatedTester, samuong Present: ato JimEvans JohnJansen ClayMartin jgraham brrian SamUong AutomatedTester Got date from IRC log name: 13 Jul 2016 Guessing minutes URL: http://www.w3.org/2016/07/13-webdriver-minutes.html People with action items: brrian WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option.[End of scribe.perl diagnostic output]