WebRTC WG F2F Santa Clara - Day 1/2 -- 31 Oct 2011

Stefan: [starting meeting. Reviewing agenda]

IETF Architecture Overview

hta: The goal for RTCWeb is real-time communication between browsers
... arbitrarily define that as within ~100ms
... Trying to drive a design by use cases. Must have a design that meet the priority use cases.
... we want to design general purpose functions.
... one use case we're looking at is the interworking with legacy systems. We're fairly sure we want to make that work.

hta: relays must be possible otherwise we don't have a universal solution.
... <goes through the basic architecture in his slide deck>

hta: All components (except RTCWeb implementing browsers) must be assumed evil.
... Keep trust to a minimum
... Need to look at mechanisms for establishing trust from a web page to a browser.
... data congestion control must also be a priority.
... RTP exists. We will use it.
... encrypt everything

hta: considering DTLS-SRTP key negotiation for that purpose.
... UI issues are important to the overall security.
... always fun to agree on codecs
... connection management: least controversial proposal is ROAP
... We expect innovation in what-connects-to-what
... ROAP does allow us to interconnect to SIP and XMPP based systems
... lots of other pieces, media buffering, muting, game control.

hta: a lot of that needs to be done in the browser.

burn: ...caveated by keeping in mind that we want to allow innovation.

hta: W3C has an Audio group defining interfaces for accessing audio data.
... hopefully we'll be able to use that but we need to confirm that down the line.
... All of this is captured in draft-ietf-rtcweb-overview-02.txt

DanD: We know web is beyond browsers. We do have the ability to execute web apps in non-browser UAs.

DanD: We need to ensure that a browser endpoint can communicate with a non-browser endpoint.

hta: We need communication to devices that are not browsers.
... We should not lose track of the browser use cases first and foremost.
... One principle is that as long as the other side obeys the interface then it doesn't matter what it is.

DanD: Another comment RE: interdependencies with other groups. One example is on the discovery of the capabilities on other devices.
... this might be a missing piece in our discussions to date.

anant: There are some capabilities in the proposal to negotiate.

fluffy: If we figure out how the protocols work for interoperability then we might get this legacy interworking.

Use-cases and Requirements

Slides: Use Cases and Requirements (odp format)
Draft: Web Real-Time Communication Use-cases and Requirements IETF draft

stefan: <goes through some of the key use cases in his presentation>

hta: regarding the Distributed Music Band use cases. We're going to need really low latency. Concert-mode? We also need to distinguish between voice and music where we will remove noise from the former that is not suitable for the latter.

francois: Perhaps we should try to stick to something simple since the really low latency issue is a problem.

stefan: It's in the use cases document anyway so we can discuss further on that.
... In the document there are a list of use cases where the discussion has died out.
... or not concluded.
... such use cases relate to different situations, E911, Recording, Emergency access, Security Camera. Large multi-party session etc.
... these use cases could get added to the document if they get more support.
... draft-jesup. I think we should cover both unreliable and reliable data channels for WebRTC data.

stefan: draft-sipdoc. 4 requirements derived. I think this is covered by the current use cases document

<juberti> I agree, these data use cases should go into this doc.

<juberti> We only have one use case for data in the current doc.

stefan: draft-kaplan. Doesn't introduce new use cases but does put a lot more requirements on the document.
... Questions/comments on the use cases?

DanD: Observation: augmented reality is not covered.

<francois> Open issues on use cases and Req on WebRTC WG wiki

richt: we've been looking at that. We have the building blocks. Would be good to have a use case on this.

DanD: that's covered in some of these use cases but maybe something we could add

cullen: The ability to overlay a video stream on top of another would be good.

richt: you could do it with canvas

cullen: that has a big security implication.
... will talk about it later on.

DanD: plus video might come from an ad-serving service.

fluffy: Back to the 1-800-FEDEX use case. Anything we can provide to scope that out futher?

stefan: not my area of specialty so feedback on this use case would be good.

fluffy: The use cases puts emphasis on DTMF.

burn: I agree that DTMF is extremely important. We have to support DTMF.

stefan: let's take a break since we're waiting on next presenter.

Security requirements

Slides: Security requirements

ekr: IETF trying to work on thread models and security models. I don't think we're at the consensus level already, but here are the directions.
... [showing slides]

ekr: Funny state: Browser threat model, browser protects you. It includes the notion that you're in an Internet cafe. Basic security technique is isolation.
... Site A and site B sandboxed.
... Browser acts as a trusted base.
... IETF adds the Internet threat model: "you hand the packets to the attacker to deliver".
... In the IETF oriented view of the universe, cryptography is the main technique.
... We can't force people to use cryptography all the time.
... We need a solid protection under the browser threat model, and the best we can on the Internet threat model
... 3 main issues: 1) access to "local devices" (use my camera, microphone)
... 2) Communications security. If we do our job right, we won't have to worry too much about that here.
... 3) consent to communications, ties in with CORS, WebSockets
... Starting with access to local devices:
... If you go to visit a malicious, you have no idea where your video is going to. It can bug you. Somehow we need the user to consent, but it's not clear when, how many times.
... One thing I do want to mention is that people make a distinction between sending video to a site and sending video to another peer, but from a technical perspective, they are the same.
... Permissions models: we need short-term permissions, click on a button for an Amazon customer service. Not a long-term permission.
... Until last night, I thought we needed long-term permissions.
... Tim indicated that he was not sure browsers will want to do this.
... Do you want to support long-term permissions? That's a question for the group

burn: why isn't this just a browser policy question?

cullen: the question here is: is it a requirement for the group?

burn: went through it in another group. Informed user consent is needed but can take the form of downloading the browser.

ekr: Then, there's the notion of per-peer permissions.
... Another example of the short-term case, showing an example of an injected ad.
... [thoughts on UI for short-term permissions]
... This has implications for the API.
... user clicks and calls Ford, but he's on Slashdot
... Dialog showing video call. There needs to be a non-maskable indicator of call status so that you know you're still on the call. You need to be consistently aware that the call is going on.
... Access to microphone/camera linked with call permission.
... Back to the example, Slashdot might have to be able to say a word.
... [thoughts on UI for long-term permissions]
... Interface should be different. Possible: door hanger style UI. You want an action that is less easy for people to do during a call.
... There's a tension between convenience and security. It gives a lot of power to the site.
... That's an open question whether we want to support that or not.
... IETF has been assuming we want, so great feedback to have if we actually don't
... [thoughts on peer-identity based permissions]

<juberti> I think we want to find a way to handle this. We don't want the web platform to miss something that will be present in native app platforms.

cullen: what's important to you is where is that going. Our media is going to a different place than the Web site. The identity is important.

burn: same issue in the Speech XG.

hta: Usually, you can read the form and find the address in the form, but sometimes the address is constructed by the JavaScript.

ekr: Partial digression on network attackers. If I'm in an Internet Cafe, and an attacker manages to inject an Iframe, he can bug my computer, redirecting the call to him. The attacker controls the network on HTTP.
... Assumption is that it's safe to authorize PokerWeb and then surf the Internet. It's basically the same on your Wifi if not secure enough.
... An open question is: should this facility be available on HTTP at all? Mandate HTTPS?
... e.g. an HTTPS page that loads jQuery through HTTP

DanD: not all the devices have the ability to securely preserve a token. That would be a good way to solve the problem.

ekr: [thoughts on consent for real-time peer-to-peer communication]
... From a protocol point of view, we have ICE. Remember that you cannot trust the JS.

burn: the point is you disabled security completely

ekr: not entirely agree that it's the same thing
... Transaction ID needs to be hidden from the JavaScript
... When I surf to HTTP gmail, any attacker can inject the JavaScript and redirect calls for him.
... In the context of SIP, we're already addressed most of communications security issues.
... There's also protocol attack issue which hopefully should not be a real problem in the end.
... otherwise security issue.
... Assuming that ROAP style API is used, we're going to make it good to hide security settings from JavaScript.

AdamB: IDs might be owned by FaceBoox, and so on.

ekr: my view is: 3 basic scenarios. 1) Gmail to Gmail, Facebook to Facebook, etc. 2) Gmail to Facebook, etc. where you'll need federation of ID. 3) Identity separated from the service I use to make the call.
... I have some possible solutions for that. Happy to discuss.

Cullen: My position is a bit stronger. This group wants encrypted calls, but if you can't tell who the call is going to, that's useless.
... We need to take that into account.

hta: for many cases, I think it's quite ok to say that the call is encrypted to an identity and that this identity is verified by the fact that the guy I talk to presented himself.

cullen: I want to know the trust chain. If this call is being intercepted, I want to have some indication on that.

anant: slightly disagree with what Harald said.
... The federated use case.

burn: how do you know that things are going to the right person?

Anant: given that we have that use case in the document, we have to touch upon that issue.
... We want a completely peer-to-peer system in the end.

ekr: Is there a good way to bootstrap these systems? I think the answer is "yes".

Status and plans in the DAP WG

Stefan: wanted to know status of controlling camera and microphone.

robin: Hi. I'm chair of DAP. We need to figure out how we split the work on who does what.
... We haven't done a lot of work on Media Capture recently.
... One dividing line that could be useful: DAP could be picking up media capture very quickly, some interest from DAP side.
... We would do the simple thing that doesn't include streaming or any complex processing.
... Then hopefully this would be pluggable in what this group needs

burn: what do you mean without streams?

robin: you could not bind a video stream to some back channel, but you could do stuff such as video mail or recording.
... In the declarative style, most of it in the browser.

hta: main difference is who controls the UI.

Anant: If you're going to do programmatic access, important to agree on what they look like between groups. Another solution is you take care of declarative, and we handle programmatic way.

Anant: If you do programmatic way, we may end up with two APIs doing sensibly the same thing

robin: heard feedback that some people wanted to do simple things immediately.

anant: cannot "simple" be done with pure declarative approach?

robin: not really.

anant: something we've discussed in Mozilla. Media type in the input, such as video/mp4. The browser prompts user with camera view. Nice property that is avoids to deal with security issues in a nice way.

robin: it would be useful if you had a demo you could show in DAP. We're meeting Thursday/Friday.

Adrian: Microsoft just joined DAP. One of our interests is media capture. API based on what getUserMedia is doing. WebRTC could build on top of this API. This way, we could split the work easily.

Anant: does that mean that you have use cases that require programmatic APIs?

Adrian: yes, in general we want developers to build their own experience.

Cullen: how do you deal with permissions?

Adrian: same way as other APIs

Cullen: agree with short-term, long-term permissions presented here?

Adrian: need to check, but didn't look wrong.

richt: in Opera, we agree that many use cases require getUserMedia but we want to decouple that from peer-to-peer connectivity. So agree to split things up.

Anant: can two groups work on the same spec?

Adrian: liaison explicit in the charter of WebRTC. Feasible for DAP to own the spec and go through the liaison.

richt: Peer-to-peer relies on a stream. We give you a stream and you deal with it.

Cullen: that's a bit more complex than that, because of the hardware support for compression, and permissions too.
... It sounds DAP needs a permissions model as well and doesn't have one for the time being.
... We have all the permission problems that have to be enforced at the getUserMedia level.

richt: the barcode scanner, face recognition use cases haven't been taken up in the group.

cullen: I don't think anyone will disagree with these use cases

hta: want to make things more complicated ;)
... If you go on with the assumption that media is always sourced locally, you're in the bad corner.
... As long as it's a media stream, the current getUserMedia doesn't care where the stream is coming from. I look at it as a first and easy step.
... thinking about Web Introducers.

robin: That's a DAP deliverable. I'd rather not drag this spec in this discussion, although I agree it's a good way to make introductions.

Anant: the resources you get are not more priviledged.

hta: I was more thinking about my computer getting access to your camera.
... We might want to explore deeper levels of complexity for passing streams around at a later stage.
... In terms of where things go, the WebRTC WG is chartered to get this thing done. The charter is written in such a way that if someone else does it, that's good!
... What I don't want to happen is one group that comes with a vocabulary that describes front camera, back camera, etc. and another group coming with one on camera orientation, in particular.

dom: can getUserMedia be split from WebRTC spec in general? Independently of where the final spec resides, that's something people are interested in seeing sooner rather than later.

cullen: I'm just wondering how much faster things will be if we split things out. Browser vendors in WebRTC already indicated their intention to implement the spec.

dom: Implementations of getUserMedia in Opera.

hta: there's on in Chrome too but part of RTC.

richt: we're going to push something out soon with getUserMedia.

burn: actually, it's a "super-subset".

Anant: if it's published as a separate spec, the use cases of getUserMedia are a subset of use cases.

cullen: what I worry about is totally changing the directions we're going to in something we're supposed to ship in a matter of months.

[discussion on Microsoft joining WebRTC]

dom: one way to have the IPR commitments that we want is to split spec out.
... That means adding the SOTD, and accepting DAP's input.

Anant: if we start taking input from DAP, we're going to lose time.

dom: I don't think so, actually.
... nothing more than what we'll get with last call comments.

[further discussion on getUserMedia]

burn: this group wants to move forward very quickly. Other want it for other purpose. Is there a way to do something quickly that does not prevent other uses?

hta: getUserMedia returns a MediaStream, so MediaStream needs to be defined before getUserMedia

cullen: [back to hardward support for video compression]
... Lots of things are wrong and need to be fixed. We haven't focused on this right now. I'd like to see use cases that we're missing (yours are great, richt).

stefan: that's the direction I'd like to follow, yes.

burn: yes, would be good to have use cases to see what's missing.

richt: the only thing we get from getting to DAP is extra IPR coverage and comments.

cullen: is there a way to get comments early on?

adrian: there's a lot of process involved to get comments sent to a group we're not participating on.

[discussion on IPR commitment]

robin: Nothing bad in splitting up and doing a joint deliverable.

dom: getting comments is something the group needs to do.

Suresh(RIM): so what happens to the draft in DAP's group?

robin: we'll kill it and keep the declarative one.

richt: it needs killing. Nothing happened on this spec for a year.

Stefan: so what do we need to do in the end?

dom: we need to ensure DAP agrees with that direction and then you need to split up the part.
... The key question is where you draw the line. The administrative side is easy.

Anant: Fine to reference WebRTC spec for definition of MediaStream?

dom: yes, but introduces a dependency in terms of timeline.
... Other question is editing.

cullen: I want someone with deep understanding of video

Adrian: we're happy to participate to make things easier since we're making things more complex to start with.

robin: ready to volunteer an editor?

Adrian: I think so.

burn: if requirements are separable, that may be good to separate them.

cullen: I think this group should agree on the mailing-list before things get done.

Stefan: we have had chairs discussions earlier on.

richt: all of the work is staying in WebRTC in the end.

robin: all you get is better IPR protection and better comments.

cullen: important to put it on the list, first time people will hear about it.

stefan: anyone objecting to have a joint deliverable?

PROPOSED RESOLUTION: split up getUserMedia and publish as joint deliverable with DAP WG.

cullen: worried that joint deliverables always take longer.

robin: one thing that is important is to specify which mailing-list takes discussions. We really should not have joint deliverable where discussion is split in groups. Smallest issues turn into a war when that happens.

<richt> proposal to RESOLUTION status: one/two week period for mailing list discussion. Resolution to be made on next conf. call. (?)

cullen: this whole thing is an integrated system. It's going to be very difficult to discuss this without discussing other ideas.

dom: I think the key issue is splitting the spec, not the joint deliverable.

robin: if we can't split the discussion, then we probably can't split the spec.

burn: question is: can we write WebRTC requirements for getUserMedia precisely enough for this virtual joint working group.

cullen: you'll need so much low-level details in getUserMedia

robin: two actions: one on splitting the spec, second on refining joint proposal.

<scribe> ACTION: anant to check how to split getUserMedia from the spec [recorded in http://www.w3.org/2011/10/31-webrtc-minutes.html#action01]

<trackbot> Created ACTION-8 - Check how to split getUserMedia from the spec [on Anant Narayanan - due 2011-11-07].

<scribe> ACTION: robin to draft a draft proposal for joint deliverable. [recorded in http://www.w3.org/2011/10/31-webrtc-minutes.html#action02]

<trackbot> Sorry, couldn't find user - robin

burn: Adrian, do you actually need to see something pulled out first before you can help out?

Adrian: we can help with splitting out the spec, I think.

burn: it's more a pratical question, given the way editors work in WebRTC.

cullen: can someone send use cases on one of the mailing-lists?

<scribe> ACTION: tibbett to send new use cases on getUserMedia to webRTC mailing-list [recorded in http://www.w3.org/2011/10/31-webrtc-minutes.html#action04]

<trackbot> Created ACTION-9 - Send new use cases on getUserMedia to webRTC mailing-list [on Richard Tibbett - due 2011-11-07].

[discussion on DAP interaction over]

Access control model and privacy/security aspects

Slides: WebRTC: User Security and Privacy

anant: currently don't specify what happens with user permission when using getUserMedia
... UAs vary, so may not be appropriate to define a standard for permissions
... propose we write guidelines for browsers rather than something mandated

richt: this is definitely difficult to get right. UA should provide opt-in in UA

francois: typically such SHOULD requirements aren't testable so they become guidelines in the end
... there is a way to make such informative statements

hta: browser differentiation is harmful to user. we have enough browser representation here to figure out where we have agreement and should have recommendations that reduce unnecessary differentiation

richt: we don't mention doorhangers because there is a lot more that can be done.

fluffy: we can say "browser needs to somehow do X" without specifying precisely how.
... if completely optional no one implements. we can learn from existing softphones, etc. I like the "check my hair" dialog, a UA where there is a popup that tells you you're sending video and who you're sending it to. PeerConnection could confirm that this is correct.

(UA = User Agent = Browser)

fluffy: e.g. JS can select camera, provide name of contact to that is displayed at the same time.
... can't check before connection happens, but later can cancel if PeerConnection learns name is wrong

anant: mandate requirements on UI but not how to do it.

burn: +1

anant: hta believes that opera user contacts chrome user, so differences could be confusing. right?

francois: some apps will use getUserMedia to send it, and others will use it for local purposes, so needs are different

anant: maybe app has to make clear what media will be used for.

francois: user might have consented to call in advance of using getusermedia

anant: we can check for stored permission
... do we have consensus to lay out steps but not specify how?

(generally yes)

richt: not sure. we don't know what we need to show yet

anant: we know some things, like previewing video

richt: anything that doesn't affect interop should not be required

fluffy: where we need encrypted name we need to require this

richt: let's not bake in too quickly because we are still experimenting

fluffy: today we support encrypted media (but not yet required). problem would be like using TLS but not showing name of site.

anant: we need global identifiers

adambe: with p2p may not know all names in advance.

anant: UI for accepting and initiating calls may be very different

adambe: what about two people talking and a third joins. media streams already availaeble.

fluffy: same problem if you have single conversation moved from one entdpoint to another

hta: good to discuss, but don't agree with cullen's request to mandate requirements. want to hear about stuffy other than just names

anant: (returning to slides)
... do we allow apps to enumerate devices? no, would like for app to request what it needs (say, hints proposal).
... if user agrees, we return success call.
... user should always have complete control over what is transmitted, independent of what the app asks for

adambe: with proper hints you need to enumerate and can get same result. prefer hints approach

fluffy: every app i use for voice and video allows me to switch cameras and mics. how does that work

anant: don't want app to choose switching, but want user to be able to switch
... UI has to be independent in UA independent of app

burn: in html speech we have notion of default mic. app doesn't choose, the user does via the chrome.

fluffy: yes, happens all the time. i'm using existing crummy mic or camera, go find a better one and plug it in.

Tim: others want to know what's available in advance so you don't even prevent option if it doesn't exist

anant: hints can solve this. some hints are compulsory, others optional.

francois: can't app just check?

anant: this way doesn't reveal info about user.

burn: failures give user info

richt: yes, hints are good. web app doesn't need to know which camera.

<richt> webapps provide a hint in the true sense of the word but the impl. can fallback to any camera if necessary (rather than fail).

anant: the comment was that UIs are best when they know what devices are available

francois: exposing capabilities is fingerprinting issue. Exposing "incapabilities" is as well.

anant: right, the key is the time it takes so that the app can't tell it's a fail because of an incapability and a user action.

hta: if you don't know what's available you can't distinguish between "you need more cameras to run this app" and "you need to allow me to use more cameras"

richt: we can't allow fingerprinting
... one error, regardless of how it fails

fluffy: when would you need a case where you'd rather have a failure than use a hint?
... would rather feed one camera into both than a failure

anant: (back to slides, showing early mockup)
... doorhanger hanging off info bar indicates that it's a web app rather than the browser. don't like this approach, but best so far.
... we have "hair check", live preview of camera before communication is active. can mute audio, click to share cameras
... webcam button on address bar gives you options to change cameras (in UI, part of browser)

adambe: what about webcam with microphone display in it

anant: we should allow it, but may be an advanced checkbox. want 95% of use cases to be handled

fluffy: sounds need to be able to changed to where they come from and where they go.
... we will see this more and more as you have more devices. "skype headset" and "facebook headset"

richt: what about tabbing implications. when you swithch tabs need to know what happens

anant: will get to that
... (back to slides) default to what app asks for but users can always override
... preferences pane to control all

anant: mockup used one-time permission grant model
... we allow user to say "always allow example,org to access a/v"

tim: if browser on phone and in pocket and permission has been given, app could just turn it on in my pocket. accelerometer info can tell you that the person is walking (and may have in pocket).

richt: we will use some kind of visual and/or vibration to indicate

anant: we need something because users won't want to clkci every time at facebook

richt: we could try to learn it based on user behavior

fluffy: from privacy standpoint, webex on your phone and laptop could do this today.
... it always starts with strong privacy position and eventually disappears to no privacy. better to have something only strong enough that it is still used
... indicators are probably more important than prevention
... anything stronger than this will be widely ignored.

richt: that is already in spec

anant: maybe sholud also have vibration or audio indication

stefan: how is this compared to geolocation

richt: we are 10, they are 2

adambe: like "watch position" but without user knowing

anant: need to let user know that previously-given permisison is now using it

fluffy: users hate apps that grabbed device and turned on indicator. needs to be when device used.

anant: in today's world we won't exclusively grab device that way anymore
... should web app be able to specify what type of access it needs?

richt: user should always be in control

francois: maybe app could say instead when it doesn't need long-term access

hta: option of granting long-term access only the second time you try it has worked well

anant: (back to slides) initially tried to tie permisison grant to a time-frame and domain name.
... deevvlopers hated this. want permissions tied to user session not just domain
... could perhaps allow app itself to revoke a permission if it detects a change in user session.

fluffy: can JS app provide a user-identifying token, so can index using both user criteria and this token

anant: yes, as optional param in JS call. could try it.

richt: browser can handle this since it runs session.

anant: we don't know what's in cookie, so no.
... but most websites won't use it.

burn: financial sites wil like this.

fluffy: bad guys don't care but helps good guys ==> okay

richt: if you injected script that just replays in different domain you can get permission easily

anant: how

richt: user-installed script

anant: yeah, but then you can do anything
... (back to slides, showing mockup of notification)
... one option is the entire tab pulses, with camera/mic control right on tab.

richt: we pin audio/video. user has to explicitly request keeping it.

hta: needs to be in spec
... switch tabs all the time and want my voice to be heard

anant: tricky across all UIs, including video phone

fluffy: something unspecified that irritates user is whether a video starts playing when you open a new tab. we should make this the same everywhere

anant: we browser vendors need to work this out.
... prefer default of not blocking audio/video just because you switched tabs. if new tab wants to start video, should ask user.

richt: but may be hard to tell which tab has audio/video

anant: if whole tab pulses it works
... (back to summary slide) what happens if device already in use by other app
... maybe can't tell which app is requesting access
... what is interaction for incoming call. assume signed in to service to receive call/audio

fluffy: yes, but others might want web apps that run in the background and have no bar (headless web apps)

hta: if headless web app reads sdp off disk and passed into PeerConnection, it should just work, with no browser connection.

anant: so we should allow headless apps and let browser determine how incoming call works.

fluffy: some chrome has to be involved when video is requested.

anant: yes. js can tell user about incoming call, but then need to get permission.

hta: gum (getUserMedia) should have enough info to identify where call is from
... apps will want "one button accept". can't avoid showing some chrome. would be better for that to be the doorhanger. neeed extra API call so web app calls receiver's browser and asks if they want to accept. then get doorhanger.

oops, previous speaker was anant

richt: (missed detailed example)

fluffy: sometimes want long-term approval to at least negotiate and reveal IP address. also a different mode where don't reveal IP address until user has accepted.
... first one allows you to deal with ICE slowness by doing ICE and acceptance in parallel.

francois; users won't understand this distinction.

fluffy: okay, then maybe don't need first case.

anant: we don't know how to implement incoming call.

richt: can do OS-level notification

anant: yes, but also want to give all the user controls when accepting call.
... other questions (not on slides)
... what about embedded iframes? we don't allow anything other than toplevel to do that. an iframe would have to pop up its own toplevel window to do this.

richt: what happens with geolocation?

anant: we don't do the same but would like to
... other use case is where ad is embedded in slashdot. In that case slashdot is accepting responsibility and you are giving permission to slashdot.

richt: iframes from different origin

anant: yes. if same origin we just let them through.

(general approval of this approach)

adambe; what about call-in widget you can add to page.

anant: can't avoid this.

adambe: could sandbox the iframe.

anant: problem is that user doesn't know it's a different site.
... when new top bar user can tell
... also, only allow long-term approval for https
... don't enforce https for all uses, but definitely if site wants long-term access

fluffy: what about mixed content
... will probably need more discussion. everyone will hate requiring https, but they may realize they need it.
... difficulty today requiring https is that many sites would break today. but with new sites where everything needs to be built from scratch, like with webrtc, we could require it now. we should consider it.
... but we need more info.

richt: could do tls as JS, so that might take care of it

hta: that's giving JS direct access to TCP
... JS should not have this power!

Stages for moving to a Rec

Slides: W3C Recommendation Track

Moving on to Dan talking about W3C Recommendation practice
... discussion on consensus, moving to First Public Working Draft, periodic publication]
... Good to reach out to groups with opinions early.
... On the "Candidate Recommentation" slide, at this stage, you defend the document needs - at this point, you need to have a test suite that tests the spec, not the implementations.

anant: Is this code and what is it run against?

francois: There is another group trying to come up with generic test framework that can be used
... Should think about how to write a testable specification when you write the spec

Dan: great to have the spec working be the same as the assertion code in test

Can two implementations share code? If have good answer, perhaps OK, but ...

Some times single implementations of optional features

Dan: on to Proposed Recommendation slide

francois: This is stage where W3C members have their last chance to comment

Dan: On to Recommendation slide

anant: How do we deal with later version of spec for features we wanted in a later version ?

francois: Need to recharter WG, go through same processes,
... also a proposed edited rec to include errata (not very common)

Dan: On to addressing public comments

Harald: What's process when can't agree

francois: The group is strongly encouraged to avoid such situations. Comments can get escalated as formal objection that goes up to W3C Director.

Dan: On to Status of WebRTC API draft slide

richard: should we stay at candidate for a year or so

Dan: better to have an exit criteria - such as meet this number of implementations

Dan: two specs on their own time line other than whatever reference dependencies are

anant: If we are doing two specs, should we push out our dates beyond Q2 ?

francois: at the point we know we won't make it, then will need to update

Low Level Control

Slides: Low-level control

Moving to low-level control presentation by burn

Dan: original proposal for a low level API (link in slide 2) received limited discussion and little support from IETF's signaling API
... But there is some interest in a low level API
... Look at requirements document (IETF) by hadriel to drive discussion
... Hints vs Capabilities will be an interesting discussion
... Some discussion now but we should move it to list soon
... Existing requirements are not the same level (higher level) than what we want for low level hints and capabilities
... Browser UI requirements are things we've discussed and should move into the current document

Dan: Media properties are the interesting ones
... A2-1 a web API to learn what codecs a browser supports

anant: How does this relate to JS application-level decoders/encoders?

fluffy: that's independent of an API that exposes what codecs the browser takes

tim: the API can only be used after the user has consented, so there's already some trust in the app

fluffy: we should go through all of the requirements

<juberti> regarding fingerprinting, aren't we sending user-agent already

<derf> We've (jokingly) discussed replacing the user-agent with an empty string.

<juberti> i think there are enough implementation differences that fingerprinting can be done using existing apis.

juberti: need to be able to query browser capabilities so that JS can generate SDP on its own

(without user consent?)

<juberti> user consent is ok

<juberti> this would happen around the same time as camera access

<derf> But if you're going to have hardware codecs, capabilities can differ even with the same UA.

<juberti> the thought experiment here is whether it would be possible to fully implement signaling, except for telling the browser what the offer and answer are.

<juberti> (fully implement signaling in JS)

<ekr> there are a lot of fingerprinting mechanisms out there. is this really making it worse?

tim: but how can you restrict information if you want JS to encode/decode (eg: hardware support for some codecs at certain resolutions)

<derf> ekr: It clearly makes it worse. The question is, is it worth the price?

<juberti> I don't like having to expose a billion knobs to JS, but if we can give the browser a SDP blob from JS, that might allow a flexible but simple compromise.

harald: if you negotiate on the principle that SDP is generated independently of setting up media streams then you don't need permission - there are use cases for that

<juberti> to generate said blob, we need to know what the browser supports.

<derf> juberti: Sounds like you're asking to give the browser an ANSWER from JS, and you want an OFFER in order to generate it.

<derf> Or did I miss what you were really asking?

<juberti> derf: I want to generate an OFFER in JS. I send the offer to the remote side, and also tell my own browser about it. The remote side generates an ANSWER in JS from the OFFER, tells the browser about both, and sends the ANSWER back to the initiator. The initiator then plugs the received ANSWER into the browser, and media flows.

<derf> juberti: Okay. Why can't you do that with ROAP today?

<juberti> a) you can't generate the OFFER, since you don't know the browser caps. b) even if you could generate your own offer, there's no way to tell the local browser about it. lastly, the state machine for ROAP lives inside the browser, so the JS can only do what ROAP allows (i.e. no trickle candidates like Jingle)

<ekr> clarification: trickle candidates is candidates in pieces like with Jingle transport-info?

<juberti> ekr: exactly

<derf> juberti: fluffy is saying what I would have replied to you right now.

dan: we jumped from A2-2 to A2-3, but they both look like they go together

fluffy: what is the use case for knowing codec properties? it only makes sense if you can control the properties

<Mani> would it be more appropriate to require that the capabilities described should be consistent with the capneg RFC5939 security properties?

adam: is A2-2/A2-3 a codec abstraction of some kind?

harald: you want to select the best possible codec for a given bandwidth requirement

harald: different for video and images etc.

richt: considering whether you can update the SDP proposal the browser sends to the JS directly through JavaScript

cullen: when we get to ROAP, we'll see that it's possible.

anant: in order for JavaScript to add things to SDP, it needs to be able to query.

cullen: if the browser supports stuff that it didn't say it supports, then it's only normal that you cannot use it.
... I think you're going to get that one way or the other, so not opposed to an API.

hta: we don't have an opaque proposal between browsers right now.

cullen: in the SIP proposal, you do

hta: cannot be used to setup the initial connection

<ekr> SIP isn't really opaque, it just looks opaque.

cullen: if we're trying to protect from fingerprinting, we need to know what kind of information we think we can reveal.

anant: hardware information is the critical key
... Easy to identify who the user is with some nuances on hardware capabilities.

hta: are we getting it worse in a way that makes a difference, that's the question.

[exchanges about fingerprinting]

cullen: my guess is that even fingerprinting was revealing that I'm using a Mac Book Air, that's still a large set.

<ekr> there's a lot more uniqueness than that. For instance, window size, fonts, plugin support, etc.

<ekr> Important to distnguish between new capabilities that expose more information to the server versus capabilities that expose info to the peer.

burn: going through the requirements provides food for issues that are relevant.

hta: looking at A2-4, in many scenarios, the application is the best place to know what can be cut off.
... e.g. stop sending video that 's not crucial for this communication.

cullen: I would be very concerned if the congestion control loop was done in JavaScript.

hta: my thinking is that, in the case when the message is "no way to get more than 100Kb/s through", the app can react and select the streams it wants to send.
... then the browser can take it from there.

cullen: level of control in JavaScript is: on/off, framerate, bandwidth... slippery road.
... Where do we draw the line?
... Implementation experience will teach us a lot here.

hta: I very much agree with that.

anant: declarative approach could work, e.g. "please turn on the stream at this bitrate"

burn: moving on level in audio streams requirements A2-8 and A2-9

cullen: security implication I think. Attacker can detect volume, and could perhaps derive words from that.

[moving on to A3-x requirements]

<ekr> cullen: depends on granularity with which it is reported

cullen: getting for SSRC and CNAME is good. Setting is more of an issue.

hta: what if you negotiate the Payload Type value and then change it afterwards?
... I don't see a reason to allow an API to do something that is not useful.

burn: A3-4 is basically already possible.

anant: what does it mean to set the audio and video codecs of streams you receive?
... At the point of rendering, it's too late.

hta: take all A3-4, A3-5, A3-6, A3-7, A3-8 together, it amounts to "the application must be able to configure a media stream across RTP sessions".
... I don't think the right approach, but I'd prefer to see a requirement like that actually.

<juberti> for receive codecs, you might choose to change the PT mapping.

<juberti> and you'd need to tell the media layer about that.

[discussion on A3-10 and A3-11, same in requirements although not as low-level]

anant: do we have use cases that we can map to these requirements? That would be useful.

burn: there were some general description that provided some context for theses. I didn't want to read it here.

anant: it would be easier to get it into the spec if these requirements were motivated by actual use cases.
... We should get more specific about the level of extensibility we need.

burn: there is a list in section 3 of this document. It explains what the problems are

anant: not convinced by argument 6) (some Web application developers may prefer to make the decision of which codecs/media-properties).
... don't see why you need to involve the server at all.

hta: it's clear that we don't have general agreement on how this is phrased.
... let's wrap this up.

burn: Moving on to hings API, last discussed on the mailing-list. Simple example is "audioType: 'spoken"music"
... question is which level of details.
... Agreement that this is needed.
... Question is do we need an API for that?

anant: new things will keep coming. Extensibility is needed.

cullen: agree.
... IANA registry could be used, I think.

burn: problem in other groups is knowing the IETF process. Won't be a problem here.

hta: we have to define some kind of namespaces for hints. Just one level, multiple levels, strings, tokenized, etc.

DanD: two things, structure and semantics.

burn: someone may want to propose finer granularity that you want to relate to other values.
... in the end, they are hints, so it doesn't matter so much. If you give something that is general, and something that is specific, you don't know what you're going to end up with.

adam: side comment that the hints should be an optional argument to addStream.

[agreed]

stefan: we should reuse MediaStreamHints object for getUserMedia

anant: true.

hta: having just one registry is probably ok. The video, you could have a hint saying low resolution.

burn: one registry makes sense.

anant: different object but same values

burn: moving on to Statistics API.
... MediaStream.getStats()

DanD: where do you specify the timeframe for those statistics?
... maybe just "what the system knows".

<derf> burn: Just a nit... if your processingDelay is 20 ms, I expect your framerate is 50 fps.

cullen: agree. Maybe we can steal this from the IETF XRBLOCK WG

hta: the caller can always call the function twice and check difference.
... just return total, and the time you think it is at the time when the function is called. Then easier to compute average.

DanD: important for that to be extensible.

cullen: there needs to be some of stats that need to be mandatory to support. Multiple layers of stats are possible.
... any structure you put in there is not really useful, you have to know the property.

hta: structure might buys you some namespace.
... Same property may be defined in different areas, so prefixing might be good.

burn: I'm not hearing any disagreement here.

hta: I note devil's in the details.

burn: then, moving on to Capabilities API
... ROAP proposes to get an SDP blob back.
... getCapabilities() would return an SDP blob.
... It's using the syntax to represent capabilities

cullen: let's take fingerprinting off the table for a second. This seems to make sense, though it may not be the syntax you could dream about to list codecs you support.
... This seems to give you all the information.

anant: why do you need this info in advance?
... more reliable to wait until getUserMedia. No guarantee you'll get video when the call is made.

DanD: I would render a different UI if I know video is not available.

anant: you could do that later on.

cullen: lots of application grey out the video when not available for instance.
... use case for "video", not specific codec.

DanD: on a mobile device, I may present a widget on the spec if I know I have support video.

anant: I understand the argument. I don't like it because you need to gracefully handle the case when video is not available in any case.

Tim: the expectation is that it would be rare.

hta: you should be able to set a callback that "if capabilities change, I want to know"

cullen: right.
... First, is video available? Then, can comeone come up with a use case for more detailed info?

[more discussion on fingerprinting, if you know when the camera comes in, you can correlate the user on Facebook and Google+, for instance]

burn: general interest in something like this, except getCapabilities early on and then callbacks.

anant: we can figure out later on if it's callback or event.
... we're going to try what Cullen suggests: simple audio/video, then if someone comes up with a use case for more, we'll add more.

DanD: good, but let's not restrict. Extensibility would be good, not to change the spec afterwards.

burn: suggests that the browser simply lies about more specific parameters.
... 3 APIs presented here. Who's gonna do this?

cullen: happy to work on the callback, with Anant's help.

burn: will work on the hints API

cullen: all three of them assigned to editors spec.

Data Streams

Slides: WebRTC Data Streams

juberti: There are use cases for unreliable data
... Need for the datachannel for mesh apps
... Encryption should be required for the data channel
... Design for DataStream should be similar to MediaStream
... there is no need for inheritance between DataStream and MediaStream
... We'll use the same flow as in MediaStream to attached to the peerConnection instead of an atomic flow

fluffy: I like this proposal. I think the priority needs to be addressed as people tend to set priority high.

juberti: We can keep it very high level with specific enumerations

fluffy: Trying to come up with some other prioritization ideas

anant: What is the use case for the readyToSend?

juberti: Application should have some notion of the flow stage
... You need to know if you have buffer available

anant: we should align this with webSockets

fluffy: we need flow control for a large transfer

hta: the JS app has the concept of blocking

anant: What if the developer wants to block?

Adam: It can't

anant: API looks good
... How about security considerations?
... how do you know who's on the other side

fluffy: You would have been able to send this anyway

anant: what are the different attack possibilities? Should be captured

juberti: What's unique is that you can send it in peer to peer way. No server involved

hta: You said data must be encrypted
... being encrypted will take care of some concerns
... it would make more sense to have a constructor of itself and then be attached to a peerConnection

Milan: Question about ack

juberti: The choices considered for the wire protocol make it useful

Milan: Protocol has an ack and it doesn't need to be exposed
... an example with the ack would be useful to understand

juberti: I'll take it as an action point

Stefan: we can conclude this session

juberti: I'll have it updated and sent to the mailing list for review

fluffy: this is just the API proposal not the actual implementation, right?
... We're moving along with this until we figure out the implementation.

juberti: Requirements came from the wire protocol

fluffy: looks good. Can we build it?
... That's what I'm concerned and maybe we should relax our requirements

<francois> [ref possible alignment with Websockets, perhaps change "sendMessage" to "send"]

francois: there's a process called feature at risk

MediaStream

Slides: MediaStream slides (odp format)

[going through slides]

cullen: why do audio tracks precede?

adam: if the last track is not a video track, you can assume there's no video in there.
... there used to be 2 lists.

anant: the order doesn't have to correspond to anything.

cullen: there's another ordering in SDP.

anant: not related.

cullen: wondering whether that ordering could be the same.
... just strikes me as something weird.

DanD: think we should be explicit that the order does not have to match that of SDP

anant: the only people who have to worry about that is browser vendors, no need to be exposed to users.

stefan: I liked it better when there were two different lists.

adam: it was easier to query whether there is audio or video.
... Moving on to definitions.
... MediaStream represents stream of media data. Do I need to go through it?

cullen: find this definition fascinating. Can you have stereo audio in two tracks? Is voice and video one track? audio and DTMF? No idea.

anant: a track is lowest you can go. Having 5.1 audio in one track looks weird.

<juberti> what about comfort noise?

<juberti> is that the same track as audio?

cullen: need some group for synchronization, but separate thing.

anant: getObjectURL function is on the MediaStream, right? When you assign a stream to a video element.

cullen: presumably, if I have a stream with 3 video streams, I want to send it to 3 different video elements.

anant: media fragment could be used to select the track you're interested in.

DanD: as long as we all agree on what's inside, we're in good shape.
... This is a good start for a glossary.

cullen: let's say that graphic card has VP8 support. You can't assume that the clone happens before the decoding happens.

[discussion on gstreamer and tracks]

anant: I think gstreamer has two separate tracks-like for stereo audio.

tim: surely, a 5.1 audio is one source for gstreamer.

adam: the motivation to remove the parallel between MediaStreamTrack and media track is that audio was a multiple list whereas video was an exclusive track.

hta: basically one media streamtrack is one stream of audio.

cullen: stereo is two tracks, 5.1 is 6 tracks. That's very easy to deal with.

anant: you want to be able to disable audio tracks.

tim: how do I know which track is the rear right and so on?

DanD: technically, with 3D video, you'll want to sync those two tracks.

francois: 6 tracks for 5.1 audio means disabling audio is disabling 6 tracks.

anant: we can add a layer at MediaStream level.

burn: the real world allows both, combined or not.

cullen: question is does something that is jointly coded with multiple channels, is that one track?
... If that's one track with a bunch of channels, the fact that it could be represented as two tracks sounds like a complete disaster.
... We need some abstraction layer to ease the life of Web developers.

hta: in the case of 4 microphones, you want to send 4 tracks. With 6, you want to send 6 tracks.

anant: I think early implementations will only support one or two channels at most.

tim: there are plenty of places where we can get audio that is not one channel.

anant: right, from files, for instance.
... my preference is to stick to a MediaStreamTrack as the lowest thing.

adam: moving on. An instance of a MediaStreamTrack can only belong to one MediaStream.

anant: noting that "track" is really not the same thing as a track in container formats, etc., so we need to be explicit in the doc about that, not to create additional confusion.

[meeting adjourned, discussion on MediaStream to be continued on day 2]

WebRTC WG F2F Santa Clara - Day 1/2

31 Oct 2011

Attendees

Contents