See also: IRC log
See also: Minutes of day 2/2
Stefan: [starting meeting. Reviewing agenda]
Slides: RTCWEB Architecture (PDF)
hta: The goal for RTCWeb is real-time
communication between browsers
... arbitrarily define that as within ~100ms
... Trying to drive a design by use cases. Must have a design that
meet the priority use cases.
... we want to design general purpose functions.
... one use case we're looking at is the interworking with legacy
systems. We're fairly sure we want to make that work.
hta: relays must be possible
otherwise we don't have a universal solution.
... <goes through the basic architecture in his slide
deck>
hta: All components (except RTCWeb
implementing browsers) must be assumed evil.
... Keep trust to a minimum
... Need to look at mechanisms for establishing trust from a web
page to a browser.
... data congestion control must also be a priority.
... RTP exists. We will use it.
... encrypt everything
<this is controversial>
hta: considering DTLS-SRTP key
negotiation for that purpose.
... UI issues are important to the overall security.
... always fun to agree on codecs
... connection management: least controversial proposal is
ROAP
... We expect innovation in what-connects-to-what
... ROAP does allow us to interconnect to SIP and XMPP based
systems
... lots of other pieces, media buffering, muting, game
control.
hta: a lot of that needs to be done in the browser.
burn: ...caveated by keeping in mind that we want to allow innovation.
hta: W3C has an Audio group defining
interfaces for accessing audio data.
... hopefully we'll be able to use that but we need to confirm that
down the line.
... All of this is captured in
draft-ietf-rtcweb-overview-02.txt
DanD: We know web is beyond browsers. We do have the ability to execute web apps in non-browser UAs.
DanD: We need to ensure that a browser endpoint can communicate with a non-browser endpoint.
hta: We need communication to devices
that are not browsers.
... We should not lose track of the browser use cases first and
foremost.
... One principle is that as long as the other side obeys the
interface then it doesn't matter what it is.
DanD: Another comment RE:
interdependencies with other groups. One example is on the
discovery of the capabilities on other devices.
... this might be a missing piece in our discussions to date.
anant: There are some capabilities in the proposal to negotiate.
fluffy: If we figure out how the protocols work for interoperability then we might get this legacy interworking.
Slides: Use Cases and Requirements (odp format)
Draft: Web Real-Time Communication Use-cases and Requirements IETF draft
stefan: <goes through some of the key use cases in his presentation>
hta: regarding the Distributed Music Band use cases. We're going to need really low latency. Concert-mode? We also need to distinguish between voice and music where we will remove noise from the former that is not suitable for the latter.
francois: Perhaps we should try to stick to something simple since the really low latency issue is a problem.
stefan: It's in the use cases
document anyway so we can discuss further on that.
... In the document there are a list of use cases where the
discussion has died out.
... or not concluded.
... such use cases relate to different situations, E911, Recording,
Emergency access, Security Camera. Large multi-party session
etc.
... these use cases could get added to the document if they get
more support.
... draft-jesup. I think we should cover both unreliable and
reliable data channels for WebRTC data.
stefan: draft-sipdoc. 4 requirements derived. I think this is covered by the current use cases document
<juberti> I agree, these data use cases should go into this doc.
<juberti> We only have one use case for data in the current doc.
stefan: draft-kaplan. Doesn't
introduce new use cases but does put a lot more requirements on the
document.
... Questions/comments on the use cases?
DanD: Observation: augmented reality is not covered.
<francois> Open issues on use cases and Req on WebRTC WG wiki
richt: we've been looking at that. We have the building blocks. Would be good to have a use case on this.
DanD: that's covered in some of these use cases but maybe something we could add
cullen: The ability to overlay a video stream on top of another would be good.
richt: you could do it with canvas
cullen: that has a big
security implication.
... will talk about it
later on.
DanD: plus video might come from an ad-serving service.
fluffy: Back to the 1-800-FEDEX use case. Anything we can provide to scope that out futher?
stefan: not my area of specialty so feedback on this use case would be good.
fluffy: The use cases puts emphasis on DTMF.
burn: I agree that DTMF is extremely important. We have to support DTMF.
stefan: let's take a break since we're waiting on next presenter.
Slides: Security requirements
ekr: IETF trying to work on thread
models and security models. I don't think we're at the consensus
level already, but here are the directions.
... [showing slides]
ekr: Funny state: Browser threat
model, browser protects you. It includes the notion that you're in
an Internet cafe. Basic security technique is isolation.
... Site A and site B sandboxed.
... Browser acts as a trusted base.
... IETF adds the Internet threat model: "you hand the packets to
the attacker to deliver".
... In the IETF oriented view of the universe, cryptography is the
main technique.
... We can't force people to use cryptography all the time.
... We need a solid protection under the browser threat model, and
the best we can on the Internet threat model
... 3 main issues: 1) access to "local devices" (use my camera,
microphone)
... 2) Communications security. If we do our job right, we won't
have to worry too much about that here.
... 3) consent to communications, ties in with CORS,
WebSockets
... Starting with access to local devices:
... If you go to visit a malicious, you have no idea where your
video is going to. It can bug you. Somehow we need the user to
consent, but it's not clear when, how many times.
... One thing I do want to mention is that people make a
distinction between sending video to a site and sending video to
another peer, but from a technical perspective, they are the
same.
... Permissions models: we need short-term permissions, click on a
button for an Amazon customer service. Not a long-term
permission.
... Until last night, I thought we needed long-term
permissions.
... Tim indicated that he was not sure browsers will want to do
this.
... Do you want to support long-term permissions? That's a question
for the group
burn: why isn't this just a browser policy question?
cullen: the question here is: is it a requirement for the group?
burn: went through it in another group. Informed user consent is needed but can take the form of downloading the browser.
ekr: Then, there's the notion of
per-peer permissions.
... Another example of the short-term case, showing an example of
an injected ad.
... [thoughts on UI for short-term permissions]
... This has implications for the API.
... user clicks and calls Ford, but he's on Slashdot
... Dialog showing video call. There needs to be a non-maskable
indicator of call status so that you know you're still on the call.
You need to be consistently aware that the call is going on.
... Access to microphone/camera linked with call permission.
... Back to the example, Slashdot might have to be able to say a
word.
... [thoughts on UI for long-term permissions]
... Interface should be different. Possible: door hanger style UI.
You want an action that is less easy for people to do during a
call.
... There's a tension between convenience and security. It gives a
lot of power to the site.
... That's an open question whether we want to support that or
not.
... IETF has been assuming we want, so great feedback to have if we
actually don't
... [thoughts on peer-identity based permissions]
<juberti> I think we want to find a way to handle this. We don't want the web platform to miss something that will be present in native app platforms.
cullen: what's important to you is where is that going. Our media is going to a different place than the Web site. The identity is important.
burn: same issue in the Speech XG.
hta: Usually, you can read the form and find the address in the form, but sometimes the address is constructed by the JavaScript.
ekr: Partial digression on network
attackers. If I'm in an Internet Cafe, and an attacker manages to
inject an Iframe, he can bug my computer, redirecting the call to
him. The attacker controls the network on HTTP.
... Assumption is that it's safe to authorize PokerWeb and then
surf the Internet. It's basically the same on your Wifi if not
secure enough.
... An open question is: should this facility be available on HTTP
at all? Mandate HTTPS?
... e.g. an HTTPS page that loads jQuery through HTTP
DanD: not all the devices have the ability to securely preserve a token. That would be a good way to solve the problem.
ekr: [thoughts on consent for
real-time peer-to-peer communication]
... From a protocol point of view, we have ICE. Remember that you
cannot trust the JS.
burn: the point is you disabled security completely
ekr: not entirely agree that it's the
same thing
... Transaction ID needs to be hidden from the JavaScript
... When I surf to HTTP gmail, any attacker can inject the
JavaScript and redirect calls for him.
... In the context of SIP, we're already addressed most of
communications security issues.
... There's also protocol attack issue which hopefully should not
be a real problem in the end.
... otherwise security issue.
... Assuming that ROAP style API is used, we're going to make it
good to hide security settings from JavaScript.
AdamB: IDs might be owned by FaceBoox, and so on.
ekr: my view is: 3 basic scenarios.
1) Gmail to Gmail, Facebook to Facebook, etc. 2) Gmail to Facebook,
etc. where you'll need federation of ID. 3) Identity separated from
the service I use to make the call.
... I have some possible solutions for that. Happy to discuss.
Cullen: My position is a bit
stronger. This group wants encrypted calls, but if you can't tell
who the call is going to, that's useless.
... We need to take that into account.
hta: for many cases, I think it's quite ok to say that the call is encrypted to an identity and that this identity is verified by the fact that the guy I talk to presented himself.
cullen: I want to know the trust chain. If this call is being intercepted, I want to have some indication on that.
anant: slightly disagree with what
Harald said.
... The federated use case.
burn: how do you know that things are going to the right person?
Anant: given that we have that use
case in the document, we have to touch upon that issue.
... We want a completely peer-to-peer system in the end.
ekr: Is there a good way to bootstrap these systems? I think the answer is "yes".
Stefan: wanted to know status of controlling camera and microphone.
robin: Hi. I'm chair of DAP. We need
to figure out how we split the work on who does what.
... We haven't done a lot of work on Media Capture recently.
... One dividing line that could be useful: DAP could be picking up
media capture very quickly, some interest from DAP side.
... We would do the simple thing that doesn't include streaming or
any complex processing.
... Then hopefully this would be pluggable in what this group
needs
burn: what do you mean without streams?
robin: you could not bind a video
stream to some back channel, but you could do stuff such as video
mail or recording.
... In the declarative style, most of it in the browser.
hta: main difference is who controls the UI.
Anant: If you're going to do programmatic access, important to agree on what they look like between groups. Another solution is you take care of declarative, and we handle programmatic way.
Anant: If you do programmatic way, we may end up with two APIs doing sensibly the same thing
robin: heard feedback that some people wanted to do simple things immediately.
anant: cannot "simple" be done with pure declarative approach?
robin: not really.
anant: something we've discussed in Mozilla. Media type in the input, such as video/mp4. The browser prompts user with camera view. Nice property that is avoids to deal with security issues in a nice way.
robin: it would be useful if you had a demo you could show in DAP. We're meeting Thursday/Friday.
Adrian: Microsoft just joined DAP. One of our interests is media capture. API based on what getUserMedia is doing. WebRTC could build on top of this API. This way, we could split the work easily.
Anant: does that mean that you have use cases that require programmatic APIs?
Adrian: yes, in general we want developers to build their own experience.
Cullen: how do you deal with permissions?
Adrian: same way as other APIs
Cullen: agree with short-term, long-term permissions presented here?
Adrian: need to check, but didn't look wrong.
richt: in Opera, we agree that many use cases require getUserMedia but we want to decouple that from peer-to-peer connectivity. So agree to split things up.
Anant: can two groups work on the same spec?
Adrian: liaison explicit in the charter of WebRTC. Feasible for DAP to own the spec and go through the liaison.
richt: Peer-to-peer relies on a stream. We give you a stream and you deal with it.
Cullen: that's a bit more complex
than that, because of the hardware support for compression, and
permissions too.
... It sounds DAP needs a permissions model as well and doesn't
have one for the time being.
... We have all the permission problems that have to be enforced at
the getUserMedia level.
richt: the barcode scanner, face recognition use cases haven't been taken up in the group.
cullen: I don't think anyone will disagree with these use cases
hta: want to make things more
complicated ;)
... If you go on with the assumption that media is always sourced
locally, you're in the bad corner.
... As long as it's a media stream, the current getUserMedia
doesn't care where the stream is coming from. I look at it as a
first and easy step.
... thinking about Web Introducers.
robin: That's a DAP deliverable. I'd rather not drag this spec in this discussion, although I agree it's a good way to make introductions.
Anant: the resources you get are not more priviledged.
hta: I was more thinking about my
computer getting access to your camera.
... We might want to explore deeper levels of complexity for
passing streams around at a later stage.
... In terms of where things go, the WebRTC WG is chartered to get
this thing done. The charter is written in such a way that if
someone else does it, that's good!
... What I don't want to happen is one group that comes with a
vocabulary that describes front camera, back camera, etc. and
another group coming with one on camera orientation, in
particular.
dom: can getUserMedia be split from WebRTC spec in general? Independently of where the final spec resides, that's something people are interested in seeing sooner rather than later.
cullen: I'm just wondering how much faster things will be if we split things out. Browser vendors in WebRTC already indicated their intention to implement the spec.
dom: Implementations of getUserMedia in Opera.
hta: there's on in Chrome too but part of RTC.
richt: we're going to push something out soon with getUserMedia.
burn: actually, it's a "super-subset".
Anant: if it's published as a separate spec, the use cases of getUserMedia are a subset of use cases.
cullen: what I worry about is totally changing the directions we're going to in something we're supposed to ship in a matter of months.
[discussion on Microsoft joining WebRTC]
dom: one way to have the IPR
commitments that we want is to split spec out.
... That means adding the SOTD, and accepting DAP's input.
Anant: if we start taking input from DAP, we're going to lose time.
dom: I don't think so,
actually.
... nothing more than what we'll get with last call comments.
[further discussion on getUserMedia]
burn: this group wants to move forward very quickly. Other want it for other purpose. Is there a way to do something quickly that does not prevent other uses?
hta: getUserMedia returns a MediaStream, so MediaStream needs to be defined before getUserMedia
cullen: [back to hardward support for
video compression]
... Lots of things are wrong and need to be fixed. We haven't
focused on this right now. I'd like to see use cases that we're
missing (yours are great, richt).
stefan: that's the direction I'd like to follow, yes.
burn: yes, would be good to have use cases to see what's missing.
richt: the only thing we get from getting to DAP is extra IPR coverage and comments.
cullen: is there a way to get comments early on?
adrian: there's a lot of process involved to get comments sent to a group we're not participating on.
[discussion on IPR commitment]
robin: Nothing bad in splitting up and doing a joint deliverable.
dom: getting comments is something the group needs to do.
Suresh(RIM): so what happens to the draft in DAP's group?
robin: we'll kill it and keep the declarative one.
richt: it needs killing. Nothing happened on this spec for a year.
Stefan: so what do we need to do in the end?
dom: we need to ensure DAP agrees
with that direction and then you need to split up the part.
... The key question is where you draw the line. The administrative
side is easy.
Anant: Fine to reference WebRTC spec for definition of MediaStream?
dom: yes, but introduces a dependency
in terms of timeline.
... Other question is editing.
cullen: I want someone with deep understanding of video
Adrian: we're happy to participate to make things easier since we're making things more complex to start with.
robin: ready to volunteer an editor?
Adrian: I think so.
burn: if requirements are separable, that may be good to separate them.
cullen: I think this group should agree on the mailing-list before things get done.
Stefan: we have had chairs discussions earlier on.
richt: all of the work is staying in WebRTC in the end.
robin: all you get is better IPR protection and better comments.
cullen: important to put it on the list, first time people will hear about it.
stefan: anyone objecting to have a joint deliverable?
PROPOSED RESOLUTION: split up getUserMedia and publish as joint deliverable with DAP WG.
cullen: worried that joint deliverables always take longer.
robin: one thing that is important is to specify which mailing-list takes discussions. We really should not have joint deliverable where discussion is split in groups. Smallest issues turn into a war when that happens.
<richt> proposal to RESOLUTION status: one/two week period for mailing list discussion. Resolution to be made on next conf. call. (?)
cullen: this whole thing is an integrated system. It's going to be very difficult to discuss this without discussing other ideas.
dom: I think the key issue is splitting the spec, not the joint deliverable.
robin: if we can't split the discussion, then we probably can't split the spec.
burn: question is: can we write WebRTC requirements for getUserMedia precisely enough for this virtual joint working group.
cullen: you'll need so much low-level details in getUserMedia
robin: two actions: one on splitting the spec, second on refining joint proposal.
<scribe> ACTION: anant to check how to split getUserMedia from the spec [recorded in http://www.w3.org/2011/10/31-webrtc-minutes.html#action01]
<trackbot> Created ACTION-8 - Check how to split getUserMedia from the spec [on Anant Narayanan - due 2011-11-07].
<scribe> ACTION: robin to draft a draft proposal for joint deliverable. [recorded in http://www.w3.org/2011/10/31-webrtc-minutes.html#action02]
<trackbot> Sorry, couldn't find user - robin
burn: Adrian, do you actually need to see something pulled out first before you can help out?
Adrian: we can help with splitting out the spec, I think.
burn: it's more a pratical question, given the way editors work in WebRTC.
cullen: can someone send use cases on one of the mailing-lists?
<scribe> ACTION: tibbett to send new use cases on getUserMedia to webRTC mailing-list [recorded in http://www.w3.org/2011/10/31-webrtc-minutes.html#action04]
<trackbot> Created ACTION-9 - Send new use cases on getUserMedia to webRTC mailing-list [on Richard Tibbett - due 2011-11-07].
[discussion on DAP interaction over]
Slides: WebRTC: User Security and Privacy
anant: currently don't specify what
happens with user permission when using getUserMedia
... UAs vary, so may not be appropriate to define a standard for
permissions
... propose we write guidelines for browsers rather than something
mandated
richt: this is definitely difficult to get right. UA should provide opt-in in UA
francois: typically such SHOULD
requirements aren't testable so they become guidelines in the
end
... there is a way to make such informative statements
hta: browser differentiation is harmful to user. we have enough browser representation here to figure out where we have agreement and should have recommendations that reduce unnecessary differentiation
richt: we don't mention doorhangers because there is a lot more that can be done.
fluffy: we can say "browser needs to
somehow do X" without specifying precisely how.
... if completely optional no one implements. we can learn from
existing softphones, etc. I like the "check my hair" dialog, a UA
where there is a popup that tells you you're sending video and who
you're sending it to. PeerConnection could confirm that this is
correct.
(UA = User Agent = Browser)
fluffy: e.g. JS can select camera,
provide name of contact to that is displayed at the same
time.
... can't check before connection happens, but later can cancel if
PeerConnection learns name is wrong
anant: mandate requirements on UI but not how to do it.
burn: +1
anant: hta believes that opera user contacts chrome user, so differences could be confusing. right?
francois: some apps will use getUserMedia to send it, and others will use it for local purposes, so needs are different
anant: maybe app has to make clear what media will be used for.
francois: user might have consented to call in advance of using getusermedia
anant: we can check for stored
permission
... do we have consensus to lay out steps but not specify how?
(generally yes)
richt: not sure. we don't know what we need to show yet
anant: we know some things, like previewing video
richt: anything that doesn't affect interop should not be required
fluffy: where we need encrypted name we need to require this
richt: let's not bake in too quickly because we are still experimenting
fluffy: today we support encrypted media (but not yet required). problem would be like using TLS but not showing name of site.
anant: we need global identifiers
adambe: with p2p may not know all names in advance.
anant: UI for accepting and initiating calls may be very different
adambe: what about two people talking and a third joins. media streams already availaeble.
fluffy: same problem if you have single conversation moved from one entdpoint to another
hta: good to discuss, but don't agree with cullen's request to mandate requirements. want to hear about stuffy other than just names
anant: (returning to slides)
... do we allow apps to enumerate devices? no, would like for app
to request what it needs (say, hints proposal).
... if user agrees, we return success call.
... user should always have complete control over what is
transmitted, independent of what the app asks for
adambe: with proper hints you need to enumerate and can get same result. prefer hints approach
fluffy: every app i use for voice and video allows me to switch cameras and mics. how does that work
anant: don't want app to choose
switching, but want user to be able to switch
... UI has to be independent in UA independent of app
burn: in html speech we have notion of default mic. app doesn't choose, the user does via the chrome.
fluffy: yes, happens all the time. i'm using existing crummy mic or camera, go find a better one and plug it in.
Tim: others want to know what's available in advance so you don't even prevent option if it doesn't exist
anant: hints can solve this. some hints are compulsory, others optional.
francois: can't app just check?
anant: this way doesn't reveal info about user.
burn: failures give user info
richt: yes, hints are good. web app doesn't need to know which camera.
<richt> webapps provide a hint in the true sense of the word but the impl. can fallback to any camera if necessary (rather than fail).
anant: the comment was that UIs are best when they know what devices are available
francois: exposing capabilities is fingerprinting issue. Exposing "incapabilities" is as well.
anant: right, the key is the time it takes so that the app can't tell it's a fail because of an incapability and a user action.
hta: if you don't know what's available you can't distinguish between "you need more cameras to run this app" and "you need to allow me to use more cameras"
richt: we can't allow
fingerprinting
... one error, regardless of how it fails
fluffy: when would you need a case
where you'd rather have a failure than use a hint?
... would rather feed one camera into both than a failure
anant: (back to slides, showing early
mockup)
... doorhanger hanging off info bar indicates that it's a web app
rather than the browser. don't like this approach, but best so
far.
... we have "hair check", live preview of camera before
communication is active. can mute audio, click to share
cameras
... webcam button on address bar gives you options to change
cameras (in UI, part of browser)
adambe: what about webcam with microphone display in it
anant: we should allow it, but may be an advanced checkbox. want 95% of use cases to be handled
fluffy: sounds need to be able to
changed to where they come from and where they go.
... we will see this more and more as you have more devices. "skype
headset" and "facebook headset"
richt: what about tabbing implications. when you swithch tabs need to know what happens
anant: will get to that
... (back to slides) default to what app asks for but users can
always override
... preferences pane to control all
anant: mockup used one-time
permission grant model
... we allow user to say "always allow example,org to access
a/v"
tim: if browser on phone and in pocket and permission has been given, app could just turn it on in my pocket. accelerometer info can tell you that the person is walking (and may have in pocket).
richt: we will use some kind of visual and/or vibration to indicate
anant: we need something because users won't want to clkci every time at facebook
richt: we could try to learn it based on user behavior
fluffy: from privacy standpoint,
webex on your phone and laptop could do this today.
... it always starts with strong privacy position and eventually
disappears to no privacy. better to have something only strong
enough that it is still used
... indicators are probably more important than prevention
... anything stronger than this will be widely ignored.
richt: that is already in spec
anant: maybe sholud also have vibration or audio indication
stefan: how is this compared to geolocation
richt: we are 10, they are 2
adambe: like "watch position" but without user knowing
anant: need to let user know that previously-given permisison is now using it
fluffy: users hate apps that grabbed device and turned on indicator. needs to be when device used.
anant: in today's world we won't
exclusively grab device that way anymore
... should web app be able to specify what type of access it
needs?
richt: user should always be in control
francois: maybe app could say instead when it doesn't need long-term access
hta: option of granting long-term access only the second time you try it has worked well
anant: (back to slides) initially
tried to tie permisison grant to a time-frame and domain
name.
... deevvlopers hated this. want permissions tied to user session
not just domain
... could perhaps allow app itself to revoke a permission if it
detects a change in user session.
fluffy: can JS app provide a user-identifying token, so can index using both user criteria and this token
anant: yes, as optional param in JS call. could try it.
richt: browser can handle this since it runs session.
anant: we don't know what's in
cookie, so no.
... but most websites won't use it.
burn: financial sites wil like this.
fluffy: bad guys don't care but helps good guys ==> okay
richt: if you injected script that just replays in different domain you can get permission easily
anant: how
richt: user-installed script
anant: yeah, but then you can do
anything
... (back to slides, showing mockup of notification)
... one option is the entire tab pulses, with camera/mic control
right on tab.
richt: we pin audio/video. user has to explicitly request keeping it.
hta: needs to be in spec
... switch tabs all the time and want my voice to be heard
anant: tricky across all UIs, including video phone
fluffy: something unspecified that irritates user is whether a video starts playing when you open a new tab. we should make this the same everywhere
anant: we browser vendors need to
work this out.
... prefer default of not blocking audio/video just because you
switched tabs. if new tab wants to start video, should ask
user.
richt: but may be hard to tell which tab has audio/video
anant: if whole tab pulses it
works
... (back to summary slide) what happens if device already in use
by other app
... maybe can't tell which app is requesting access
... what is interaction for incoming call. assume signed in to
service to receive call/audio
fluffy: yes, but others might want web apps that run in the background and have no bar (headless web apps)
hta: if headless web app reads sdp off disk and passed into PeerConnection, it should just work, with no browser connection.
anant: so we should allow headless apps and let browser determine how incoming call works.
fluffy: some chrome has to be involved when video is requested.
anant: yes. js can tell user about incoming call, but then need to get permission.
hta: gum (getUserMedia) should have
enough info to identify where call is from
... apps will want "one button accept". can't avoid showing some
chrome. would be better for that to be the doorhanger. neeed extra
API call so web app calls receiver's browser and asks if they want
to accept. then get doorhanger.
oops, previous speaker was anant
richt: (missed detailed example)
fluffy: sometimes want long-term
approval to at least negotiate and reveal IP address. also a
different mode where don't reveal IP address until user has
accepted.
... first one allows you to deal with ICE slowness by doing ICE and
acceptance in parallel.
francois; users won't understand this distinction.
fluffy: okay, then maybe don't need first case.
anant: we don't know how to implement incoming call.
richt: can do OS-level notification
anant: yes, but also want to give all
the user controls when accepting call.
... other questions (not on slides)
... what about embedded iframes? we don't allow anything other than
toplevel to do that. an iframe would have to pop up its own
toplevel window to do this.
richt: what happens with geolocation?
anant: we don't do the same but would
like to
... other use case is where ad is embedded in slashdot. In that
case slashdot is accepting responsibility and you are giving
permission to slashdot.
richt: iframes from different origin
anant: yes. if same origin we just let them through.
(general approval of this approach)
adambe; what about call-in widget you can add to page.
anant: can't avoid this.
adambe: could sandbox the iframe.
anant: problem is that user doesn't
know it's a different site.
... when new top bar user can tell
... also, only allow long-term approval for https
... don't enforce https for all uses, but definitely if site wants
long-term access
fluffy: what about mixed
content
... will probably need more discussion. everyone will hate
requiring https, but they may realize they need it.
... difficulty today requiring https is that many sites would break
today. but with new sites where everything needs to be built from
scratch, like with webrtc, we could require it now. we should
consider it.
... but we need more info.
richt: could do tls as JS, so that might take care of it
hta: that's giving JS direct access
to TCP
... JS should not have this power!
Slides: W3C Recommendation Track
Moving on to Dan talking about W3C Recommendation practice
... discussion on consensus, moving to First Public Working Draft, periodic publication]
... Good to reach out to groups with opinions
early.
... On the "Candidate Recommentation" slide, at this stage, you defend the document needs - at this point, you need to have a test suite that tests the
spec, not the implementations.
anant: Is this code and what is it run against?
francois: There is another group
trying to come up with generic test framework that can be
used
... Should think about how to write a testable specification when
you write the spec
Dan: great to have the spec working be the same as the assertion code in test
Can two implementations share code? If have good answer, perhaps OK, but ...
Some times single implementations of optional features
Dan: on to Proposed Recommendation slide
francois: This is stage where W3C members have their last chance to comment
Dan: On to Recommendation slide
anant: How do we deal with later version of spec for features we wanted in a later version ?
francois: Need to recharter WG, go
through same processes,
... also a proposed edited rec to include errata (not very common)
Dan: On to addressing public comments
Harald: What's process when can't agree
francois: The group is strongly encouraged to avoid such situations. Comments can get escalated as formal objection that goes up to W3C Director.
Dan: On to Status of WebRTC API draft slide
richard: should we stay at candidate for a year or so
Dan: better to have an exit criteria - such as meet this number of implementations
Dan: two specs on their own time line other than whatever reference dependencies are
anant: If we are doing two specs, should we push out our dates beyond Q2 ?
francois: at the point we know we won't make it, then will need to update
Slides: Low-level control
Moving to low-level control presentation by burn
Dan: original proposal for a low
level API (link in slide 2) received limited discussion and little
support from IETF's signaling API
... But there is some interest in a low level API
... Look at requirements document (IETF) by hadriel to drive discussion
... Hints vs Capabilities will be an interesting discussion
... Some discussion now but we should move it to list soon
... Existing requirements are not the same level (higher level)
than what we want for low level hints and capabilities
... Browser UI requirements are things we've discussed and should
move into the current document
Dan: Media properties are the
interesting ones
... A2-1 a web API to learn what codecs a browser supports
anant: How does this relate to JS application-level decoders/encoders?
fluffy: that's independent of an API that exposes what codecs the browser takes
tim: the API can only be used after the user has consented, so there's already some trust in the app
fluffy: we should go through all of the requirements
<juberti> regarding fingerprinting, aren't we sending user-agent already
<derf> We've (jokingly) discussed replacing the user-agent with an empty string.
<juberti> i think there are enough implementation differences that fingerprinting can be done using existing apis.
juberti: need to be able to query browser capabilities so that JS can generate SDP on its own
(without user consent?)
<juberti> user consent is ok
<juberti> this would happen around the same time as camera access
<derf> But if you're going to have hardware codecs, capabilities can differ even with the same UA.
<juberti> the thought experiment here is whether it would be possible to fully implement signaling, except for telling the browser what the offer and answer are.
<juberti> (fully implement signaling in JS)
<ekr> there are a lot of fingerprinting mechanisms out there. is this really making it worse?
tim: but how can you restrict information if you want JS to encode/decode (eg: hardware support for some codecs at certain resolutions)
<derf> ekr: It clearly makes it worse. The question is, is it worth the price?
<juberti> I don't like having to expose a billion knobs to JS, but if we can give the browser a SDP blob from JS, that might allow a flexible but simple compromise.
harald: if you negotiate on the principle that SDP is generated independently of setting up media streams then you don't need permission - there are use cases for that
<juberti> to generate said blob, we need to know what the browser supports.
<derf> juberti: Sounds like you're asking to give the browser an ANSWER from JS, and you want an OFFER in order to generate it.
<derf> Or did I miss what you were really asking?
<juberti> derf: I want to generate an OFFER in JS. I send the offer to the remote side, and also tell my own browser about it. The remote side generates an ANSWER in JS from the OFFER, tells the browser about both, and sends the ANSWER back to the initiator. The initiator then plugs the received ANSWER into the browser, and media flows.
<derf> juberti: Okay. Why can't you do that with ROAP today?
<juberti> a) you can't generate the OFFER, since you don't know the browser caps. b) even if you could generate your own offer, there's no way to tell the local browser about it. lastly, the state machine for ROAP lives inside the browser, so the JS can only do what ROAP allows (i.e. no trickle candidates like Jingle)
<ekr> clarification: trickle candidates is candidates in pieces like with Jingle transport-info?
<juberti> ekr: exactly
<derf> juberti: fluffy is saying what I would have replied to you right now.
dan: we jumped from A2-2 to A2-3, but they both look like they go together
fluffy: what is the use case for knowing codec properties? it only makes sense if you can control the properties
<Mani> would it be more appropriate to require that the capabilities described should be consistent with the capneg RFC5939 security properties?
adam: is A2-2/A2-3 a codec abstraction of some kind?
harald: you want to select the best possible codec for a given bandwidth requirement
harald: different for video and images etc.
richt: considering whether you can update the SDP proposal the browser sends to the JS directly through JavaScript
cullen: when we get to ROAP, we'll see that it's possible.
anant: in order for JavaScript to add things to SDP, it needs to be able to query.
cullen: if the browser supports stuff
that it didn't say it supports, then it's only normal that you
cannot use it.
... I think you're going to get that one way or the other, so not
opposed to an API.
hta: we don't have an opaque proposal between browsers right now.
cullen: in the SIP proposal, you do
hta: cannot be used to setup the initial connection
<ekr> SIP isn't really opaque, it just looks opaque.
cullen: if we're trying to protect from fingerprinting, we need to know what kind of information we think we can reveal.
anant: hardware information is the
critical key
... Easy to identify who the user is with some nuances on hardware
capabilities.
hta: are we getting it worse in a way that makes a difference, that's the question.
[exchanges about fingerprinting]
cullen: my guess is that even fingerprinting was revealing that I'm using a Mac Book Air, that's still a large set.
<ekr> there's a lot more uniqueness than that. For instance, window size, fonts, plugin support, etc.
<ekr> Important to distnguish between new capabilities that expose more information to the server versus capabilities that expose info to the peer.
burn: going through the requirements provides food for issues that are relevant.
hta: looking at A2-4, in many
scenarios, the application is the best place to know what can be
cut off.
... e.g. stop sending video that 's not crucial for this
communication.
cullen: I would be very concerned if the congestion control loop was done in JavaScript.
hta: my thinking is that, in the case
when the message is "no way to get more than 100Kb/s through", the
app can react and select the streams it wants to send.
... then the browser can take it from there.
cullen: level of control in
JavaScript is: on/off, framerate, bandwidth... slippery road.
... Where do we draw the line?
... Implementation experience will teach us a lot here.
hta: I very much agree with that.
anant: declarative approach could work, e.g. "please turn on the stream at this bitrate"
burn: moving on level in audio streams requirements A2-8 and A2-9
cullen: security implication I think. Attacker can detect volume, and could perhaps derive words from that.
[moving on to A3-x requirements]
<ekr> cullen: depends on granularity with which it is reported
cullen: getting for SSRC and CNAME is good. Setting is more of an issue.
hta: what if you negotiate the
Payload Type value and then change it afterwards?
... I don't see a reason to allow an API to do something that is
not useful.
burn: A3-4 is basically already possible.
anant: what does it mean to set the
audio and video codecs of streams you receive?
... At the point of rendering, it's too late.
hta: take all A3-4, A3-5, A3-6, A3-7,
A3-8 together, it amounts to "the application must be able to
configure a media stream across RTP sessions".
... I don't think the right approach, but I'd prefer to see a requirement like that actually.
<juberti> for receive codecs, you might choose to change the PT mapping.
<juberti> and you'd need to tell the media layer about that.
[discussion on A3-10 and A3-11, same in requirements although not as low-level]
anant: do we have use cases that we can map to these requirements? That would be useful.
burn: there were some general description that provided some context for theses. I didn't want to read it here.
anant: it would be easier to get it
into the spec if these requirements were motivated by actual use
cases.
... We should get more specific about the level of extensibility we
need.
burn: there is a list in section 3 of this document. It explains what the problems are
anant: not convinced by argument 6)
(some Web application developers may prefer to make the decision of
which codecs/media-properties).
... don't see why you need to involve the server at all.
hta: it's clear that we don't have
general agreement on how this is phrased.
... let's wrap this up.
burn: Moving on to hings API, last
discussed on the mailing-list. Simple example is "audioType:
'spoken"music"
... question is which level of details.
... Agreement that this is needed.
... Question is do we need an API for that?
anant: new things will keep coming. Extensibility is needed.
cullen: agree.
... IANA registry could be used, I think.
burn: problem in other groups is knowing the IETF process. Won't be a problem here.
hta: we have to define some kind of namespaces for hints. Just one level, multiple levels, strings, tokenized, etc.
DanD: two things, structure and semantics.
burn: someone may want to propose
finer granularity that you want to relate to other values.
... in the end, they are hints, so it doesn't matter so much. If
you give something that is general, and something that is specific,
you don't know what you're going to end up with.
adam: side comment that the hints should be an optional argument to addStream.
[agreed]
stefan: we should reuse MediaStreamHints object for getUserMedia
anant: true.
hta: having just one registry is probably ok. The video, you could have a hint saying low resolution.
burn: one registry makes sense.
anant: different object but same values
burn: moving on to Statistics
API.
... MediaStream.getStats()
DanD: where do you specify the
timeframe for those statistics?
... maybe just "what the system knows".
<derf> burn: Just a nit... if your processingDelay is 20 ms, I expect your framerate is 50 fps.
cullen: agree. Maybe we can steal this from the IETF XRBLOCK WG
hta: the caller can always call the
function twice and check difference.
... just return total, and the time you think it is at the time
when the function is called. Then easier to compute average.
DanD: important for that to be extensible.
cullen: there needs to be some of
stats that need to be mandatory to support. Multiple layers of
stats are possible.
... any structure you put in there is not really useful, you have
to know the property.
hta: structure might buys you some
namespace.
... Same property may be defined in different areas, so prefixing
might be good.
burn: I'm not hearing any disagreement here.
hta: I note devil's in the details.
burn: then, moving on to Capabilities
API
... ROAP proposes to get an SDP blob back.
... getCapabilities() would return an SDP blob.
... It's using the syntax to represent capabilities
cullen: let's take fingerprinting off
the table for a second. This seems to make sense, though it may not
be the syntax you could dream about to list codecs you
support.
... This seems to give you all the information.
anant: why do you need this info in
advance?
... more reliable to wait until getUserMedia. No guarantee you'll
get video when the call is made.
DanD: I would render a different UI if I know video is not available.
anant: you could do that later on.
cullen: lots of application grey out
the video when not available for instance.
... use case for "video", not specific codec.
DanD: on a mobile device, I may present a widget on the spec if I know I have support video.
anant: I understand the argument. I don't like it because you need to gracefully handle the case when video is not available in any case.
Tim: the expectation is that it would be rare.
hta: you should be able to set a callback that "if capabilities change, I want to know"
cullen: right.
... First, is video available? Then, can comeone come up with a use
case for more detailed info?
[more discussion on fingerprinting, if you know when the camera comes in, you can correlate the user on Facebook and Google+, for instance]
burn: general interest in something like this, except getCapabilities early on and then callbacks.
anant: we can figure out later on if
it's callback or event.
... we're going to try what Cullen suggests: simple audio/video,
then if someone comes up with a use case for more, we'll add
more.
DanD: good, but let's not restrict. Extensibility would be good, not to change the spec afterwards.
burn: suggests that the browser
simply lies about more specific parameters.
... 3 APIs presented here. Who's gonna do this?
cullen: happy to work on the callback, with Anant's help.
burn: will work on the hints API
cullen: all three of them assigned to editors spec.
Slides: WebRTC Data Streams
juberti: There are use cases for
unreliable data
... Need for the datachannel for mesh apps
... Encryption should be required for the data channel
... Design for DataStream should be similar to MediaStream
... there is no need for inheritance between DataStream and
MediaStream
... We'll use the same flow as in MediaStream to attached to the
peerConnection instead of an atomic flow
fluffy: I like this proposal. I think the priority needs to be addressed as people tend to set priority high.
juberti: We can keep it very high level with specific enumerations
fluffy: Trying to come up with some other prioritization ideas
anant: What is the use case for the readyToSend?
juberti: Application should have some
notion of the flow stage
... You need to know if you have buffer available
anant: we should align this with webSockets
fluffy: we need flow control for a large transfer
hta: the JS app has the concept of blocking
anant: What if the developer wants to block?
Adam: It can't
anant: API looks good
... How about security considerations?
... how do you know who's on the other side
fluffy: You would have been able to send this anyway
anant: what are the different attack possibilities? Should be captured
juberti: What's unique is that you can send it in peer to peer way. No server involved
hta: You said data must be
encrypted
... being encrypted will take care of some concerns
... it would make more sense to have a constructor of itself and
then be attached to a peerConnection
Milan: Question about ack
juberti: The choices considered for the wire protocol make it useful
Milan: Protocol has an ack and it
doesn't need to be exposed
... an example with the ack would be useful to understand
juberti: I'll take it as an action point
Stefan: we can conclude this session
juberti: I'll have it updated and sent to the mailing list for review
fluffy: this is just the API proposal
not the actual implementation, right?
... We're moving along with this until we figure out the
implementation.
juberti: Requirements came from the wire protocol
fluffy: looks good. Can we build
it?
... That's what I'm concerned and maybe we should relax our
requirements
<francois> [ref possible alignment with Websockets, perhaps change "sendMessage" to "send"]
francois: there's a process called feature at risk
Slides: MediaStream slides (odp format)
[going through slides]
cullen: why do audio tracks precede?
adam: if the last track is not a
video track, you can assume there's no video in there.
... there used to be 2 lists.
anant: the order doesn't have to correspond to anything.
cullen: there's another ordering in SDP.
anant: not related.
cullen: wondering whether that
ordering could be the same.
... just strikes me as something weird.
DanD: think we should be explicit that the order does not have to match that of SDP
anant: the only people who have to worry about that is browser vendors, no need to be exposed to users.
stefan: I liked it better when there were two different lists.
adam: it was easier to query whether
there is audio or video.
... Moving on to definitions.
... MediaStream represents stream of media data. Do I need to go
through it?
cullen: find this definition fascinating. Can you have stereo audio in two tracks? Is voice and video one track? audio and DTMF? No idea.
anant: a track is lowest you can go. Having 5.1 audio in one track looks weird.
<juberti> what about comfort noise?
<juberti> is that the same track as audio?
cullen: need some group for synchronization, but separate thing.
anant: getObjectURL function is on the MediaStream, right? When you assign a stream to a video element.
cullen: presumably, if I have a stream with 3 video streams, I want to send it to 3 different video elements.
anant: media fragment could be used to select the track you're interested in.
DanD: as long as we all agree on
what's inside, we're in good shape.
... This is a good start for a glossary.
cullen: let's say that graphic card has VP8 support. You can't assume that the clone happens before the decoding happens.
[discussion on gstreamer and tracks]
anant: I think gstreamer has two separate tracks-like for stereo audio.
tim: surely, a 5.1 audio is one source for gstreamer.
adam: the motivation to remove the parallel between MediaStreamTrack and media track is that audio was a multiple list whereas video was an exclusive track.
hta: basically one media streamtrack is one stream of audio.
cullen: stereo is two tracks, 5.1 is 6 tracks. That's very easy to deal with.
anant: you want to be able to disable audio tracks.
tim: how do I know which track is the rear right and so on?
DanD: technically, with 3D video, you'll want to sync those two tracks.
francois: 6 tracks for 5.1 audio means disabling audio is disabling 6 tracks.
anant: we can add a layer at MediaStream level.
burn: the real world allows both, combined or not.
cullen: question is does something
that is jointly coded with multiple channels, is that one
track?
... If that's one track with a bunch of channels, the fact that it
could be represented as two tracks sounds like a complete
disaster.
... We need some abstraction layer to ease the life of Web
developers.
hta: in the case of 4 microphones, you want to send 4 tracks. With 6, you want to send 6 tracks.
anant: I think early implementations will only support one or two channels at most.
tim: there are plenty of places where we can get audio that is not one channel.
anant: right, from files, for
instance.
... my preference is to stick to a MediaStreamTrack as the lowest
thing.
adam: moving on. An instance of a MediaStreamTrack can only belong to one MediaStream.
anant: noting that "track" is really not the same thing as a track in container formats, etc., so we need to be explicit in the doc about that, not to create additional confusion.
[meeting adjourned, discussion on MediaStream to be continued on day 2]