Santa Clara F2F Summary

From Web Real-Time Communications Working Group Wiki

This is a summary of the WebRTC WG discussions in Santa Clara on 31 October and 1 November 2011. Please check the minutes for a more detailed report:


IETF Architecture Overview

See also: session minutes

Harald presents the IETF overview document that sets the context of the WebRTC effort.

The focus is on designs that meet the priority use cases. The Web browser may be trusted, all the rest must be assumed evil. Data congestion control is key. RTP will be used. Whether everything needs to be encrypted or not is still controversial, the group is considering DTLS-SRTP key negotiation for that purpose.

For connection management, ROAP seems to be the least controversial proposal on the table, allowing innovation in what-connects-to-what. Processing audio data will hopefully be addressed by the W3C Audio WG.


Use cases and Requirements

See also: session minutes

Stefan goes through the list of identified use cases.

No really-low-latency requirement derived from Distributed Music Band use cases, which are there more to show the need to distinguish between voice and music when it comes to e.g. echo cancellation.

Some discussions on adding a use case on augmented reality, and on the ability to overlay a video stream on top of another. Stefan maintains a list of open issues on use cases on the Wiki.

Requirements derived from draft-jesup-rtcweb-data are considered. Both reliable and unreliable datagram streams very likely need to be supported.

Requirements from sipdoc draft should already be covered by existing use cases.

Requirements from draft-kaplan-rtcweb-api-reqs does not introduce new use cases but adds a lot of requirements. Discussed later on in the meeting during "Low level" session.


Security requirements

See also: session minutes

Eric Rescorla presents the RTCWEB security model, insisting on the differences between the browser threat model and the Internet threat model and the fact that our threat model is a combinaison of both. Three main issues to address:

  • access to local devices where some form on informed user consent is needed
  • consent to communications where the group will need to think about short-term and long-term permissions, allowing mixed content or not which would enable MITM attacks, and users authentication.
  • communications security which should not be a problem if above points are addressed properly and assuming something like ROAP.


Status and plans in the DAP WG

See also: session minutes

Robin (chair of DAP WG) presents the status of Media Capture in DAP. The declarative approach will be moved forward. The programmatic approach should end up being dropped in favor of something like getUserMedia. The overlap between WebRTC WG and DAP WG is good but some DAP participants are willing to work on getUserMedia for non peer-to-peer scenarios. Adrian Bateman (Microsoft) explains that Microsoft joined DAP to work on Media Capture in particular, but without willing to work on peer-to-peer communications at this point.

A Media Capture API is in both group charters. The ownership of getUserMedia is in WebRTC currently. The idea of splitting getUserMedia out of the main WebRTC deliverable and working on the result as a joint deliverable in DAP and WebRTC is proposed. It is received with mixed reactions. Concerns that this will delay things are raised, to be balanced with the fact that potentially blocking comments would be coming late and would be less detailed if WebRTC continues to work on the spec on its own. It is also unclear whether the split actually makes sense.

WebRTC editors will try out the split. Robin will prepare a proposal for turning getUserMedia into a joint DAP and WebRTC deliverable. No decision taken on this particular point during the meeting.


Access control model and privacy/security aspects

See also: session minutes

Anant mentions that the current draft does not specify what happens in terms of getting user permission when a call to getUserMedia is issued. After further discussion, the general consensus is that the spec should lay out the steps but not specify how the User Agent needs to do it. However, we're still at the experimentation phase, and do not really know what we need to show the user yet.

To avoid fingerprinting issues, the Web app will not be allowed to enumerate available devices (e.g. cameras, microphones), but only to hint as to what it needs.

Anant shows some early UI mockup. Participants discuss on short-term and long-term permissions and whether an app should be able to hint that it would like long-term access (or perhaps hint that it doesn't need long-term access).

Open questions include:

  • what happens if devices are already in use by another app? Again, for fingerprinting issue, there should be only one fail and the Web app should not know more about it.
  • what is the interaction for an incoming call?

Web applications in an iframe coming from a different origin should not trigger the same interface but rather open a new window that ensures the user knows he switched to another domain name.


Stages for moving to a REC

See also: session minutes

Dan Burnett reviews the different maturity levels of a specification, and the requirements to fulfill each time to move forward. We're at the very first stage, First Public Working Draft. At Last Call, the group says "We think we're done" and that's when external comments usually come in (they must be addressed). Candidate Recommendation exit criteria need to be chosen. At the very least, two implementations need to implement each feature of the spec. A test suite will need to be developed.


Low Level Control

See also: session minutes

Dan Burnett presents the original proposal for a low-level API. There is little support for a low-level signaling API in IETF. The group reviews the API requirements identified in draft-kaplan.

In summary, while the requirements may not be addressed or may need to be refined, there is general interest for fleshing out APIs for hints, statistics and capabilities, although capabilities should not be done early on. The editors will work on such proposals.


Data Streams

See also: session minutes

Justin presents a proposal for DataStream, based on jesup draft for underlying protocols. Both unreliable and reliable datagrams are supported.

The proposal looks good, although properties and functions should be more aligned with those of Web Sockets. Justin will work on an update (actually already done, see Justin's follow-up message)


MediaStreams

See also: session minutes (day 1) and session minutes (day 2)

Adam presents the status of MediaStream. The discussion raises the question of the exact definition of a MediaStream, and more specifically what a MediaTrack is (e.g. are "channels" tracks?)

A track is not used in the same acceptation as in other contexts, that needs to be made explicit in the spec. The definition needs to be refined as well. Whether channels within tracks (e.g. left/right channel in a stereo track) needs to be exposed to the JavaScript remains an open question.

Also MediaStream and MediaTracks should end up with a definition that is consistent with that used for the HTML5 MediaElement on one side, and should be easily mappable to underlying protocol objects on the other side.

The parent-child relationship between two streams upon cloning got removed to make it easy for developers to understand how enabling/disabling works. General agreement that there should be a way to delete tracks from a list (MediaStream track lists are immutable for the time being). The notion of "stopping" a local media stream needs to be properly defined.

For recording, the general feeling is that we should scrap the part on recording in the spec for the time being, gather requirements for this that could be addressed later on.

Cloning, composoting and their effect on synchronization get reviewed. User authorization is tricky to handle when MediaStreams can be cloned and sent to different peers. It is hard to know where at which level the permissions should be handled.


PeerConnection

See also: session minutes

Cullen presents the different features present in ROAP and how it fits with the JavaScript API the group is working on. The ROAP proposal is a first step and needs refinement. It deals with glare in a SIP-like fashion but a better solution based on random timeouts might be better.

Discussion on passing STUN and TURN credentials in the JavaScript, and supporting DTMF.


Audio WG

See also: session minutes

Chris Rogers presents the Web Audio API, showing how it could be used in the context of WebRTC communications. Processing through the Web Audio API may operate at the channel level. Most of the processing is done in native code to reduce latency.

Echo cancelation is not yet included in the draft. Placement node allows to position audio source in a 3D space (addressing requirement F13). A new AudioLevel node likely needs to be added to address F14. The AudioGain node would address F15, the mixer would de facto address F17.

Right now, there is no simple way to create an event to a level filter, but such a gate could be added.

The discussion shifts on comparing the Web Audio proposal and the proposal from ROC at Mozilla, where most of the processing is done in JS, which triggers latency concerns (although the use of Web Workers may alleviate that concern). The Audio WG is not clear yet as to which API will move forward.

The WebRTC WG will send requirements to the Audio WG to ensure they get properly addressed by the Audio WG.


Implementation status

See also: session minutes

The group discusses implementation updates from Ericsson, Google, Mozilla and Opera. Discussion on implemented features and libraries used (e.g. libnice for ICE, implementation of DTLS-SRTP, libjingle, crypto keys negotiation).


Incoming notifications

See also: session minutes

Dan Druta raises the issue of incoming calls and how to notify the user depending on whether the Web app that should receive the call is running, running in the background, running as a headless app, or not running at all. Web Notifications and Server-Sent Events seem the way to go to address most of the scenarios, but this will have to be investigated through running code. Dan Druta will look into that.


Next WebRTC WG call at some point in December. Chairs will setup a poll.