WebRTC Next Version Use Cases

W3C First Public Working Draft

This version:
https://www.w3.org/TR/2018/WD-webrtc-nv-use-cases-20181211/
Latest published version:
https://www.w3.org/TR/webrtc-nv-use-cases/
Latest editor's draft:
https://w3c.github.io/webrtc-nv-use-cases/
Editor:
Bernard Aboba, Microsoft Corporation
Author:
Participate:
Mailing list
Browse open issues

Abstract

This document describes a set of use cases motivating the development of WebRTC Next Version (WebRTC-NV), as well as the requirements derived from those use cases.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Web Real-Time Communications Working Group as a First Public Working Draft. Comments regarding this document are welcome. Please send them to public-webrtc@w3.org (subscribe, archives).

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 February 2018 W3C Process Document.

1. Scope and Motivation

To motivate the development of WebRTC 1.0, the IETF RTCWEB WG developed [RFC7478]. This document describes use cases motivating the development of "WebRTC Next Version" (WebRTC-NV), and the requirements deriving from those use cases. The use cases fall into one of two categories: enhancements to use cases already covered in [RFC7478], and new use cases currently not implementable in WebRTC 1.0.

1.1 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

2. Existing Use Cases

The uses cases in this section improve upon use cases described in [RFC7478].

2.1 Multiparty online game with voice communications

[RFC7478] Section 2.3.12 describes a use case involving a multiparty online game with voice communications. In these scenarios, reducing time to join the game and receive media is important. To minimize this, ICE enhancements are desirable, such as the ability to control candidate gathering and pruning. Also, “parallel forking” minimizes conference establishment time by allowing a participant to broadcast an Offer to a “room” abstraction (maintained on a server), with other room participants responding back directly to the Offerer, avoiding a separate discovery step. It is desirable to allow media to be received from responders before the initiator receives an answer. Also, the ability to impose a bandwidth limit across all mesh endpoints limits the build up of queues that can affect audio quality or perceived latency in the game. Supporting this enhancement adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N01             The user agent can control candidate gathering
                   and pruning, limiting the networks on which
                   candidates are gathered, the types of candidates,
                   etc.
   N02             The user agent must be able to support parallel
                   forking, including reuse of local ICE candidates
                   and local certificates with multiple Answerers.
   N03             The user agent must be able to impose a bandwidth
                   limit across mesh endpoints.
   N04             Early media must be supported.

Experience: This use case has been implemented by a gaming service utilizing [ORTC].

References:
  1. ORTC Issue 54
  2. ORTC Issue 603

2.2 Mobility

[RFC7478] Section 2.3.6 describes a simple communications service where the user changes access network during the session. This use case is enhanced by being able to re-route media over an alternate path (potentially taking network cost into account) without need for signaling.

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N05             The ICE agent must be able to maintain multiple
                   candidate pairs and move traffic between them.
   N06             The ICE agent must be able to take the network
                   cost into account when considering re-routing.
References:
  1. Mailing list proposal
  2. Mailing list proposal
  3. ORTC Issue 583

2.3 Video Conferencing with a Central Server

[RFC7478] Section 2.4.3.1 describes a use case involving Multiparty Video Communications with a central conferencing server. In such a use case, clients with disparate capabilities such as differing bandwidth availability, screen size and maximum displayable frame rate may participate in the same conference. In such a situation it is advantageous to support Scalable Video Coding (SVC). Encoding with temporal scalability is supported by several browsers today and is utilized by most centralized conferencing services.

It is expected that spatial scalability (supported by VP9 and AV1) will become more popular with time. In this use case, if the desired video codec is known beforehand and participants are muted by default (as in a very large meeting), it is desirable to allow new participants to start receiving immediately, without negotiation. Supporting this enhancement adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N07             The user agent must be able to encode and decode
                   video utilizing temporal scalability and (if
                   supported by the chosen codec) spatial scalability.
   N08             A user agent can receive audio/video without a
                   corresponding sender.
   N09             It is possible to select the sending and/or
                   receiving codec without negotiation.
   N10             The user agent must be able to control
                   robustness (RTX, RED, FEC) applied to individual 
                   simulcast and SVC layers.

This use case has been implemented by conferencing services utilizing [ORTC], as well as proprietary additions to [WEBRTC10].

3. New Use Cases

Several new uses cases relate to scenarios that cannot be supported in [WEBRTC10].

3.1 File Sharing

Participants in a mesh exchange large files without disruption to audio/video sessions. It is also possible for a participant to send a large file to a user who is not currently online. Supporting this use case adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N11             It must be possible for the user agent to
                   transfer large files as a single message.
   N12             The application must be able to signal backpressure
                   (flow control) when receiving data. It must also
                   receive a backpressure signal when sending data.
   N13             It must be possible for the user agent to transfer
                   data utilizing a congestion control algorithm
                   that does not compete aggressively with
                   audio/video communications.
   N14             It must be possible for the file exchange to
                   be supported by servers as well as user agents.
   N15             It must be possible to support data exchange
                   in a worker.
References:
  1. Mailing list discussion
  2. Mailing list discussion

3.2 Internet of Things

An IoT sensor maintains a long-term connection and seeks to minimize power consumption. Some of the sensor’s data may need to be sent reliable and ordered while other sensors may provide data that can be sent unreliable and unordered or in a partially reliable manner. This use case adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N16            The application must be able to minimize ICE
                  connectivity checks.
   N17            The application must be able to control aspects
                  of the data transport  (e.g. set the SCTP
                  heartbeat interval or turn it off), RTO values,
                  etc.
   N18            It must be possible to send arbitrary data
                  reliable, unreliable or partially reliable with
                  a specific maximum number of retransmissions
                  or a specific maximum timeout.
   N19            It must be possible to send arbitrary data
                  ordered or unordered.
Reference: Mailing list discussion

3.3 Funny Hats

A communications service that manipulates captured media prior to encoding and after decoding to provide effects including:

  1. Funny hats
  2. Background removal or blurring
  3. In-browser compositing
  4. Voice effects
  5. Stress detection

This use case adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N20             The application must be able to obtain raw media
                   from the capture device.
   N21             The application must be able to insert processed
                   frames into the outgoing media path.
   N22             The application must be able to obtain decoded
                   media from the remote party.
   N23             It must be possible to efficiently share media
                   between the main thread and worker threads.
   N24             It must be possible to do efficient media
                   manipulation in worker threads.
References:
  1. Mailing list discussion
  2. Mailing list discussion
  3. Sharper Image Research

3.4 Machine Learning

In a web game called “NameTheBird.com” participants use their devices to provide audio and video observations of birds to the service along with identifications for training purposes, allowing the service to identify birds from the provided audio and video and returning this information to the users in real-time.

The web application has a site specific federated learning-based classifier for contextual object detection, user intent prediction and media manipulation, allowing it to augment the streams it receives and inject identifying or other supplemental information into the streams sent or received.

The shared classification models are trained on the birds found by the participants and are based on the feedback of the participants. Each device client updates of the model are up-streamed to a shared model server that pushes updates of the global model to the clients.

Implementation outline:

  1. Originating media (raw) streams are cloned for inference and training purposes, denoted “inference stream” and “training stream”, with the inference stream also being the media stream shared with peer(s). The cloning can occur any time during a session.
  2. Inference stream: A web site specific classifier acts on the raw inference stream, with the result used to guide a custom encoder in the sender device and send metadata to the server and peer devices outside the media stream. The encoder adds proper augmentation, e.g. sign with “name this bird” hovering over the enlarged bird in case of video enrichment, or enhanced bird song if audio.
  3. Training stream: Model in training classifies the raw data and evaluate the classification using user feedback, said feedback loop being web site specific. The evaluation may be “online” or “offline”, offline meaning the training is done at a later stage on the recorded encoded media set.
  4. Both inference stream and training streams may use payload protection depending on trust model on compute resources for optional intermedia server side of app.
  5. Both inference stream and training streams use transport object for communicating with peers or servers, the communication in some cases can be a site specific QUIC based transport solution, in others RTP based.

This use case adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N20             The application must be able to obtain raw media
                   from the capture device.
   N21             The application must be able to insert processed
                   frames into the outgoing media path.
   N22             The application must be able to obtain decoded
                   media from the remote party.
   N23             It must be possible to efficiently share media
                   between the main thread and worker threads.
   N24             It must be possible to do efficient media
                   manipulation in worker threads.

3.5 Virtual Reality Gaming

A virtual reality gaming service utilizing a centralized conferencing server wants to synchronize data with media, using an existing Selective Forwarding Unit (SFU) to distribute the data. This use case adds the following requirements:

   ----------------------------------------------------------------
   REQ-ID          DESCRIPTION
   ----------------------------------------------------------------
   N25             The user agent must be able to send data synchronized
                   with audio and video.
References: Mailing list discussion

3.6 Requirements

This section lists the requirements arising from the use-cases catalogued in this document.

  ----------------------------------------------------------------
   REQ-ID        DESCRIPTION
   ---------------------------------------------------------------
   N01           The user agent can control candidate gathering
                 and pruning, limiting the networks on which
                 candidates are gathered, the types of candidates,
                 etc.
   N02           The user agent must be able to support parallel
                 forking, including reuse of local ICE candidates
                 and local certificates with multiple Answerers.
   N03           The user agent must be able to impose a bandwidth
                 limit across mesh endpoints.
   N04           Early media must be supported.
   N05           The ICE agent must be able to maintain multiple
                 candidate pairs and move traffic between them.
   N06           The ICE agent must be able to take the network
                 cost into account when considering re-routing.
   N07           The user agent must be able to encode and decode
                 video utilizing temporal scalability and (if
                 supported by the chosen codec) spatial scalability.
   N08           A user agent can receive audio/video without a
                 corresponding sender.
   N09           It is possible to select the sending and/or
                 receiving codec without negotiation.
   N10           The user agent must be able to control
                 robustness (RTX, RED, FEC) applied to individual
                 simulcast and SVC layers.
   N11           It must be possible for the user agent to
                 transfer large files as a single message.
   N12           The application must be able to signal backpressure
                 (flow control) when receiving data. It must also
                 receive a backpressure signal when sending data.
   N13           It must be possible for the user agent to transfer
                 data utilizing a congestion control algorithm
                 that does not compete aggressively with
                 audio/video communications.
   N14           It must be possible for the file exchange to
                 be supported by servers as well as user agents.
   N15           It must be possible to support data exchange
                 in a worker.
   N16           The application must be able to minimize ICE
                 connectivity checks.
   N17           The application must be able to control aspects
                 of the data transport  (e.g. set the SCTP
                 heartbeat interval or turn it off), RTO values,
                 etc.
   N18           It must be possible to send arbitrary data
                 reliable, unreliable or partially reliable with
                 a specific maximum number of retransmissions
                 or a specific maximum timeout.
   N19           It must be possible to send arbitrary data
                 ordered or unordered.
   N20           The application must be able to obtain raw media
                 from the capture device.
   N21           The application must be able to insert processed
                 frames into the outgoing media path.
   N22           The application must be able to obtain decoded
                 media from the remote party.
   N23           It must be possible to efficiently share media
                 between the main thread and worker threads.
   N24           It must be possible to do efficient media
                 manipulation in worker threads.
   N25           The user agent must be able to send data synchronized
                 with audio.

A. References

A.1 Normative references

[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

A.2 Informative references

[ORTC]
Object RTC (ORTC) API for WebRTC. Robin Raymond. W3C. 26 June 2018 (work in progress). URL: https://w3c.github.io/ortc/
[RFC7478]
Web Real-Time Communication Use Cases and Requirements. C. Holmberg; S. Hakansson; G. Eriksson. IETF. March 2015. Informational. URL: https://tools.ietf.org/html/rfc7478
[WEBRTC10]
WebRTC 1.0: Real-time Communication Between Browsers. Adam Bergkvist; Daniel Burnett; Cullen Jennings; Anant Narayanan; Bernard Aboba; Taylor Brandstetter; Jan-Ivar Bruaroey. W3C. 27 September 2018. W3C Candidate Recommendation. URL: https://www.w3.org/TR/webrtc/