Web is Communications

TAD Summit Asia Spring 2021, Dominique Hazael-Massieux

slides

Transcript

Hello, my name is Dominique Hazael-Massieux and I'm here today to demonstrate how fundamental the Web is to communications, and more specifically its critical role as a platform for programmable communications.

Let me introduce myself quickly first - I work for W3C, the World Wide Web Consortium, the international non-profit organization responsible for standardizing many of the Web technologies you use on a daily basis, in particular (but not exclusively) in your Web browsers.

W3C is a community of organizations & individuals working together to ensure the Web grows in accordance with the values of universality that have always been at the core of its design, while integrating technlogical improvements that are emerging over time.

W3C accomplishes this mission through the work of its global Membership, with more than 430 organizations involved today, supported by a technical staff of around 60 people located in North America, Europe and Asia.

I am part of this technical staff, and among my various roles in W3C, I have been in charge of the standardization of WebRTC (Web real-time communications) and more generally, I am responsible for ensuring the Web fulfills the needs emerging from the communication ecosystem.

Naturally, when I claim that the Web plays a critical role in programmable communications, the foundation of that claim is WebRTC: as many of you will know, WebRTC is a collection of standardized protocols & JavaScript APIs that enable setting up audio-video communications between Web browsers and any other WebRTC endpoints, including gateways to non-WebRTC systems.

W3C is where the WebRTC JavaScript APIs have been designed and are still evolving to adapt to the dynamic needs of audio-video communications.

Earlier this year, WebRTC reached a very significant milestone of that development with its first release as a W3C standard, along with the release of the 75 IETF RFCs that define the underlying protocols & architecture, recognizing its nearly universal availability on billion of devices.

The impact of WebRTC has been felt in the industry for many years, but the past 14 months have highlighted its critical role to an extent none of us involved in its development had anticipated.

The COVID-19 pandemic has prevented many of us to use our most essential way of communicating one with another, through physical shared presence; a huge chunk of that communication need has migrated to the virtual space, and while audio/video real-time communications aren't yet quite as good as the real thing, this has brought significant resilience to the world activities during one of the most challenging crisis it has faced in the past decades.

And among all the ways of establishing audio-video communications, browser-based WebRTC has been the universal entry point, enabling anybody to take part to any of these discussions through just a click or a tap.

Browser makers and RTC solutions providers have all reported a huge uptick in the usage of browser-based WebRTC, including on mobile.

This has also meant we've been exposed to the many ways our existing real-time communications infrastructure need to evolve to deal with the many new ways people need or want to use it for.

Interfaces and user experiences that were optimized for office workers and technology-inclined users have shown their limitations when they need to be incorporated into much more diverse environments with much more diverse participants: in the class room, in the doctor's office, in a sport stadium, in the theater, and many more.

And the Web has here again a critical role to play: anyone can build anything they want on the Web, and they only need a Web server to make their idea available to 4 billions users at the end of link.

With WebRTC available everywhere, we are witnessing a swirl of new usages, customized to these different needs that have emerged with the pandemic and will likely stay and grow well-after its end.

This capacity of the Web to host permission-less innovation has always been a key to its success and will hopefully again show its value in bringing all the innovation that real-time communications are calling for.

We know a lot of that innovation is already possible with the many APIs available in Web browsers today; there are also a set of known gaps in these APIs that we are working in addressing in W3C and which I'll cover next.

And there are likely many gaps we're not aware of or have not been prioritized based on our current understanding of the needs - I want to hear from you if there are needs in this space you're not seeing addressed.

But first, what are the gaps we know of and are working to address?

One area that has shown needs for improvements is around screen sharing: while browser APIs already provide some of the basic features needed to share all or part of one's screen during a videoconference, we know that the user experience they enable is not optimal compared to what can be achieved with native applications and we are considering a series of proposals that should improve that situation, in particular when it comes to sharing content from the browser itself.

Separately, with virtual communications gaining so much prevalence in so many aspects of our lives, the pandemic has seen increased interest in upgrading the security model of WebRTC.

WebRTC already ensures full & robust encryption between end-points, ensuring no one can listen “on the wire”.

But with the increase surface of usage, interest in limiting what server components (SFUs) can get access to has risen again: the desire is that these server components get access to enough metadata to do their job of routing media streams to the right end points, but should no longer be able to read or change the media streams.

Making this possible require improvements both in protocols and in APIs - the SFrame work happening in the IETF is a key element of the story at the protocol layer, while in W3C, the WebRTC Encoded Transform API is the hook that will allow to inject encryption seamlessly in the client end points.

Now, this leaves open a pretty major question: encryption is only as effective as the key management that supports it, and right now, that aspect has been left to application developers.

It's very likely we will want to expose new browser APIs that deals with key management: this is needed to isolate media streams from the end-point application code itself to enable WebRTC services that are cryptographically guaranteed to be firewalled from the content they transmit.

This will need to integrate with some of the identity and authentication mechanisms for the Web, including other technologies developed in W3C, such as Web Authentication and Decentralized Identifiers.

With what is available today, we want service providers to experiment in that space, so that we can hear from services who want to bring these higher guarantees to help design the stronger framework.

The WebRTC Encoded Transform API I mentioned earlier opens the door to much more than just end-to-end encryption: it provides a way for WebRTC applications to apply efficient media processing on the streams before they get sent over a WebRTC connection - that cover popular features such as background removal or voice processing, and more generally any operation that can run on real-time video in the browser, which with high performance computing APIs such as Web Assembly and WebGPU open up a huge swath of opportunities.

And the spectrum of high-perfomance computing capabilities in browsers is about to expand.

W3C announced a couple of weeks ago an addition to these which we can expect to play a major role in the communications ecosystem: work on standardizing the Web Neural Network API (WebNN) is starting.

It will enable hardware-accelerated use of Machine Learning algorithms in the browser and make them usable client-side for many real-time media processing scenarios.

Machine Learning in the browser opens up low-latency smart noise reduction, codec optimization, object detection & recognition, automatic transcription and many more: making that kind of capability available for any communication platform developer on all browsers will reshape what we think of communication systems in the first place.

One domain where these Machine Learning capabilities are bound to have a particularly important impact is in empowering people with disabilities.

W3C has a long-established track record of work to ensure that Web applications can be used by everyone no matter their disabilities.

With virtual interactions becoming the norm in an increasing number of situations, ensuring access to people with disabilities is proving all the more critical and is indeed promising to expand opportunities for people for whom physical barriers have proved too hard to overcome.

W3C is actively developing guidance for real-time communication providers to make best use of that opportunities for the benefits of all.

Empowering WebRTC service providers to adapt to the needs of their particular use cases & communities is an overall theme of the evolutions that are being desiged in W3C.

That extends to supporting different architectural designs that can for instance build on different network optimizations.

We have thus started to provide new capabilities that let developers replace some of the architectural designs that currently comes baked-in with the APIs.

WebTransport for instance learns from all the usage that WebRTC has seen for transfer of low-latency data and proposes a dedicated low-latency client-server network API that builds on HTTP3 & QUIC.

This opens the way for instance to replace the RTP protocol that WebRTC uses for media transmission with new approaches, with possibly very different trade-offs in terms of congestion control or latency guarantees.

This also open the door for scaling real-time transmission of one-to-many scenarios - in a move that will further close the gap between communication & broadcasting.

Likewise, our work on WebCodecs opens the blackbox of media encoders & decoders that are currently provided as-is in browsers, opening up many fine-tuning of encoding & decoding parameters that can be key to optimizations needed to scale various communication scenarios.

Speaking of codecs, one of the capabilities that modern codecs such as AV1 are exposing is Scalable Video Coding which allows even more efficient transmission of videos at different resolutions and framerates to match the capabilities of the various endpoints and their networks.

These capabilities are being exposed to WebRTC browsers via our WebRTC SVC work.

And given the many innovations happening around new codecs generally, we may come to a point where allowing communication providers to bring their own codecs become a necessity: the combination of WebCodecs and improved hardware-accelerated computing capabilities (in particular through WebGPU) may make this a practical option, although there remain significant challenges to make this work across the wide variety of contexts in which Web browsers run.

And these possibilities opened by broader improvements to what browsers can do illustrate a theme of where we expect a lot of innovation to emerge from: as browsers become more and more powerful while keeping their one-click magic, communication-based apps can build on that magic, and conversely, any kind of apps can tap into the human touch that real-time communications enable.

To give you a quick glimpse of some of these improvements that may seed the next generation of services in this space: Progressive Web Apps or PWA is a catch-all terms that refer to the many additions to Web browsers that make Web apps fully integrated in their hosting operating system, in particular on mobile - this includes showing up on the home screen, support for notification, running in the background - there is a lot room for web-based communications apps to benefit from these additions, and inspire new ones: for instance, making WebRTC-based calls work while the browser is in the background, or integrate with the call-priority stack on phones, or show up in the context of the dialer - there is no lack of ideas in this space, but we know we need more contributors to our work to make them happen.

As part of the W3C Web and Networks Interest Group, we are also exploring how the rapid evolution of network technologies (5G, edge computing, HTTP3 and QUIC) are reshaping what Web applications can achieve and how they can best adapt to the wide variety of dynamic network conditions in which they need to operate.

Another area that is seeing at lot of rapid improvements is around payments: our Web Payments stack enables smooth & secure payment checkout, for both physical and digital goods, which combined with WebRTC creates a new set of possibilities in human assisted e-commerce.

And early work specialized on payments for digital goods & services opens up new ways of monetizing communication-based services, including through proposals of using streaming payments over time.

And I'll finish that glimpse of broader improvements that will have intersections with communications with one close to my heart since I helped getting the work started in W3C: the Immersive Web is our vision of making Web browsers a platform to deploy virtual and augmented reality experiences.

And as we learn of all the things that cannot be mediated simply through 2D audio and video, the possibilities that these immersive technologies open up in making our virtual communications more embodied, more physical, more local are in my mind one of the next frontier that communications apps will need to cross.

And while there is already a lot of experimentation in this space, it's also clear there is a lot of more work needed to make these experiences smooth enough to replace or complement their less immersive alternatives.

With the Web providing a single platform where these two universes can be combined seamlessly, we expect to see lots of rapid iterations by innovators of all kind and sizes to make emerge this new ecosystem.

And hopefully that example is good way to illustrate my claim: the richness of the Web platform, its openess allowing anyone to innovate without having to ask anyone else permission, its low-friction model that makes any app available at the end of a click, have been key components of its success as platform in general, and are bound to make it an ever more important platform for innovative communication apps & services.

If that claim resonates with you and you want to help build the next generation platform for communications, I hope you'll get in touch to get involved in our collaborative work.

Many thanks for your attention!