W3C

- DRAFT -

Media and Entertainment IG

07 Jul 2020

Agenda

Attendees

Present
Kaz_Ashimura, Chris_Needham, Huaqi_Shan, Kazuhiro_Hoya, Peipei_Guo, Rom_Smith, Takio_Yamaoka, Tatsuya_Igarashi, Chris_Cunningham, Barbara_Hochgesang, Will_Law, Yjun_Chen, Lally_Zhao, Francois_Daoust, Rijubrata_Bhaumik, John_Riviello, John_Simmons, So_Vang, Andreas_Tai, Anssi_Kostiainen, Gary_Katsevman, Paul_Adenot, Yash_Khandelwal, Kasar_Masood
Regrets
Chair
Chris, Igarashi
Scribe
cpn, kaz

Contents


Web Transport WG

<inserted> scribenick: cpn

Francois: We've been discussing Web Transport, creating a working group
... Could be useful for low latency, but needs input from media companies
... For media scalable streaming, to millions of users, there needs to be additional semantics for CDNs to understand what the byte stream contains, to enable caching
... It would be useful for media companies to join discussion on WT API to bring their requirements for low latency streaming
... I encourage you to support the call for review, raise concerns if you have them, and when the WG is created, please bring requirements to the group
... Will is a proposed chair for the group, so good representation for media

Will: We're putting together a use case document, setting up a repo, this is ahead of chartering the WG
... We'll reach out to the IG to help put together the use cases

<Zakim> kaz, you wanted to ask tidoust for the resources

<tidoust> Proposed W3C Charter: WebTransport Working Group (until 2020-07-27)

Web Codecs

<tidoust> WebCodecs repo

Paul: I'll present a high level intro to Web Codecs
... We realised that on the web there are lots of ways to encode and decode audio and video media, possibly with multiple tracks
... First we had the media element, it takes a URL and magically renders the audio and video frames
... It's high level, some controls are available, but it's just for playback, and works well
... We then added Media Source Extensions, to be closer to the bytestream, but there's still lots of magic there
... It demuxes the packets and renders. It's good for a lot of purposes
... For decoding audio, we have decodeAudioData in Web Audio, for audio in memory, but this can't stream or touch the network or disk
... It's fine, but you may also want to progressively decode. Decoding audio is quite big, not as much as video
... It decodes as fast as the machine can, not in real time
... There's lots of issues filed against Web Audio about this. There's no progressive decoding, e.g., so you can't get just the first 30 seconds
... No progress feedback, useful as it takes a long time
... You need to demux and re-mux yourself. It's only useful really for small audio samples for playback
... Another way to decode A/V is in teh PeerConnection WebRTC object, you can get a MediaStream with audio and video tracks in real time
... Processing such as jitter buffer. Lots of magic, but some perceive this as not enough control
... There's no all purpose API for advanced use cases, and lots of people complain, legitimetly
... For encoding, it's worse. We have MediaRecorder. This works with media streams with audio and video tracks
... It is set up with a codec string and other parameters, which have been added in an ad-hoc fashion
... It will give you a blob of the contents. If you want to minimise latency between media hitting the encoder and being able to send on the network
... It's high level, transform the blob to arraybuffer. It's for recording for offline use, e.g, recording a WebRTC call
... Other ways to encode include RTCPeerConnection, but this is coupled to the transport, emits RTP packets
... It's lossy, and there are other concerns
... Looking at spec issue trackers over the years, we saw the need for something to solve all these issues
... Low latency audio encoding and decoding for broadcasting, configurable encoding, maybe implememnting a whole media playback stack
... so you have tight control over A/V sync, possibly using WebGL
... We tested a few approaches. I wondered if we could use MediaStream for this. But this is in a real-time clock domain
... Lots of work was done to break away from Media Stream, looking at other *** streams which can work offline
... The current iteration of the API is in Chromium behind a flag. It seems much simpler, relies on fewer concepts
... A few reasons for this. If we want to solve all use cases, we needed to be low level, and unopinionated on how to do things
... I epxect libraries to build on this, e.g., WebGL, ThreeJS
... There's lots of control, with audio encode and decode, as a low level API. It has a nice integration with WASM
... It's an API shape that's not traditional for the web. Callbacks and events. Accepts ArrayBuffer objects. API principles that have been long established in native code
... The API is supposed to be able to encode/decode with a symmetric API
... It will hopefully work in the same way with audio and video, although some concepts work for one but not the other
... If we're successful, you should be able make any app you want with this. There should be no limit, like with a native API
... The level is as low as we can make it. Video frames and buffers on the one side, encoded packets on the other
... Out of scope for now is muxing and demuxing the media. This would have to be provided by web content authors
... This may seem to be a problem, but often these are memory rather than CPU bound
... There are other concerns, issues to deal with. More concretely, Chris can present the API itself

Chris Cunningham's related presentation video

ChrisC: [shows slides]
... [Canvas setup example] How to paint a video frame? Transfer to a bitmap and render on a canvas
... [Decoder example] The constructor arguments include paint and output callbacks, an error callback.
... You configure with desired parameters, pass in encoded chunks, and the output callback is called with the decoded frames
... There's a reverse symmetry with the encoder APIs, frames to encoded chunks
... The IDL for our latest thinking is in Chrome's web codecs folder in blink
... We've worked hard on it this quarter, for an origin trial for M86, October
... We have the video decoder wired up in Worker and Window contexts. That's plumbed into our hardware and software decoders
... How to export YUV data from a video frame? It's not seen much in today's web APIs wich are RGB oriented
... There's a proposal to allow YUV access, and you can manipulate the pixels
... Painting directly to a canvas is supported
... Audio is less far along. We have a skeleton interface, and code under review to implement them
... We have this wired up to software decoders, further work for platform decoders
... We plan to have it all ready for origin trial in M86. The intent for this is to get feedback on the API shape.
... It may be that the performance may not be completely ready, but people can try out the proposed API and feed back on if it's viable
... One thing that's critical is to produce a spec. There's an explainer that's mostly up to date. We're moving away from using promises

<tidoust> WebCodecs Explainer

ChrisC: We're working to have a spec draft uploaded soon. We welcome feedback on the spec as well as on the origin trial
... Any questions?

Will: How do you know what codecs are supported, with the codec string? How do we support our own codecs?

ChrisC: These are open questions. There's an open issue about Media Capabilities
... For configuration, if you supply an unsupported codec string, we'd immediately error out. I'd like to explore integration with Media Capabilities
... The codec string is the classic VP9 fully specified, but there's an open issue if this really makes sense. It comes from a containerised description of the media
... For the codec, you only need a subset, such as the profile, or the extra data for x264
... The exact shape of the codec string may be relaxed

Will: So this isn't about bringing your own codec?

ChrisC: That's right. We're not against that, but it's unclear with what integration with would Web Codecs would look like
... People doing that could write an interface matching our own, and fall back

Paul: You can do dependency injection. Do your own codec, and drop it in

Will: And you'd register your codec string?

ChrisC: We may not manifest it that way, security guarantees, we could consider a codec worklet. Nothing defined yet
... You could swap in a software implementation

Paul: Registering WASM in the platform, we've been doing it for some time now. It's complex because of the way browsers are implemented, codecs are more sandboxed
... No explicit registration, but we can use duck typing as you have the same interface

Igarashi: We're interested in low latency video streaming
... What is the buffering model for this interface? Does it buffer packets, or is that done in the application?

ChrisC: The decode call can be made multiple times. THe implementation manages queuing internally, there's no buffering of outputs, so the app would need to do that

Igarashi: Would the application have to buffer before feeding to the decoder. Also, what about rendering?

ChrisC: We're happy to buffer, and you'd see the queue of buffered chunks decrease as it progresses. You could also buffer prior to decode if preferred
... There's an attribute that exposes how much is buffered, so you can use this as a backpressure signal

Igarashi: What is the relation between Web Transport and Web Codecs? COuld you directly feed a stream from WT to Web Codecs?

ChrisC: There's no tight integration between those specs, similar to how the network stack and the media stack in Chrome are separate
... It should be possible with little intervening code to feed the data through, but this will depend on if you've done packaging etc

Kaz: Thank you for presenting. A possible use case could include interconnection between video conference systems such as WebEx and Zoom, as a unified video conferencing system?

ChrisC: Yes, absolutely. It's one of the foremost use cases, with a combination of Web Codecs and Web Transport you can do customised conferencing

Larry: We're working on a video editor program, compile multiple audio and videos with images and text
... Can Web Codecs allow these to be mixed into a video file?

ChrisC: Is that text rendered into the image?

Larry: Yes, we'd combine those into a new video

ChrisC: Yes, this should be possible. The primitive for the video encoder will be an image for the video frame. If you're construcing this yourself, you could manipulate the pixels to add text and re-encode

Francois: I'd like to understand how I'd use this. Is rendering via Canvas, or can I connect to a video element? Also, one use case we're considering is media processing. Is a VideoFrame API suitable for video processing, e.g., if you want to do GPU based processing, or run an ML algorithm?
... Would all these APIs need to integrate with VideoFrame?

ChrisC: The current plan is to render via Canvas. An earlier version allowed rendering via media element, but most users we talked to wanted manual control,
... e.g., video editing or rtc use cases
... So we started with the most manual option, happy to explore other things in future
... We hope VideoFrame will serve a number of generic use cases that will allow YUV pixel arrangement, encoded width and height, planes, and this would perform well with WASM and JS
... It's an open question how this could be consumed by Web GPU. No clear answer yet, but we're having the conversation
... Want to have minimum number of copies, use GPU backed memory, avoid forcing a GPU to CPU copy if we can. In some cases it's unavoidable
... We'd love feedback on this

<inserted> gpuweb discussion

Francois: Is HDR support part of this discussion, with YUV support?

ChrisC: There are some proposals for how to do that with Canvas, e.g., extended colour space
... The intent is to be able to paint HDR video in Canvas. Details of how that would work are still TBD, needs integration with the canvas folks

Will: What controls pacing? Is the output called with a precise timing, to maintain frame rate?

ChrisC: The example paints the frames faster than real time. The app would have to buffer outputs and sync with the audio

Will: Do we have accuracy in the timer to do that, e.g., 29.97 FPS

ChrisC: There's nothing inherent in JS that should prevent that. With audio we do sensitive deadline based audio rendering

Paul: It is clearly sufficient to reach the level of quality we have natively, e.g., rational FPS, 29.97 is fine. You need to take into account the screen frame rate
... There are performance timers, with are fuzzed because of Spectre, but they are sufficiently precise
... You can get vsync information from rAF, use the video timestamps, decide yourself
... You have latency information for audio, but you'll know how much latency there is after the AudioContext
... You'll have to shift the video frames yourself to align with the audio
... We'll need to add display latency, it's known that different browser (or the same browser on different platforms) have different latency
... Audio latency can be lower than video latency, so you could break causality, audio comes before the video
... That's fixable

Igarashi: Support for copy protected content? Prevent copying from the Canvas?

ChrisC: There's no integration with existing mechanisms, not discussed at all
... It hasn't been critical to any of the use cases we've prioritised so far. My naive hope is that introducing the decode API separate from content protection is a good first step

ChrisN: Video editing use cases? client-side video editing proposal

ChrisC: You might want to make a precise cut, decode and re-encode based on a subset, Web Codecs is up to meeting the needs of MediaBlob proposal
... The blink-dev thread on Blob based proposal, they explored the performance of JS based muxing and demuxing, and found its up to the task

Paul: In Firefox, demuxing of Ogg Theora or Opus, is done in WASM. We do this for security, also for performance as we don't have to send the data to a different to process
... Increasingly we don't want to run codec code in process
... Performance measurements we made showed this is in the noise, it's memory bound

<inserted> scribenick: kaz

ChrisN: how to proceed?

ChrisC: GitHub issues would be the best place for the discussion

Kaz: link for the repo?

<chcunningham> https://github.com/WICG/web-codecs

ChrisC: will put here

ChrisN: and slides?

<chcunningham> https://www.youtube.com/watch?v=nhTxJBgTywc&feature=youtu.be&list=PLNYkxOF6rcIBhuGsbO6t8-OBE5-fVPe7K&t=521

ChrisN: tx!
... really interesting and exciting discussion today

Paul: we have so many people from audio group also excited

ChrisN: a couple of open source projects also working on media editing on the browser
... becoming more and more possible these days
... audio/video non-linear editing

AOB

Kaz: there was another topic for today on the WoT use cases related to media handling, but I'd like to initiate the discussion about that by email instead given the time

Next meeting

ChrisN: will be in August
... detail to be sent out later

[adjourned]

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version (CVS log)
$Date: 2020/07/07 15:55:49 $