<inserted> scribenick: cpn
Francois: We've been discussing
Web Transport, creating a working group
... Could be useful for low latency, but needs input from media
companies
... For media scalable streaming, to millions of users, there
needs to be additional semantics for CDNs to understand what
the byte stream contains, to enable caching
... It would be useful for media companies to join discussion
on WT API to bring their requirements for low latency
streaming
... I encourage you to support the call for review, raise
concerns if you have them, and when the WG is created, please
bring requirements to the group
... Will is a proposed chair for the group, so good
representation for media
Will: We're putting together a
use case document, setting up a repo, this is ahead of
chartering the WG
... We'll reach out to the IG to help put together the use cases
<Zakim> kaz, you wanted to ask tidoust for the resources
<tidoust> Proposed W3C Charter: WebTransport Working Group (until 2020-07-27)
<tidoust> WebCodecs repo
Paul: I'll present a high level
intro to Web Codecs
... We realised that on the web there are lots of ways to
encode and decode audio and video media, possibly with multiple
tracks
... First we had the media element, it takes a URL and
magically renders the audio and video frames
... It's high level, some controls are available, but it's just
for playback, and works well
... We then added Media Source Extensions, to be closer to the
bytestream, but there's still lots of magic there
... It demuxes the packets and renders. It's good for a lot of
purposes
... For decoding audio, we have decodeAudioData in Web Audio,
for audio in memory, but this can't stream or touch the network
or disk
... It's fine, but you may also want to progressively decode.
Decoding audio is quite big, not as much as video
... It decodes as fast as the machine can, not in real
time
... There's lots of issues filed against Web Audio about this.
There's no progressive decoding, e.g., so you can't get just
the first 30 seconds
... No progress feedback, useful as it takes a long time
... You need to demux and re-mux yourself. It's only useful really for small
audio samples for playback
... Another way to decode A/V is in teh PeerConnection WebRTC
object, you can get a MediaStream with audio and video tracks
in real time
... Processing such as jitter buffer. Lots of magic, but some
perceive this as not enough control
... There's no all purpose API for advanced use cases, and lots
of people complain, legitimetly
... For encoding, it's worse. We have MediaRecorder. This works
with media streams with audio and video tracks
... It is set up with a codec string and other parameters,
which have been added in an ad-hoc fashion
... It will give you a blob of the contents. If you want to
minimise latency between media hitting the encoder and being
able to send on the network
... It's high level, transform the blob to arraybuffer. It's for recording for
offline use, e.g, recording a WebRTC call
... Other ways to encode include RTCPeerConnection, but this is
coupled to the transport, emits RTP packets
... It's lossy, and there are other concerns
... Looking at spec issue trackers over the years, we saw the
need for something to solve all these issues
... Low latency audio encoding and decoding for broadcasting,
configurable encoding, maybe implememnting a whole media
playback stack
... so you have tight control over A/V sync, possibly using
WebGL
... We tested a few approaches. I wondered if we could use
MediaStream for this. But this is in a real-time clock
domain
... Lots of work was done to break away from Media Stream,
looking at other *** streams which can work offline
... The current iteration of the API is in Chromium behind a
flag. It seems much simpler, relies on fewer concepts
... A few reasons for this. If we want to solve all use cases,
we needed to be low level, and unopinionated on how to do
things
... I epxect libraries to build on this, e.g., WebGL,
ThreeJS
... There's lots of control, with audio encode and decode, as a
low level API. It has a nice integration with WASM
... It's an API shape that's not traditional for the web.
Callbacks and events. Accepts ArrayBuffer objects. API
principles that have been long established in native code
... The API is supposed to be able to encode/decode with a
symmetric API
... It will hopefully work in the same way with audio and
video, although some concepts work for one but not the
other
... If we're successful, you should be able make any app you
want with this. There should be no limit, like with a native
API
... The level is as low as we can make it. Video frames and
buffers on the one side, encoded packets on the other
... Out of scope for now is muxing and demuxing the media. This
would have to be provided by web content authors
... This may seem to be a problem, but often these are memory
rather than CPU bound
... There are other concerns, issues to deal with. More
concretely, Chris can present the API itself
Chris Cunningham's related presentation video
ChrisC: [shows slides]
... [Canvas setup example] How to paint a video frame? Transfer
to a bitmap and render on a canvas
... [Decoder example] The constructor arguments include paint
and output callbacks, an error callback.
... You configure with desired parameters, pass in encoded
chunks, and the output callback is called with the decoded
frames
... There's a reverse symmetry with the encoder APIs, frames to
encoded chunks
... The IDL for our latest thinking is in Chrome's web codecs
folder in blink
... We've worked hard on it this quarter, for an origin trial
for M86, October
... We have the video decoder wired up in Worker and Window
contexts. That's plumbed into our hardware and software
decoders
... How to export YUV data from a video frame? It's not seen
much in today's web APIs wich are RGB oriented
... There's a proposal to allow YUV access, and you can
manipulate the pixels
... Painting directly to a canvas is supported
... Audio is less far along. We have a skeleton interface, and
code under review to implement them
... We have this wired up to software decoders, further work
for platform decoders
... We plan to have it all ready for origin trial in M86. The
intent for this is to get feedback on the API shape.
... It may be that the performance may not be completely ready,
but people can try out the proposed API and feed back on if
it's viable
... One thing that's critical is to produce a spec. There's an
explainer that's mostly up to date. We're moving away from
using promises
<tidoust> WebCodecs Explainer
ChrisC: We're working to have a
spec draft uploaded soon. We welcome feedback on the spec as
well as on the origin trial
... Any questions?
Will: How do you know what codecs are supported, with the codec string? How do we support our own codecs?
ChrisC: These are open questions.
There's an open issue about Media Capabilities
... For configuration, if you supply an unsupported codec
string, we'd immediately error out. I'd like to explore
integration with Media Capabilities
... The codec string is the classic VP9 fully specified, but
there's an open issue if this really makes sense. It comes from
a containerised description of the media
... For the codec, you only need a subset, such as the profile,
or the extra data for x264
... The exact shape of the codec string may be relaxed
Will: So this isn't about bringing your own codec?
ChrisC: That's right. We're not
against that, but it's unclear with what integration with would
Web Codecs would look like
... People doing that could write an interface matching our
own, and fall back
Paul: You can do dependency injection. Do your own codec, and drop it in
Will: And you'd register your codec string?
ChrisC: We may not manifest it
that way, security guarantees, we could consider a codec
worklet. Nothing defined yet
... You could swap in a software implementation
Paul: Registering WASM in the
platform, we've been doing it for some time now. It's complex
because of the way browsers are implemented, codecs are more
sandboxed
... No explicit registration, but we can use duck typing as you
have the same interface
Igarashi: We're interested in low
latency video streaming
... What is the buffering model for this interface? Does it
buffer packets, or is that done in the application?
ChrisC: The decode call can be made multiple times. THe implementation manages queuing internally, there's no buffering of outputs, so the app would need to do that
Igarashi: Would the application have to buffer before feeding to the decoder. Also, what about rendering?
ChrisC: We're happy to buffer,
and you'd see the queue of buffered chunks decrease as it
progresses. You could also buffer prior to decode if
preferred
... There's an attribute that exposes how much is buffered, so
you can use this as a backpressure signal
Igarashi: What is the relation between Web Transport and Web Codecs? COuld you directly feed a stream from WT to Web Codecs?
ChrisC: There's no tight
integration between those specs, similar to how the network
stack and the media stack in Chrome are separate
... It should be possible with little intervening code to feed
the data through, but this will depend on if you've done
packaging etc
Kaz: Thank you for presenting. A possible use case could include interconnection between video conference systems such as WebEx and Zoom, as a unified video conferencing system?
ChrisC: Yes, absolutely. It's one of the foremost use cases, with a combination of Web Codecs and Web Transport you can do customised conferencing
Larry: We're working on a video
editor program, compile multiple audio and videos with images
and text
... Can Web Codecs allow these to be mixed into a video
file?
ChrisC: Is that text rendered into the image?
Larry: Yes, we'd combine those into a new video
ChrisC: Yes, this should be possible. The primitive for the video encoder will be an image for the video frame. If you're construcing this yourself, you could manipulate the pixels to add text and re-encode
Francois: I'd like to understand
how I'd use this. Is rendering via Canvas, or can I connect to
a video element? Also, one use case we're considering is media
processing. Is a VideoFrame API suitable for video processing,
e.g., if you want to do GPU based processing, or run an ML
algorithm?
... Would all these APIs need to integrate with VideoFrame?
ChrisC: The current plan is to
render via Canvas. An earlier version allowed rendering via
media element, but most users we talked to wanted manual
control,
... e.g., video editing or rtc use cases
... So we started with the most manual option, happy to explore
other things in future
... We hope VideoFrame will serve a number of generic use cases
that will allow YUV pixel arrangement, encoded width and
height, planes, and this would perform well with WASM and
JS
... It's an open question how this could be consumed by Web
GPU. No clear answer yet, but we're having the
conversation
... Want to have minimum number of copies, use GPU backed
memory, avoid forcing a GPU to CPU copy if we can. In some
cases it's unavoidable
... We'd love feedback on this
<inserted> gpuweb discussion
Francois: Is HDR support part of this discussion, with YUV support?
ChrisC: There are some proposals
for how to do that with Canvas, e.g., extended colour
space
... The intent is to be able to paint HDR video in Canvas.
Details of how that would work are still TBD, needs integration
with the canvas folks
Will: What controls pacing? Is the output called with a precise timing, to maintain frame rate?
ChrisC: The example paints the frames faster than real time. The app would have to buffer outputs and sync with the audio
Will: Do we have accuracy in the timer to do that, e.g., 29.97 FPS
ChrisC: There's nothing inherent in JS that should prevent that. With audio we do sensitive deadline based audio rendering
Paul: It is clearly sufficient to
reach the level of quality we have natively, e.g., rational
FPS, 29.97 is fine. You need to take into account the screen
frame rate
... There are performance timers, with are fuzzed because of
Spectre, but they are sufficiently precise
... You can get vsync information from rAF, use the video
timestamps, decide yourself
... You have latency information for audio, but you'll know how
much latency there is after the AudioContext
... You'll have to shift the video frames yourself to align
with the audio
... We'll need to add display latency, it's known that
different browser (or the same browser on different platforms)
have different latency
... Audio latency can be lower than video latency, so you could
break causality, audio comes before the video
... That's fixable
Igarashi: Support for copy protected content? Prevent copying from the Canvas?
ChrisC: There's no integration
with existing mechanisms, not discussed at all
... It hasn't been critical to any of the use cases we've
prioritised so far. My naive hope is that introducing the
decode API separate from content protection is a good first
step
ChrisN: Video editing use cases? client-side video editing proposal
ChrisC: You might want to make a
precise cut, decode and re-encode based on a subset, Web Codecs
is up to meeting the needs of MediaBlob proposal
... The blink-dev thread on Blob based proposal, they explored
the performance of JS based muxing and demuxing, and found its
up to the task
Paul: In Firefox, demuxing of Ogg
Theora or Opus, is done in WASM. We do this for security, also
for performance as we don't have to send the data to a
different to process
... Increasingly we don't want to run codec code in
process
... Performance measurements we made showed this is in the
noise, it's memory bound
<inserted> scribenick: kaz
ChrisN: how to proceed?
ChrisC: GitHub issues would be the best place for the discussion
Kaz: link for the repo?
<chcunningham> https://github.com/WICG/web-codecs
ChrisC: will put here
ChrisN: and slides?
<chcunningham> https://www.youtube.com/watch?v=nhTxJBgTywc&feature=youtu.be&list=PLNYkxOF6rcIBhuGsbO6t8-OBE5-fVPe7K&t=521
ChrisN: tx!
... really interesting and exciting discussion today
Paul: we have so many people from audio group also excited
ChrisN: a couple of open source
projects also working on media editing on the browser
... becoming more and more possible these days
... audio/video non-linear editing
Kaz: there was another topic for today on the WoT use cases related to media handling, but I'd like to initiate the discussion about that by email instead given the time
ChrisN: will be in August
... detail to be sent out later
[adjourned]