<inserted> scribenick: cpn
Francois: We've been discussing
    Web Transport, creating a working group
    ... Could be useful for low latency, but needs input from media
    companies
    ... For media scalable streaming, to millions of users, there
    needs to be additional semantics for CDNs to understand what
    the byte stream contains, to enable caching
    ... It would be useful for media companies to join discussion
    on WT API to bring their requirements for low latency
    streaming
    ... I encourage you to support the call for review, raise
    concerns if you have them, and when the WG is created, please
    bring requirements to the group
    ... Will is a proposed chair for the group, so good
    representation for media
Will: We're putting together a
    use case document, setting up a repo, this is ahead of
    chartering the WG
    ... We'll reach out to the IG to help put together the use cases
<Zakim> kaz, you wanted to ask tidoust for the resources
<tidoust> Proposed W3C Charter: WebTransport Working Group (until 2020-07-27)
<tidoust> WebCodecs repo
Paul: I'll present a high level
    intro to Web Codecs
    ... We realised that on the web there are lots of ways to
    encode and decode audio and video media, possibly with multiple
    tracks
    ... First we had the media element, it takes a URL and
    magically renders the audio and video frames
    ... It's high level, some controls are available, but it's just
    for playback, and works well
    ... We then added Media Source Extensions, to be closer to the
    bytestream, but there's still lots of magic there
    ... It demuxes the packets and renders. It's good for a lot of
    purposes
    ... For decoding audio, we have decodeAudioData in Web Audio,
    for audio in memory, but this can't stream or touch the network
    or disk
    ... It's fine, but you may also want to progressively decode.
    Decoding audio is quite big, not as much as video
    ... It decodes as fast as the machine can, not in real
    time
    ... There's lots of issues filed against Web Audio about this.
    There's no progressive decoding, e.g., so you can't get just
    the first 30 seconds
    ... No progress feedback, useful as it takes a long time
    ... You need to demux and re-mux yourself. It's only useful really for small
    audio samples for playback
    ... Another way to decode A/V is in teh PeerConnection WebRTC
    object, you can get a MediaStream with audio and video tracks
    in real time
    ... Processing such as jitter buffer. Lots of magic, but some
    perceive this as not enough control
    ... There's no all purpose API for advanced use cases, and lots
    of people complain, legitimetly
    ... For encoding, it's worse. We have MediaRecorder. This works
    with media streams with audio and video tracks
    ... It is set up with a codec string and other parameters,
    which have been added in an ad-hoc fashion
    ... It will give you a blob of the contents. If you want to
    minimise latency between media hitting the encoder and being
    able to send on the network
    ... It's high level, transform the blob to arraybuffer. It's for recording for
    offline use, e.g, recording a WebRTC call
    ... Other ways to encode include RTCPeerConnection, but this is
    coupled to the transport, emits RTP packets
    ... It's lossy, and there are other concerns
    ... Looking at spec issue trackers over the years, we saw the
    need for something to solve all these issues
    ... Low latency audio encoding and decoding for broadcasting,
    configurable encoding, maybe implememnting a whole media
    playback stack
    ... so you have tight control over A/V sync, possibly using
    WebGL
    ... We tested a few approaches. I wondered if we could use
    MediaStream for this. But this is in a real-time clock
    domain
    ... Lots of work was done to break away from Media Stream,
    looking at other *** streams which can work offline
    ... The current iteration of the API is in Chromium behind a
    flag. It seems much simpler, relies on fewer concepts
    ... A few reasons for this. If we want to solve all use cases,
    we needed to be low level, and unopinionated on how to do
    things
    ... I epxect libraries to build on this, e.g., WebGL,
    ThreeJS
    ... There's lots of control, with audio encode and decode, as a
    low level API. It has a nice integration with WASM
    ... It's an API shape that's not traditional for the web.
    Callbacks and events. Accepts ArrayBuffer objects. API
    principles that have been long established in native code
    ... The API is supposed to be able to encode/decode with a
    symmetric API
    ... It will hopefully work in the same way with audio and
    video, although some concepts work for one but not the
    other
    ... If we're successful, you should be able make any app you
    want with this. There should be no limit, like with a native
    API
    ... The level is as low as we can make it. Video frames and
    buffers on the one side, encoded packets on the other
    ... Out of scope for now is muxing and demuxing the media. This
    would have to be provided by web content authors
    ... This may seem to be a problem, but often these are memory
    rather than CPU bound
    ... There are other concerns, issues to deal with. More
    concretely, Chris can present the API itself
Chris Cunningham's related presentation video
ChrisC: [shows slides]
    ... [Canvas setup example] How to paint a video frame? Transfer
    to a bitmap and render on a canvas
    ... [Decoder example] The constructor arguments include paint
    and output callbacks, an error callback.
    ... You configure with desired parameters, pass in encoded
    chunks, and the output callback is called with the decoded
    frames
    ... There's a reverse symmetry with the encoder APIs, frames to
    encoded chunks
    ... The IDL for our latest thinking is in Chrome's web codecs
    folder in blink
    ... We've worked hard on it this quarter, for an origin trial
    for M86, October
    ... We have the video decoder wired up in Worker and Window
    contexts. That's plumbed into our hardware and software
    decoders
    ... How to export YUV data from a video frame? It's not seen
    much in today's web APIs wich are RGB oriented
    ... There's a proposal to allow YUV access, and you can
    manipulate the pixels
    ... Painting directly to a canvas is supported
    ... Audio is less far along. We have a skeleton interface, and
    code under review to implement them
    ... We have this wired up to software decoders, further work
    for platform decoders
    ... We plan to have it all ready for origin trial in M86. The
    intent for this is to get feedback on the API shape.
    ... It may be that the performance may not be completely ready,
    but people can try out the proposed API and feed back on if
    it's viable
    ... One thing that's critical is to produce a spec. There's an
    explainer that's mostly up to date. We're moving away from
    using promises
<tidoust> WebCodecs Explainer
ChrisC: We're working to have a
    spec draft uploaded soon. We welcome feedback on the spec as
    well as on the origin trial
    ... Any questions?
Will: How do you know what codecs are supported, with the codec string? How do we support our own codecs?
ChrisC: These are open questions.
    There's an open issue about Media Capabilities
    ... For configuration, if you supply an unsupported codec
    string, we'd immediately error out. I'd like to explore
    integration with Media Capabilities
    ... The codec string is the classic VP9 fully specified, but
    there's an open issue if this really makes sense. It comes from
    a containerised description of the media
    ... For the codec, you only need a subset, such as the profile,
    or the extra data for x264
    ... The exact shape of the codec string may be relaxed
Will: So this isn't about bringing your own codec?
ChrisC: That's right. We're not
    against that, but it's unclear with what integration with would
    Web Codecs would look like
    ... People doing that could write an interface matching our
    own, and fall back
Paul: You can do dependency injection. Do your own codec, and drop it in
Will: And you'd register your codec string?
ChrisC: We may not manifest it
    that way, security guarantees, we could consider a codec
    worklet. Nothing defined yet
    ... You could swap in a software implementation
Paul: Registering WASM in the
    platform, we've been doing it for some time now. It's complex
    because of the way browsers are implemented, codecs are more
    sandboxed
    ... No explicit registration, but we can use duck typing as you
    have the same interface
Igarashi: We're interested in low
    latency video streaming
    ... What is the buffering model for this interface? Does it
    buffer packets, or is that done in the application?
ChrisC: The decode call can be made multiple times. THe implementation manages queuing internally, there's no buffering of outputs, so the app would need to do that
Igarashi: Would the application have to buffer before feeding to the decoder. Also, what about rendering?
ChrisC: We're happy to buffer,
    and you'd see the queue of buffered chunks decrease as it
    progresses. You could also buffer prior to decode if
    preferred
    ... There's an attribute that exposes how much is buffered, so
    you can use this as a backpressure signal
Igarashi: What is the relation between Web Transport and Web Codecs? COuld you directly feed a stream from WT to Web Codecs?
ChrisC: There's no tight
    integration between those specs, similar to how the network
    stack and the media stack in Chrome are separate
    ... It should be possible with little intervening code to feed
    the data through, but this will depend on if you've done
    packaging etc
Kaz: Thank you for presenting. A possible use case could include interconnection between video conference systems such as WebEx and Zoom, as a unified video conferencing system?
ChrisC: Yes, absolutely. It's one of the foremost use cases, with a combination of Web Codecs and Web Transport you can do customised conferencing
Larry: We're working on a video
    editor program, compile multiple audio and videos with images
    and text
    ... Can Web Codecs allow these to be mixed into a video
    file?
ChrisC: Is that text rendered into the image?
Larry: Yes, we'd combine those into a new video
ChrisC: Yes, this should be possible. The primitive for the video encoder will be an image for the video frame. If you're construcing this yourself, you could manipulate the pixels to add text and re-encode
Francois: I'd like to understand
    how I'd use this. Is rendering via Canvas, or can I connect to
    a video element? Also, one use case we're considering is media
    processing. Is a VideoFrame API suitable for video processing,
    e.g., if you want to do GPU based processing, or run an ML
    algorithm?
    ... Would all these APIs need to integrate with VideoFrame?
ChrisC: The current plan is to
    render via Canvas. An earlier version allowed rendering via
    media element, but most users we talked to wanted manual
    control,
    ... e.g., video editing or rtc use cases
    ... So we started with the most manual option, happy to explore
    other things in future
    ... We hope VideoFrame will serve a number of generic use cases
    that will allow YUV pixel arrangement, encoded width and
    height, planes, and this would perform well with WASM and
    JS
    ... It's an open question how this could be consumed by Web
    GPU. No clear answer yet, but we're having the
    conversation
    ... Want to have minimum number of copies, use GPU backed
    memory, avoid forcing a GPU to CPU copy if we can. In some
    cases it's unavoidable
    ... We'd love feedback on this
<inserted> gpuweb discussion
Francois: Is HDR support part of this discussion, with YUV support?
ChrisC: There are some proposals
    for how to do that with Canvas, e.g., extended colour
    space
    ... The intent is to be able to paint HDR video in Canvas.
    Details of how that would work are still TBD, needs integration
    with the canvas folks
Will: What controls pacing? Is the output called with a precise timing, to maintain frame rate?
ChrisC: The example paints the frames faster than real time. The app would have to buffer outputs and sync with the audio
Will: Do we have accuracy in the timer to do that, e.g., 29.97 FPS
ChrisC: There's nothing inherent in JS that should prevent that. With audio we do sensitive deadline based audio rendering
Paul: It is clearly sufficient to
    reach the level of quality we have natively, e.g., rational
    FPS, 29.97 is fine. You need to take into account the screen
    frame rate
    ... There are performance timers, with are fuzzed because of
    Spectre, but they are sufficiently precise
    ... You can get vsync information from rAF, use the video
    timestamps, decide yourself
    ... You have latency information for audio, but you'll know how
    much latency there is after the AudioContext
    ... You'll have to shift the video frames yourself to align
    with the audio
    ... We'll need to add display latency, it's known that
    different browser (or the same browser on different platforms)
    have different latency
    ... Audio latency can be lower than video latency, so you could
    break causality, audio comes before the video
    ... That's fixable
Igarashi: Support for copy protected content? Prevent copying from the Canvas?
ChrisC: There's no integration
    with existing mechanisms, not discussed at all
    ... It hasn't been critical to any of the use cases we've
    prioritised so far. My naive hope is that introducing the
    decode API separate from content protection is a good first
    step
ChrisN: Video editing use cases? client-side video editing proposal
ChrisC: You might want to make a
    precise cut, decode and re-encode based on a subset, Web Codecs
    is up to meeting the needs of MediaBlob proposal
    ... The blink-dev thread on Blob based proposal, they explored
    the performance of JS based muxing and demuxing, and found its
    up to the task
Paul: In Firefox, demuxing of Ogg
    Theora or Opus, is done in WASM. We do this for security, also
    for performance as we don't have to send the data to a
    different to process
    ... Increasingly we don't want to run codec code in
    process
    ... Performance measurements we made showed this is in the
    noise, it's memory bound
<inserted> scribenick: kaz
ChrisN: how to proceed?
ChrisC: GitHub issues would be the best place for the discussion
Kaz: link for the repo?
<chcunningham> https://github.com/WICG/web-codecs
ChrisC: will put here
ChrisN: and slides?
<chcunningham> https://www.youtube.com/watch?v=nhTxJBgTywc&feature=youtu.be&list=PLNYkxOF6rcIBhuGsbO6t8-OBE5-fVPe7K&t=521
ChrisN: tx!
    ... really interesting and exciting discussion today
Paul: we have so many people from audio group also excited
ChrisN: a couple of open source
    projects also working on media editing on the browser
    ... becoming more and more possible these days
    ... audio/video non-linear editing
Kaz: there was another topic for today on the WoT use cases related to media handling, but I'd like to initiate the discussion about that by email instead given the time
ChrisN: will be in August
    ... detail to be sent out later
[adjourned]