Media WG – 12 November 2024

Meeting minutes

Camera Effect Status API

Byrant: The goal is to let developers monitor changes to platform blur, or in future, other effects
… We put it on VideoFrameMetadata so it can be frame accurate
… It's not for controlling platform effects, e.g., to turn them on or off
… Not for platforms that don't have those effects
… The API is a field on VideoFrameMetadata, a MediaEffectInfo object, which BackgroundBlur inherits
… The object allows for future expansion, e.g., we might want to expose info about where the effect was implemented, or how intense the blur is
… Example usage, you implement a TransformStream so you can see each Frame coming through. The metadata returns a dictionary, check if the backgroundBlur is present
… There's a PR against WebCodecs, feedback that we want a W3C spec link, so I'm working on adding it to MediaCapture-Extensions, so I'll update the PR to link to that
… We have links to the explainer

Mark: We have broader ambitions, but after getting feedback from people at Apple and at TPAC, we scoped it down to additional info on VideoFrameMetadata
… If we can get this out, we'd look at adding other effects, depending on developer interest
… Bryant today created a PR for MediaCapture-Extensions. Looking forward to feedback there
… We'll link that from the WebCodecs PR

Eugene: Seems in line with what we did for human face segmentation
… Seems little downside to exposing it
… Any web applications asking for it, e.g., Meet?

Mark: They're interested, part of conversation with them about preventing double effects

Bryant: Helping users know where the effects are coming from

Francois: In the proposed IDL there's a dictionary with a readonly member, I don't think that's allowed in WebIDL
… An application could apply the blur effect itself and set the value?

Mark: That sounds correct. We dont' have in scope changing the state of the video frame, as the blur is applied beforehand, so setting wouldn't do anything

Bryant: It's a typo

Eugene: The metadata() method returns a dictionary. You can set any fields, specced or not, but that doesn't change the frame itself
… The value comes from the capture pipeline or WebRTC stream, or when you create a frame via the constructor, you could have a way to amend metadata

Marcos: You could simplify the code using nullish coalescing

Chris: Are there other implementations interested in this?

Marcos: Once we have a concrete proposal, we could look from WebKit point of view

Chris: And adding the registry entry conditional on multi-implementer support?

Marcos: Is it shipping in Chrome?

Mark: It's work in progress

Marcos: There should be interest from another implementer

Mark: Ok, so as we move ahead we'll ask WebKit position
… As a side-effect of this work we're adding the support for metadata in general, so easier to ship metadata in future
… We're also intersted in the face segmentation metadata, just not had time to get into it yet

Francois: Related to the human face segmentation metadata, the MediaCapture-Streams spec suggests you can set the value in the web app, or it can be set by the UA
… But right now, setting the data has no effect, and you can't pass in to the VideoFrame constructor

Eugene: Both VideoFrameInit and VideoFrameBufferInit lets you set it

Mark: For blur, I don't think it's meaningful, not sure for other metadata

Francois: The TransformStream could let you apply the blur and set the metadata

Mark: If you're creating a new VideoFrame, yes

Bernard: It's a general problem with these registry entries, they create data, they don't imply any API can do anything with it, it's just a collection of data
… It just links to a spec, MCE, but the registry isn't a spec, just a pointer. Whether it's a coherent story, the registry doesn't require
… For example, no implication that WebCodecs does anything with it

Eugene: It's just a way to surface additional info about the VideoFrame, so the app doesn't try to do extra effects or the web app can use the face metadata, or delivery timestamps the app can use
… WebCodecs doesn't depend on it, it's surfaced for apps to use
… We discussed adding it vs surfacing via a side channel, but we don't have any side channels
… Apps such as Teams or Zoom can do something useful with it

Bernard: It's not actionable (face data) not useful for avoiding double action

Bryant: This is a different kind of double action, just avoiding repeat processing

Eugene: If you create a VideoFrame from a canvas, and you know you already did background blur, you can set it in VideoFrameInit

cpn: You could set this value on the VideoFrameInit but the browser implementation would not pay attention to that, it would just set the value. I'm wondering about cases where the metadata property could conflict with a property that the user agent might set itself

Eugene: The user set value should override the implementations

Bernard: You have a track from a camera, get a stream of video frames, the backgroundBlur is set set on frames from that source
… The application could set the value, but it just messes up the signal it gets

Mark: Hers's a use case. Suppose an app doesn't want to double blur. It detects if blur is applied or not. If not applied, it applies its own
… Then, it wants to copy metadata from one frame to another, then update the blur to note the fact it applied its own
… It's not from the UA now, just something the app keeps track of

Bernard: And it doesn't get used in the WebCodecs encoder

Mark: Right, it's just for application's bookkeeping

Media Capabilities MIME Types

Fixup MIME usage

Marcos: We've been investigating the MIME type processing in the MC API spec
… We found interop issues
… Chris has had a go at rewriting algorithms, and Mark reviewing
… As chairs we're doing a bit too much editing....
… But it's in a pretty good place, can take what we have and finish it off
… The tests that are there don't necessarily match the spec. Need to decide what to do with those
… We could feed them into ChatGPT to generate test cases, but also decide which implementations should win
… How much interop pain will there be, how much breakage should we tolerate to get correct behaviour
… We don't want multiple MIME parsers, mimesniff should be the one used throughout the platform
… We take the input, pass through mimesniff to parse it to give a structure
… Take the component parts and do things with them
… How to deal with different behaviours, parameter handling, some implementations reject, and some ignore

cpn: I can show some examples.
… [shares screen]
… First example is a string that according to MIME sniff is not a valid MIME type string "audio/mpeg;". Chrome rejects on this, Firefox resolves on it, Safari also resolves on it but says it's not supported.
… Per the current spec, Chrome is doing the right thing, but what we're proposing in the PR is to use the parse MIME type algorithm, in which case Firefox and Safari are actually doing the right thing for that new algorithm.
… We need to decide what is the right way to do it.
… The second example is that we have language on "does the MIME type imply a single codec?". The example is "video/webm" or "video/mp4", which are just containers and don't say much about the contents.
… And can support multiple codecs.
… Same thing, implementations differ. In this example, the string should be rejected.
… Third example is multiple parameters in the MIME type string. "audio/webm; codecs="opus"; foo="bar"'. Different behavior.
… Should be rejected per spec, but Chrome resolves.
… Final example is multiple codecs in the codecs parameters. Chrome resolves, but Firefox and Safari reject. Per spec, it should be rejected.
… Some variability in implementations. I'm trying to figure out what behavior we want to specify, and which implementations need to change to match whatever we end up with.

markafoltz: These are good finds! Thanks for the all the work.
… As editor, my next step would be to create test cases that action some of these so that we know where implementations are.
… My second comment is that we've been working on this PR for some time and it's quite complicated. I think that, as an editor, my goal is to make the PR self-consistent, with the acknowledgment that there may remain some issues to resolve later.

cpn: Even if that means that we end up with a different set of requirements.

markafoltz: We should file interop issues for these and link them from the spec. I want to land the structure because it is an improvement over the status quo.

cpn: The difficulty is that what we have in the PR is still not quite right. "MIME type support" can still return a failure case, even though it shouldn't because failures should have been captured earlier.

markafoltz: That's what I mean by self-consistent. It should not have bugs but may not reflect reality of implementations.

cpn: The actual check for support is pretty simple, we can probably delegate that to MIME Sniff.

markafoltz: The problem is that there are different steps for audio and video.

cpn: If we break up the WebRTC steps from the rest, perhaps that would simplify things.

markafoltz: The existing spec does not really explain whether the WebRTC codec that you need to use is because of support or because of validation.
… That's another question we could ask. I'm inclined to keep what both implementation currently do (Firefox does not implement that part).
… My proposal is to update the validation to approximate what browsers currently do with some issues noting that there are interop issues.

Marcos: That seems fine. The only need that I had was around the internal slot attached to a dictionary. That seems a premature optimization there. Apart from that, I think it's generally looking good enough, with issues.

markafoltz: Some specs use slots a lot to pass data around, others don't.

Marcos: I use them quite a lot too. Generally, we only put them on interfaces, not on dictionaries.
… It doesn't save much in the spec to use an internal slot.
… Let's keep the spec as dumb and simple as possible so that it's easier to fix and update as needed.

Marcos: I'll think about removing the slot.

cpn: And maybe I'll sent a PR to Web Platform Tests with these cases for you to review.

– DRAFT –
Media WG

12 November 2024

Attendees

Meeting minutes

Camera Effect Status API

Media Capabilities MIME Types

Diagnostics