Media WG Meeting – 19 March 2024

Meeting minutes

<marcosc> cpn: agenda items, Spatial media, and EME/MSE registries

Spatial Media

Jer: We have seen some standardisation needs for spatial media, Safari on platforms
… One problem we faced was trying to enable stereo playback support, different video to the left and right eye
… Available for native playback, but not capable on the web within the bounds of the viewport
… No way to detect whether to deliver spatial video to the browser
… It's a display not a decode problem
… Video formats can be decoded but won't be displayed correctly, e.g., two videos side by side
… If a layered approach is used, you get a layered view
… It'll need resolving before anyone can do stereoscopic delivery to a spatially aware UA
… Another problem is motion safety metadata. Risk of making people ill if there's too much motion
… With native playback it's possible to include safety metadata with the video stream
… The native app could reduce the viewport so the effect is less well felt
… Not specified anywhere, so currently proprietary
… Related to this is work on flashing lights, released on Apple platforms
… We can identify scenes as having flashing lights, and protect people
… Other places with platform features, 180 wide angle image viewing is only available in a fullscreen presentation
… No way to detect that, but that's difficult as not how CSS is set up. Can't ask what CSS capability would be in a different mode
… Haven't found a good place to put it, so use UA string or other out of band solution
… For video playing inband, where to put caption in the Z order, so they don't interfere with the depth of the scene?
… For native playback, can deliver depth info in a metadata track, but there's no standard for it at the moment
… It should be easy to put immersive video in a media element and let the user control the viewpoint of the video
… Currently, use WebGL projections, but the video element should be capable, either native or custom controls, so you could implement your own pan+tilt controls
… There's no way to set up a soundscape for audio presentation. Not sure this is correct, as Web Audio allows HRTFs and impact maps
… But custom work needed, it's not as simple as a single control knob
… Environment dimming, if watching in Vision OS you can dim the environment so it feels like you've turned down the lights in the room
… No web API for that, seems useful, e.g., also when presenting a spreadshet
… That's a summary of the web API issues when we tried to enable immersive capabilities in a browser on a device like VisionOS
… Most important to solve immediately? Caption depth data and motion safety data

Mark: To understand the use case, is this about playing non-immersive video in an immersive environment? Or immersive videos?

Jer: A web page itself is non-immersive, a 2D plane. It should be possible to embed stereo or 360 video content and you get a picture frame effect
… The lack of depth info made that impossible

Mark: So it's like mixed reality with 2D content mixed with 3D content

Jer: Yes. For 2D video, it still seems important to have depth information to render captions on the presentation

Mark: Could WebXR solve this, or is a different set of controls needed?

Jer: Yes, it's possible to build fully immersive presentation using WebXR. But that's like saying the audio element isn't needed as we have Web Audio
… A declarative way of doing something, you could do in WebXR, but make it accessible of you're not a WebXR expert
… WebXR isn't really about media playback, it's about building blocks for immersive experiences
… So I'm not really suggesting fully immersive, but making use of the capabilities of the device, bigger viewport

Marcos: It could use environemental lighting as well, so has privacy implications

Jer: Vision Pro has modes for media presentation, where the device can mutate the point of view it passes through to the user's view such as light dimming
… Doesn't seem like something feasible without revealing detail about the environment the user is in

Francois: I'm trying to map where these features might fit, what's required
… Caption data could be an extension of WebVTT or TTML, something to be added there?
… What is needed for motion safety, could it be in-band metadata?

Jer: Those are format questions. Don't know if WebVTT is the right thing for depth information
… The point that's closest to the user visually, it's about describing where the closest point to the user is, tells the UA where to put the captions, so the captions don't appear deeper or inside something visual
… Is it just the deepest part, or more of a depth map? Don't know
… For motion safety, the same problem exists, what is the format - JSON, text, etc?

Chris: So there's a temporal aspect?

Jer: Yes, it can change frame by frame
… Motion safety is important to have a few frames in advance, to restrict the viewport so it's less immersive, for comfort and safety reasons
… For non-professionally captured media, there can be a lot of motion captured, so can be disconcerting to watch

Chris: Media encodings, things not currently supported?

Jer: HEVC, where information stored in an additional layer. It's a delta on the orignal captured frame, to ship a stereo presentation more efficiently
… Formats that encode left and right eye separately. Google has a proposal, I can find the info, not sure if it's standardised. It tells how to interpret visually the signal from the encoder

Francois: Remembering the workshop on media production, Bruce Devlin talked about different forms of metadata
… Here you have frame level metadata for motion, and for captions may not be frame by frame, not the same need for precision and sync?

Jer: Two answers: should have the same cadence as the captions themselves. But there are frame by frame formats for depth info, so don't want to commit

<tidoust> Metadata in production workflows talk by Bruce Devlin

Jer: There may be other use cases for depth info that do require frame accuracy, for demos that push the boundaries, e.g., to do clipping correctly, place other pieces of the web page

Chris: Other vendor implementations?

Jer: Stereo video is relevant here. Other devices let the users pick the projection, and how to interpret the data from the encoder - horizontally or vertically divided
… Not a good way to put in the media file itself. So relies on the person. They had the same problem, so the standardisation opportunity for them is the same
… Similar problem with depth info and captions, either put outside the viewport, or it looks uncomfortable if the depth info is wrong

Chris: Which of this would come to this WG, which else where?

Jer: Not thinking everything to come here. I've been asked about how in the HLS manifest you can specify the layered approach. Hard to do same as in a native app
… If I give you HEVC with a depth layer, can't do. Media Capabilities
… Viewport controls could be an HTML question, given a 360 video stream, and want to change the viewport angle, can only do in WebGL. Could add to video element
… Display capabilities in CSS

Chris: Media Capabilities could be done here, we already have spatial audio

Jer: Could be a need for something more dynamic in Media Capabilities

Chris: Next steps?

Francois: There's an active immersive captions CG at W3C. Have you interacted with them?

<tidoust> Immersive Caption CG

Jer: I didn't know about them. For things that might belong in Media WG, could file issues on the WG or the relevant standards
… So taking the list and breaking them out into issues is a good next step

Chris: Happy to follow up from a MEIG perspective to bring this to other audiences

MSE and EME registries

<tidoust> Recommendations for accessible captions in 360 degree video

chrisn: Topic: Registries in EME and MSE

https://docs.google.com/presentation/d/1azcBi0C-Sw_bF6x-SIBdpLorLQLkSfdyKeWTnphka2Y/edit <- Chris's slides

* we previously discussed needing editors for those registries

* chrisn has taken on editor responsibilities for MSE Byte Stream Registry and WebM Byte Stream

* W3c/encrypted-media#524

* w3c/encrypted-media#526 These two issues update the registries to use the W3C registry track, as well as using correct registry names

* Proposal: move the MSE and EME registries to the W3C Registry track

* Currently published as notes, with normative content

* May currently be notes due to discussions about formats and codecs

marcosc: These may just be able to sit on Registry track if they only need to be updated occasionally

chrisn: The documents within the registries benefit from being on the Recommendation track since they do contain normative language

* The benefit would be that the documents would be covered under the patent policy

jernoble: presented the following slides as text:

* Spatial Media Standards Soft Spots

* - Stereo Video Support Detection * - Motion Safety Metadata * - Fullscreen-only Capability Detection * - Caption Depth Data * - 360º/180º Viewport Controls * - Spatial Soundscapes * - Environment Dimming Support

* - Stereo Video Support Detection

* - Motion Safety Metadata

* - Fullscreen-only Capability Detection

* - Caption Depth Data

* - 360º/180º Viewport Controls

* - Spatial Soundscapes

* - Environment Dimming Support

– DRAFT –
Media WG Meeting

19 March 2024

Attendees

Meeting minutes

Spatial Media

MSE and EME registries

Diagnostics