20:58:10 RRSAgent has joined #mediawg 20:58:14 logging to https://www.w3.org/2024/03/19-mediawg-irc 20:58:14 Zakim has joined #mediawg 20:58:17 RRSAgent, make logs public 20:58:28 cpn has joined #mediawg 20:58:29 Meeting: Media WG Meeting 20:58:53 Agenda: https://github.com/w3c/media-wg/blob/main/meetings/2024-03-19-Media_Working_Group_Teleconference-agenda.md 21:00:12 present+ Chris_Needham 21:00:53 present+ Francois_Daoust, Greg_Freedman, Joey_Parrish 21:01:29 present+ Sun_Shin 21:01:29 marcosc has joined #mediawg 21:02:12 present+ Marcos_Caceres 21:02:14 Chair: Chris, Marcos 21:02:15 scribe+ cpn 21:04:39 present+ Jer_Noble, Andy_Estes 21:06:58 can: agenda items, Spatial media, and EME/MSE registries 21:07:11 s/can/cpn 21:07:34 TOPIC: Spatial Media 21:07:49 mfoltzgoogle has joined #mediawg 21:07:57 Present+ Mark_Foltz 21:08:20 Jer: We have seen some standardisation needs for spatial media, Safari on platforms 21:08:44 ... One problem we faced was trying to enable stereo playback support, different video to the left and right eye 21:08:58 ... Available for native playback, but not capable on the web within the bounds of the viewport 21:09:01 andy has joined #mediawg 21:09:09 ... No way to detect whether to deliver spatial video to the browser 21:09:15 ... It's a display not a decode problem 21:09:35 ... Video formats can be decoded but won't be displayed correctly, e.g., two videos side by side 21:09:44 ... If a layered approach is used, you get a layered view 21:10:07 ... It'll need resolving before anyone can do stereoscopic delivery to a spatially aware UA 21:10:24 ... Another problem is motion safety metadata. Risk of making people ill if there's too much motion 21:10:36 ... With native playback it's possible to include safety metadata with the video stream 21:10:48 ... The native app could reduce the viewport so the effect is less well felt 21:10:58 ... Not specified anywhere, so currently proprietary 21:11:18 ... Related to this is work on flashing lights, released on Apple platforms 21:11:35 ... We can identify scenes as having flashing lights, and protect people 21:11:53 ... Other places with platform features, 180 wide angle image viewing is only available in a fullscreen presentation 21:12:21 ... No way to detect that, but that's difficult as not how CSS is set up. Can't ask what CSS capability would be in a different mode 21:12:38 ... Haven't found a good place to put it, so use UA string or other out of band solution 21:13:05 ... For video playing inband, where to put caption in the Z order, so they don't interfere with the depth of the scene? 21:13:29 ... For native playback, can deliver depth info in a metadata track, but there's no standard for it at the moment 21:13:47 ... It should be easy to put immersive video in a media element and let the user control the viewpoint of the video 21:14:21 ... Currently, use WebGL projections, but the video element should be capable, either native or custom controls, so you could implement your own pan+tilt controls 21:14:48 ... There's no way to set up a soundscape for audio presentation. Not sure this is correct, as Web Audio allows HRTFs and impact maps 21:15:01 ... But custom work needed, it's not as simple as a single control knob 21:15:34 .... Environment dimming, if watching in Vision OS you can dim the environment so it feels like you've turned down the lights in the room 21:15:49 ... No web API for that, seems useful, e.g., also when presenting a spreadshet 21:16:20 ... That's a summary of the web API issues when we tried to enable immersive capabilities in a browser on a device like VisionOS 21:16:27 Xiaohan has joined #mediawg 21:16:44 ... Most important to solve immediately? Caption depth data and motion safety data 21:16:58 q+ 21:17:32 Mark: To understand the use case, is this about playing non-immersive video in an immersive environment? Or immersive videos? 21:18:01 Jer: A web page itself is non-immersive, a 2D plane. It should be possible to embed stereo or 360 video content and you get a picture frame effect 21:18:17 ... The lack of depth info made that impossible 21:18:33 Mark: So it's like mixed reality with 2D content mixed with 3D content 21:18:53 Jer: Yes. For 2D video, it still seems important to have depth information to render captions on the presentation 21:19:13 Mark: Could WebXR solve this, or is a different set of controls needed? 21:19:43 Jer: Yes, it's possible to build fully immersive presentation using WebXR. But that's like saying the audio element isn't needed as we have Web Audio 21:20:03 ... A declarative way of doing something, you could do in WebXR, but make it accessible of you're not a WebXR expert 21:20:29 ... WebXR isn't really about media playback, it's about building blocks for immersive experiences 21:20:48 ... So I'm not really suggesting fully immersive, but making use of the capabilities of the device, bigger viewport 21:21:04 Marcos: It could use environemental lighting as well, so has privacy implications 21:21:58 Jer: Vision Pro has modes for media presentation, where the device can mutate the point of view it passes through to the user's view such as light dimming 21:22:22 ... Doesn't seem like something feasible without revealing detail about the environment the user is in 21:22:53 Francois: I'm trying to map where these features might fit, what's required 21:22:58 gregwfreedman has joined #mediawg 21:23:15 ... Caption data could be an extension of WebVTT or TTML, something to be added there? 21:23:28 ... What is needed for motion safety, could it be in-band metadata? 21:23:51 Jer: Those are format questions. Don't know if WebVTT is the right thing for depth information 21:24:52 ... The point that's closest to the user visually, it's about describing where the closest point to the user is, tells the UA where to put the captions, so the captions don't appear deeper or inside something visual 21:25:20 ... Is it just the deepest part, or more of a depth map? Don't know 21:25:44 ... For motion safety, the same problem exists, what is the format - JSON, text, etc? 21:26:04 Chris: So there's a temporal aspect? 21:26:10 Jer: Yes, it can change frame by frame 21:26:35 ... Motion safety is important to have a few frames in advance, to restrict the viewport so it's less immersive, for comfort and safety reasons 21:27:01 ... For non-professionally captured media, there can be a lot of motion captured, so can be disconcerting to watch 21:27:33 Chris: Media encodings, things not currently supported? 21:28:09 Jer: HEVC, where information stored in an additional layer. It's a delta on the orignal captured frame, to ship a stereo presentation more efficiently 21:28:52 ... Formats that encode left and right eye separately. Google has a proposal, I can find the info, not sure if it's standardised. It tells how to interpret visually the signal from the encoder 21:29:01 q? 21:29:14 ack tidoust 21:29:44 Francois: Remembering the workshop on media production, Bruce Devlin talked about different forms of metadata 21:30:14 ... Here you have frame level metadata for motion, and for captions may not be frame by frame, not the same need for precision and sync? 21:30:45 Jer: Two answers: should have the same cadence as the captions themselves. But there are frame by frame formats for depth info, so don't want to commit 21:31:27 -> https://www.w3.org/2021/03/media-production-workshop/talks/bruce-devlin-metadata.html Metadata in production workflows talk by Bruce Devlin 21:32:28 Jer: There may be other use cases for depth info that do require frame accuracy, for demos that push the boundaries, e.g., to do clipping correctly, place other pieces of the web page 21:33:13 Chris: Other vendor implementations? 21:33:39 Jer: Stereo video is relevant here. Other devices let the users pick the projection, and how to interpret the data from the encoder - horizontally or vertically divided 21:34:22 ... Not a good way to put in the media file itself. So relies on the person. They had the same problem, so the standardisation opportunity for them is the same 21:34:55 ... Similar problem with depth info and captions, either put outside the viewport, or it looks uncomfortable if the depth info is wrong 21:35:10 q? 21:36:26 Chris: Which of this would come to this WG, which else where? 21:37:10 Jer: Not thinking everything to come here. I've been asked about how in the HLS manifest you can specify the layered approach. Hard to do same as in a native app 21:37:38 ... If I give you HEVC with a depth layer, can't do. Media Capabilities 21:38:09 ... Viewport controls could be an HTML question, given a 360 video stream, and want to change the viewport angle, can only do in WebGL. Could add to video element 21:38:59 ... Display capabilities in CSS 21:39:28 Chris: Media Capabilities could be done here, we already have spatial audio 21:39:55 Jer: Could be a need for something more dynamic in Media Capabilities 21:40:19 Chris: Next steps? 21:40:44 Francois: There's an active immersive captions CG at W3C. Have you interacted with them? 21:40:59 -> https://www.w3.org/community/immersive-captions/ Immersive Caption CG 21:41:32 Jer: I didn't know about them. For things that might belong in Media WG, could file issues on the WG or the relevant standards 21:42:58 ... So taking the list and breaking them out into issues is a good next step 21:43:17 Chris: Happy to follow up from a MEIG perspective to bring this to other audiences 21:43:28 Topic: MSE and EME registries 21:43:32 jernoble has joined #mediawg 21:43:45 -> https://w3c.github.io/immersive-captions-cg/360-captions/ Recommendations for accessible captions in 360 degree video 21:44:43 chrisn: Topic: Registries in EME and MSE 21:45:02 * we reviously discussed needing editors for those registries 21:45:34 * chrisn has taken on editor responsibilities for MSE Byte Stream Registry and WebM Byte Stream 21:47:51 * W3c/encrypted-media#524 21:49:53 * w3c/encrypted-media#526 These two issues update the registries to use the W3C registry track, as well as using correct registry names 21:50:22 * Proposal: move the MSE and EME registries to the W3C Registry track 21:50:51 * Currently published as notes, with normative content 21:52:12 * May currently be notes due to discussions about formats and codecs 21:53:01 marcosc: These may just be able to sit on Registry track if they only need to be updated occasionally 21:54:11 chrisn: The documents within the registries benefit from being on the Recommendation track since they do contain normative language 21:54:33 * The benefit would be that the documents would be covered under the patent policy 21:58:11 jernoble: presented the following slides as text: 21:58:13 * Spatial Media Standards Soft Spots 21:58:35 * - Stereo Video Support Detection * - Motion Safety Metadata * - Fullscreen-only Capability Detection * - Caption Depth Data * - 360º/180º Viewport Controls * - Spatial Soundscapes * - Environment Dimming Support 21:58:51 * - Stereo Video Support Detection 21:58:55 * - Motion Safety Metadata 21:58:58 * - Fullscreen-only Capability Detection 21:59:04 i/chrisn: Topic:/scribe+ jernoble/ 21:59:05 * - Caption Depth Data 21:59:11 * - 360º/180º Viewport Controls 21:59:15 * - Spatial Soundscapes 21:59:16 * - Environment Dimming Support 21:59:52 rrsagent, draft minutes 21:59:54 I have made the request to generate https://www.w3.org/2024/03/19-mediawg-minutes.html cpn 22:00:14 rrsagent, make log public 22:03:40 i/we reviously/https://docs.google.com/presentation/d/1azcBi0C-Sw_bF6x-SIBdpLorLQLkSfdyKeWTnphka2Y/edit <- Chris's slides/ 22:03:43 rrsagent, draft minutes 22:03:44 I have made the request to generate https://www.w3.org/2024/03/19-mediawg-minutes.html cpn 22:04:55 s/reviously/previously/ 22:05:00 rrsagent, draft minutes 22:05:01 I have made the request to generate https://www.w3.org/2024/03/19-mediawg-minutes.html cpn