20:58:10 <RRSAgent> RRSAgent has joined #mediawg
20:58:14 <RRSAgent> logging to https://www.w3.org/2024/03/19-mediawg-irc
20:58:14 <Zakim> Zakim has joined #mediawg
20:58:17 <tidoust> RRSAgent, make logs public
20:58:28 <cpn> cpn has joined #mediawg
20:58:29 <tidoust> Meeting: Media WG Meeting
20:58:53 <tidoust> Agenda: https://github.com/w3c/media-wg/blob/main/meetings/2024-03-19-Media_Working_Group_Teleconference-agenda.md
21:00:12 <cpn> present+ Chris_Needham
21:00:53 <cpn> present+ Francois_Daoust, Greg_Freedman, Joey_Parrish
21:01:29 <cpn> present+ Sun_Shin
21:01:29 <marcosc> marcosc has joined #mediawg
21:02:12 <cpn> present+ Marcos_Caceres
21:02:14 <tidoust> Chair: Chris, Marcos
21:02:15 <cpn> scribe+ cpn
21:04:39 <cpn> present+ Jer_Noble, Andy_Estes
21:06:58 <marcosc> can: agenda items, Spatial media, and EME/MSE registries
21:07:11 <marcosc> s/can/cpn
21:07:34 <marcosc> TOPIC:  Spatial Media
21:07:49 <mfoltzgoogle> mfoltzgoogle has joined #mediawg
21:07:57 <mfoltzgoogle> Present+ Mark_Foltz
21:08:20 <cpn> Jer: We have seen some standardisation needs for spatial media, Safari on platforms
21:08:44 <cpn> ... One problem we faced was trying to enable stereo playback support, different video to the left and right eye
21:08:58 <cpn> ... Available for native playback, but not capable on the web within the bounds of the viewport
21:09:01 <andy> andy has joined #mediawg
21:09:09 <cpn> ... No way to detect whether to deliver spatial video to the browser
21:09:15 <cpn> ... It's a display not a decode problem
21:09:35 <cpn> ... Video formats can be decoded but won't be displayed correctly, e.g., two videos side by side
21:09:44 <cpn> ... If a layered approach is used, you get a layered view
21:10:07 <cpn> ... It'll need resolving before anyone can do stereoscopic delivery to a spatially aware UA
21:10:24 <cpn> ... Another problem is motion safety metadata. Risk of making people ill if there's too much motion
21:10:36 <cpn> ... With native playback it's possible to include safety metadata with the video stream
21:10:48 <cpn> ... The native app could reduce the viewport so the effect is less well felt
21:10:58 <cpn> ... Not specified anywhere, so currently proprietary
21:11:18 <cpn> ... Related to this is work on flashing lights, released on Apple platforms
21:11:35 <cpn> ... We can identify scenes as having flashing lights, and protect people
21:11:53 <cpn> ... Other places with platform features, 180 wide angle image viewing is only available in a fullscreen presentation
21:12:21 <cpn> ... No way to detect that, but that's difficult as not how CSS is set up. Can't ask what CSS capability would be in a different mode
21:12:38 <cpn> ... Haven't found a good place to put it, so use UA string or other out of band solution
21:13:05 <cpn> ... For video playing inband, where to put caption in the Z order, so they don't interfere with the depth of the scene?
21:13:29 <cpn> ... For native playback, can deliver depth info in a metadata track, but there's no standard for it at the moment
21:13:47 <cpn> ... It should be easy to put immersive video in a media element and let the user control the viewpoint of the video
21:14:21 <cpn> ... Currently, use WebGL projections, but the video element should be capable, either native or custom controls, so you could implement your own pan+tilt controls
21:14:48 <cpn> ... There's no way to set up a soundscape for audio presentation. Not sure this is correct, as Web Audio allows HRTFs and impact maps
21:15:01 <cpn> ... But custom work needed, it's not as simple as a single control knob
21:15:34 <cpn> .... Environment dimming, if watching in Vision OS you can dim the environment so it feels like you've turned down the lights in the room
21:15:49 <cpn> ... No web API for that, seems useful, e.g., also when presenting a spreadshet
21:16:20 <cpn> ... That's a summary of the web API issues when we tried to enable immersive capabilities in a browser on a device like VisionOS
21:16:27 <Xiaohan> Xiaohan has joined #mediawg
21:16:44 <cpn> ... Most important to solve immediately? Caption depth data and motion safety data
21:16:58 <tidoust> q+
21:17:32 <cpn> Mark: To understand the use case, is this about playing non-immersive video in an immersive environment? Or immersive videos?
21:18:01 <cpn> Jer: A web page itself is non-immersive, a 2D plane. It should be possible to embed stereo or 360 video content and you get a picture frame effect
21:18:17 <cpn> ... The lack of depth info made that impossible
21:18:33 <cpn> Mark: So it's like mixed reality with 2D content mixed with 3D content
21:18:53 <cpn> Jer: Yes. For 2D video, it still seems important to have depth information to render captions on the presentation
21:19:13 <cpn> Mark: Could WebXR solve this, or is a different set of controls needed?
21:19:43 <cpn> Jer: Yes, it's possible to build fully immersive presentation using WebXR. But that's like saying the audio element isn't needed as we have Web Audio
21:20:03 <cpn> ... A declarative way of doing something, you could do in WebXR, but make it accessible of you're not a WebXR expert
21:20:29 <cpn> ... WebXR isn't really about media playback, it's about building blocks for immersive experiences
21:20:48 <cpn> ... So I'm not really suggesting fully immersive, but making use of the capabilities of the device, bigger viewport
21:21:04 <cpn> Marcos: It could use environemental lighting as well, so has privacy implications
21:21:58 <cpn> Jer: Vision Pro has modes for media presentation, where the device can mutate the point of view it passes through to the user's view such as light dimming
21:22:22 <cpn> ... Doesn't seem like something feasible without revealing detail about the environment the user is in
21:22:53 <cpn> Francois: I'm trying to map where these features might fit, what's required
21:22:58 <gregwfreedman> gregwfreedman has joined #mediawg
21:23:15 <cpn> ... Caption data could be an extension of WebVTT or TTML, something to be added there?
21:23:28 <cpn> ... What is needed for motion safety, could it be in-band metadata?
21:23:51 <cpn> Jer: Those are format questions. Don't know if WebVTT is the right thing for depth information
21:24:52 <cpn> ... The point that's closest to the user visually, it's about describing where the closest point to the user is, tells the UA where to put the captions, so the captions don't appear deeper or inside something visual
21:25:20 <cpn> ... Is it just the deepest part, or more of a depth map? Don't know
21:25:44 <cpn> ... For motion safety, the same problem exists, what is the format - JSON, text, etc?
21:26:04 <cpn> Chris: So there's a temporal aspect?
21:26:10 <cpn> Jer: Yes, it can change frame by frame
21:26:35 <cpn> ... Motion safety is important to have a few frames in advance, to restrict the viewport so it's less immersive, for comfort and safety reasons
21:27:01 <cpn> ... For non-professionally captured media, there can be a lot of motion captured, so can be disconcerting to watch
21:27:33 <cpn> Chris: Media encodings, things not currently supported?
21:28:09 <cpn> Jer: HEVC, where information stored in an additional layer. It's a delta on the orignal captured frame, to ship a stereo presentation more efficiently
21:28:52 <cpn> ... Formats that encode left and right eye separately. Google has a proposal, I can find the info, not sure if it's standardised. It tells how to interpret visually the signal from the encoder
21:29:01 <cpn> q?
21:29:14 <cpn> ack tidoust
21:29:44 <cpn> Francois: Remembering the workshop on media production, Bruce Devlin talked about different forms of metadata
21:30:14 <cpn> ... Here you have frame level metadata for motion, and for captions may not be frame by frame, not the same need for precision and sync?
21:30:45 <cpn> Jer: Two answers: should have the same cadence as the captions themselves. But there are frame by frame formats for depth info, so don't want to commit
21:31:27 <tidoust> -> https://www.w3.org/2021/03/media-production-workshop/talks/bruce-devlin-metadata.html Metadata in production workflows talk by Bruce Devlin
21:32:28 <cpn> Jer: There may be other use cases for depth info that do require frame accuracy, for demos that push the boundaries, e.g., to do clipping correctly, place other pieces of the web page
21:33:13 <cpn> Chris: Other vendor implementations?
21:33:39 <cpn> Jer: Stereo video is relevant here. Other devices let the users pick the projection, and how to interpret the data from the encoder - horizontally or vertically divided
21:34:22 <cpn> ... Not a good way to put in the media file itself. So relies on the person. They had the same problem, so the standardisation opportunity for them is the same
21:34:55 <cpn> ... Similar problem with depth info and captions, either put outside the viewport, or it looks uncomfortable if the depth info is wrong
21:35:10 <cpn> q?
21:36:26 <cpn> Chris: Which of this would come to this WG, which else where?
21:37:10 <cpn> Jer: Not thinking everything to come here. I've been asked about how in the HLS manifest you can specify the layered approach. Hard to do same as in a native app
21:37:38 <cpn> ... If I give you HEVC with a depth layer, can't do. Media Capabilities
21:38:09 <cpn> ... Viewport controls could be an HTML question, given a 360 video stream, and want to change the viewport angle, can only do in WebGL. Could add to video element
21:38:59 <cpn> ... Display capabilities in CSS
21:39:28 <cpn> Chris: Media Capabilities could be done here, we already have spatial audio
21:39:55 <cpn> Jer: Could be a need for something more dynamic in Media Capabilities
21:40:19 <cpn> Chris: Next steps?
21:40:44 <cpn> Francois: There's an active immersive captions CG at W3C. Have you interacted with them?
21:40:59 <tidoust> -> https://www.w3.org/community/immersive-captions/ Immersive Caption CG
21:41:32 <cpn> Jer: I didn't know about them. For things that might belong in Media WG, could file issues on the WG or the relevant standards
21:42:58 <cpn> ... So taking the list and breaking them out into issues is a good next step
21:43:17 <cpn> Chris: Happy to follow up from a MEIG perspective to bring this to other audiences
21:43:28 <cpn> Topic: MSE and EME registries
21:43:32 <jernoble> jernoble has joined #mediawg
21:43:45 <tidoust> -> https://w3c.github.io/immersive-captions-cg/360-captions/ Recommendations for accessible captions in 360 degree video
21:44:43 <jernoble> chrisn: Topic: Registries in EME and MSE
21:45:02 <jernoble> * we reviously discussed needing editors for those registries
21:45:34 <jernoble> * chrisn has taken on editor responsibilities for MSE Byte Stream Registry and WebM Byte Stream
21:47:51 <jernoble> * W3c/encrypted-media#524
21:49:53 <jernoble> * w3c/encrypted-media#526 These two issues update the registries to use the W3C registry track, as well as using correct registry names
21:50:22 <jernoble> * Proposal: move the MSE and EME registries to the W3C Registry track
21:50:51 <jernoble> * Currently published as notes, with normative content
21:52:12 <jernoble> * May currently be notes due to discussions about formats and codecs
21:53:01 <jernoble> marcosc: These may just be able to sit on Registry track if they only need to be updated occasionally
21:54:11 <jernoble> chrisn: The documents within the registries benefit from being on the Recommendation track since they do contain normative language
21:54:33 <jernoble> * The benefit would be that the documents would be covered under the patent policy
21:58:11 <jernoble> jernoble: presented the following slides as text:
21:58:13 <jernoble> * Spatial Media Standards Soft Spots
21:58:35 <jernoble> * - Stereo Video Support Detection * - Motion Safety Metadata * - Fullscreen-only Capability Detection * - Caption Depth Data * - 360º/180º Viewport Controls * - Spatial Soundscapes * - Environment Dimming Support
21:58:51 <jernoble> * - Stereo Video Support Detection
21:58:55 <jernoble> * - Motion Safety Metadata
21:58:58 <jernoble> * - Fullscreen-only Capability Detection
21:59:04 <cpn> i/chrisn: Topic:/scribe+ jernoble/
21:59:05 <jernoble> * - Caption Depth Data
21:59:11 <jernoble> * - 360º/180º Viewport Controls
21:59:15 <jernoble>  * - Spatial Soundscapes
21:59:16 <jernoble>  * - Environment Dimming Support
21:59:52 <cpn> rrsagent, draft minutes
21:59:54 <RRSAgent> I have made the request to generate https://www.w3.org/2024/03/19-mediawg-minutes.html cpn
22:00:14 <cpn> rrsagent, make log public
22:03:40 <cpn> i/we reviously/https://docs.google.com/presentation/d/1azcBi0C-Sw_bF6x-SIBdpLorLQLkSfdyKeWTnphka2Y/edit <- Chris's slides/
22:03:43 <cpn> rrsagent, draft minutes
22:03:44 <RRSAgent> I have made the request to generate https://www.w3.org/2024/03/19-mediawg-minutes.html cpn
22:04:55 <cpn> s/reviously/previously/
22:05:00 <cpn> rrsagent, draft minutes
22:05:01 <RRSAgent> I have made the request to generate https://www.w3.org/2024/03/19-mediawg-minutes.html cpn