W3C

– DRAFT –
Media and Entertainment IG vF2F meeting - Day 2

27 October 2021

Attendees

Present
Adam_Dawidziuk, Amy_Huang, Andreas_Tai, Barbara_Hochgesang__Intel, Benjamin_De_Kosnik, Calvaris, Chris_Lorenzo__Comcast, Chris_Needham__BBC, Dong-Young_Lee__LGE, Dr._Rachel_Yager, Eero_Hakkinen, Francois_Daoust__W3C, Frode_Hernes, Gary_Katsevman__Brightcove, Geun_Hyung_Kim__Gooroomee, Giuseppe_Pascale, Hiroshi_Fujisawa__NHK, Hiroshi_Ota__Yahoo!_Japan, Hyojin_Song__LGE, Igarashi_Tatsuya__Sony, Jaroslaw_Kubiec__XPERI, Jeff_Jaffe__W3C, John_Riviello, Jon_Piesing__TP_Vision, Judy_Brewer__W3C, Karen_Myers__W3C, Kazuhiro_Hoya__JBA, Kaz_Ashimura__W3C, Kinji_Matsumura__NHK, Mark_Corl, Mark_Lomas__BBC, Martin_Wonsiewicz, Michael_Bergman__CTA_WAVE, Michael_Dolan__ATSC, Paul_Hearty__Samsung,_WAVE,_ATSC, Phillip_Maness__Xperi, riju, Rob_Wilson, Shannon_Janus, Takio_Yamaoka__Yahoo!_Japan, Tatsuya_Sato__NHK, Tomoaki_Mizushima__IRI, Will_Law__Akamai, Wojciech_Mycek, Wouter_van_Boesschoten, Yasushi_Minoya__Sony, Zachary_Cava
Regrets
-
Chair
ChrisN, Igarashi
Scribe
cpn, kaz

Meeting minutes

Introduction

https://docs.google.com/presentation/d/15-QdWc87IiUhlPOwER7Gxde1a2FvG5h7OYMva0eGc_s/edit <-- Chris's slides

ChrisN: This is the 2nd meeting of MEIG during TPAC
… Feel free to contact the group co-chairs for any questions about the group
… Our mission from the Charter is about use cases and requirements for better media app support on the web
… Resources: home page, charter, GitHub, etc
… The minutes will be published publicly later, notes taken on IRC
… Be aware of the Code of Conduct and Patent Policy

kaz: Queue management by Zoom's raise hand
… I'll add them to speaker queue on the IRC side

NHK's update on Hybridcast

https://www.w3.org/2011/webtv/wiki/images/5/52/NHK-update-MEIG-TPAC-2021.pdf <-- Sato-san's slides

ChrisN: I'd like to welcome Mr. Sato's presentation first

Sato: I'm from NHK, I'll present an update and issues from Hybridcast
… First, I'd like to explain our future vision of a web based broadcast platform
… We aim to make it possible to use any viewing environment and provide the same user experience regardless of device and transmission path
… Our goals are to make the user experience of broadcast and internet streaming seamless, with the same quality of service
… Two requirements. A TV and smartphone can be connected for remote control
… We're considering providing content to devices using W3C WoT technology

Seamless switching between broadcast and internet streaming

Sato: We're using an OS and platform independent HTML app
… A broadcast oriented managed application is used for presenting broadcast programmes and the content selection UI
… A broadcast independent application runs independently of broadcast services
… It's used for content selection and internet streaming
… It's possible to switch seamlessly between broadcast and internet streaming
… [Application demo]
… The initial screen shows live broadcast programs, with ondemand programs
… You can go to live and on-demand programs or transition directly from internet to broadcast without returning to the home screen, and vice versa
… This application has a remaining issue that transition between broadcast and internet streaming isn't as fast as switching between broadcast channels
… Requirments: Low delay for video playback, to allow users to view video at the same time
… Firing events with precise timing accuracy, for dynamic ad insertion and programme-linked UI
… Reducing latency in online delivery. Would CMAF with WebTransport be a solution?
… Event firing in MSE playback is also an issue. Accuracy of event handling in JavaScript is affected by other processing on the device
… Hybridcast Connect allows you to connect your TV to your smartphone. It provides device discovery and a command interface
… The protocol uses open standards to connect the devices. In our current spec version, it's built on open standards such as DIAL
… and two-way communication using WebSocket
… Open and secure standards are desired for communication, e.g., HTTPS in local networks
… We're looking at new ways to present content using IoT devices
… Examples include presenting audio on smart speakers, or news on a smart mirror. You could change the color of your room lights, linked to the content
… An issue is that there's no established method to present content based on device characteristics. WoT is promising for this issue
… Our vision is an IoT based media framework
… It delivers broadcaster content to a device without a broadcast tuner, integrated broadcast and broadband services
… Connect with various IoT devices and internet services
… [Demo]
… Devices in this video are operated using Hybridcast Connect and Web of Things
… Thank you

ChrisN: Thank you for your presentation!

Kaz: I work for both MEIG and WoT WG
… You mentioned several issues on TV performance, timing mechanisms
… There was discussion on performance at the last MEIG meeting. Are you interested to join that discussion?

Sato: Yes, I am

Kaz: What kind of WoT Thing Description was used? Maybe you could provide input on the WoT side as well
… You could work with Endo-san for that purpose

Sato: I'd like to continue work on WoT, yes

Rob: Regarding IoT integration. Are you looking for a synchronisation mechanism for timed events?
… For example, synchronising room light changes, should that be synchronised to the media?

Sato: Currently we're using broadcast content timecode for synchronisation, but want to use the MTE mechanism

Rob: I wonder if there's interest to use DataCue, which we'll discuss next

ChrisN: I wanted to ask about secure connections
… There was a group named HTTPS in the Local Network CG
… Has there been any further activity or updates?
… One group I am involved in, Second Screen WG, is also working on secure discovery and connection protocols for devices on a local network
… I wonder if that could be a possible solution?

Igarashi: I'm co-Chair of the CG
… The current status is that the group is not very active, due to several issues including the COVID situation
… We've been working on the issue almost 4-5 years and discussed several possible solutions
… but have not got feedback. If you have any feedback, it would be welcome

Kaz: There's also interest in WoT discovery capability. Also decentralized identity. Need to continue discussion on those topics

<jake_> Concrete proposal: https://blog.filippo.io/how-plex-is-doing-https-for-all-its-users/

Jake: Plex has a solution that they have published
… Not sure how to submit a concrete proposal, but it's the solution I think of when this issue comes up
… I wonder if it's helpful? Plex is a local media server. They changed their servers to use HTTPS to continue interoperating with browsers
… The article describes how it works. I'm not sure how applicable it is to your use case

<kaz> ChrisN: The other point you mentioned is seamless switching, and low latency protocols for that purpose

<kaz> ... Do you have any data on the causes of delays, such as the need to buffer content, reinitialise the TV decoder, etc?

<kaz> Sato: The issue is that broadcast content and internet content are come from different sources

<igarashi> here is the http local network cg github. https://github.com/httpslocal

<kaz> ... and causes the delay problem

Sato: The broadcast and streaming content is another document in the web app

<kaz> ChrisN: I expect that HbbTV and DVB groups have looked into this problem.

<kaz> ... Those could be good places for further discussion

<kaz> Sato: Thanks

<igarashi> The CG has studied about the Plex solution. One of issues is that it requires TLS server certificates for a bunch of IoT devices.

<igarashi> The other issue is that it does not support adhoc discovery of device on local network.

<kaz> ChrisN: if you'd like to organize future meetings to focus on any of these topics, happy to organize it

<kaz> ChrisN: any other questions?

<kaz> (none)

ChrisN: Thank you very much for presenting!

Media Timed Events update

https://docs.google.com/presentation/d/11DqQINDd-zmMvSmQafZ6S6ZL7a5vK3tLpfAHEixWudc/edit <-- ChrisN's slides

ChrisN: I'll give an update on the Media Timed Events project that we have been running for a while in MEIG
… [History and Background]
… As background, the HTML5 spec included a DataCue API, but it was removed in WHATWG HTML, because it needed interest from multiple browser vendors
… WebKit is the only mainstream browser to implement it so far
… HbbTV uses HTML DataCue for DASH events
… MSE issue #189: add support for media-encoded events requests in-band even support to be added to MSE
… The MEIG Media Timed Events TF started in 2018 following input from ATSC and DASH-IF
… CTA WAVE have proposed CMAF MSE Byte Stream Format spec, which includes a requirement for surfacing emsg events
… We have a WICG DataCue repo since 2018, https://github.com/wicg/datacue
… [Use Cases]
… I won't go into the details of each use case, but an example is a lecture recording with synchronized slideshow, or a video with synchronized map display
… Client side dynamic content insertion was raised in Mr Sato's presentation. From what I've heard elsewhere, server-side insertion is more commonly used, so I would like to how how much demand there is for an API for client-side insertion
… [Developer benefits]
… Apps today must currently either use VTTCue or custom app code
… For custom JavaScript code, the HTMLMediaElement timeupdate events are too coarse (250 ms), so to get accurate synchronization you need to poll the media element's currentTime from a higher resolution timer loop, but polling is expensive
… for VTTCue, you can't store data objects directly, only strings, so you need to serialize your data to JSON. Also, VTTCue is really intended to cue rendering so it has attributes to describe rendering properties, which aren't needed for timed metadata cues
… [API proposal]
… Our API proposal consists of 3 parts:
… 1. A DataCue API, based on the existing WebKit implementation
… This has a value field that carries the data, in an aribrary structure, and a type field that indicates what kind of data structure is used
… 2. Mappings for browser-generated timed matadata events, which could be events carried in a DASH manifest for browser-integrated DASH players, or events parsed from the media container in MSE
… 3. Extending the TextTrackCue interface to support unbounded cues
… In a live stream, we may know the cue start time but not the end time, and the end time can be updated later, when it becomes known
… [In-band emsg event handling]
… We have had a good collaboration with the DASH-IF Events TF, which is defining interoperability guidance for DASH events
… There are lot of open questions for how to handle in-band emsg events in MSE, so work is needed to come up with a proposal
… I need your input and help to move this forward!
… [In-band emsg event subscription API]
… We hav discussed possible need for an event subscription API
… This would allow the web app to set the dispatch mode to on-receive or on-start,
… and allow the web app to tell the browser which events to surface to the app
… Some feedback from Safari WebKit is that they would only support the on-start dispatch mode, due to limitations of their media playback engine
… [In-band emsg event handling]
… Open questions. Is there still interest in this feature?
… Should we leave in-band event parsing to JavaScript?
… Editorial help is wanted to develop the explainer and the spec draft. If you're interested, please let me know. Your help is appreciated
… [TextTrackCue unbounded end time]
… In April 2021 we got changes to the HTML and WebVTT specs to support unbounded cues
… Thanks to the help from Rob Smith
… He also contributed test cases to Web Platform Tests
… Implementation bugs are filed, but code contributions are needed in browsers to implement the changes
… [Unbounded cues in WebVTT]
… The next topic we are looking at, is WebVTT file format. Do we need a syntax change to allow unbounded cues?
… There are two main use cases: timed metadata in live streams (chapters, etc.), and live captioning
… From that, we have the requirement to allow a cue to have an unbounded end time, update an previous cue, and maintain compatibility with existing WebVTT parsers
… [Unbounded cues in WebVTT: current status]
… WebVTT issue #496 discusses syntax options. For timed metadata, we concluded that no change was needed
… But there are still open questions, e.g., for live captioning or delivering live WebVTT over WebSockets there may still be a needed for an unbounded cue syntax
… [Unbounded cues in WebVTT: current status]
… (diagram with Chapter 1 and Chapter 2 at the top)
… (and segment 1, segment 2 and segment 3 corresponding to those chapters)
… In segment 3, the chapter 2 continues and the endTime is extended
… This way allows the media player to know the current chapter when it joins the live stream, as it's repeated in each segment
… [Documents]
… This slide shows the various documents we've written: requirements, explainer, etc
… [Next meeting]
… The next Media Timed Events meeting is Monday, 15 Nov. 2021. Your participation is welcome
… Any questions?

Rob: Great summary. I have a couple of points to add
… You asked about the purpose of the API and is this still required?
… Are there there common interfaces to access the data?

ChrisN: It depends on which part you mean
… It's possible to implement media parsing to extract events in JavaScript, as the media formats are well defined
… The other part is more for your own data objects that the application creates
… You can use VTTCue to handle those events but it's a bit inconvenient because it only handles strings and not data objects
… It may be considered as a developer optimization
… We have not had strong interest by browser vendors so far
… The other issue with VTTCue is the timing accuracy for event firing
… We made a spec change to HTML to increase the timing accuracy
… Before it could be delayed up to 250 ms, but we reduced that number to make it more accurate
… There is an implementation bug for Chrome, and some work was done, but I'm not really sure of the current status. We should follow it up

Rob: It's important to be able to recognise the type of data, and there's no place to put that in VTTCue, so you'd have to infer it from the data somehow

ChrisN: Yes, that is a key benefit of DataCue
… Safari's implementation uses that to communicate which data format is used,
… so the type field is important

Rob: The other thing we need is a place for cue identifiers

ChrisN: The id field is in TextTrackCue which DataCue inherits, but my example didn't show that
… For inband events, the id field mapping needs to be defined

Rob: Thanks

<cpn> Kaz: For automated live captioning in zoom, what mechanism are they using, and is it related to your proposed use case?

ChrisN: I don't know what they're using specifically, but related to the live captioning use case would be the need to align with a media timeline
… That may be more an issue for Timed Text WG, whereas in the Media Timed Events TF we've focused more on timed metadata events rather than captions
… We're collaborating on the common technical issues on unbounded cues

<RobSmith> The Zoom live captions are visibly correcting themselves in real time, e.g. happy -> happening

ChrisN: Any other comments?

(none)

ChrisN: Thank you for the discussion
… Feel free to contact us
… We'll follow up to continue the discussion for these topics
… Apologize we went over time

Next meetings

ChrisN: Nov. 2, 1am UTC is the MiniApps joint meeting
… It's a follow-up from the TV application performance discussion from Monday's meeting
… That would be a different approach for Web technologies to handle media, so we want to have an exploratory discussion
… We'll also start a new activity on performance, your participation in that is also welcome
… We'll make announcement on the mailing list
… Thank you all!
… I look forward to seeing you soon

[adjourned]

<jake_> thanks chris

Minutes manually created (not a transcript), formatted by scribe.perl version 147 (Thu Jun 24 22:21:39 2021 UTC).