13:48:48 <RRSAgent> RRSAgent has joined #me
13:48:48 <RRSAgent> logging to https://www.w3.org/2020/10/15-me-irc
13:48:54 <Zakim> Zakim has joined #me
13:49:25 <cpn> Meeting: Media & Entertainment IG, Timed Text WG, Media WG Joint Meeting
13:49:50 <cpn> Agenda: https://www.w3.org/2011/webtv/wiki/TPAC_2020_meeting#Thursday_15_October_2020:_Timed_Text_WG_.2F_Media_WG_.2F_Media_.26_Entertainment_IG_Joint_Meeting
13:53:05 <cpn> present+ Chris_Needham
13:53:23 <cpn> present+ Marusu_Takechi
13:55:26 <cpn> present+ Germain_Souquet
13:56:00 <cpn> present+ Tohru_Takiguchi
13:56:02 <MTakechi> MTakechi has joined #me
13:57:49 <cpn> present+ Satoshi_Fujitsu
13:58:28 <takio> takio has joined #me
13:58:51 <germain> germain has joined #me
13:59:08 <cpn> present+ Franco_Ghilardi
13:59:14 <cpn> present+ Takio_Yamaoka
13:59:30 <cpn> present+ Francois_Daoust
13:59:34 <cpn> present+ John_Riviello
13:59:38 <tidoust> tidoust has joined #me
13:59:48 <cpn> present+ Eric_Carlson
13:59:51 <cpn> present+ Mounir_Lamouri
13:59:56 <cpn> present+ Nigel_Megitt
14:00:13 <nigel> nigel has joined #me
14:00:28 <tidoust> present+ Francois_Daoust
14:00:36 <fghilardi> fghilardi has joined #me
14:00:52 <cpn> present+ John_Simmons
14:00:54 <eric_carlson> eric_carlson has joined #me
14:01:03 <cpn> present+ Andreas_Tai
14:01:18 <Will_Law> Will_Law has joined #me
14:01:20 <cpn> Topic: https://www.w3.org/2011/webtv/wiki/TPAC_2020_meeting#Thursday_15_October_2020:_Timed_Text_WG_.2F_Media_WG_.2F_Media_.26_Entertainment_IG_Joint_Meeting
14:01:34 <atai> atai has joined #me
14:01:35 <nigel> nigel has changed the topic to: Agenda: https://www.w3.org/2011/webtv/wiki/TPAC_2020_meeting#Thursday_15_October_2020:_Timed_Text_WG_.2F_Media_WG_.2F_Media_.26_Entertainment_IG_Joint_Meeting
14:01:43 <cpn> Present+ Joey_Parrish
14:01:53 <cpn> present+ Mark_Watson
14:01:53 <markw_> markw_ has joined #me
14:01:58 <johnsim> johnsim has joined #me
14:02:05 <Will_Law> Present+ Will Law
14:02:07 <markw_> present+ markw
14:02:14 <atai> Present+ Andreas Tai
14:02:19 <johnsim> present+ johnsim
14:02:27 <mounir> mounir has joined #me
14:02:35 <sfujitsu> sfujitsu has joined #me
14:02:36 <atsushi> atsushi has joined #me
14:02:38 <cpn> present+ Gary_Katsevman
14:02:55 <mounir> present+ Mounir_Lamouri
14:03:01 <cpn> present+ Cyril_Concolato
14:03:21 <atsushi> present+
14:03:49 <Joey_Parrish> Joey_Parrish has joined #me
14:04:20 <zacharycava> zacharycava has joined #me
14:04:39 <peng> peng has joined #me
14:04:49 <tidoust> scribe: tidoust
14:05:04 <cyril> cyril has joined #me
14:05:44 <Matt_Wolenetz> Matt_Wolenetz has joined #me
14:06:05 <tidoust> cpn: 2 hours for this, we may not need the whole time. 3 main topics to cover. We'll start from an update from Nigel on Timed Text, audio description, then TextTrackCue proposal. Finally, I'll give an update of where DataCue is.
14:06:29 <Matt_Wolenetz> q+ I'm available to answer MSE text track queries
14:06:39 <tidoust> ... There was the possiblility to talk about TextTrack support in MSE. I don't know if anyone on the call is able and willing to talk about that today, feel free to chime in if you do!
14:06:59 <nigel> ack Matt_Wolenetz
14:07:24 <tidoust> Matt_Wolenetz: I'm available if people have questions about TextTrack support in Media Source Extensions.
14:07:36 <tidoust> cpn: Excellent, maybe we'll do that near the end of the call then
14:07:47 <tidoust> Topic: Audio Description profile
14:08:15 <cpn> scribenick: cpn
14:08:20 <tidoust> nigel: Chris introduced this topic as being from the Audio Description CG. That's where it originated. Since then, it has moved to the Rec track in the Timed Text WG.
14:08:31 <tidoust> ... Not a lot of progress since the document transitioned, I must say.
14:08:56 <tidoust> ... Additional energy from people would be most welcome.
14:09:30 <tidoust> ... [showing specification]
14:10:02 <tidoust> ... It's an exchange format for audio script, mixing instruction.
14:10:18 <BarbaraH_> BarbaraH_ has joined #me
14:10:18 <tidoust> ... There are some examples in the specification. Based on TTML
14:10:48 <tidoust> ... [going through examples in the spec]
14:11:11 <tidoust> -> https://w3c.github.io/adpt/#intro-example Example documents
14:11:34 <tidoust> nigel: You can end up with complicated instruction to control the gain and so on.
14:11:52 <tidoust> ... It could be done on the client or server-side to generate a separate audio track. That does not matter.
14:12:19 <tidoust> ... The sort of implementation that we've published here has some video and showcases the benefits of the approach.
14:12:22 <plh> plh has joined #me
14:12:32 <tidoust> ... One is to adjust the relative volume of the audio description compared to the main program.
14:12:52 <tidoust> ... Another is making this available to assistive technologies.
14:13:35 <tidoust> ... [goes through demonstration video]
14:14:11 <tidoust> ... All of the features that we're using here are in TTML2. This is a profiling activity.
14:14:22 <tidoust> ... We need a substantive part of it.
14:14:57 <tidoust> ... It appears that this should be all quite easy to get this done and in standards space. But it just needs some energy, and I've had other priorities.
14:15:14 <RobSmith> RobSmith has joined #me
14:15:39 <tidoust> cpn: My main question is implementation support. Presumably, all can be done through JS?
14:15:49 <jhelman> jhelman has joined #me
14:16:30 <plh> q+
14:16:46 <tidoust> nigel: Yes, this is what we use here, through Web Speech API, and so on. It would be kind of nice if Web Audio finally got to Rec, but Web Speech is obviously not being worked on for now. There is a whole area of discussions to be had about the needs to provide some speech fonts.
14:17:03 <tidoust> ... From a BBC perspective, it would be nice to be able to provide a BBC voice!
14:17:15 <tidoust> ... Something to think about.
14:17:34 <tidoust> ack plh
14:17:37 <Louay_> Louay_ has joined #me
14:17:46 <Louay_> Present+ Louay_Bassbouss
14:18:03 <tidoust> plh: Web Audio is done actually. Just wrapping things up, Rec should be by end of the year or just slightly afterwards.
14:18:13 <tidoust> cpn: About speech synthesis API?
14:18:32 <tidoust> plh: Not in scope of a Working Group. Not aware of any recent discussion on the topic to be honest.
14:18:41 <RobSmith> q+
14:18:42 <tidoust> cpn: Any other perspective, from implementors perhaps?
14:18:48 <tidoust> ack RobSmith
14:18:55 <nigel> q+ to mention sync and potential native implementations
14:19:26 <tidoust> RobSmith: This strucks me as something similar to what we're doing with metadata. Isn't there going to be some latency issue as you need to download remote assets?
14:19:34 <tidoust> ... Don't you have synchronization issues?
14:20:11 <tidoust> nigel: Thanks for the question. That proxies the point I wanted to raise. Fetching resources is one potential issue, and second is synchronization of playback.
14:20:13 <hober> hober has joined #me
14:20:22 <hober> present+
14:20:42 <tidoust> ... The sensitivity of timing for audio is relatively high. You may end up missing the beginning or having some dialogue at the end.
14:20:49 <tidoust> ... It still works pretty well.
14:21:34 <tidoust> ... [giving it a try with another demo]
14:22:01 <JohnRiv> q+
14:22:17 <tidoust> ... The browser code had to catch up. That illustrates the point very well. If you rely the more local processing or fetching of resources, you may run into problems. That's a good argument in favor of native support for the whole thing.
14:22:22 <tidoust> q- nigel
14:22:30 <tidoust> ack JohnRiv
14:23:16 <tidoust> JohnRiv: Similar to what we've been discussion in Web Media API guidelines. What's the recommended way for the user agent to handle synchronization when video playback and Web Audio?
14:23:24 <eric_carlson_> eric_carlson_ has joined #me
14:23:37 <tidoust> nigel: It's a real-world issue in some devices and a good question.
14:23:55 <tidoust> ... If we were to do it from a BBC perspective, we'd probably have to do it in a pre-rendered mode, on the server.
14:24:03 <tidoust> ... And we would lose some of the benefits of it.
14:24:48 <tidoust> ... Additional energy to develop the spec would be welcome. People in the group would probably be willing to do some implementation work.
14:25:14 <tidoust> ... If there is just me pushing for it and nobody else, that probably should not happen. It just would be a shame.
14:25:25 <tidoust> cpn: What could we do to change that?
14:25:42 <tidoust> nigel: We have a number of companies represented in the call. Come and have a chat with me!
14:26:00 <tidoust> ... It does not need much, but it does need some.
14:26:47 <tidoust> Topic: Proposal for a new Text Track interface
14:26:57 <cpn> scribenick: cpn
14:28:20 <cpn> Tess: Eric and I first pitched this at FOMS a couple of years ago
14:28:37 <cpn> ... iterated on it, got feedback at TPAC last year
14:30:00 <cpn> ... we've made a lot of progress
14:30:13 <cpn> ... I'll talk about the problem we're tying to solve.
14:30:31 <cpn> ... Today, if you want to deliver captions to the browser, you can use WebVTT for delivery and display,
14:30:40 <cpn> ... or you can do it all yourself
14:30:58 <cpn> ... A lot of people of people do it themselves, not using WebVTT.
14:31:18 <cpn> ... There are conversion costs, large library of content with subtitles, would be expensive to convert to VTT while preserving fidelity
14:31:45 <cpn> ... Storage costs. Also YouTube does dynamic caption generation, and the current API doesn't cater to that use case
14:32:03 <cpn> ... Bespoke is costly. You have to write your own renderer and maintain it.
14:32:25 <cpn> ... Accessibility costs, different jurisdictions have requirements, e.g., for user customization of caption display
14:32:36 <cpn> ... VTT served to the browser gets that for free, from the device
14:32:43 <cpn> ... You have to manage that yourself if you roll your own
14:32:58 <cpn> ... Platforms have picture in picture, which work with built in caption support
14:33:29 <cpn> ... Performance costs. If you rely on the browser's built in media stack, including captions, you're more likely to get frame accurate display of captions
14:33:48 <cpn> ... [Shows Mac OS accessibility preferences page]
14:34:03 <cpn> ... At TPAC 2019, we proposed to split the built-in browser feature in half
14:34:14 <cpn> ... to decouple the native display code from the VTT parser and processor
14:34:38 <cpn> ... We came up with an abstract JSON model, generate a JSON blob and hand to the browser
14:34:55 <cpn> ... Shuold be straightforward to generate from common caption formats
14:35:00 <kaz> present+ Kaz
14:35:03 <cpn> ... [Shows JSON example]
14:35:15 <cpn> ... It looks like a serialization of HTML in JSON.
14:35:23 <cpn> ... Feedback was why not just use HTML?
14:35:33 <cpn> ... Why not hand it a document fragment?
14:35:50 <cpn> ... If you're already rolling your own caption support, you have code that generates HTML
14:36:05 <cpn> ... This could be a cleaner way to hook into your existing code, and as a polyfill for older browsers
14:36:46 <cpn> Eric: We decided that we could take what we have in the spec now, and make some minor modifications to get where we want to go
14:37:06 <cpn> ... In the current spec, TextTrackCue doesn't have a constructor.
14:37:35 <cpn> ... So we defined a constructor taking start and end time, and a document fragment (rather than text, per the VTTCue constructor)
14:37:54 <cpn> ... We move getCueAsHTML() from VTTCue down to the base class
14:38:10 <Joey_Parrish> q+
14:38:10 <cpn> ... We want to make changes to HTML, CSS Pseudo-Elements, and WebVTT
14:38:57 <cpn> ... We don't think it makes to allow anything in the document fragment, so we define some processing requirements for what's actually allowed
14:39:48 <cpn> ... For the UA to apply the styles from the user's settings, the author needs to be able to identify the parts of the document fragment that represent the text of the cue and the cue background, so you can style these differently
14:39:51 <tidoust> ack Joey_Parrish
14:40:16 <cpn> Joey: Shaka player supports some smart TV platforms from before VTTCue, where TextTrackCue had a constructor
14:40:35 <cpn> ... How would we be able to detect the new vs the very old TextTrack Cue?
14:40:50 <cpn> Tess: You could check for getCueAsHTML on the prototype, yes
14:41:10 <cpn> Eric: We want to move pseudo-elements defined in WebVTT CSS extensions to CSS Pseudo-elements
14:41:29 <cpn> ... In WebVTT we need to make some minor changes - moving from derived class to base class
14:41:48 <cpn> ... We're proposing a very limited subset of HTML to be allowed in the document fragment
14:42:12 <cpn> ... Needs discussion. In the WebKit prototype, we allow br, div, img, p, rb, rt, rtc, ruby, and span elements
14:42:26 <cpn> ... These are styled with ::cue and ::cue-region pseudo-classes
14:42:52 <cpn> ... We recognize the cue and the cuebackground attribute
14:43:08 <cpn> ... It's inserted in the shadow DOM under the video element, so it's not accessible to script
14:43:27 <nigel> q+ to ask about the derivation of the semantic model equivalence in the HTML - what's a "subtitle" etc?
14:43:32 <cpn> ... This is an extension to the web API we have now to try to make a more flexible arrangement for captions and subtitles
14:43:48 <cpn> ... We can do it with small updates to HTML, CSS, WebVTT
14:44:04 <cpn> q?
14:44:04 <Matt_Wolenetz> q+ to "ask about conversion costs: are they less with this approach"
14:44:44 <cpn> Nigel: This is really exciting. The end result looks good, interesting approach
14:45:16 <cpn> ... There seems to be a core of this, which is a semantic equivalence between concepts in HTML and things you want to customize via system settings
14:46:04 <cpn> ... Whenever the subject of "what is a subtitle" comes up, we get different answers. What is the element to which it's reasonable to apply system level styling?
14:47:06 <cpn> Eric: The system settings let a user customize the display in terms of the text: color, font, outline, etc, and the background around the text
14:47:12 <cpn> ... This is based on requirements we have
14:47:28 <cpn> ... We looked at the requirements and settings and tried to come up with the simplest model we could
14:48:14 <cpn> ... It worked to define two different parts of the cue. Because it's based on tags in the document fragment, there's a lot of freedom for the author to define how they want the styles to be applied, or not be applied
14:48:55 <cpn> ... We also use styles defined in CSS. When a user defines their style in the system settings, they pick a font size. That's the size to use for a cue unless the cue has a defined size
14:49:07 <cpn> ... So it's an override for what's in the cue itself
14:49:33 <cpn> ... The model gives flexibility to allow user to define their needs, and to the author
14:50:09 <cpn> Tess: Part of the goal is to make authoring as simple as we can, while making it adapt the display to regulatory requirements, and flexibilty to achieve a desired layout
14:50:19 <cpn> ... This seems to be the smallest change to HTML to achieve that
14:50:45 <cpn> ... You have the ability to have your own styles, complex interplay. We think this is the right sweet spot.
14:51:23 <cpn> Nigel: The change to HTML is to add cue and cuebackground attributes. Do these tell the customization mechanism where the cue and background are?
14:51:29 <cpn> Eric: Exactly
14:52:01 <cpn> ... We have a demo, but it doesn't work right now. We took some existing BBC videos where they currently render captions themselves
14:52:30 <cpn> ... By injecting scripting, I could override their JS based rendering. I looked at the structure of the markup they use and modified it slightly by adding these attributes
14:52:59 <cpn> ... Their polyfill uses a VTTCue, but I could use the new API and made the cues look the same as they do now
14:53:32 <cpn> ... When rendered natively, when I change my local caption preferences, those are reflected in the way they're rendered, just by adding those attributes
14:53:34 <cpn> q?
14:54:11 <cpn> Nigel: How does it identify if a background is set? Every element has a default background colour, even if it's the default. How do you detect that?
14:55:15 <cpn> Eric: We just use CSS, the fragment is in the Shadow DOM, but it behaves as if it were anywhere else in the DOM
14:55:28 <cpn> Nigel: How do you know if there's something you mustn't override?
14:55:57 <cpn> Eric: We generate a UA stylesheet based on the user settings. If the user has specified a precedence, we make it the most important rule and let CSS handle it
14:56:13 <cpn> Tess: The UA !important always wins, so this is how it works
14:56:18 <nigel> ack nigel
14:56:18 <Zakim> nigel, you wanted to ask about the derivation of the semantic model equivalence in the HTML - what's a "subtitle" etc?
14:56:43 <tidoust> ack Matt_Wolenetz
14:56:43 <Zakim> Matt_Wolenetz, you wanted to "ask about conversion costs: are they less with this approach"
14:56:48 <cpn> Matt: I'd like to know more context around how document fragments would be easier to convert from legacy subtitle formats vs using VTT cues
14:57:29 <cpn> Tess: We're looking at the existing ecosystem. People already have code that does this, genrating HTML code to add to the page
14:58:03 <cpn> ... It's a  small change for exsting libraries, instead of creating a VTTCue, generate the HTML and put in the TextTrackCue constructor
14:58:25 <cpn> ... There are caption format that don't translate with full fidelity to VTT, so it's lossy. In those cases, conversion to HTML can be lossless
14:58:38 <mounir> q+ to hear Jess and Eric thoughts about second screen use cases when using HTML as an input
14:58:38 <nigel> +1
14:59:32 <cpn> Tess: If conversion can be lossless, there are also the storage costs for pre-converting these things. Either you convert on the fly, duplicating effort over time, or you double the storage needs
14:59:45 <cpn> ... The storage costs can be prohibitive for some services
15:00:09 <cpn> Matt: If there were a full fidelity conversion using VTTCue, would the use case be solved similarly?
15:00:27 <cpn> Eric: The issue is that existing polyfills work by converting from a native format to a document fragment
15:00:36 <atsushi> sorry but conflict joint meeting from next top (i18n+CSS). let me follow by minutes on TTWG+Media joint for later an hour.
15:00:37 <cpn> ... Then the make a VTTCue that's only used for timing purposes
15:00:53 <cpn> ... When the cue events fire, they take the document fragment and insert it into the DOM and remove it from the DOM
15:01:25 <cpn> ... in order for such a polyfill to switch to VTT, they'd have to write new code to generate VTT instead of a document fragment. So this seemed a better impedance match
15:01:47 <cpn> Matt: I wondered if this would help with the storage and conversion problem.
15:01:49 <cpn> q?
15:02:02 <tidoust> ack mounir
15:02:02 <Zakim> mounir, you wanted to hear Jess and Eric thoughts about second screen use cases when using HTML as an input
15:02:34 <cpn> Mounier: At TPAC two years ago, Mark Foltz and I met with people at the time. The use cases is second screen.
15:02:46 <cpn> ... With a simple format we could pass with the video to a second screen.
15:02:51 <cyril> are the slides posted somewhere
15:03:04 <cpn> ... HTML makes it more difficult to do that. Have you consdiered this?
15:03:34 <cpn> Eric: Would getCueAsHTML() not work?
15:04:09 <cpn> Mounir: Having HTML as input means that in second screen scenarios (e.g., cast devices), the cast device doesn't know about HTML
15:04:41 <cpn> ... Previously it would be easier to handle the JSON. Requiring the second screen to render HTML could be an issue as some screens cannot do that
15:05:19 <cpn> Eric: It's a good question. There'll be trade-offs with any solution. One (not great) option is to render locally and send over an image. You could also convert it to some intermediate representation.
15:05:43 <cpn> ... It seems to us that most of the uses won't require that simplification, it's a tradeoff that made sense in the big picture
15:06:25 <cpn> Chris: How to continue the discussion?
15:06:27 <igarashi> igarashi has joined #me
15:06:46 <cpn> Tess: We have an explainer, we can draft some spec language
15:07:07 <cpn> ... That will help with clarifications. We could take the document to WICG to start with, if it matures we could bring to Media WG
15:07:45 <cpn> Tess: One thing I like about this is that it's refactoring existing spec text, not lots of new spec text
15:08:15 <cpn> ... I'm hoping the actual amount of spec changes needed is minimal
15:08:59 <cpn> Nigel: There's also the Text Track CG.
15:09:09 <cpn> q?
15:09:12 <tidoust> scribenick: tidoust
15:09:23 <tidoust> Topic: DataCue API proposal
15:10:21 <tidoust> cpn: This is an update on the WICG activity that we have around the DataCue API. It is an API proposal to allow apps to create timed metadata cues to be triggered on the media timeline.
15:10:30 <tidoust> ... It is consistent with the existing HTML TextTrack API.
15:10:43 <tidoust> ... This is also a proposal to expose in-bands metadata tracks.
15:11:06 <nigel> i/... It's an exchange format for audio script/-> https://w3c.github.io/adpt/ ADPT specification
15:11:16 <tidoust> ... We've been collecting use cases for this. The most straightforward is lecture recording with a slideshow where the timed metadata cues update the slides in sync with the video.
15:11:36 <tidoust> ... Also Video synchronized with map, which Rob has been working on (WebVMT).
15:11:54 <nigel> i/... All of the features that we're using here are in TTML2/-> https://bbc.github.io/Adhere/ Adhere demonstrator of client side AD with ADPT
15:12:09 <tidoust> ... Then there's client-side dynamic content insertion where you may want to trigger some video overlay at some point in time, and the timed cues tell you when.
15:12:24 <tidoust> ... Playback metrics reporting is another use case to track how far playback has progressed.
15:12:43 <tidoust> ... Video playback with overlays, such as in sport events.
15:13:14 <tidoust> ... Also live linear programme events (BBC has "now" and "next" for instance)
15:13:34 <tidoust> ... Historically, the DataCue API was in HTML, implemented in WebKit, and then dropped from HTML.
15:13:53 <JerNoble> JerNoble has joined #me
15:13:54 <tidoust> ... More recently, some interest to surface in-band tracks in MSE.
15:14:16 <tidoust> ... Strong interest from external communities such as CTA WAVE and DASH-IF to surface in-band tracks.
15:14:22 <tidoust> ... and in particular emsg boxes.
15:14:45 <tidoust> ... This was brought to the Media & Entertainment IG in 2018. Since then, we set up a WICG project to develop the incubation.
15:15:32 <tidoust> ... For the in-band timed metadata cues, the primary feedback that we got is interest for emsg boxes. But there are other formats as well.
15:16:16 <tidoust> ... One of the goals is to reduce scalability issues for distributors by allowing to distributing metadata along with audio/video, not to have a to scale servers.
15:16:46 <tidoust> ... Also makes it easier to integrate with usual streaming server pipeline.
15:17:00 <tidoust> ... Additionally, apps may want to generate their own timed metadata cues.
15:17:24 <tidoust> ... What we haven't discussed is having some kind of API that can be used to surface in-band captions formats such as CTA 608/708.
15:17:52 <tidoust> ... There's some speculative text about how you might go about that, but that's not something that we have actually discussed in the incubation group.
15:18:06 <tidoust> ... If anybody's interested in that aspect, we'd be interested to discuss.
15:18:48 <tidoust> ... We're proposing this as a browser API because, as we talked to companies developing for embedded devices, they want to reduce the amount of parsing that they have to make using JS.
15:19:21 <tidoust> ... Having to do some extra work to extract segments duplicates some of the work that the user agent is already doing. Can we rely on the user agent instead?
15:20:14 <tidoust> ... And then there's the other argument around developer ergonomics. We heard this in the previous discussion with VTTCue. Since it has a constructor, you can use it to create your own metadata timed cues, using serialization/deserialization. But that's a workaround.
15:20:42 <tidoust> ... Three kinds of cues:
15:21:08 <tidoust> ... 1. instantaneous: start and end time are equal. There is a slight issue right now where cues of that type may be missed.
15:21:46 <tidoust> ... 2. cue with known start time and end time. Typically how captions are working.
15:22:11 <tidoust> ... 3. Also cues with a known start time but an unknown duration which may become known at a later point in time, or remain active indefinitely.
15:22:35 <tidoust> ... Example of a video with a map track, or captions in a live stream where you don't necessarily know when it's going to end.
15:22:48 <tidoust> ... In terms of the timing aspects, these are the 3 types we're looking at.
15:23:03 <tidoust> ... The proposal is to re-introduce the DataCue API, based on the Webkit implementation with one minor modification.
15:24:05 <tidoust> ... The actual data that is carried in the DataCue is in a value attribute, with a type field that tells you what type of value it is. This is useful in particular for the in-band case where the application will have to look at this.
15:24:43 <tidoust> ... The data field that used to exist seems no longer needed. Discussion is open on whether we deprecate/remove it.
15:24:59 <tidoust> ... Unrestricted end time to account for type 3. cues mentioned above.
15:25:24 <tidoust> ... We have a PR open for review against the HTML spec related to that. I'd be interested about feedback on how best to move that forward.
15:26:12 <tidoust> ... Specific proposal, assuming that there is implementor's support for it: standardize DataCue, probably in the Media WG once it's ready to transition out of incubation.
15:26:22 <tidoust> ... I already mentioned the unbounded cue duration.
15:26:23 <RobSmith> https://github.com/whatwg/html/pull/5953
15:27:03 <tidoust> ... There was a previous change to HTML that we made to recommend firing cue events to within 20ms of media timeline. My understanding is that Chrome has been working on that. Don't know about the exact current status.
15:27:19 <RobSmith> Unbounded TextTrackCue pull request
15:28:02 <tidoust> ... And then we have a proposal to define a standardized mapping to media in-band events, in particular emsg boxes in ISO BMFF / CMAF. We're working on this in collaboration with DASH-ID, they have some processing model around such events.
15:29:12 <tidoust> ... The Sourcing In-band Media Resource Tracks from Media Containers into HTML document is referenced from HTML, but it is not being maintained and does not really have a natural home. In a way, that would be the logic place to document the standardized mapping, but it's not clear whether what it contains already is actually implemented.
15:29:24 <tidoust> ... Get to something that could be referenced more normatively?
15:29:39 <tidoust> ... I talked about short duration cues that could be missed.
15:30:09 <tidoust> ... The solution to that is to use cue enter/exit event handlers. This is fine when the app creates the cues. It's much more difficult when the user agent generates the cues.
15:30:24 <tidoust> ... And not clear how the application can be aware of new cues.
15:30:48 <tidoust> ... Then there needs to be some work about what we call on-receive processing for emsg cues.
15:31:28 <tidoust> ... If the app needs to load some resources linked to a cue, that suggests that there are two steps: a preparation phase and then a phase where the cue actually affects the page.
15:31:42 <tidoust> ... Early exposure would be good.
15:31:51 <tidoust> ... The group is looking at option on ways to handle that.
15:32:32 <tidoust> ... One is an event when the user agent parses a cue. Consistent with existing media libraries such as Shaka player.
15:32:57 <tidoust> ... Another would be to add an early dispatch hint that would make the cue fire a bit in advance.
15:33:05 <tidoust> ... That's all I wanted to present.
15:33:09 <Matt_Wolenetz> q+ to "discuss integration with MSE specification of DataCue API. Separately, Chrome has web_test evaluating 20 ms cue firing expectation now IIUC"
15:33:21 <tidoust> ack Matt_Wolenetz
15:33:21 <Zakim> Matt_Wolenetz, you wanted to "discuss integration with MSE specification of DataCue API. Separately, Chrome has web_test evaluating 20 ms cue firing expectation now IIUC"
15:33:26 <JerNoble> JerNoble has joined #me
15:33:57 <atai1> atai1 has joined #me
15:33:58 <tidoust> Matt_Wolenetz: I wasn't at TPAC last year. Not fully up to speed on this. Interested with intersection with MSE.
15:34:39 <tidoust> ... For the emsg extraction, what MSE implementations might need to do is to populate cues from in-band messages, how would it determine the start and end times of emsg cues?
15:35:23 <tidoust> ... [going into details of earliest presentation time and timeline]
15:36:05 <tidoust> ... At what point do we now interoperably thatwe don't need to parse an emsg box because the time is already past.
15:36:47 <tidoust> ... Should we support both version of emsg boxes?
15:37:06 <tidoust> ... I'm having trouble understanding how times work with movie times.
15:37:23 <tidoust> cpn: In terms of the versioning, both 0 and 1 would be of interest, I believe.
15:37:32 <zacharycava> q+ on the timeline mapping piece
15:37:37 <tidoust> ... Timeline mapping varies between the two versions.
15:37:43 <tidoust> ack zacharycava
15:37:43 <Zakim> zacharycava, you wanted to comment on the timeline mapping piece
15:38:29 <tidoust> zacharycava: You're right, we're allowing messages before the timestamp gets defined. We should remove that.
15:38:59 <tidoust> ... Some association between the message and the chunk seems doable.
15:39:36 <tidoust> cpn: I think it's fair to say that we've been looking more at the message payload, but we haven't looked so much into how the timing aspects would need to work.
15:40:12 <tidoust> Matt_Wolenetz: The specification for MSE gives a lot of flexibility to implementations about how and when they parse things and start.
15:40:44 <tidoust> ... There might be interoperability issues depending on how and when we expose parsed cues
15:41:14 <tidoust> ... Are we in agreement about a potential MSE integration with emsg timestamp offset?
15:41:21 <JerNoble> q+
15:41:49 <tidoust> ... If you're shifting audio/video by 10s, should we shift emsg start time by 10s as well?
15:43:03 <tidoust> ... My assumption would be to make it more consistent with MSE, but that might come as a surprise to applications with legacy content that wouldn't expect things to happen this way.
15:43:12 <tidoust> ... Feature detection is another big question here.
15:43:48 <zacharycava> q+
15:43:49 <tidoust> ... All of the various types of cues. How can I understand which types are supported? Is there a registry? Is it arbitrary?
15:44:27 <tidoust> cpn: I know that the DASH-IF is collecting info on the different types that are in use. Some of that is application-specific.
15:45:06 <tidoust> ... We would not necessarily expect user agents to understand the message formats. The user agent could hand over the unparsed message and the application would do the parsing.
15:45:29 <tidoust> ... Mounir was wary about adding another layer of parsing last year, which could trigger interoperability issues.
15:45:53 <tidoust> ... The user agent wouldn't need to parse the message type, and could just pass the message over to the application.
15:46:30 <tidoust> Matt_Wolenetz: In the proposal, there is a sample where the message is filtered by the UA. That presumes that there is some specification for the UA to do that.
15:46:44 <tidoust> cpn: I think that part may need updating. Part of the thoughts process we went through.
15:47:03 <cpn> qq?
15:47:04 <cpn> q?
15:47:08 <tidoust> ack JerNoble
15:47:27 <tidoust> JerNoble: Two cents about Matt's questions. Yes, you would want to time shift cues.
15:47:41 <tidoust> ... Ad insertion use case would require you to match the media timestamps.
15:48:39 <tidoust> ... Webkit DataCue implementation does not support emsg boxes. Primarily comes from HLS. Different characteristics and different payload.
15:48:54 <tidoust> ... Just wanted to point out particularities of the Webkit implementation.
15:49:14 <tidoust> zacharycava: Second Jer's comment on shifting times.
15:50:29 <tidoust> ... I also wanted to provide context. In Hulu, we do actually implement extensions on top of MSE/EME. Something like DataCue is what we explored previously just so that we could expose in-band data without requiring JS code to handle the parsing. Very small payload for memory on low-memory devices.
15:50:41 <tidoust> ... I just want to say that there is a lot of practicality in this.
15:51:00 <tidoust> ... Especially in constrained environments.
15:51:02 <cpn> q?
15:51:05 <cpn> ack qach
15:51:08 <Matt_Wolenetz> q+
15:51:16 <tidoust> q- zacharycava
15:51:17 <tidoust> ack Matt_Wolenetz
15:52:31 <tidoust> Matt_Wolenetz: I would like to know where I can discuss feature discussion. What Datacue types might need support. HLS produced ones? ID3? Emsg boxes? We might need something like "canPlayType". I imagine that this would be needed.
15:52:46 <tidoust> s/feature discussion/feature detection
15:52:50 <Matt_Wolenetz> s/feature discussion/feature detection/
15:53:13 <tidoust> JerNoble: I'm entertained by the idea of exposing some feature detection mechanism for DataCue.
15:53:52 <tidoust> cpn: OK, thanks for the feedback, Matt. We'll be looking into that in the incubation group. I'd like to invite you to participate there.
15:54:21 <tidoust> Topic: TextTrack support in MSE
15:54:36 <tidoust> cpn: Anything that you could share about TextTrack support in MSE
15:55:15 <tidoust> Matt_Wolenetz: [audio chopped]. We removed experimental support for TextTrack because of issues.
15:55:38 <cyril> q+
15:55:55 <tidoust> ... Many sites have already developed their own out-of-band solutions for presenting the tracks. This is not necessarily a justification for not including it.
15:55:58 <tidoust> ack cyril
15:56:19 <tidoust> cyril: I'm wondering how this would interact with new TextTrackCue proposal?
15:56:38 <JerNoble> q+
15:56:57 <tidoust> ... I could imagine MSE handing over unparsed cue to the app, app doing the parsing, creating the DocumentFragment, and then the UA would render.
15:57:00 <nigel> +1
15:57:00 <tidoust> ... Is that acceptable?
15:57:08 <tidoust> ack JerNoble
15:57:58 <tidoust> JerNoble: I don't know what the benefit is about using MSE instead of fetching raw text and doing parsing on the side.
15:58:31 <Will_Law> q+
15:58:36 <nigel> q+
15:58:45 <tidoust> cyril: CMAF supports in-band tracks. Supporting this is beneficial. The new TextTrackCue proposal is also made to make the UA avoid doing the parsing.
15:59:06 <tidoust> JerNoble: Is your expectation that the UA would parse the TTML?
15:59:20 <tidoust> cyril: I don't think that the UA would have to do the TTML at all.
15:59:32 <tidoust> JerNoble: Ah, get it, you're talking about TTML in MP4. I see.
15:59:37 <tidoust> ack Will_Law
15:59:59 <tidoust> Will_Law: We want MSE do all the component parsing. That's the architecture we'd like to see.
16:00:03 <tidoust> q- nigel
16:00:53 <tidoust> cpn: Thank you everybody for joining today. The DataCue incubation continues in WICG. I may reach out to people directly. Thanks for the updates on TextTrackCue, and thank you Nigel for the part on Audio Description!
16:01:09 <tidoust> ... That's it for today. More meeting coming up next week, please check the schedule!
16:01:13 <tidoust> RRSAgent, draft minutes v2
16:01:13 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/15-me-minutes.html tidoust
16:08:43 <cpn> rrsagent, make log public
16:09:52 <cpn> present+ Rob_Smith
16:10:13 <cpn> present+ Barbara_Hochgesang
16:10:23 <cpn> present+ Zachary_Cava
16:10:34 <cpn> present+ Jim_Helman
16:10:50 <cpn> present+ Peng_Liu
16:11:05 <cpn> present+ Philippe_Le_Hegaret
16:11:19 <cpn> present+ Ali_Begen
16:11:42 <cpn> present+ Pierre-Anthony_Lemieux
16:11:49 <cpn> Chair: Chris
16:13:40 <cpn> rrsagent, draft minutes
16:13:40 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/15-me-minutes.html cpn
16:39:46 <cpn> rrsagent, draft minutes v2
16:39:46 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/15-me-minutes.html cpn
17:07:39 <atai1> atai1 has left #me
21:09:13 <atsushi_> atsushi_ has joined #me
21:16:11 <Karen> Karen has joined #me
21:18:28 <Karen> Karen has joined #me
23:44:00 <Zakim> Zakim has left #me