Media and Entertainment IG f2f meeting at TPAC 2019 in Fukuoka -- 16 Sep 2019

Group photo from the WoT F2F meeting on 19-20 September 2019
(Some more photos from at the F2F at Hilton)

<kaz> scribenick: kaz

Welcome and introduction

Chris: Welcome to the M&E IG F2F.
... This is a one-day meeting
... [W3C Code of Ethics]
... [Media&Entertainment IG: Mission]
... We track Puapply Web technology in general to media services
... Importantly, we gather new use cases / requirements to drive the development of media on the web.
... The goal is to make the web a great platform for media of all kinds, interactive media experiences.
... [History of Major Initiatives]
... Group was formed in 2011, had influence in development of media support in HTML5, MSE, EME, one of our significant achievements.
... Now looking towards MSE and EME v.Next.
... [Charter]
... Scope covers almost everything related to media, the entire pipeline from production, distribution, consumption.
... Continuous experiences, mainly audio and video related.
... Increasingly we're seeing more interactive media, e.g., a recent workshop on games on the web.
... [Tasks]
... We identify requirements, start incubations to address those needs, we've run numerous task forces over the years, with success in developing features in the web platform.
... Another aspect is to review and give input to media-related deliverables from other media related groups. There's now a new Media WG, clearly we want to have a relationship with them.
... We coordinate with other media-related groups, with liaisons e.g., MPEG, HbbTV, etc.
... Internationalization and accessibility, etc., are very important.
... [Work Flow]

<scribe> ... New ideas or issues can be raised by members or via liaisons from SDOs.

UNKNOWN_SPEAKER: We typically set up a task force to focus on the issue; use cases, requirments and gap analysis.
... We're not chartered to write specifications, because this is an IG.
... The outputs could be a bug report to an existing spec, to promote a new feature for another WG, or a new feature in a CG that then gets standardised in a WG.
... With the current process, we're encouraged to work more at WICG, which is a focus for new feature development on the web platform. If the incubation is successful, it would transition to a WG.
... [WICG]
... There's a Discourse forum, where you can post issues or send ideas and get feedback from the industry and from implementers.
... Once your idea has enough implementer support, you can get a GitHub repo for your proposed feature.
... You'd typically write an explainer that describes the feature, the goals, the user need.
... [Contributing to HTML and DOM]
... We're interested in the and elements, so under the WHATWG / W3C agreement, we would propose changes to these at WHATWG, or with help from the W3C HTML WG.
... [Task Forces]
... We've run a lot of task forces over the years, significantly the Media Pipeline TF which contributed to MSE and EME.
... We have two task forces active now: Media Timed Events TF and Cloud Browser API TF which is dormant.

remote+ Kazuhiro_Hoya, Lei_Zhai

Chris: [Monthly conference calls]
... We've been running these for the last couple of years, various topics, looking at what to do next, reviewing media specifications. Reviewing the Media WG specs has generated some useful input
... Also want to be looking ahead for new media related features.
... [Activities for 2020]
... We're entering a new phase of development of media support on the web, lots of topics in the Media WG.
... A couple of these are potential candidates for adoption into the WG. The IG should be looking closely, to ensure they meet our needs.
... [Activities for 2020 (cont)]
... One of the purposes of today's meeting is to set our prioirities for next year.
... Open question: what are the main things to address, new features or capabilities?
... I would like to capture ideas and suggestions during the day.
... In the afternoon session, we'll have a more open discussion.
... [Schedule]
... There's a lot to cover. Does anyone have additions to the agenda?
... (none)
... [Resources]
... There's various links here, for your reference.
... We have a new co-Chair, Pierre. It's great to have you on board.

Pierre: I do consulting, been involved standards since 2001 at IETF, SMPTE, ISO, and at W3C since 2011.
... I started with HTML WG, then TTWG.
... I'm the editor of IMSC and co-editor of TTML 1.0. Looking forward to helping here.
... Feel free to contact me and Chris (and Igarashi-san).

Chris: It's important to mention that Mark Vickers is in the process of stepping down as a co-chair

remote+ Mark_Vickers

Chris: Mark has been instrumental to the success of this group,
... and getting the Web as the platform for media as it is today, not just in the IG but also the HTML Media WG.
... Mark will continue to participate in the MEIG as an IE, supporting the co-chairs.

Mark_Vickers: I have loved working in this group, as one of the founding co-chairs.
... We've accomplished a lot, and there's more to do. I plan to stay involved as an Invited Expert.
... This group represents the best source of consolidated expertise in media, video and audio.
... Before HTML5 there wasn't a lot of support for media in the web and W3C, so there weren't a lot of media experts.
... We've been a source of reference for the Working Groups. What matters is making changes to the web,
... which means getting standards written at W3C, WHATWG, Kronos, and other related groups.
... We don't write the specs, but the IG is important for providing a consensus among media experts on issues,
... e.g, is a new API needed, is an API good enough, what's the priority among changes.
... And people look to this group to provide media expertise.
... That's the inward focus. The outward focus is to communicate with media companies, standards groups, regional groups not in W3C, on what W3C is doing.
... It takes a lot of work, we always need more help, but I'm glad to be able to help.
... There are three strong co-chairs.
... Pierre is coming in, representing leadership in the studio world. He's done Emmy award winning work on subtitles.
... I'm happy with how things are going, and encourage people to get involved. We're starting on HTML5 media 2.0, some of the specs are getting new versions, some are new.
... This is the time to expand media support. Please call on me any time, with my new email.
... Comcast, the company I've retired from, has a new representative. He's an experienced web person, John Riviello.

John: I'm here.

Chris: It's great to have you with us, thank you Mark.

Hybridcast update

slides

Ikeo: Welcome to Japan!
... I would like to talk about Hybridcast.
... [Today's outline]
... I will talk about the current status at Hybridcast. We started Hybridcast Connect last year, and we have a demo I would like to show you today.
... It will show some of the developments with Hybridcast test and verification.
... [Deployment status]
... [History of standardization and experiments]
... This this the history of Hybridcast standardisation. In 2019 we have Hybridcast Connect Conformance Test for the Hybridcast service.
... Hybridcast Connect is deployed on some of the TV sets. After the slides, I'll give a demo.
... [Shipment of Hybridcast receivers]
... This shows the number of receivers. We have over 10 million now. By 2023 we expect to have 29 million.
... [Trial deployment "Hybridcast-Connect"]
... What is Hybridcast Connect? This has new functions and services which we standardised in 2018.
... The API is a TV device API, for example to launch TV programmes and get programme information.
... The new APIs are experimentally implemented in some TV sets.
... brought an example here
... A number of companies are involved and broadcasters are developing services.
... [Typical Sequence by additional APIs]
... The API has 5 functions: MediaAvailabilityAPI, ChannelsInfoAPI, StartAITAPI to launch a broadcaster application, TaskStatusAPI to get check whether the Hybridcast browser is launched or not, and ReceiverStatusAPI is used to get the status of the browser, the TV channel, etc.
... [Hybridcast Connect demo]
... (brings a TV set in front of the screen)
... Today, we have two demonstrations, one is for emergency alerts which tunes to the broadcater's news programme on the TV.
... The other is for smooth guidance of catch up video from a mobile device. This is similar to an OTT service. want Broadcasters to let users know if there's an emergency.
... [Supposed use cases demo (1)]
... (Kaz adds webcam to webex)
... In this case the broadcaster or service provider sends a push notification. The mobile device receives the message and launches the associated app.
... In the associated app, the user can press the notification button to control the TV device and launch the TV programme.

David: Does it communicate with accessibility, screen readers on iPhone or Android, so users who can't see knows that it's popped up?

Ikeo: We'd like to do that, but this is just a demo implementation
... We have tried using an audio notification, similar to the iPhone application.
... This is the NHK application. The TV powers on and tunes to the NHK programme.

Pierre: Is the the API to the TV, or to the application on the TV?

Ikeo: The pairing is needs to be done before getting the notification.

Igarashi: It depends on the TV's implementation.
... It's possibly implemented by the TV platform, but we can also implement the protocol in the TV application.

Ikeo: It's kind of the DIAL protocol with a REST API to launch and get information.

Chris: Does this require a native app on the phone, rather than a web app?
... Can use Push Notifications API?

Yongjun: Does this introduce additional latency over the remote control?

Igarashi: As in, remote control for playback, etc.?

Ikeo: [Typical Sequence by additional APIs]
... Explains the sequence of communication
... The inter-application protocols use websocket.

Jongjun: How much latency is there in the round trips?

Ikeo: It's TV-set dependent

Igarashi: Do the protocols support all the functions of the IR remote?

Ikeo: Some specific functions are supported

Igarashi: Arrow key navigation?

Ikeo: That's implemented in the inter-application communication API. You can send left and right commands.

Igarashi: Using this, the user can control the on-screen menu.

Ikeo: We would like to implement all the functions supported by the remote control, but there are too many.
... We have a volume API, but this is a security issue for TV manufacturers,
... e.g., to avoid unexpected change of volume.
... This is a local network communicatio, all the 5 APIs are implemented based on HTTP.
... We implemented a pairing protocol, which is standardised.

Sudeep: Mobile phones have infra-red sensors that can be used to control the TV.
... Is this about tuning the TV when you're in the vicinity, or from wherever you are?
... I wonder what value this adds over IR.

Pierre: The TV has to implement this specific API.
... It doesn't work with any web enabled TV, only with those that support this specific control API.

Igarashi: TV manufacturers have to implement the APIs.

Lin_Li: How many TVs could be controlled from one mobile phone, e.g., if a family has more than one TV?

Ikeo: This depends on the TV implementation, but more than two.
... We'd like to handle more than one TV, but TV vendors say hundreds of mobiles can't be connected, as it's hard to implement, so the standard says 2 or more.

Igarashi: If the user has multiple TVs, you can select which TV from the companion device?

Ikeo: Yes, TV is used within a local network, user selects which would be the best one to connected to.

David_Fazio: For emergencies, does the system nofity all the cellphones available in the area, which may belong to children, or is it just tied to one person?

Ikeo: The application keeps it stored in the session information, so the user doesn't have to select the TV again.

Pierre: The emergency notification itself is not included in the protocol, its separate.

Ikeo: Right. These 5 APIs implemented within the device, so it's a device API not a web API.

Chris: Are you looking at developing a secure protocol?

Ikeo: We have some solution in the pairing protocol, using encryption.

Chris: I notice the Second Screen CG is working on a secure protocol between user agents, so you're aware of that.

Ikeo: Right, we'd like to look at that.
... The problem is HTTPS in the local network, there's a CG working on that.
... [Supposed use cases demo (2)]
... This is implemented as a VOD service like Netflix or Amazon Prime Video
... The application selects a program from a list, it tunes to a channel and launches the Hybridcast application.
... Just using just one API.
... (select a program on his smartphone)

<MarkVickers> Q: What would the Hybridcast group like from W3C? New specs? Changes to specs?

<MarkVickers> Thx

Ikeo: Send the message to launch the HTML5 app, using MSE and DASH.js

Igarashi: Can you also control playback, pause, forward/backward?

Ikeo: It can be done using websocket.

<MarkVickers> While it's always interesting to see what other groups are doing, we have to focus on our goals to drive changes into W3C and increase adoption of W3C standards outside of W3C.

Yongjun: Are these features available to third party content providers?

Ikeo: This app is provided by the broadcaster.

Youngjun: How many customers are using these features?

Ikeo: 30% of the TV sets include this feature

Chris: Mark asked: what would the Hybridcast group like from W3C? New specs, gap analysis?

Ikeo: We would like to combine some Hybridcast APIs to W3C standards,
... e.g., playback API, messages to control the video playback
... between a mobile and a TV

Kaz: Is that kind of like the formerly proposed TV Control API?

Ikeo: Yeah...

Chris: Or something like the Remote Playback API from the Second Screen WG?

Ikeo: Getting interoperability is always a prolem. We have to select web standard APIs to include in Hybridcast.
... [Conformance Test and verification]
... We make conformance tests for Hybridcast Connect, to ensure interoperability. That's is really important for us.
... [Conformance Test for Hybridcast-Connect]
... IPTV Forum Japan provides Hybridcast Connect standard, and also test kit as a mobile app and test tool.
... This is the overview.
... (shows a diagram)
... Emulator as the test environment.

Chris: Does this cover the Web application environment?

Ikeo: It's an end-to-end test, REST APIs, similar to the tests by HbbTV, etc.
... [Service Verification by MIC project and others]
... Service Verification is provided by the MIC, Ministry of Internal Affairs and Communications from the Japanese Government.
... In 2018 19 companies were in this organization, and 23 companies in 2019.
... Hybridcast Connect is a hot topic there.
... Thank you!

Chris: What other specific things to ask from W3C to address any gaps,
... e.g., relationship with the Web Platform Tests, etc.?

Ikeo: This is a big problem, as we required some Web API functions, and also TV vendors sometimes want and sometimes not.

Media Timed Events in Hybridcast

slides

Ikeo: [Service Patterns with Hybridcast Connect]
... We'd like to realize programs using media timed events.
... Broadcasters in Japan need to trigger messages to switch to a broadcast service.
... Pattern 1: from mobile app to broadcasing on TV
... Pattern 2: from another app on the TV to broadcasting
... [Media Timed Events with Hybridcast-Connect]
... If a user is watching a video, we can send a MTE in the video resource, this is an alternative way of pushing a notification.
... JP broadcasters interested in media timed events (MTE).
... We'd like to send some message to control the brodcasting service. We have some data in the broadcast signal, and we'd like to realize the same function as the trigger message.
... There are two possible choices.
... (MTE data in video resource + push emergency alert notification) to the smartphone
... Another option (MTE data in video resource to another app on TV)

Chris: Is this emsg, in ISO container in the DASH format?

Ikeo: Yes, it's DASH timed metadata

Igarashi: The upper pattern can be realized using a mobile specific API.
... But what about the bottom pattern, is the TV device at the bottom same as the one on the right?

Ikeo: Yes, in the case of Android platform, the mechanism is something like intent.
... We'd like to do this for all TV sets.

Igarashi: A general question about MTE, it's unclear why you want to embed events within the video resource.
... You could use Web Notifications or Web Sockets to deliver notifications.

Ikeo: The main reason is the cost of access the message API from this mobile device.

Igarashi: The cost of notification servers.

Ikeo: Right, but also the accuracy, as it can be time dependent.

Igarashi: Do we share the same requirements?

Yongjun: At which layer do you carry the events, in the media fragment or the manifest?

Igarashi: The event is in the manifest embedded event?

Yongjun: It's challenging to handle embedded events.

Ikeo: It depends on the needs,
... in case of out-of-band, MTE might be written in the manifest,
... and there would be possible delay.

Igarashi: With manifest events, you can control the frequecy of update of the MPD, a few seconds.

Ikeo: It's related to cost of access transfer, trade-off of accuracy and cost.
... [shows another demo on MTE]
... (select an app on his mobile)
... Send a message using hybridcast-connect to the TV
... This is embedded event
... Emergency alert shown on the upper-right of the TV

Chris: Is the event intended to be synchronized of media?

Ikeo: This an emergency message, the broadcaster would want to control the switch timing after receiving the message,
... and the Hybridcast application on the TV can handle how to display it.
... [Usecases of MTE]
... Switch broadcasting service from OTT triggered by emergency message.
... Superimpose time-dependent metadata, e.g., weather icon and event information.

<scribe> ... New style of ad-insertion on a broadcasting service.

UNKNOWN_SPEAKER: [MediaTimedEvents demos]
... Demo implementations
... use case 1: switch live news program on braodcasting service from OTT service by emergency-warning message
... use case 2: super-impose a weather icon on the Internet video, using in-band messages

Chris: What are the timing requirements, how precise does it need to be?

Ikeo: We would like to show the picture on the warning, and
... it's important to avoid overlaps with the content (e.g., peoples faces)

Igarashi: Depends on the apps, for some apps, accuracy is not important.

Ikeo: We have to consider the accuracy of timing for many cases, that's all.

Chris: Thank you for presenting!

Ikeo: BTW, We would like to control other devices, e.g., drones, from the app on TV.
... We joined the Web of Things group. We have some demos, of home appliance demos.

Kaz: Demos are on the 1st floor, and Wednesday breakout.

Ikeo: We'd like to use MTE as the basis and integrate with the WoT

[break till 11am]

Media Timed Events Task Force

slides

Chris: This is an update from our Media Timed Events TF.
... [Topics]
... Scope includes in-band timed metadata and timed event support, specifically DASH emsg and ID3 in HLS.
... Also out-of band timed metadata, DASH MPD events.
... Also working with Spatial Data on the Web group, on Video Map Tracks, position and orientation data alongside video, e.g., from a drone.
... They want a similar API, so this isn't just for entertainment media.
... We've also been looking at synchronization requirements, input from Nigel from a Timed Text perspective.
... We want to improve the synchronization of DOM events triggered on the media timeline, looking at Text Track API events.
... The HTML spec allows a certian amount of tolerance, which we'd want to improve.
... There's related work at MPEG, Carriage of Web Resources in ISO BMFF, delivering web resources in the media container, can synchronise these with media playback.
... Some discussion with MPEG people on this, but that hasn't progressed too far at W3C in terms of API support. Hope to continue this conversation in the IG.
... [History]
... The TF started in 2018, Giri Mandyam from Qualcomm preesnted work at ATSC and MPEG on eventing.
... Cyril Concolato presented on the MPEG work.
... We then created the TF, and at TPAC last year we got approval to start an incubation for DataCue.
... We published a v1 of use cases and requirements document early this year.
... I'm now having conversations offline with browser implementers about DataCue.
... [Use cases for timed metadata and in-band events]
... Some MPEG-DASH specific use cases, coming from DASH-IF input.
... One is notifications to the media player, e.g., to refresh the MPD manifest.
... Another use case for getting metrics during playback, an HTTP request to an endpoint at a given time.
... ID3 tags for providing metadata about the content, can vary over time for chapterisation: title, artist, image URLs.
... Ad insertion cues: SCTE35, SCTE214-1, 2, 3.

David_Singer: Also, this simple use case: keeping a web page in synch with a media stream,
... a video of a presentation with synchronized slide deck.

Chris: Yes, this use case is in the explainer.

Pierre: The earlier presentation was about events not related to the media content. Is that covered here?

Chris: We don't cover the case of events in a multiplexed set of streams, as in broadcast.

Pierre: Are the cues tied to entire programme, or specific tracks within?
... What if someone removes one of the tracks?

Chris: These events relate to the entire program.

Pierre: Can remove part of the program and it's still relevant?

Chris: Right, but maybe not explicit in our document.
... [Recommendations]
... We want to allow web application to subscribe to event streams.
... Should that be per event type, or for all available events?
... Maybe some optimisation concern, can avoid doing the parsing if there's nothing subscribed.
... Something we can discuss as we move to implementation.
... We want to allow web applications to create timed event/timed metadata cues, so not just about the UA surfacing in-band events from the media.
... Including start time, end time and data payload

Igarashi: In the morning, we had some discussion on in-band message, and MPD events.
... W3C standards don't support type 1 players, so why do we need support for MPD events?
... Is the scope only in-band events?

Chris: There are some implementations that have type 1 players, e.g, HbbTV, which has specified DataCue API for exposing MPD events.
... W3C specs don't say anything about type 1 players.
... Regarding event triggering, use cases from DASH-IF suggest the need to see the events when cues are parsed from the media container by the UA,
... ahead of the trigger time on the media timeline, so that the application can prepare or fetch any resources needed.
... Another recommendation is support for DASH emsg. Also, we want to allow cues with unknown end time, e.g., for live content where a chapter starts but you don't know where it ends. This also is needed in the WebVMT use case.
... And finally, improving synchronization (within 20 msec on media timeline), related to time marches on in HTML.
... Also important for timed text, where aligning captions to shot changes is important.
... Ultimate goal being a frame accurate rendering capability.

David_Singer: Regarding event triggering, what you want is that when time changes such that the set of active cues changes, you get told about those changes.
... That covers the case where you join a live stream after the start of a cue, and random seeking, trick modes, etc.
... This is why cues with duration are easier to understand than events at a specific time, e.g, what happens if time marches on jumps over that time?
... It's hard to answer those questions for spike events.

Chris: Some the DASH events are a bit like that, e.g., refresh the manifest at a specific time.
... But you're right, I agree.

Igarashi: A requirement might be that an application cannot detect the difference between the expected and actual time of event firing,
... the delay depends on the implementation. Applications would know about the difference between specified timing and actual fired event.
... I agree we need to improved the timing (if possible), but maybe adding timestamps will also help.

Chris: [Current status]
... The use case and requirements document is almost complete, add some things based on today's discussion.
... The WICG DataCue explainer is an early draft, needs work.
... The API spec not started yet. Need more involvement from implementers.

Yongjun: Need to revise DASH or HLS spec, etc.?

Chris: I think these don't need to change. We need to have discussion about how to expose certain message cue types, e.g., ad insertions.
... Do we ask the UA to give structured data, or as raw binary data?
... So there's a question about how the different formats should be mapped to data structures for the web app.

Mark_Vickers: The issue is how to map particular data formats and how to present them, is what the unfinished in-band data document was about.
... It's referenced from HTML5 but not completed, needs to be updated.
... I think along with this API, it would be good to have a plan to update that document, or maybe in another form to achieve the same goal.

<cpn> Sourcing In-band Media Resource Tracks from Media Containers into HTML

Chris: We have a reference from HTML to a draft document, and documents such as Media Fragment URI,
... not clear what the implementation status is.

<cpn> Media Fragments URI 1.0 (basic)

Mark_Vickers: Another question, DataCue was implemented in Webkit before HTML5 spec was completed.
... But it was removed from HTML5 to be done later because it wasn't implemented in other codebases.
... Some years since then, still not implemented AFAIK. Would be interested to hear from Chromium and Firefox, is there a lack of clear requirements, or some other issue?

Chris: We have a session with the Media WG to discuss.

Mark_Vickers: That sounds like a right place.

Chris: [MPEG Carriage of Web Resources in ISO-BMFF Containers]
... Liaison received from MPEG. Identified that this has web architecture issues, so we sought TAG advice.
... Since then, work continued at MPEG, now at FDIS stage, as ISO/IEC FDIS 23001-15.

David_Singer: The liaison officer at MPEG can provide details.

Chris: This topic is welcome in the MEIG, want to continue conversation.

David_Singer: It would be good to have a workshop including the MEIG, MPEG, ECMA, security specialists, DVB, ATSC, etc,
... to get users of the technology at the same time at the same place.
... Haven't organised it yet.

Pierre: What is the use case?

David: There's two things: carriage of metadata text tracks, also packaging of the page and JS and HTML to display sync'd to video, want a single package of that.

Pierre: But what would be the actual business case?

David: For distributing rich content like a DVD that has navigation menu and content to display alongside the video during playback.

<MarkVickers> FYI, we have previously gotten permission from MPEG to host MPEG documents on the W3C member-only website. We could ask for MPEG for permission to host CMAF spec for this purpose.

Pierre: So the package is an ISO BMFF. BTW, the community draft is available.

Igarashi: A benefit of web resource embedded to ISO BMFF is to reduce distribution cost.
... By distributing the content in the media container you reduce the cost for the web servers.
... Could be beneficial for offline and broadcast distribution, as I understand.

Pierre: The offline case is weird to me, as offline web experience isn't great.

Igarashi: There are some offline use cases, and packaging of media and HTML needs to be addressed, for download and offline playback.

Chris: This is something we should follow up. It's not being actively worked on in our TF.
... [Browser support for DataCue]
... Pre-Chromium Edge has HTML 5.1 DataCue, using attribute ArrayBuffer data;
... Chrome: no support. Safari has an extended DataCue API with some type information.
... Firefox: no support.
... HbbTV: HTML 5.1 (8 Oct 2015 ED) DataCue with native handling of player specific DASH events, other events exposed to web apps.
... [Player support for DASH and HLS events]
... Shaka Player: shaka.Player.EmsgEvent no internal handling of manifest refresh events.
... This shows there's need for native support, want to avoid parsing in JavaScript, helps with low latency.
... [Next steps]
... Breakout session on Wednesday about "DataCue API and time marches on in HTML" at 11:00am, also joint TTWG / Media WG meeting.

Mark_Watson: May have people on Wednesday ay have people aware of the algorithm

Igarashi: Is this in scope of the Media WG?

Chris: Yes, it's a potential future topic for the Media WG, if the incubation is successful.
... It's only being discussed in MEIG and WICG for now.
... We're looking at the API design now.

Igarashi: What I remember from the previous TPAC, we were looking at WICG, but what would be the direction now?

Chris: I think we need more input from player libraries, to get more concrete feedback for the API.
... Also would be good to have more browser vendors, need to have wider discussion.
... [References]
... (links to resources)

Igarashi: Does WICG meet this week?

Chris: Yes, Thu/Fri, it's not on their agenda as we have a breakout session, and time with Media WG.

Igarashi: It's good timing to have discussion with them.
... Should ask the other participants about opinions as well, do we support proposing DataCue to WICG or Media WG?
... Need to get opinions from the IG members.

Pierre: When would be our final report available?
... Is more input needed?

Chris: I don't think more input is needed.
... If there's no objection, can be late November?

Igarashi: We have not specifically asked the MEIG for opinions.

Pierre: Can ask this when the report is ready for review.

Igarashi: The report itself is about requirements.

Chris: It's an IG Note, includes recommendations.

Pierre: Does the report says something is missing and to be added, shouldn't it say that explicitly?

Chris: The solution design is being done by WICG.

Pierre: It could be more specific.

Chris: Our TF could continue editorial changes, everybody, please join in.

CTA WAVE update

slides

John: I'll give a quick update on CTA WAVE.
... [The Web Appliction Video Ecosystem Project]
... Aim is to improve how video is handled on the web.
... [Supporting a fragmented OTT world]
... Why? Fragmentation is expensive. Impacts content providers and device makers.
... [Brief history]
... CEA started the Global Internet Video Ecosystem project in 2015, then CEA became CTA, renamed project to WAVE.
... [Steering Committee]
... There's a steering committee and three task forces.
... First problem is how to prepare the content, and how to play back on all devices. CSTF is producing a content specification based on MPEG CMAF.
... Second problem is how to get high quality playback. DPCTF for testable requirements for common interop issues.
... Third problem is having a common playback platform univerally available. HATF for the HTML5 API, delivering a reference application framework.
... Supporting these is a test suite that validates the specs.
... [WAVE bridges media standards & web standards]
... WAVE is made up of various specs from MPEG and W3C etc.
... [Curent WAVE Membership]
... Many members from across the globe, overlapping with W3C members.
... [WAVE Content Spec & Published CMAF Media Profiles]
... [Media Profile Approval]
... There's an approval process for adding media profiles, criteria such as market relevance.
... New profiles are considered to be added annually.
... [WAVE Content Specification 2018 AMD 1 - Video Profiles]
... These are the latest profiles in the 2018 content spec.
... [WAVE Content Spec 2018 AMD 1 - Audio Profiles]
... Important to note that the content spec includes IMSC text and image CMAF bindings.
... [WAVE Programs and Live Linear Content]
... [Anticipated WAVE Content Spec 2019 Updates]
... The main update will be CMAF encoding for DASH and HLS interoperability.
... [Test Suite: COntent Verification Tool]
... There is a DASH validator, has a way to test WAVE requirements

DASH-IF Validator

John: [CSTF - Specification Process]
... There's an annual specification, development of test infrastructure, conf calls every 2 weeks, annual F2F meeting.
... [Links]
... Links for resources;
... [HATF: HTML5 API Task Force]
... I am currently co-chair of the HTML5 API TF along with John Luther from JWPlayer.
... [What We Do in the HATF]
... We're defining a minimum set of standards for playback of audio and video, with emphasis on adaptive streaming.
... APIs are defined in other groups, we focus on the four main browser codebases, will move from Edge to Chromium Edge in 2020.
... [HATF Work Plan]
... All the work happens in the W3C Web Media API CG. Outputs are the Web Media API spec, and a test suite, an extension of Web Platform Tests.
... There are also guidelines for media app developers. We work with W3C groups on any gaps we identify.
... [HTML5 APIs: Reference Platform]
... We write tests in HTML5, extending Web Platform Tests. The tests can be run on mobile, PC, TV, set-top boxes, games consoles.
... [HATF Specs]
... The Web Media API snapshot documents the key APIs that are supported across all codebases, targeting the devices I mentioned. Updated annually.
... CTA and W3C co-publishing. We use the W3C GitHub.

Igarashi: What do you mean?

John: There's a W3C document that contains the spec, both organisations publish.

Igarashi: It's not a "W3C Recommendation" but "CG Report"

Chris: Should this document be on the W3C standards track?

Andreas: It's updated annually, so the standards track process might not be right.

Francois: We have agreement to co-publish, shared copyright, but it's a CG spec so not endorsed by W3C members.
... There will be discussion about W3C process during this week, for specs that need to evolve.

Alan: It's part of the plenary on Wednesday.

John: [Anticipated Web Media API 2019 Snapshot Updates]
... We expect to publish an update in December, update to ECMAScript 7,
... CSS snapshot 2018. Interestingly, this removes CSS Profiles which includes a TV profile.
... [HATF Testing Framework]

Web Media API Snapshot Test Suite

Mark_Vickers: FYI on referencing WAVE specs: ATSC references the WAVE WMAS as published by CTA, which is referencable. The W3C version of the WMAS spec, like all CG specs, includes boilerplate language that it should not be referenced.

John: [WMAS Testing Suite Updates]
... Updates being made now on automation, updates should be available in the next month or so.
... Making test suites available alongside the snapshots.
... [Links]
... [DPCTF]
... [Abstracted Device Playback Model]
... The DPCTF spec describes inputs and observations.
... [Spec Highlights and Outline Dec 2018]
... [Promises in Spec for 2019 and beyond]
... Items highlighted are being worked on for 2019 update.
... [Test Sutie: RFPs]
... There's an RFP out to develop a test suite, closes at the end of this month.
... [Q&A]

Igarashi: Are you developing a specification for a type 1 player?
... Any room for W3C standardization?
... If you have any specific requirements, the MEIG can discuss that.

John: I can get more details on that.

Igarashi: What is the "Content Model Format"?

John: https://github.com/cta-wave/device-playback-task-force/issues/33

Chris: Is the testing work here contributing into the Web Platform Tests?

Mark_Vickers: It has always been the intention of WAVE to contribute back to W3C any new tests and also any changes to the W3C test runner. WAVE representatives met with the W3C test group at TPAC 2018. There was an issue opened on April 2, 2019: https://github.com/web-platform-tests/wpt/issues/16214. There was a PR entered on June 12, 2019: https://github.com/web-platform-tests/rfcs/pull/23

Pierre: Should we put that on the agenda for the afternoon?

All: OK

Frame accurate seeking and rendering

Frame Accurate Synchronization

Chris: We use GitHub in the IG to manage issues,
... most of the issues will be covered in the afternoon jointly with the other WGs.
... but one specific issue here about frame accurate synchronization and seeking.

Francois: [Related GitHub Issues]
... This topic relates to issues 4, 5, 21, I want to check what you might want do with these, what gaps to highlight.
... The main issue is #4, frame accurate seeking of HTML5 MediaElement
... [Categories of Use cases]
... The discussion of use cases for frame accurate seeking fell into 2 different categories: frame accurate seeking and frame accurate rendering.
... [Main Seeking Use Cases]
... The main seeking use cases include non-linear video editing in a browser, this needs the ability to seek to a frame.
... This can be non-linear editing in your local browser, or can be cloud-based editing.
... Also mentioned were collaborative review scenarios where you want to annotate a precise frame.
... Another use case was evidence playback from a police body camera video, and you want to step the video frame by frame.
... [Seeking Gaps]
... Looking at technical gaps, the main one is that the media element currentTime is not precise enough to identify individual frames.
... Lots of discussion on this in the past few years. The main solution would be to use a rational number to be able to identify frame boundaries. Right now currentTime is a double, so subject to rounding errors.
... Another gap is that there's no way to seek to the next/prev frame in the generic case, it's not just about time, as you can use variable frame rate.
... [Main Rendering Use Cases]
... The main use cases are dynamic content insertion (media splicing), inserting ads for instance at a precise point in the video.
... Video overlays in WebRTC video, synchronised map animations.
... Subtitle rendering synchronization, want to align on the precise frame boundary with scene changes.
... Also synchonized playback across users/devices, e.g., a football match on devices in different rooms.

Igarashi: Requirements for time accuracy for seek timing?

Francois: This is rather rendering issues. For seeking, the requirement is being able to stop at a precise frame.

Pierre: Sample accurate alignment and duration.
... The current web platform doesn't allow sample-accurate synchronization.

Francois: Some of these are in the Media Timed Events document that the IG published.
... [Rendering Gaps]
... currentTime is not precise enough to identify individual frames.
... In MSE, timestampOffset is also a double, so is not precise to identify frame boundaries.
... In general, it's hard to track media timeline with sufficient precision to do things on a frame by frame basis.
... In any case there is no mechanism to render non-media content aligned with a frame.
... There's no way to relate the rendering of a video frame with a local wall clock, so you don't know from currentTime when the frame rendering was completed.
... Looking at global synchronization, there's no way to tie rendering of a video frame to the local wall clock.
... The media timeline runs its own clock, doesn't follow the wall clock exactly.
... It's unclear whether synchronization will work in remote scenarios, such as the Presentation API, Remote Playback API.

[break for lunch]

Chris: The purpose of this part of the agenda is to talk about what's needed next for media support on the web.
... The Media WG has several specs in scope, but we want to look beyond those to future requirements.
... Anything we identify can be followed up in the IG.

Francois: Continuing with the slides, the Media Timed Events TF made some recommendations, e.g., 20 milliseconds for TextTrackCue event firing.
... Also having a way to predict the wall clock time when the next frame will be rendered, as done in Web Audio API.
... [Rendering Gaps that Remain]
... Things not covered in the Media Timed Events document: currentTime is not precise enough,
... timestampOffset is not precise enough, and global synchronization scenarios.
... [Seeking Gaps that Remain]
... Seeking was not in scope for the Media Timed Events TF so those gaps I mentioned remain.
... [Next Steps?]
... What do we want to do? Having them in the MTE report doesn't automatically mean they'll turn into a standard.
... Who could lead the follow up on this? Should we write another document to explain the use cases, describe the gaps, make recommendations?
... Some things might need to go to WHATWG or the Media WG, who can do that?
... We have a breakout session on Wednesday on efficient audio/video processing.
... The concept of plugging into the media pipeline to process frames keeps coming up in different groups, e.g., WebRTC, Web Audio, Immersive Web, Machine Learning CG has use cases for processing video frames.
... So how do we do efficient processing in the media pipeline? This involves not copying bytes around, so preserving memory, the processing may happen on the CPU or GPU.
... Is there a common way that would work for all use cases?

Chris: Pierre, you mentioned production use cases?

Pierre: I put together a presentation that would dovetail well with this.
... Lots of professional assets are being made in the cloud.
... These things need to be manipulated directly in the cloud, and the most natural way to do that is through web applications.
... Professional media editing has stringent requirements on sample accuracy, color emission. I have a demo that was given at IBC, to give context.

Mark_Watson: The problem with currentTime is not so much the precision as the ambiguity. You can accurately identify a frame with double precision,
... but there are at least four ways of identifying a frame given a particular timestamp.
... If there isn't a frame at the given time, there needs to be a rule, which we could standardize, and then a double would be adequate.
... It's not quite the same with timestampOffset, where unless you have a rational timestamp you'll have either an overlap or a gap.

Chris: The need for a rational type came up in the Media Capabilities API discussion regarding frame rates.

Pierre: I think if we're going to touch this, we should do it right.

Mark_Watson: I support that, from experience with video players with timestamps that weren't proper rational numbers, no regrets when converted to use rational numbers, it's always better.

Pierre: The same conclusion is reached every 10 years.

Igarashi: This presentation includes two different aspects, one is frame accurate seeking, the other is frame accurate synchronisation.
... Should separate these, as because the difficulty and realization is quite different.
... In terms of frame accurate rendering, it might be dependent on the performance of the browser and the hardware, so not easy to define requirements, as it will still depend on the performance of the JavaScript runtime engine and web rendering.
... Suggest tackling the requirements separately.

Francois: Could maybe write two documents.
... There are different use cases for rendering, and some of them aren't really about synchronization in the sense of timing, but more about rendering content synchronized to video frames, connecting to the media pipeline.
... So there are different use cases, some of them might be out of scope, some you may not want to consider.
... If the IG can weigh in, it gives some use cases more power than others, and set priorities.

Pierre: The extreme example is sound and video synchronization ultimately depends on how close to the speakers you are.
... Not much you can do about that. The problem I see is that at the API level today, it's not possible to ever get there.
... There are no APIs that will give you sample accurate today.

Igarashi: That requirement is related to time seeking, so an API is needed to get the time.
... It's different from synchronization itself, synchronizing with the current frame.

Pierre: When you say "seeking", you mean at the API level?

Igarashi: And how long it takes depends on the browser, currently there are no requirements for how quickly browsers should seek the media, it depends on browser performance.

Pierre: So there might be some delay between when a frame is shown on screen and when the application is told?

Igarashi: Yes. It's implementation dependent. This also affects the 20 millisecond recommendation for timed events.
... The currentTime issue and synchronization are different issues.
... I wonder if any W3C specs talk about the performance of the browser itself, I think currently not.
... It's up to the implementation how fast it will render the content.
... We'd need to talk about performance.

Pierre: It sounds like you'd like to contribute to the document :)

Igarashi: I support the frame accurate seeking.

Yongjun: For ad insertion, need some mechanism to handle frame jumping,
... not only at the beginning, also at the end of the ad, which is more complicated, GOP length considerations.
... If you need to drop something, the requirement would be to drop the ad rather than the main content.
... If we care about one case, we may miss the other case.

Igarashi: Are you thinking of seamless ad-insertion, without a gap?
... Accuracy of seeking is important for many use cases,
... but we should distinguish gapless seeking, which is similar to the media synchronization case.

Pierre: If there is a stream playing, and another set to start at some time, that time has to be frame accurate.

Igarashi: I agree, but there might be some delay before the frame would be rendered.

Yongjun: I suggest that we define the problem, then look for solutions. Whether we can find solutions for different devices, different ecosystems, that's a separate story.

Professional media workflows on the web

Pierre: This is a very timely topic.
... This presentation is supported by MovieLabs.
... Something that's increasing popularity is moving entire media workflows to the web.
... What used to be done on workstations and tape decks is now being done through web applications.
... [Web applications are coming to profesisonal media workflows]
... This has implications on the web platform. Why are web applications coming to professional media workflows?
... Because web applications are mainstream, users are accustomed to them for banking, document editing, etc, and they expect this for professional media editing and authoring applications.
... Also because media on the web is really good now, quality content, good experience with TV. Users expect to be able to edit their content on the web.
... More importantly, professional A/V assets, finished movies are being stored in the cloud.
... [Why move audio-visual assets to the cloud?]
... It's a sea change in technical and business practices. An asset such as a movie is uploaded to the cloud as early as possible in the authoring chain,
... and then applications can process that content in the cloud directly. Once in the cloud it doesn't need to be copied back to a local workstation.
... This means there are fewer copies of the content, there's less human handling, no more shipping hard drives or DVDs around,
... also increased scalability with cloud services.
... The reason it's happening is that cloud storage is more secure, fewer people touch the content, reduce possibilty of it being stolen.
... And of course it's more efficient.
... [Today]
... This is a summary of a complete authoring chain: previsualization of scenes, visual effects, grading, editing, localization, mastering, quality check, archival, distribution.
... In the past, you'd have copies at each step of this process. Lots of opportunity for mishandling, cost, different silos.
... [Tomorrow]
... Idea is to move to this, where the content is available in the cloud to all of these task groups (and can be accessed via web applications).
... [Demo]
... Demo of some recent advances in mastering applications. This is a product called Ownzones Connect. It's just an example.

Ownzones Connect

Ownzones Youtube videos

Pierre: It's a professional mastering application, processes content already on the cloud, and it runs in a web browser.
... It's a full non-lineaer editor. For this type of application you want to see the content as closely as possible to how it will be rendered, so frame accurate rendering is important. HDR rendering is important.
... [Some steps of the workflow remain out of reach of web applications]
... There are gaps in the web platform: sample accurate playback, HDR and wide color gamut, improved subtitle and caption support. Is it time to catalog and address those gaps?
... The good news is that there are implementers, like these people, who can provide concrete input. We can ask them what's missing and get concrete use cases and examples.

Chris: Is this of interest to others in the IG, should we do a more detailed analysis?

Francois: This is a good perspective for the frame accurate seeking I presented earlier.

Yongjun: You say that HDR and wide color gamut aren't supported, you mean not supported by the standards. As far as I know, people such as Comcast and XBox use MSE and EME, with proprietary support for HDR.

Mark_Watson: The thing that's missing for HDR support on the web platform is capability discovery. If you know through some other means that the capability is there, you can use MSE and EME to feed in HDR video and have it play.

Pierre: If you want to create HDR graphics and composite it into HDR video, that's a different problem.

Chris: The Color on the Web CG is there to discuss that problem.

Mark_Watson: For those interested in HDR graphics, the Color on the Web CG meets tomorrow afternoon.

Igarashi: Regarding the video editing using browser use cases,
... is there another requirement for gapless rendering, related to multiple video clipping, where you'd want to be able to render seamlessly.
... This is separate to time accurate seeking.

Pierre: Do we have a critical mass to start writing a document? Are people interest in spending time on it? We need volunteers to contribute.

Mark_Watson: Not volunteering myself as editor, but I support that this is a valid use cases, and the way studios are handling media content is evolving and making more use of web technologies in the production chain. I think it's a thing for this group to look at, happy to contribute.

Scott: Have folks here considered a more consumer focused angle on local video editing? I wonder what primitives are missing to be able to do video editing, not necesarily cloud based, but locally in the client, e.g., how to splice with frame accuracy.

Pierre: Are there volunteers to help write the document?

Chris: We can help, but needs an editor.

Pierre: I'm hearing lots of people willing to help, but needs someone to list the gaps in the web platform today.

Gary: I know Garrett is interested in the frame accurate topic, don't want to volunteer someone though.

Chris: Thank you. I think we should start something on this.

360 video

Samira: I work for Microsoft, interested in people who are creating 360 video content.
... Today, if you want to play 360 video content, you need to use WebGL and WebVR. This can be a lot for web developers.
... I'm gathering data, we have a few ideas for standardizing 360 video content. One is to add an attribute to the video element with information on projection type.
... A few years ago, somebody from Google also proposed standardizing 360 video, but container issues including assets such as spherical metadata.
... My first question to people who create video or provide services is: do you have any thoughts on this? I'm hosting a breakout session on Wednesday, please come and give feedback there.

Chris: I'm representing a content organisation. Our current solution is quite inefficient, we deliver a large video, then use a rendering loop with a canvas for 2D playback, managing our own controls for panning the view.
... Has this been discussed in the Immersive Web groups?

Samira: Yes, we'll talk about it tomorrow there.
... Do we want to have standards around the projection type, then there's subtitles and spatial audio, a lot to think about.

Chris: We have an item to talk about captioning later.

Andreas: We've been trying to follow where to standardize 360 video this year, brought it up in the Immersive Web CG.
... This problem is that 360 video is not standardised yet, and the CG said it's not a priority for them yet. We have a presentation on that later in the afternoon.
... Would like to discuss with the Immersive Web CG.

Chris: Is anybody aware of MPEG format update?

David: There's the OMAF format, short term work for 360 video. And then there's more general work on MPEG-I, documents and white papers are available on Leonardo's website.
... There's a lot of work happening on projection formats and so on and 3DoF+.

<tidoust> OMAF

<scottlow> MPEG-I

Andreas: TTWG has liaison with MPEG regarding OMAF and the subtitle rendering use case,
... but that's just one part of the scenarios, inband information for subtiles, it doesn't solve out of band subtitles.

<Joshue108> There are also accessibility requirements around how 360 is standardised.

Andreas: Could we possibly discuss that tomorrow, and bring the subtitle topic?

Samira: Yes, tomorrow afternoon.
... How many content producers, providers are here? Have any of you ventured into 360 video, and what's blocking you?

Song: We produce 360 video at China Mobile and deliver over the internet.

Igarashi: VR content production?

Samira: 360 video from a 360 camera can be presented in VR, or you can have a magic window where you can look around.
... Just wanted to bring this discussion up

Chris: To Andreas's point, what's the natural home for this discussion in W3C?
... MEIG discusses audio and video support, there's Immersive Web, and TTWG.

Samira: It could be the Immersive Web or the Media group, I just wanted to gather some information on your needs, any blockers.

Josh: There are also accessibility considerationd, around an architecture that will support 360 environments, multi-modality, accessibiity.

Andreas: This question of which group will come up again later, it's really difficult to find the right place.

Web & Networks IG

Sudeep: I'm chair of the Web & Networks IG, we're looking at use cases which have a network aspect.
... We have colleagues from Intel doing a demo that showcases how hints from the network can be used for media buffering.
... If you can predict what the network conditions will be, based on that information you can get a better content experience.
... Please drop by at 3:30 to see the demo. We're also meeting tomorrow to cover networking topics. The demo is media specific.
... Regarding Media Timed Events, I wonder how important is latency for use cases like online editing? How is that going to impact the user experience? We'd like to get some input on that to look at in the Web & Networks IG.

Chris: Interesting questions, we want to have a close relationship between the two IGs, we're interested to see how network hints can help improve media playback.

Sudeep: For editing, could use mobile edge computing to reduce latency. There are solutions which can benefit users.

Chris: In content production, we have a web interface that controls server side rendering. The server can deliver a WebRTC stream with a preview of the content being produced.
... This is an approach we use at the moment. One thing we've experimented with is having WebRTC streams for multiple different sources, where you want to vision mix between them. Again, latency is important there.
... There's then an issue of synchronization between those streams. This feeds into the media production use case work we discussed earlier, which we should follow up on.

Sudeep: How should we bring this to M&E IG? Should we use GitHub?

Chris: We should use GitHub to track the important issues coming from this.
... Also we have monthly IG calls, so following TPAC we'd want to have a call to review the outcomes from the IG meeting and other media related sessions at TPAC.
... We can use this to talk about how to move forward with each topic.

Josh: Regarding stream synchronization with WebRTC and multiple streams, that has particular accessibility implictions, for example, someone who's not sighted, or is using a different modality to access information, need to make sure information in a data track are in sync, for someone using assistive technology.

Chris: Yes. Anything more to add?

Web Accessibiltiy Initiative for media

Josh: I work with W3C on Web Accessibility Inititive, looking at emerging technology and accessibility.

<Joshue108> Accessible RTC Use Cases

Josh: Please take a look, you can see use case related to the Media WG, also related to this group.
... There are requirements identified for people with disabilities, e.g., audio routing in the browser. How would a user control different streams?
... Imagine a digital audio workstation, different modality channels based on user's preference, text to speech for blind users, or a braille output device, etc.
... There are a bunch of use cases in the document. Also issues around controlling the volume level in audio description, for real time communication, so that the volume can be set at a predetermined level for the user, and so that the audio description doesn't get swamped by inappropriate audio levels.

(kaz remembers the MMI Architecture and SCXML :)

Chris: There's and Audio Description CG, not sure if they're also looking at real-time communication.
... Thank you for sharing this.
... Any other topics, new requirements for this group should work on?

Igarashi: I'm interested in offline playback and packaging of media content with web applications.
... I know the Publishing WG is working on the packaging issue, thinking of video with images for download and local playback.
... Sometimes the network connection might not be available, the user would like to playback downloaded content locally.
... This could be high value content, cannot be streamed directly to the user.

Chris: It seems another place where a gap analysis is useful.

<tidoust> [Note the breakout session on Web Packaging planned on Wednesday: https://w3c.github.io/tpac-breakouts/sessions.html#wpack]

Bullet Chatting

Song: I'm from China Mobile, I'd like to give a presentation about Bullet Chatting, with Michael from Dwango.

Bullet Chatting

<scribe> scribenick: tidoust

Song: Interactive tool for video broadcasting over the Internet. Use cases: see reviews of group users. Real-time interaction, engagement for young generation, to show social presence.
... Implementation is difficult because you need to compute the positioning and animation of bullet chatting, rendered in DOM or Canvas and overlaid on top of the video.
... Strong demand for this type of applications, particularly in Asia
... Standardization would improve UX, reduce the difficulty in implementation.
... We suggest to define a standard format for bullet curtain.
... We started an analysis to identify gaps. No specific API introduced for the time being.
... Bullet chatting is basically floating text over the screen with four attributes:
... mode, basic properties, timeline, and container (typically the video)
... [going through Bullet Chatting Proposal document]
... During streaming, two main ways to present: chatting room or bullet chatting.
... Advantages of bullet chatting display are that there is a wider display area and it does not require the user to move her eyes.
... The movement from right to left allows users to read content quickly (and again without moving her eyes).
... Sometimes, it's not only about comments, it can be text to improve the feeling of horror videos for instance.
... Also used to share messages in stadiums on a big wall.

Michael: I'm from Dwango. Use cases and requirements for our current service Niconico.
... Niconico is a streaming Web site launched in 2006. Since its inception, its unique feature has been its comment system.
... [showing a demo]
... allows to create a user experience.

Pierre: Who specifies at what vertical position the bullet curtain appears?
... Do you foresee that to be done at the client side?

Song: No, it's done on the server side.

Pierre: So the format has all the positioning information.

Michael: In the current implementation, clients do the rendering, and they all have the same algorithm, so it's deterministic.

Pierre: If things were standardized at W3C, would the positioning be imposed by the server?

Michael: Currently, we'd like the client to have the ability to position the comments.

Pierre: So the client receives the comments and decides where to lay them out.

Igarashi: You want to let the browser do the whole rendering?

Michael: No, the Web application.
... Goal of the standardization is to have a shared format for bullet curtains, because many providers have a similar comments system (Niconico, Bilibili, etc.)

Song: First step is to define an interoperability format. If there is a way to involve the browser vendors, then great, second step.

Mark_Watson: Browsers would want to know why something cannot be done in JS.

Dave_Singer: And you could possibly do it with WebVTT / TTML.

Song: For advanced features, there are things that TTML does not address. Happy to talk with TTML folks though.

Michael: Use cases and requirements level for now. Possible solutions are still very early stage.
... Bullet curtain allows to create feelings such as sharing content with friends.
... Comments can be used to improve the video with artwork, or even to flood the video with comments.
... Comments have become an important part of Niconico's culture.
... Part of on-demand and live-streaming services of Niconico.
... Comments move right to left across at set times, based on the media timeline.

Chris: If I pause the video, do the comments pause?

Michael: Yes.
... Comments are clipped to the edge of the player (or to an arbitrary region).
... When the video loads, comments are loaded from the server and rendered.
... If a user submits a comment, it appears immediately to the user, and gets shared to other viewers.
... Seeking to the same time in the same video will have the same comment appear at the same time and at the same position.
... As if the comments were part of the video, comments scale with the video in particular.
... Comments can be interactive (e.g. context menu)

Mark_Watson: Layout problem (HTML is good at it), animation problem (Web Animations), but the thing is Web Animations ties animations to the wall clock, whereas here animation is tied to the media clock.
... That may be a useful gap to identify

Chris: Came earlier during Francois' presentation. Tying non-media content rendering to media timeline.

Igarashi: Some requirements about positioning the subtitles.
... Client decides arbitrary where to position the comments.

Michael: Yes.

Igarashi: Content provider does not care about positioning of subtitles.

Sangwhan: Aside from Web, do you also want to handle support for native players?
... That would change perspectives.

Michael: We do have native apps, so we'd be interested with a solution that covers that space too.

Sangwhan: According to Mark's idea, if it's tied to the animation timeline in browsers, you're restricting yourself to Web environment.

Kaz: When I talked to Koizuka-san from Niconico, he mentioned extension mechanism named "Niko-script", and that mechanism has capability of specifying style and position of captions. So that capability could be also considered at some point. maybe not now, though.

<MarkVickers> I'm not staying connected for the joint meetings. Have a good TPAC all! -mav

Joint meeting with Second Screen WG/CG

Chris: The Second Screen WG/CG made a lot of progress on the Open Screen Protocol for discovering, authenticating and controlling remote displays on the local network.

Mark_Foltz: I work for Google. Been involved in Second Screen since 2015. Second screen for the Web is the way we want to enable Web applications to take advantage of connected displays/speakers and render different types of content.
... Content can be a full Web page or specific media.
... The Presentation API enables a web page, called the controller, to request display of an URL on a remote display on the LAN.
... Example of a photo app that displays the loaded picture on a large display. You can play media, do gaming, collaboration tools. Pretty agnostic, but our experience shows that it's mainly used for media playback.
... The Remote Playback API allows a web page on which there is a media element to remote the playback of the media element on a second screen, either through media flinging where the URL to play gets sent to the remote device, or media remoting where the media gets streamed to the second screen.
... Both APIs are in Chrome.
... The APIs were designed to take advantage of proprietary protocols. To get broad adoption, we decided to develop an open set of protocols so that implementers could all support the APIs in an interoperable way.
... We hope to converge at the end of the Second Screen F2F meeting this week to v1.0 of the Open Screen Protocol.
... One use case for the future: enabling Web applications to generate their own media and present it to a connected display, e.g. for gaming.
... The Open Screen Protocol supports all sorts of use cases that we hope to expose to Web applications in the future.

Yongsun: Support of QUIC in smart TVs. UDP is not supported in some TVs.

Sangwhan: UDP is supported at the kernel level.

Mark_Foltz: in our library implementation, we expose UDP but that's pretty much the same thing as what you get at the system level.

<kaz> remote+ Yajun_Chen

Chris: One of the question that came up in our previous F2F meeting is around synchronization, e.g. ability to provide audio description on their device while they are sharing a media element on a second screen.
... Within that, there is the question of how close the synchronization needs to be.
... We worked on close synchronization between main screen and companion device in HbbTV.

Mark_Foltz: Does the HbbTV specification rely on clocks?

Chris: Yes, clock synchronization and then the devices can make adjustments to playback to stay in sync.

Mark_Foltz: We need a mechanism for the two sides agree on a wall clock for presentation.
... If the HbbTV covers all of that, we can have a look for OSP.

Chris: Yes, it does.

<anssik> Open Screen Protocol issue Requirements for multi-device timing while streaming https://github.com/webscreens/openscreenprotocol/issues/195

Chris: Some implementers have found it difficult to achieve that level of synchronization. It's not so widely implemented for now.
... I can provide information on how that has been done.

Mark_Foltz: Collaboration between the protocol and the application levels.

Chris: And also something that exposes the pipeline delays.

Mark_Foltz: One of the things that seem very important is the establishment of a secure communication between devices, which could have broader implications, such as connected home scenarios.
... it could be a good foundation for that. Part of the OSP focus has been on authenticating devices, currently based on SPAKE2.
... We're not currently focused on enabling one piece of software to find out attributes of another, for instance who manufactured it, what does it do.

<anssik> SPAKE2 https://datatracker.ietf.org/doc/draft-irtf-cfrg-spake2/

Mark_Foltz: You could take the chapter on authentication and use it elsewhere.
... We did anticipate that there may be other use cases than the ones we foresee, so have landed an extensibility mechanism.

Sangwhan: Is there a registry for these capabilities?

Mark_Foltz: Yes, it's on GitHub.
... You can be a presentation controller, receiver, send or receive media, that's all negotiable in the OSP.

Chris: I suspect remote playback of encrypted content is a use case shared by different members here.

Mark_Foltz: The API is pretty much agnostic. At the protocol level, we haven't tried to add support for messages to exchange to support encrypted media.
... That seems more to be a use case for the Presentation API where the application can create and exchange application-specific message commands.
... Remote playback of encrypted media is closely tied to credentials, and that's application level.

Mark_Watson: The thing that you don't have here is the streaming model where the controlling device has the decryption key and wants to stream the content to the receiver device.
... What happens to the media stream when it reaches the receiver? Goes to a media element or through JS processing?

Peter: receiver is handling the decoding.

Chris: Is there an IG recommendation that we'd want to make?

Mark_Watson: The most likely model for us for doing this would be to have a receiving web application that handles the user's credentials

Chris: That would make the sync issue interesting because it is then at the application level.
... One of the issues we have with Remote Playback is that we want to provide a custom UI, which means that we rather want to use the Presentation API for that.
... Didn't we discuss having a Media element through the Presentation API that gets automatically synchronized with local content?

Mark_Foltz: I believe that's correct. I don't recall the status of it. It came up in May 2018, I think.

<anssik> Second Screen May 2019 F2F https://www.w3.org/wiki/Second_Screen/Meetings/May_2019_F2F

Mark_Foltz: I think we probably agreed that it should be possible. It probably requires a few tweaks to the protocol so that it knows that the remoting is part of a shared presentation.
... We discussed whether everything could be done in script. Same recommendation for synchronization. What you might be missing is the latency of the media rendering pipeline.

Chris: I have seen implementations that manage to do synchronized playback across devices through a timing server.

Igarashi: I don't follow the discussion on encrypted media. You are not going to define how keys are exchanged in the protocol?

Mark_Foltz: Someone with more experience on EME might be able to shed some lights as to what would be required.
... One reason we designed an extension system is that people interested in new features can propose them, prototype implementations, and then we can incorporate them in the spec if all goes fine. We don't have the expertise in the group.
... We're not defining the path for encrypted media from one device with another. Might work if both devices support HDCP.
... I think there is an open issue in our GitHub about remote playback and encrypted media.

Igarashi: Arbitrary application message passing is supported?

Mark_Foltz: Yes.
... In the spec, you'll see bindings between the API and the messages exchanged in the protocol.
... For instance, video.remote.prompt() requires exchanges messages between devices

Mark_Watson: Could the protocol work on TCP?

Peter: You'd have to advertise it differently

Igarashi: [question on security during remote playback]

Mark_Foltz: the Remote Playback API does not require the receiver to be a user agent in the usual sense, it does require the receiver to support media playback as in the HTML spec.

Mark_Watson: The Presentation API requires the receiver to be able to render the URL, but the URL could be a non HTTP URL, custom schemes may be supported instead.

Mark_Foltz: The spec defines processing of HTTPS URL, the rest is undefined.

<anssik> Open Screen Protocol https://github.com/webscreens/openscreenprotocol/

Mark_Foltz: We have a writeup of how the protocol interacts with custom schemes in the GitHub repo.

Chris: That has been one of the extension mechanisms that we've been interested in for opening a Web page that has broadcast capability in HbbTV (perhaps Hybridcast has similar needs)

<anssik> Custom Schemes and Open Screen Protocol https://github.com/webscreens/openscreenprotocol/blob/gh-pages/schemes.md

[discussion on second screen support in Hybridcast]

Mark_Foltz: regarding authentication, we looked at J-PAKE and request/response challenges but we had memory concerns there so switched to SPAKE2 following internal discussion with security experts at Google.

Peter: The protocol allows for more authentication mechanisms in the future.
... Devices can support their own mechanism.

Igarashi: Co-chair of HTTPS in local network CG, meeting on Thursday morning. We haven't reached discussion on authentication. Would be good to align with Open Screen Protocol.

Sangwhan: Is there a prototype?

Mark_Foltz: We recently decided to add streaming to the OSP, which complicated things. We have a first implementation of Presentation API commands. No crypto because we've kept changing that.
... The library is coming. It implements the protocol. It does not do media rendering, it does not have JS bindings, etc.

<anssik> Open Screen Library implementation https://chromium.googlesource.com/openscreen/

Igarashi: If you want to apply the OSP to the broadcast protocol, we need to consider the case where the remote device is not a browser. For instance, channel change is done by the system, not the application.

Mark_Foltz: Capabilities like supporting channel tuning is not in the OSP. If you think that the communication channel needs to be terminated on channel change, that can be added.

Igarashi: In the case that some arbitrary message protocol is still necessary, you'd use the Presentation API, but the receiver may not be a browser agent.

Mark_Foltz: seems like something for an extension.

Chris: OK, thank you for the discussion.

Mark_Foltz: Mostly, we want input on use cases that we haven't considered yet. We'd love to get feedback on the extension mechanism as well.

Pierre: Thank you.

Joint meeting with Timed Text WG

Andreas: We could start with 360 standardization

Nigel: In TTWG, we're in the final stages of rechartering.
... Some things that we're considering such as karaoke.

<Joshue108> https://www.w3.org/WAI/APA/wiki/Accessible_RTC_Use_Cases

Nigel: Quick agenda bashing, any topic you'd like to cover?

Josh: accessibility use cases? See accessible RTC use cases document

Chris: TTML and MSE?

Nigel: Yes, opinions about exposing TextTracks from MSE.

<Joshue108> apologises for throwing a curve ball to Nigel, I'm here for the XR bit but think this doc may still be useful as an FYI

Andreas: Focus the discussion of the day on standardization of 360 subtitles. Most of the stuff comes from an EU research project.
... To make it short, there have been extensive user tests. For captions, main requirement is to have subtitles that are always in the field of view. It's enough to have them on a 2D plane, no need to have them positioned in 3D.
... There should be some indication of where the audio source is positioned.
... Of course, you also need features present in TTML, TTML-IMSC profile being a good example.
... [demo of an application to test subtitles positioning]
... Lots of activity starting last year at TPAC. We started with a discussion in the Immersive Web CG. Then discussion within the TTWG, Media & Entertainment IG.
... In the end, we realized we needed more people from immersive and browser vendors.
... We wrote a proposal to be discussed in the WICG.
... There has been no comment on the WICG forum yet, so question is how do we proceed?
... Two additional activities worth noting. A colleague from Google proposed the creation of an Immersive Caption Community Group, and XR accessibility W3C workshop in November.
... There is awareness that something needs to be done.
... Hard to get enough resources to get started though.
... How to get time and resources from implementors?

<Joshue108> Inclusive Design for Immersive Web Standards W3C Workshop Seattle Nov 5-6

<Joshue108> https://www.w3.org/2019/08/inclusive-xr-workshop/

Andreas: Everything is evolving, nothing really fixed.
... Is it really a web platform topic?
... Important to know when to stop if there is not enough interest.
... Apart from which group should deal with it, the question is also where does this solution fit?
... Authoring environments (Unity, Unreal), Web applications, WebXR API (linked to OpenXR) and 360 / XR device
... How to follow-up? I thought WICG would be the right place, but if there is not enough place, there is still the question of whether that's the right place. Not sure about Immersive Caption CG since it does not exist yet.
... TTWG is the right group but we need more expertise from the XR world.
... Another solution is to continue the work in a "private" repository.

<Zakim> nigel, you wanted to ask what is the state of documentation of the requirements right now

Nigel: What is the state of documentation in terms of the requirements?
... Describing positioning in 3D space, can I do it with audio?

Andreas: There are documented user tests, as part of an European project deliverable.

Nigel: I was thinking about requirements documentation. What is the problem that you're trying to solve, user needs.

Samira: Who was the person who started the Immersive Caption Community Group?

Andreas: Christopher Patnoe at Google

Samira: OK. Another comment is that WebXR is becoming more stable.

Andreas: Yes, the question for me is where should this go.
... The WebXR API does not know anything about what's inside the WebGL right now.

Chris: Is all that's needed a delivery format and then some library can place that in the immersive environment?

Igarashi: Do we need to extend APIs in the browser to support this?

Andreas: OMAF defines a way to multiplex IMSC subtitles with MP4, but then it's all bound to that content format. Not sure it's sufficient for interoperability scenarios.

Kaz: I'm wondering about the possible relationship with WebVMT (because 360 video could be mapped with some kind of map image)

Francois: WebVMT is about tracks positioned on a map, not in 360 videos.

Andreas: It would be an option to have a subtitle format, but burning captions in a frame does not provide good user experience.

Josh: Looking at things from an accessibility perspective. APA would seem a good group to talk to.

Andreas: We talked a lot with Judy, Janina and so on.

<Joshue108> https://www.w3.org/WAI/APA/wiki/Xaur_draft

Josh: We created a list of requirements for XR in APA.

Samira: IW group is also discussing DOM overlays so this is another option for subtitles

Pierre: How many people in this group doing 360 videos and XR content?
... One possibility is that this group is not the best group to get feedback from.

Andreas: I don't know, that's what all groups say ;)
... We need a critical mass to do it.

Pierre: People that build apps for Oculus, are they around?

Andreas: I spoke to some of them. They always say that they don't provide subtitles.
... Some discussion in Khronos with Unity and Epic.
... I talked with Immersive Web folks. We'll talk about that on Wednesday 11:00 during Samira's breakout session.
... The issue that we have is that there is not endless time to deal with it. The project is running out. It stops next year. To push a standard, it will take 2-3 more years.

<Joshue108> There are very few testing with people with disabilities in this space so this is very interesting.

Igarashi: From a content production perspective, I'm interested in a format, but not sure about browser support for this.

https://github.com/immersive-web/dom-overlays

Francois: Not clear to me what you want to be standardized. DOM overlays could be one building block.

Andreas: Yes, DOM overlays may be a good way forward to render captioning thatn burning things in WebGL.

<Zakim> nigel, you wanted to wonder what the smallest thing is that we need to standardise first - is it a syntax for expressing a 3D location?

<Joshue108> +1 to Nigel

Nigel: Same point. Do we have agreement that it's about a syntax for expressing a 3D location?

Andreas: Actually, that's not what we need, since we want it to appear on a 2D plane, that is what the users want.
... We need a way to indicate where in the 3D space the audio source is coming from.

Gary: So you need some positioning in 3D to make that possible.

Andreas: Define a good container is another issue.

Josh: in the User requirements document I showed you, we took a modular approach.
... This architecture does not exist yet.

<Joshue108> https://www.w3.org/WAI/APA/wiki/Media_in_XR

Josh: We're also looking at Media requirements in XR. Not vetted by the APA WG yet.

Andreas: Lots of 360 content for the time being, and a lot of it without captioning.

Gary: WebVTT update. I joined TTWG half a year ago. Trying to get WebVTT to progress. One of the big thing is an implementation report exists right now.
... Something like 6-7 issues with it.

<atai> Link to 360 subtitle requirement https://github.com/immersive-web/proposals/issues/40

Gary: Basically, we're looking at features implemented in browsers and in VLC. Then identify features at risk, and possibly remove them to get a V1 out.
... Then hopefully convince browser vendors to implement the features that we may remove.

<gkatsev> WebVTT Implementation Report

Glenn: Any SMPTE spec that includes 3d positions of audio sources?

Nigel: That's a good question.
... One of the things we're doing around TTML2 is adding new functionality in extension modules. We're trying to constrain the core, and then provide the rest in extensions.
... There are a few ones that are ongoing.
... [details extensions]
... Right now, audio/video comes to MSE but not text.

Mark_Watson: My personal position is that things should be symmetrical across media types.
... At least in our application, we prefer to do the rendering of text tracks ourselves.
... It would be advantageous in which the browser is aware of text tracks.

Nigel: You said my sentiment much better than I could.

Gregg: I would argue that we don't want to render them ourselves, but we still want to control the rendering with our styles.

Mark_Watson: Yes, we want to have enough control of the rendering, but we could offload the rendering to the browser, that would be great.

Nigel: It's been hard to get statistics about user customization, or people that play back content with captions.

Mark_Watson: In terms of rendering, you would still want the site to control enabling/disabling.

<atai> +1

Gary: We shouldn't try to do the same thing twice. If there's more support to do the new generic TextTrack thing, then that's good.

Pierre: Two different questions: any objection to enabling symmetry in MSE? Are you going to use it?

Mark_Watson: First question is whether people think that could be harmful.

Nigel: OK, I just wanted to raise it to get feedback.

[No concerns expressed regarding question on whether people think that could be harmful]

Josh: About accessibility in WebRTC use cases, challenge of synchronizing some of these things together when switching to a different modality. That's one.

Nigel: It would make sense to talk about live contribution to see where that fits. How does live contributions actually work, what's the mental model?
... Alright, I think we covered all topics.

Closing and wrap-up

Chris: Thinking about Media Timed Events, some editorial work needs to be done on the TF report. Planned discussion on DataCue. Around bullet chatting, more conversation will happen this week.
... Some possibility to go to Timed Text WG.

Nigel: It feels to me that this IG could be the best place to give guidance for that if there's no clarity in TTWG on Friday about that.

Andreas: Can you explain again how you want to proceed?
... Draft published in the Chinese IG, what would the ideal next step be?

Song: Initially, contributors were from China. Now that Niconico is engaged in discussions, work could go to TTWG, or perhaps in another group.
... We want the use cases to be approved by the IG, afterwards we'd like to push standardization work on identified gaps.
... Within the next few weeks, we'll have a last version of the use cases.

Andreas: OK, so this week would be a good opportunity to decide where this should go.

Chris: We had a lot of discussion around synchronization today. Frame accurate rendering.
... Ability to seek accurately within videos.
... Some interest to follow-up, although no one volunteers.
... The media production use case that Pierre presented would be a good perspective to address this.

Pierre: With an action on Gary to follow up with Garrett Singer on that.

Chris: Secure communications between devices, we heard interesting stuff from Hybridcast and HTTPS in local network, and Second Screen. Interesting set of approaches that could be compared.
... Seems like a good fit for HTTPS in local network CG discussions.
... Clearly the immersive captioning is interesting, but don't have a clear next step in the IG. Maybe the Immersive Captioning CG could be the right forum.
... We talked about 360 videos. That's something that the IG could follow on. We have liaison with MPEG. Unless you feel that immersive group would be a better home.

Samira: Possibly. At this point, I'm gathering input.

Chris: Finally, there's the timed text in MSE proposal. Would that sit in TTWG?

Mark_Watson: It would be in scope of the Media WG.

Chris: Have I missed anything from the summary?

Pierre: One encouragement for you to clarify the scope in Media Timed Events.

Chris: And also possibly make more specific recommendations.

Pierre: I think it helps to have something concrete.

Chris: OK, I think that's everything, thank for your presence today!

- DRAFT -

Media and Entertainment IG f2f meeting at TPAC 2019 in Fukuoka

16 Sep 2019

Attendees

Contents

Welcome and introduction

Hybridcast update

Media Timed Events in Hybridcast

Media Timed Events Task Force

CTA WAVE update

Frame accurate seeking and rendering

Professional media workflows on the web

360 video

Web & Networks IG

Web Accessibiltiy Initiative for media

Bullet Chatting

Joint meeting with Second Screen WG/CG

Joint meeting with Timed Text WG

Closing and wrap-up

Summary of Action Items

Summary of Resolutions