Media and Entertainment IG at TPAC 2022

Meeting minutes

Welcome and Introduction

cpn: (gives what the MEIG does)
… we had a workshop last year
… recently Chris Lorenzo is introducing HTML technology for TV devices
… and would ask Nigel to talk about Timed Text
… and then next generation audio
… any other topics to cover today?
… the other thing is a new CG starting
… would recommend everybody join it too

Application Development for Consumer Products Task Force Charter

TV application development

Draft TF Charter

ChrisL: We have browsers on TVs, want developers to make applications, but to do that it's different to developing for the web
… Similar to the early days of mobile development, so people preferred native
… Goal of the problem is to identify problems we face as developers, and what can we do to solve them?
… Tizen as example. You have to download their IDE, and command line tools
… I spent a long time getting it to install. I want to use tools I'm familiar with, such as VSCode
… There's good documentation. Samsung developer portal, needs a user account and key
… Bundling the app, figure out putting the TV into developer mode
… Then you can get the web app on the TV
… The process is similar for other TV manufacturers
… It's a complex process

<ChrisLorenzo> https://mlangendijk.medium.com/deploying-and-running-lighting-apps-on-smarttv-5f27e2491943

ChrisL: There's an article written recently talking about how to get a web app running on different platforms
… It also mentions Android and Amazon Fire, using web view
… Challenge of building TV apps across devices, it's time consuming
… Needs a large development team
… Getting a web to load on a TV device is difficult, and varies between devices. Hope to address that in our TF charter
… Embedded devices have 2G memory typically in total. The CPU is not that performant
… So if you build an app with HTML and CSS, you may not achieve 30FPS
… Rendering HTML and CSS has overhead, too much CPU usage
… Our team created a JS framework that renders using WebGL
… Enables a smooth experience
… Across all devices, there's no standard for how much CPU and memory does it require
… You can build an app and deploy on the latest TV, it'll look great. But not so much on older devices
… Last topic is API support for TV specific features
… Deploying RDK devices uses a library called ?? that provides access to the device features
… So you have a different API for each device. You need permission from manufacturers to even have access
… Good to have some kind of web API for device specific things such as volume
… Good progress on mobile APIs, e.g., for gyroscope etc. TV devices haven't caught up yet

<ChrisLorenzo> https://github.com/w3c/media-and-entertainment/blob/master/app-development/charter.md

ChrisL: Any questions?

<ChrisLorenzo> Also JS library for TV devices - https://lightningjs.io/

ChrisN: Can we address performance issues in HTML CSS, do we need a new standard based on WebGL/Canvas rendering?

ChrisL: It's complicated in HTML, there's the DOM tree, how to deal with resizing. There's always a performance overhead
… It would be wonderful if we could use it

<jholland> I had some questions, but I think they're mostly addressed by the "success criteria" section in the charter: https://github.com/w3c/media-and-entertainment/blob/master/app-development/charter.md#success-criteria

ChrisL: We're suggesting WebGL as solution, as it's already in the browser. TV apps require animation
… You can get more FPS from WebGL etc. Not really about saying use WebGL, but also WebGPU and WebAssembly
… Cobalt from Google removed the difficult layout parts, but not sure who's using this

Jake: Do you see developing new standards in a WG?

ChrisL: This is the interest group, so I can't propose solutions. DIAL allows remote application launch
… That would simplify things. Standardising TV APIs is something for the TF to identify then suggest to a WG
… Mobile-like things, progressive web app technologies

ChrisN: Looking for interested companies, would be good to have manufacturers and app developer on board

Kaz: Identify stakeholders as potential participants in the TF

Igarashi: I'd like to clarify about the development experience. Should the web development environment be standardised or not?
… Sony uses Android TV, but we have a WebView, it's up to the Android environment
… Other TV manufacturers provide HTML, but as an alternative to native apps
… Each manufacturer differentiates based on improved user experiene
… Should we standardise the application environment, or leave it to vary by manufacturers?

ChrisL: It depends on what level of standards. Some have TVs have Android, some WebKit, some webviews
… I'd say there needs to be a browser with minimum set of APIs to support TV apps
… A lot of TV manufacturers have emulators to simulate TVs. I want to avoid that as a development environment. I should be able to develop in Chrome in Safari and have it work the same on the TV
… First goal, make it super easy to load an app URL on a TV

Igarashi: I see the issue. Shuold Android TV have a standard development experience

ChrisL: Really, I'd like to see a web browser in there. If apps have webviews, you're using a browser anyway, so have it as part of the OS

Nigel: It sounds like you want to promote Canvas based rendering. How does that fit with accessibility frameworks, if there's no semantic model of the content

ChrisL: We use WebGL and Canvas to get performance. Having the DOM is great if it were fast enough. Accessibility is more challenging, our solution uses text to speech
… The user always has focus on one element, remote control input. The app reads out where you are. It doesn't rely on ARIA tags
… I find it simpler than ARIA tags. It's extensive, but it's a lot to learn to make semantically correct markup

Nigel: That sounds terrible. Sounds like the only concern is reading text. It's a much bigger problem
… Is reading the text as far as it goes?

ChrisL: A deaf person needs audio cues. For colour blindness and visual impairments, WebGL can change colours across the whole UI with no effort, using a filter
… Then there's other devices for input available, you use a simple remote, not a mouse
… Bring your own device using Bluetooth

<Zakim> nigel, you wanted to ask about accessibility framework support if drawing directly to canvas or webGL without a DOM

Piers: On web APIs in TV, HbbTV are developing APIs, seems like an important aspect
… Lightning JS runs a similar speed to DOM type apps, don't know how much more processing is needed. How much time is spent optimising?
… Developer base for Lightning is smaller than React

ChrisL: I've been a web developer for 15 years, now doing TV apps. We've built apps in HTML, can be done, it's usable interface. You can spend time tuning it, but it wasn't achieving 60FPS, not great animations
… Open to solutions, not tied to WebGL

<nigel> ChrisN: Good point that other frameworks already exist. Some follow-up work to be done.

Kaz: I agree with ChrisN. ChrisL, you've mentioned pain points, we can continue discussion on requirements for what's needed
… Survey existing current practices as ChrisN mention, in the TF

MarkVickers: One thing that's causing the move to WebGL, difference from PC and mobile is ratio between CPU and GPU speed
… Graphics acceleration is much closer to PC and mobile. So the ratio between CPU and GPU is closer

ChrisL: Manufacturer goal is to minimise cost to consumers, SoC may be dual core, some are quad code, made to playback 4K video. Everything else is just enough
… To get it down to a price point, reduce memory, e.g, to 1G of system memory. Loading a web app environment takes a lot of memory

MarkVickers: Goal was to use the same APIs, so underline what ChrisL said about needing a browser APIs
… Want to avoid TV specific APIs, so if there are APIs needed they should apply across all devices

Francois: Next step, call for consensus to create a TF?

ChrisN: Yes, also positive indications for wanting to participate

Web Media API Snapshot Update

JohnR: I'm here as chair of WAVE HATF
… I want to give an update on the Web Media API Snapshot
… Reflect the state of the web and how TV devices support those APIs
… CTA WAVE is part of CTA, which hosts the CES conference. They do standards work
… The WAVE group focuses on internet video on devices
… Make it easier to consume video on devices. Focus on tools for interop
… The WAVE group doesn't create new standards, but references existing standards as much as possible. HTML5, MSE, EME
… There are a number of groups active in WAVE. The Web Media API group develops the snapshot
… The Device Playback Capability TF looks at video playback itself
… DASH/HLS interop group
… Client media data group focuses on CDN data. Common Access Token Group
… How can we improve interop across CE devices?
… With HATF, we identify the minimum set of web stadards needed with emphasis on streaming
… Use existing APIs, based on four widely adopted engines
… We consider the capabilities of CE devices. Just because an API is available in all four devices, doesn't mean we'll include it
… Web Media API Snapshot happens in a W3C CG. Everything is in the open, in GitHub
… We update it every December
… That was requested by manufacturers. We co-publish as a W3C CG Note, not a standard. There's related CTA WAVE spec with the same content
… HbbTV and ATSC reference the Web Media API Snapshot, a specific snapshot year, not necessarily the most recent
… They choose the snapshot that works on devices
… There's also a test suite to ensure devices meet guidelines, based on Web Platform Tests. We fork WPT, add tests we need to run
… Control it from another device to run the tests
… We contribute changes back to WPT
… There's WAVE directory in WPT
… Louay's team is instrumental in doing that, and hosting the tests online
… As far as what makes it to annual updates, each change needs multiple refernces, caniuse, EcmaScript compatibility tables, etc
… Not always accurate, so we look at the WPT results as input
… We do our own tests
… ECMAScript is being updated to 2022, CSS is updated. for WHATWG it's tricky as those are living standards. We reference W3C snapshots or WHATWG review drafts
… We need stable references
… We're including the receive side of WebRTC. We're considering WASM for the future, not this year

<kaz> ack t

<Zakim> tidoust, you wanted to wonder about next immediate step to create the TF (call for support and consensus?)

ChrisN: Any challenges for WASM adoption on devices?

JohnR: We're not at the stage where we can say it's ready. The 32-bit limitation is an issue, that's why it'll land in 2023

Jake: Any thoughts on recommended memory or CPU budget going into these standards?

<Zakim> nigel, you wanted to contrast the Web Media API with the idea of Consumer Products

JohnR: We have an exception, if you're transforming video, don't expect 60FPS. We could include some form of hardware requirement for WASM

Nigel: We've heard two presentations from a CE perspective, at two ends of a spectrum. What's going on?
… We have Web Media API requires DOM and browser. On the other hand, the limitations in CPU saying we can't afford to use the DOM. What lens should we see this through?

JohnR: Goal is to have something for standards bodies. Is this the direction we continue to go? We're actively talking about it. Will there be a pivot, because of the issues Chris mentioned, to something else
… We're actively talking if we need this or something related?

Kaz: I'm interested in that point. The WoT group organised a breakout, including NHK's demo, smart TV and refrigerator. That connection to WoT is interesting

<kaz> NHK's slide on their demo during the WoT breakout on Sep-14

ChrisN: Want to organise meeting with NHK soon

Piers: If there could be an optimised subset of the full web API that runs on TVs, could be a good solution. For example, MPEG DASH and DVB DASH that runs on TVs
… May be HbbTV produces that, instead of having a different approach, something more optimised

JohnR: That's like what Cobalt does

Piers: There all the others, such as Roku that have their own APIs

JohnR: Also looking at miniapps, webviews. Unsolved problem

ChrisN: Let's figure out how to collaborate between WAVE and MEIG

JohnR: Sounds good

Timed Text Joint Meeting

<MarkVickers> Just because a TV application runs best on a subset of HTML5 doesn't require that the HTML5 platform on the TV be subsetted.

Nigel: Update on relevant work areas. We're in a charter extension at the moment, currently dealing with a formal objection
… Two topics for this meeting are audio description for dubbing, and timed text complexity

Audio Description for Dubbing

<kaz> Audio Description CG

Nigel: I set up a CG for Audio Description. That would be a list of times and text that and audio describer would describe the scene. Well established in MAUR
… Seemed to have support, but not enough. Nobody came forward to help edit
… Input from Netflix, they have a need to exchange dubbing scripts. Common requirements. In both cases you need to work out words that need to be spoken at given times
… AD is translation of the video image. We agreed to work on a single spec, a TTML2 profile, called DAPT
… We're editing to get it to FPWD soon
… We published a requirements document. It's a draft note, published in May
… It describes the workflow for the creation process and what you end up with
… That can include recorded audio with mixing instructions. It could not include the audio, then be used by an actor to create the dubbing output

<kaz> DAPT Requirements

<kaz> Dubbing and Audio description Profiles of TTML2

Nigel: It's a subset of TTML2 vocabulary, plus metadata to help workflow
… The requirements went out for review, one additionl request for functionality, not added yet
… Process steps are used to extract requirements for the spec
… Can be used for dubbing tools, AD, the recording process
… If you're translating the dialog to create a dubbing script, it makes sense to use that for subtitles

Cyril: It isn't good if you have independent translations from the poeple doing dubbing and those doing subtitling
… If you coordinate the processes by having a single translation to the target language, you get fewer discrepancies

Nigel: Easy to transform those

Jake: Is it about linking text to the audio? It seems like it would be reasonable to have multiple translations, but you don't want the dubbing and text to be misaligned?

Cyril: Our users are watching a show originally in English, but in French with French subtitles

Jake: It's odd when the audio doesn't match the text

Cyril: That's why it's good to have one translation. For a dubbing script you want to adapt to lip sync. We think these should be final adjustments
… The structure of the sentence is the same, better UX

Nigel: It's always a good starting point with authoring to have a close match to the original audio. Then there's an editorial decision to turn that into subtitles and apply styling

Nigel: The work on an audio description profile isn't wasted, we're meeting two use cases instead of one
… I can show a demo of how the mixing instructions can work, either in browser or server side
… Next generation audio, object based audio. This would be amenable for transformation to the instructions for an NGA platform, representing text as different objects
… That's a goal for AD. Open question for the dubbing side
… So it provides a path to do that
… The audio mixing instructions allow you to smoothly adjust gain and position over time

Janina: We're happy to hear about this work, want to encourage keeping the text that becomes the video description is useful. We don't do enough for people who are deaf blind, want to do more

Nigel: Yes. We made sure it includes the ability for the client to vary the level relative to the program audio. Also putting the text into an ARIA ?? region, so the screen reader can use a different voice
… Also if screen reader is a braille display, it works too

Kaz: Wondering about relationship with VoiceXML, SSML and SMIL

Nigel: It's close enough to SMIL, no need to further align
… If you use text to speech features in TTML you can adjust pitch. Other adjustments for pronunciation
… I'd see that as a v2

Kaz: I need to organise another voice agent workshop. This is related

Nigel: You can't route output from Web Speech to Web Audio. That would be interesting

<nigel> DAPT Requirements

Janina: APA is actively working on a spec to make sure we can lock in pronunciation on the authoring side on different environments. Not acceptable in education environment to have variations between different renderings

<nigel> DAPT ED

Janina: Meeting at 3:30pm today, you're welcome to join

<nigel> BBC Adhere ADPT/DAPT playback prototype

Janina: Good meeting with WHATWG. They're happy to make SSML a full citizen
… The pronunciation topic may be of interest, welcome your input

Matthew: Can be used anywhere, also smart speakers

Nigel: Yes. When I created a prototype video, had to add extra SSML directives to make it sound right
… AD providers today would use a human, so little benefit

Janina: Amazon Prime is sometimes delivering script-generated TTS, controversial among consumers

IMSC profiles of TTML

<kaz> TTML Profiles for Internet Media Subtitles and Captions 1.2

Nigel: IMSC profiles of TTML are widely used for providing caption data. Three active versions. Each are referenced from other specs, such as DVB, HbbTV, ATSC
… A feature common to all is the hypthetical render model
… The intent is to manage document complexity such that client devices aren't swamped by the processing load and so can present the intended text at the required time
… It's a maintenance headache for the spec, so we want to factor it out into a new spec and reference normatively rather than duplicate
… There's an open source implementation and lots of content
… Issues to resolve, such as when you have a caption pattern with a few frames with nothing displayed, it increases document complexity
… The time from clear to present next caption, there's only a few frames, under 100ms to do all DOM rendering
… It can mean caption display is later than desired
… Not good UX. We may decide to optimise in the HRM to allow for that. We want feedback on that

Pierre: From the feedback I received, there's practice out there, when there's a small gap, have a gap of 2 frames, 20-40ms
… in the current HRM, there's a big penalty. It may be artificial due to how it's specified. Can address without changing substantially what an implementation needs to do

Nigel: I'm interested in real world implementations, how they do pre-planning on what they do next

Pierre: https://github.com/sandflow/imscHRM/pull/9

<nigel> IMSC-HRM specification ED

<kaz> IMSC Hypothetical Render Model WD

Pierre: Goes back to how it was designed, assume is cleared as beginning of a subtitle, but could be done at the end
… So refactoring the model, can accommodate this practice with affecting how it works
… Anyone here with TTML content, try out the HRM. Your content should pass. If it doesn't let's figure out where the problem is, in the HRM or the content
… Please report your experience. Thank you

Pierre:

<nigel> Open source IMSC HRM software

Pierre: It's easy to run, Python based. Having user feedback as we get to CR will be key
… It's a common practice to have a short gap, but also common to have no gap

Nigel: That summarises what we're up to in TTWG

Pierre: We do need the feedback, want to avoid bad UX or content being rejected

Nigel: This week we had breakouts on privacy and caption settings, and getting useful data to content providers for product improvement. Another breakout on WebVTT to drive TTS in the browser
… TTWG meets today and tomorrow, and joint meeting with APA this afternoon, looking at synchronisation a11y user requirments

Next Generation Audio

Bernd: First time at TPAC. We're interested in NGA APIs

Bernd: A stream could include different dialogue components in different languages,
… audio description, or different number of channels eg 5.1, stereo etc
… Can define preselected mixes and what flexiibility there is to adjust.
… For example a "no dialogue" preselection,
… one called "English" with English language
… "English Dialog plus" with a lowered bed plus English audio
… [others, adding in audio description]
… Then allow different interactions
… for example changing the spatial position of the audio description
… which could improve the ability to differentiate the audio description
… This is a paradigm shift where the audio decoder is not only a sink
… but also can provide information about the stream,
… and is reactive to the user's settings.

NGA API Requirements

ChrisN: We've been collaborating with Fraunhofer and Dolby (who did AC4)
… to pull together a proposal for browsers for what NGA could look like.
… Just the high level capabilities
… 1. Dialogue enhancement, looking for a toggle to enable/disable and a gain control to allow adjustment
… 2. Preselections: expose list of what's available as signalled in the content, and controlling the active one.
… 3. Object based audio control: list of available objects with enable/disable/gain/position adjustments for each one.
… Current focus is on streaming media playback, file based and MSE based.
… Intent is codec agnostic, AC-4 or MPEG-H, same API for either.
… Potentially has interest around WebCodecs and WebAudio but not the focus now.
… More focused on MSE playback, leave those to future iterations
… assuming we make progress with regular playback.
… That's as far as we've got to with the project so far.
… Interested in talking to anyone who is interested in this.
… Proposing to come back to this group at some point in the future with more detail
… so we can start that conversation.
… Will announce when we're ready and invite participation.
… Initial reactions and thoughts?

Eric: Eric. Apple - Webkit.
… In our case this will require changes to frameworks underneath us, system frameworks
… that Webkit hands off to. We probably don't have the kinds of controls you're talking about.

Bernd: We were thinking about a JS API

Eric: In order to implement those in Webkit we will need features from other parts of the system.
… We probably can't implement all the things you're talking about yet.
… It needs more than the JS API.

ChrisN: It would need host OS-level support.

Kaz: Depends on the uses cases. Maybe this can be applied to the metaverse AR/VR space.
… Automatic generated voice sound should be integrated with the recorded audio objects.
… May be too complicated, but could be useful in some use cases.

Andreas: Andreas Tai, Invited Expert.
… Thanks for pushing this forward. Important to work on interop for this technology
… Could be an accessibility preference that would by default activate audio description?

Bernd: Another thing about DE signalling, could be related to signalling in the DASH MPD

[adjourned]

– DRAFT –
Media and Entertainment IG at TPAC 2022

15 September 2022

Attendees