Accessible Platform Architectures Working Group Teleconference -- 13 Oct 2020

<bruce_bailey> scribe: bruce_bailey

presents +

Janina: simple agenda

<inserted> scribe: bruce_bailey

brainstorm about situations where close synchronization is needed

scribe: research taskforce started from accessibility needed
... later deafened people use reading lips more than anything
... it seems like everyone on the planet who can see rely on lip reading even if they do not know it
... recent events with mask wearing really interfers with that
... concern that TimeText may have issues for accomodation
... What is Time Text perspective ?

Nigel Megitt: Q is how sensitive is audience to syrchonization?

scribe: from a TT WG perspective, we have various TTML and its various profiles and WebTT which allow time stamping to millisecond,
... that might be order of magnitude we need
... we are not talking about seconds or tenths of seconds -- we need to be talking about hundreths or thousandths of seconds.
... standard does not give assurance of accuracy however.

Chris Needham:

Janina: apa, tt wg, media group as welll

Chris Needham: work for BBC and co-chair of entertainment ig

scribe: we working on javascript that synchronize captions with sharp changes in video,
... so want almost frame-specific metrics, so that as shot changes, chaptions changes
... this was a significant change to normative requirement
... goal was for 20 millisecond accurancy to support shot changes of videos, so text could be cordinated with display

<Zakim> nigel, you wanted to note that the requirement about frame accuracy was based on "normal" frame rates

Janina: We also have a broader set of use cases, need to include synchronization of voice and audio

Nigel: Import to keep in mind frame rate in mind.
... a 25 fps video might only render at 6 fps, so 25 fps is the benchmark
... if there is low profile encoding at 6 fps, one can imagine the captions drifting quite a bit compared to the audio quite a bit
... it is not just text synchronize, but also audio description.
... I have experimentally have an implementation that mixes the AD client side and needs good accuracy.
... the audio description plays over the default audio track.
... this is an effect that AD folks work very hard to avoid for live use

Jeanne: Changing the question,

I am the co-chair of the Silver community group and WG

scribe: we are working on requirment for caption in XR
... our research shows that 80% of people using captioning are not Deaf

<mikecrabb> Reference for Jeannes 80% stat: https://www.ofcom.org.uk/consultations-and-statements/category-1/accessservs

scribe: and that culturally Deaf folks find lip reading more important
... so i have some concern that we not ignore need of the majority users of captions.

Janina: i appreciate to be reminded of the importance of lip reading...
... and also that TT standards have resolution down to microsends, and off course things break down at 6 fps
... so to brainstorm, do we need to do anything
... in the past, what was happening on the screen was not as much of a concern.
... my whole idea for this session is just that we brainstorm

Jeanne: WCAG3 will also include advice to user agents.

<Zakim> jeanne, you wanted to ask if the research that Janina quoted about lip reading included aging? Our research had different results. and to say that WCAG3 can include advice to user

<Zakim> cpn, you wanted to mention second screen scenarios

Chris Needlem: In the 2nd screen working group, talking about having cotent from one device onto another

scribe: typical used case is something like "ChromeCast" where video is thrown from one screen to another...
... does this technology image captions ?

Janina: Yes, very important, but want to keep in mind that "screen" might be another device

<Zakim> janina, you wanted to discuss 2nd screen

Janina: like bluetooth headphones or braille display
... if we can keep thing sychronized, this works
... the lowest numbers is 50 ms.

<jeanne> +1 to include 2nd screen - also discussed in WCAG3

Chris: This is one of the questions we were struggling with, is the synchronization of the 2nd screen important.
... so research you have is important to answer this question in the affirmative
... needing to follow by lip reading is an interesting use case that we have identified for the specifcity between audio and video track

Janina: I don't recall this comming up in MAUR so thinking about it five years later, this may have been something we missed.
... some reasearch about synchronization goes back even 50 years to some data indicating the importance

<Zakim> nigel, you wanted to mention authoring guidelines

Janina: I will take a look for this in the Silver reachers.

Nigel: if the change for UA to get to within 20 ms, then this has huge impact for authors

<nigel> BBC Subtitle Guidelines re synchronisation

Nigel: pointing BBC captioning guidelines, just one data point
... if no one authoring content with this level of accuracy, then requirements on UA might not be productive
... Two ways to think of synchronize. Typical is focus on timing stamps.

<Joshue108> +1 to Nigel

<Joshue108> There is a need to be aware of the ideal tolerances between various AT user requirements

<Zakim> mikecrabb, you wanted to say WCAG3 challenges in positioning captioning in XR

Nigel: other technique, which has not had much discussion, is for the slowest bit to set the pace, and for the fastest parts to be slowed down
... not seen much reasearch on that.

Michael Crabb: One of the details we have been worried about is the meta data that needs to be attached to VR environments

scribe: object needs sound location data. Where are the sounds coming from?
... Does anyone have suggestions for spacial location?

Janina: This issue did come up in maps subgroup and the group is proposing a new datum for location
... we wonder if this might suffice for a mechanism with audio and captioning source

<Zakim> janina, you wanted to support pausing in ua

Mike Crabb: Two dimensions is clearly note enough and we wonder if we need six dimensions in some situations.

Janina: Have similar concern with mapping, so hopefully the mechism would be compatible.

Andreas Tai: In TT group we have been have conversations on similar issues, even during last couple of TPACs, but we have not felt like we were getting much positive feed back

scribe: so we have some feedback and work on the topic. I will post link in IRC with GitHub tracker

<atai> https://github.com/immersive-web/proposals/issues/40#issuecomment-533966441

Janina: I want to ask about pausing and how fast can someone read braille
... we could NOT count on someone to read Braille transcription in real time
... really depends on someone's skill, and when the person learning to read braille.
... Our conclusion was that users had to be provided the option to pause the rest of the stream
... if you think about a physic problem with good audio description, one absolutly needs to interact with the description stream, rather audio or text

<jeanne> acl nig

Janina: no NOT just user skill, but nature of content.

<Zakim> nigel, you wanted to mention that there's no API mechanism for this

Nigel Megit: Thank Janina, this really provides evidence that this is an important issue that has not been well addressed, particular in the API or meta data.

Janina: Environment makes huge difference. Watch a Shakespeare play for entertainment is very different from studying cimontography
... really need strong API that we will have to talk quite a bit.

Jason: One of the interesting scenerios is where one is reading a transcript, so manually pausing would be very disorienting
... would be nice if the API could allow the captions to queue and then play back when there is a natural pause in the dialog
... could use a gap analysis. Make consumption much less interactive.

Janina and Jason agree we need background investigation for future API work.

Gary Katsevman, video JS project

scribe: We have plug in fro reading Audio Description
... so how that works, we have text track that gives start and end time, and it works with the speech API
... if we pause the video, then the playback continues automatically.
... this can happen because tts typically is faster than real time, so speech api can read back AD before the end event
... but with something like a braille display, there is not a trigger for "finished reading" so that is trickyier

Janina: Great to hear the speach API has this already.

<Zakim> Joshue, you wanted to mention getting clarity on ideal tolerances between media

<gkatsev> use speech synthesis to speak description text track

Joshue, ex AG WG chair

scribe: we need to work out various tolerances for various AT
... can we get idea for tolerances and sweet spots
... so we could have various end points
... ignoring privacy, device migh "listen" to try and figure out what user was using for consumption, and adjust accordingly

<nigel> bruce_bailey: In a real time environment, do we have good grounding for syncing?

<nigel> .. how does that work in terms of setting the zero point?

Bruce asks questions about counting? Ticks from what?

scribe: films are start, but that doesn't work for V

Nigel: Typically, syncrhonization starts from "now" so clients sync that up okay

Janina: We have enough reseach based evidence for tolerances
... we have that synchronization from video and voice, so we picked "primary media resource"
... additional used cases for captioning and audio description
... we will synchronize coordination for those groups to stay in touch with
... we can followup on what is "reasonable tollerance"
... we also need to followup on user agents.

<Zakim> gkatsev, you wanted to mention privacy and importance of vendoring

<nigel> +1 to the privacy point that exposing to JS might not always be a good idea

<jeanne2> +1 to privacy

<Joshue108> scribe: Joshue108

BA: I will get underway - just to mention the IPR

<dom> Slides for APA/WebRTC joint meeting

mentions the channels and agenda.

Will talk about charter and deliverables

Will mention the Machine Learning workshop

Also IETF input and gateway implementation

Josh will talk about RTC Accessibility User Requirements

BA: So WebRTC was recently rechartered.

Defines what we do and timeframes.

Charter contains refs to camera, mic and speakers

There are APIs for capture media

Media streams processing etc

BA: Gives overview on deliverables

In Capture realm there are a bunch of specs

Media capture automation, and streams, image, output, recording and more

questions?

BA: This is what we do, there are lots we don't do - like content protection, or Web CODECs - lower level access.

So things handled by Media group

HA: The most active things are Raw media access and insertable streams

Other things are maintenance

BA: Access to Raw media is a pre-requisite for ML

There are a11y implications; speech and image recognition for example.

Emotion analysis also - some work on ML signing, or at least being able to identify it.

HA: There is a text to speech API but no one has done much on it, front end to proprietary

There's also speech to text

JS: One of each

BA: Gives over of recent Machine learning conference

https://www.w3.org/2020/06/machine-learning-workshop/

BA: There is a move towards working on the client, and the goal

you need access to raw media - to make this work

There is a video track reader, and insertable streams for raw media, Harald?

HA: They are not unrelated - if we make one access efficient it will help others

Breakout box will enable you to do these things efficiently

BA: Both of these APIs will be discussed

Stay tuned

BA: Dom there will be proceedings from ML workshop?

DHM: Yes, available soon

BA: We think ML will have a big impact on a11y via these APIs
... Three things to point out.. T.140 over WebRTC Data channel. Language negotiation SLIM and Interop profile of Video Relay Service RUM

There is open source implementation of RUM

BA: Over to Lorenzo to talk about T.140 over WebRTC Data channel gateway

It uses the channel in a reliable way

LM: This describes how to translate from SIP RTT session

WebRTC as it is does not support T.140 for this describes how to manage and handle translations

A gateway is needed between WebRTC data channel and RTT

LM: Implemented in Janus

Was curious about SIP RTT

Janus is an Open Source WebRTC server - acts as a media and signalling gateway

SIP originated on server side - not transcoded just relayed

I had to add support for Mtext etc and translate on the WebRTC side

WebRTC does not support text based media normally..

Had to take care of T.140 delivery..

Used data channels to send data as binary payloads

Someones T.140 messages are sent via data channels also

browsers may not need it

The draft describes the spec and describes the @@

You negotiate in advance using expected formats

This spec allows for attributes to be negotiated

Patch includes a demo

More testing needed

ML I tested this with old JAVA applications

This is an open source implementation

ML: Good test bed

I could test what was going on, but more advanced apps may have done more

Want to track future testing.

BA: Anyone have opinion on definitive RTT implementation are?

JS: RTT support is mandated by the FCC in the US.

BA: If it works with Android and iOS they would be the canonical tests

JS: yes

ML: My implementation is generic, there may be vendor locking etc

JS: That defeats the purpose, needs to be interoperable cross platform etc - vital for 911 emergency services

There should be no vendor lock at,

Verizon and ATnT etc needs to work

There are bugs for users of braille devices etc on Android

You need to interoperate and handle emergency communications

ML: Thats good to know.

We need to test in Europe

JOC: There are also requirements under EN 301 549 the EU a11y procurement standard

ML: <overview of sub protocols>
... It is currently up to JavaScript applications to handle @@

Need to figure out where some functionality belongs

End points should be free to use various buffering windows

I plan to do some buffering in the gateway also.

I also need to look at packet loss

And how to handle these errors etc

If we get feedback we currently ignore it.

BA: Did your implemtation not support NAC?

ML: All the bricks are there, we need to play with it and see if it is a good starting point

want to move it forward - if you are interested in testing I can provide guidance

JS: Sounds good

BA: Comments?

Are there others working in this area with RTT that you can work with?

Many are proprietary right?

ML: Yes, there are links that I got but they are framework specific

would like a generic client that is good enough

ML: Most of these implementations are specific and bespoke to their intended service

I can look again

BA: Thank you Lorenzo
... I mentioned language negotiation

To accomodate disability requirements

You can serve ASL in a video stream - or in a gateway, you can request to write and recieve or other preferences

This signalling is outside the WebRTC API

Regarding routing, you may use ML algos

BA: Media usage is up to participant

Calls may get routed to where they can be handled

The user can decide what they need

JS: Question - is the concept that we may use ML to provide these services because these are quick etc?

Or involving a human where higher quality is needed? Dont want to rely on a bot in complex scenarios.

BA: Yes. <gives example of interoperability between foreign languages like Mandarin and the need to access emergency services>

ML may fill in but primary routing to a human

JS: Sounds good

BA: Morphing now to ML

<janina> scribe: janina

jo: Have an updated draft of RTC Accessibility User Requirements

<Joshue108> http://raw.githack.com/w3c/apa/AccessibleRTC/raur/index.html

jo: We had feedback from public review now reflected in our document
... Relates to anchoring and pinning a video window
... eg the sign lang interpreter next to the person speaking
... or natural lang interpreter ...
... so req to associate and pin thos windows so that user can correctly associate who's being interpreted
... make sourcing clear even with 2nd wscreen use
... also ability to capture captioning, but also to pause recording captions when an off minutes conversation is brought up
... ditto for signing
... but still provide on screen even though not captured -- then resume
... to have a11y profiles that persist across environments
... some new reqs re RTT ... some blind users may not be aware msg not sent because of buffering needs
... haven't specified yet how this should be met, more discussion needed
... req to support other langs; so good to hear this is contemplated
... noted relation to XR environments with some similar reqs
... also noted ITU Total Conversation services
... requesting review and feedback

<Joshue108> JS: I think we have this covered.

<Joshue108> What comes to mind is yes RTT is important and we need the use case for the IRC type interface who needs this as an option

<Joshue108> Braille needs to be buffered as does text to speech so instantaneous transfer results in unitelligable speech

<Joshue108> Apple say they may be able to buffer as needed

<Joshue108> Longer than the 300 ms we heard about earlier

<Joshue108> from Lorenzo

<Joshue108> We may have competing use cases here - for deaf users and blind users who need different things

<Joshue108> Otherwise we are good - we think this is the final version of the RAUR

<Joshue108> JS: We are happy to finish.

<Joshue108> BA: I've a comment, some of the requirements are relevant to second screen

<Joshue108> JS: Lets ask Dom.

<Joshue108> DHM: I've a high level question from my reading, a lot of these requirements apply at the service provide level.

<Joshue108> This may be the goal of the document, and if so how can we bring this the attention of providers?

<Joshue108> Some may be involved in our group, or not directly

<Joshue108> JS: Yes, somethings are spec related, and some are guideance and support needs

<Joshue108> We can tease

<Joshue108> JS: We have learned a lot from the current situation a la COVID

<Joshue108> but many people now know about remote meeting issues around Zoom etc

<Joshue108> The overall experience gives us a good idea of what we need

<Joshue108> DHM: As a document this may more have an impact if it was clear about what requirements were meant for whom.

<Joshue108> So WebRTC providers can be made aware of what needs to be added to their service

<Joshue108> It's hard to be clear on where the target requirements are etc

<Joshue108> Clarify who the target requirements are for.

<Joshue108> JOC: Yes

<Joshue108> BA: It would be good to get dev feedback

<Joshue108> I've seen this in workshops

<Joshue108> where say 90% of those there were using WebRTC.

<Joshue108> they may have feedback

<Joshue108> JS: We hope there will be interoperability - vendor lock in is in no ones interest really

- DRAFT -

Accessible Platform Architectures Working Group Teleconference

13 Oct 2020

Attendees

Contents

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output