<bruce_bailey> scribe: bruce_bailey
presents +
Janina: simple agenda
<inserted> scribe: bruce_bailey
brainstorm about situations where close synchronization is needed
scribe: research taskforce
started from accessibility needed
... later deafened people use reading lips more than
anything
... it seems like everyone on the planet who can see rely on
lip reading even if they do not know it
... recent events with mask wearing really interfers with
that
... concern that TimeText may have issues for
accomodation
... What is Time Text perspective ?
Nigel Megitt: Q is how sensitive is audience to syrchonization?
scribe: from a TT WG perspective,
we have various TTML and its various profiles and WebTT which
allow time stamping to millisecond,
... that might be order of magnitude we need
... we are not talking about seconds or tenths of seconds -- we
need to be talking about hundreths or thousandths of
seconds.
... standard does not give assurance of accuracy however.
Chris Needham:
Janina: apa, tt wg, media group as welll
Chris Needham: work for BBC and co-chair of entertainment ig
scribe: we working on javascript
that synchronize captions with sharp changes in video,
... so want almost frame-specific metrics, so that as shot
changes, chaptions changes
... this was a significant change to normative
requirement
... goal was for 20 millisecond accurancy to support shot
changes of videos, so text could be cordinated with display
<Zakim> nigel, you wanted to note that the requirement about frame accuracy was based on "normal" frame rates
Janina: We also have a broader set of use cases, need to include synchronization of voice and audio
Nigel: Import to keep in mind
frame rate in mind.
... a 25 fps video might only render at 6 fps, so 25 fps is the
benchmark
... if there is low profile encoding at 6 fps, one can imagine
the captions drifting quite a bit compared to the audio quite a
bit
... it is not just text synchronize, but also audio
description.
... I have experimentally have an implementation that mixes the
AD client side and needs good accuracy.
... the audio description plays over the default audio
track.
... this is an effect that AD folks work very hard to avoid for
live use
Jeanne: Changing the question,
I am the co-chair of the Silver community group and WG
scribe: we are working on
requirment for caption in XR
... our research shows that 80% of people using captioning are
not Deaf
<mikecrabb> Reference for Jeannes 80% stat: https://www.ofcom.org.uk/consultations-and-statements/category-1/accessservs
scribe: and that culturally Deaf
folks find lip reading more important
... so i have some concern that we not ignore need of the
majority users of captions.
Janina: i appreciate to be
reminded of the importance of lip reading...
... and also that TT standards have resolution down to
microsends, and off course things break down at 6 fps
... so to brainstorm, do we need to do anything
... in the past, what was happening on the screen was not as
much of a concern.
... my whole idea for this session is just that we
brainstorm
Jeanne: WCAG3 will also include advice to user agents.
<Zakim> jeanne, you wanted to ask if the research that Janina quoted about lip reading included aging? Our research had different results. and to say that WCAG3 can include advice to user
<Zakim> cpn, you wanted to mention second screen scenarios
Chris Needlem: In the 2nd screen working group, talking about having cotent from one device onto another
scribe: typical used case is
something like "ChromeCast" where video is thrown from one
screen to another...
... does this technology image captions ?
Janina: Yes, very important, but want to keep in mind that "screen" might be another device
<Zakim> janina, you wanted to discuss 2nd screen
Janina: like bluetooth headphones
or braille display
... if we can keep thing sychronized, this works
... the lowest numbers is 50 ms.
<jeanne> +1 to include 2nd screen - also discussed in WCAG3
Chris: This is one of the
questions we were struggling with, is the synchronization of
the 2nd screen important.
... so research you have is important to answer this question
in the affirmative
... needing to follow by lip reading is an interesting use case
that we have identified for the specifcity between audio and
video track
Janina: I don't recall this
comming up in MAUR so thinking about it five years later, this
may have been something we missed.
... some reasearch about synchronization goes back even 50
years to some data indicating the importance
<Zakim> nigel, you wanted to mention authoring guidelines
Janina: I will take a look for this in the Silver reachers.
Nigel: if the change for UA to get to within 20 ms, then this has huge impact for authors
<nigel> BBC Subtitle Guidelines re synchronisation
Nigel: pointing BBC captioning
guidelines, just one data point
... if no one authoring content with this level of accuracy,
then requirements on UA might not be productive
... Two ways to think of synchronize. Typical is focus on
timing stamps.
<Joshue108> +1 to Nigel
<Joshue108> There is a need to be aware of the ideal tolerances between various AT user requirements
<Zakim> mikecrabb, you wanted to say WCAG3 challenges in positioning captioning in XR
Nigel: other technique, which has
not had much discussion, is for the slowest bit to set the
pace, and for the fastest parts to be slowed down
... not seen much reasearch on that.
Michael Crabb: One of the details we have been worried about is the meta data that needs to be attached to VR environments
scribe: object needs sound
location data. Where are the sounds coming from?
... Does anyone have suggestions for spacial location?
Janina: This issue did come up in
maps subgroup and the group is proposing a new datum for
location
... we wonder if this might suffice for a mechanism with audio
and captioning source
<Zakim> janina, you wanted to support pausing in ua
Mike Crabb: Two dimensions is clearly note enough and we wonder if we need six dimensions in some situations.
Janina: Have similar concern with mapping, so hopefully the mechism would be compatible.
Andreas Tai: In TT group we have been have conversations on similar issues, even during last couple of TPACs, but we have not felt like we were getting much positive feed back
scribe: so we have some feedback and work on the topic. I will post link in IRC with GitHub tracker
<atai> https://github.com/immersive-web/proposals/issues/40#issuecomment-533966441
Janina: I want to ask about
pausing and how fast can someone read braille
... we could NOT count on someone to read Braille transcription
in real time
... really depends on someone's skill, and when the person
learning to read braille.
... Our conclusion was that users had to be provided the option
to pause the rest of the stream
... if you think about a physic problem with good audio
description, one absolutly needs to interact with the
description stream, rather audio or text
<jeanne> acl nig
Janina: no NOT just user skill, but nature of content.
<Zakim> nigel, you wanted to mention that there's no API mechanism for this
Nigel Megit: Thank Janina, this really provides evidence that this is an important issue that has not been well addressed, particular in the API or meta data.
Janina: Environment makes huge
difference. Watch a Shakespeare play for entertainment is very
different from studying cimontography
... really need strong API that we will have to talk quite a
bit.
Jason: One of the interesting
scenerios is where one is reading a transcript, so manually
pausing would be very disorienting
... would be nice if the API could allow the captions to queue
and then play back when there is a natural pause in the
dialog
... could use a gap analysis. Make consumption much less
interactive.
Janina and Jason agree we need background investigation for future API work.
Gary Katsevman, video JS project
scribe: We have plug in fro
reading Audio Description
... so how that works, we have text track that gives start and
end time, and it works with the speech API
... if we pause the video, then the playback continues
automatically.
... this can happen because tts typically is faster than real
time, so speech api can read back AD before the end event
... but with something like a braille display, there is not a
trigger for "finished reading" so that is trickyier
Janina: Great to hear the speach API has this already.
<Zakim> Joshue, you wanted to mention getting clarity on ideal tolerances between media
<gkatsev> use speech synthesis to speak description text track
Joshue, ex AG WG chair
scribe: we need to work out
various tolerances for various AT
... can we get idea for tolerances and sweet spots
... so we could have various end points
... ignoring privacy, device migh "listen" to try and figure
out what user was using for consumption, and adjust
accordingly
<nigel> bruce_bailey: In a real time environment, do we have good grounding for syncing?
<nigel> .. how does that work in terms of setting the zero point?
Bruce asks questions about counting? Ticks from what?
scribe: films are start, but that doesn't work for V
Nigel: Typically, syncrhonization starts from "now" so clients sync that up okay
Janina: We have enough reseach
based evidence for tolerances
... we have that synchronization from video and voice, so we
picked "primary media resource"
... additional used cases for captioning and audio
description
... we will synchronize coordination for those groups to stay
in touch with
... we can followup on what is "reasonable tollerance"
... we also need to followup on user agents.
<Zakim> gkatsev, you wanted to mention privacy and importance of vendoring
<nigel> +1 to the privacy point that exposing to JS might not always be a good idea
<jeanne2> +1 to privacy
<Joshue108> scribe: Joshue108
BA: I will get underway - just to mention the IPR
<dom> Slides for APA/WebRTC joint meeting
mentions the channels and agenda.
Will talk about charter and deliverables
Will mention the Machine Learning workshop
Also IETF input and gateway implementation
Josh will talk about RTC Accessibility User Requirements
BA: So WebRTC was recently rechartered.
Defines what we do and timeframes.
Charter contains refs to camera, mic and speakers
There are APIs for capture media
Media streams processing etc
BA: Gives overview on deliverables
In Capture realm there are a bunch of specs
Media capture automation, and streams, image, output, recording and more
questions?
BA: This is what we do, there are lots we don't do - like content protection, or Web CODECs - lower level access.
So things handled by Media group
HA: The most active things are Raw media access and insertable streams
Other things are maintenance
BA: Access to Raw media is a pre-requisite for ML
There are a11y implications; speech and image recognition for example.
Emotion analysis also - some work on ML signing, or at least being able to identify it.
HA: There is a text to speech API but no one has done much on it, front end to proprietary
There's also speech to text
JS: One of each
BA: Gives over of recent Machine learning conference
https://www.w3.org/2020/06/machine-learning-workshop/
BA: There is a move towards working on the client, and the goal
you need access to raw media - to make this work
There is a video track reader, and insertable streams for raw media, Harald?
HA: They are not unrelated - if we make one access efficient it will help others
Breakout box will enable you to do these things efficiently
BA: Both of these APIs will be discussed
Stay tuned
BA: Dom there will be proceedings from ML workshop?
DHM: Yes, available soon
BA: We think ML will have a big
impact on a11y via these APIs
... Three things to point out.. T.140 over WebRTC Data channel.
Language negotiation SLIM and Interop profile of Video Relay
Service RUM
There is open source implementation of RUM
BA: Over to Lorenzo to talk about T.140 over WebRTC Data channel gateway
It uses the channel in a reliable way
LM: This describes how to translate from SIP RTT session
WebRTC as it is does not support T.140 for this describes how to manage and handle translations
A gateway is needed between WebRTC data channel and RTT
LM: Implemented in Janus
Was curious about SIP RTT
Janus is an Open Source WebRTC server - acts as a media and signalling gateway
SIP originated on server side - not transcoded just relayed
I had to add support for Mtext etc and translate on the WebRTC side
WebRTC does not support text based media normally..
Had to take care of T.140 delivery..
Used data channels to send data as binary payloads
Someones T.140 messages are sent via data channels also
browsers may not need it
The draft describes the spec and describes the @@
You negotiate in advance using expected formats
This spec allows for attributes to be negotiated
Patch includes a demo
More testing needed
ML I tested this with old JAVA applications
This is an open source implementation
ML: Good test bed
I could test what was going on, but more advanced apps may have done more
Want to track future testing.
BA: Anyone have opinion on definitive RTT implementation are?
JS: RTT support is mandated by the FCC in the US.
BA: If it works with Android and iOS they would be the canonical tests
JS: yes
ML: My implementation is generic, there may be vendor locking etc
JS: That defeats the purpose, needs to be interoperable cross platform etc - vital for 911 emergency services
There should be no vendor lock at,
Verizon and ATnT etc needs to work
There are bugs for users of braille devices etc on Android
You need to interoperate and handle emergency communications
ML: Thats good to know.
We need to test in Europe
JOC: There are also requirements under EN 301 549 the EU a11y procurement standard
ML: <overview of sub
protocols>
... It is currently up to JavaScript applications to handle
@@
Need to figure out where some functionality belongs
End points should be free to use various buffering windows
I plan to do some buffering in the gateway also.
I also need to look at packet loss
And how to handle these errors etc
If we get feedback we currently ignore it.
BA: Did your implemtation not support NAC?
ML: All the bricks are there, we need to play with it and see if it is a good starting point
want to move it forward - if you are interested in testing I can provide guidance
JS: Sounds good
BA: Comments?
Are there others working in this area with RTT that you can work with?
Many are proprietary right?
ML: Yes, there are links that I got but they are framework specific
would like a generic client that is good enough
ML: Most of these implementations are specific and bespoke to their intended service
I can look again
BA: Thank you Lorenzo
... I mentioned language negotiation
To accomodate disability requirements
You can serve ASL in a video stream - or in a gateway, you can request to write and recieve or other preferences
This signalling is outside the WebRTC API
Regarding routing, you may use ML algos
BA: Media usage is up to participant
Calls may get routed to where they can be handled
The user can decide what they need
JS: Question - is the concept that we may use ML to provide these services because these are quick etc?
Or involving a human where higher quality is needed? Dont want to rely on a bot in complex scenarios.
BA: Yes. <gives example of interoperability between foreign languages like Mandarin and the need to access emergency services>
ML may fill in but primary routing to a human
JS: Sounds good
BA: Morphing now to ML
<janina> scribe: janina
jo: Have an updated draft of RTC Accessibility User Requirements
<Joshue108> http://raw.githack.com/w3c/apa/AccessibleRTC/raur/index.html
jo: We had feedback from public
review now reflected in our document
... Relates to anchoring and pinning a video window
... eg the sign lang interpreter next to the person
speaking
... or natural lang interpreter ...
... so req to associate and pin thos windows so that user can
correctly associate who's being interpreted
... make sourcing clear even with 2nd wscreen use
... also ability to capture captioning, but also to pause
recording captions when an off minutes conversation is brought
up
... ditto for signing
... but still provide on screen even though not captured --
then resume
... to have a11y profiles that persist across
environments
... some new reqs re RTT ... some blind users may not be aware
msg not sent because of buffering needs
... haven't specified yet how this should be met, more
discussion needed
... req to support other langs; so good to hear this is
contemplated
... noted relation to XR environments with some similar
reqs
... also noted ITU Total Conversation services
... requesting review and feedback
<Joshue108> JS: I think we have this covered.
<Joshue108> What comes to mind is yes RTT is important and we need the use case for the IRC type interface who needs this as an option
<Joshue108> Braille needs to be buffered as does text to speech so instantaneous transfer results in unitelligable speech
<Joshue108> Apple say they may be able to buffer as needed
<Joshue108> Longer than the 300 ms we heard about earlier
<Joshue108> from Lorenzo
<Joshue108> We may have competing use cases here - for deaf users and blind users who need different things
<Joshue108> Otherwise we are good - we think this is the final version of the RAUR
<Joshue108> JS: We are happy to finish.
<Joshue108> BA: I've a comment, some of the requirements are relevant to second screen
<Joshue108> JS: Lets ask Dom.
<Joshue108> DHM: I've a high level question from my reading, a lot of these requirements apply at the service provide level.
<Joshue108> This may be the goal of the document, and if so how can we bring this the attention of providers?
<Joshue108> Some may be involved in our group, or not directly
<Joshue108> JS: Yes, somethings are spec related, and some are guideance and support needs
<Joshue108> We can tease
<Joshue108> JS: We have learned a lot from the current situation a la COVID
<Joshue108> but many people now know about remote meeting issues around Zoom etc
<Joshue108> The overall experience gives us a good idea of what we need
<Joshue108> DHM: As a document this may more have an impact if it was clear about what requirements were meant for whom.
<Joshue108> So WebRTC providers can be made aware of what needs to be added to their service
<Joshue108> It's hard to be clear on where the target requirements are etc
<Joshue108> Clarify who the target requirements are for.
<Joshue108> JOC: Yes
<Joshue108> BA: It would be good to get dev feedback
<Joshue108> I've seen this in workshops
<Joshue108> where say 90% of those there were using WebRTC.
<Joshue108> they may have feedback
<Joshue108> JS: We hope there will be interoperability - vendor lock in is in no ones interest really
This is scribe.perl Revision of Date Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/have various profiles/have various TTML and its various profiles/ Succeeded: s/sharp changes/shot changes/ Succeeded: s/shop changes/shot changes/ Succeeded: s/entertainmen wg/entertainment ig/ Succeeded: s/imaging the video drifting/imagine the captions drifting quite a bit compared to the audio/ Succeeded: s/24 fps is benchmark/25 fps is the benchmark/ Succeeded: s/implimentation that tries to do this at low frame rate./implementation that mixes the AD client side and needs good accuracy./ Succeeded: i/brainstorm about situations/scribe: bruce_bailey Succeeded: s/AG WG chair/ ex AG WG chair/ Succeeded: s/Gary Katsevan/Gary Katsevman/ Default Present: janina, plh, CharlesHall, Chris_Needham, jeanne, Francis_Storr, Lauriat, Nigel_Megitt, becky, mikecrabb, jasonjgw, MelanieP, Joshue Present: janina plh CharlesHall Chris_Needham jeanne Francis_Storr Lauriat Nigel_Megitt becky mikecrabb jasonjgw MelanieP Joshue Joshue108 SuzanneTaylor KimD jib Found Scribe: bruce_bailey Inferring ScribeNick: bruce_bailey Found Scribe: bruce_bailey Inferring ScribeNick: bruce_bailey Found Scribe: Joshue108 Inferring ScribeNick: Joshue108 Found Scribe: janina Inferring ScribeNick: janina Scribes: bruce_bailey, Joshue108, janina ScribeNicks: bruce_bailey, Joshue108, janina WARNING: No "Topic:" lines found. Found Date: 13 Oct 2020 People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]