APA WG TPAC 2019 -- 20 Sep 2019

<ada> scribenick: ada

NellWaliczek: One of the things I had noticed is that we don't have a shared understanding of eachothers technlogies and it was enlightneing to the issues we each are investigating.
... but was hard to come accross solutions. This is to give some background about 3D graphics, if we would like to follow up on a call after TPAC ask ada and she can add it to the agenda.
... to give background to I will tie it to something we have experience with today the DOM, in HTML you don't say draw this pixel at this point. You ask to draw by component, draw an input box , div etc
... it is declarative, it isn't imperatively asking the GPU to draw pixels. We describe the elements and style and it is up to the UA to issue the GPU commands.
... (aside from canvas)
... imperative rendering is the opposite we take some buffers and send them to the GPU and give it commands which draw to the sceen pixel by pixel.
... so for those who are unfamiliar with canvas, you cannot place content with style and it is just an opaque block which cannot interact with the page.
... It is nesacary for 3D graphics but without understanding the necessity we will spin our wheels when it comes to adding a11y into this.
... i'm going to breakdown about how we think about 3d rendering into its constituent parts. What are the data you need for 3d rendering and what do you send to the GPU for 3D rendering.

Matt_King_: Are the drawing APIs for standardised?

NellWaliczek: Yes, but not through the W3C through Khronos WebGL.

<Joshue108> Ada: WebGL is a low level drawing primitive.

<Joshue108> It will draw triangles fast, thats all.

<Joshue108> The layer between that and what devs write is wild west.

NellWaliczek: You were asking about the graphics standards, whilst WebGL is widely adopted it is basd on a fairly old native API called OpenGL, there have been numerous developments in 3D graphics. Which cannot be matched by WebGL.
... There are new standards to try to access this new functionality such as WebGPU which provides different functionality. Which exposes functionality unavailable to WebGL. There is also WebGL2 which is an imporovement on WebGL.

XR

NellWaliczek: Vulkan, Apple's Metal, WebGPU is aimed at targetting those.

WebGPU is still a wip, so for all intents and purposes WebGL is only available to us.

scribe: I could talk to you for hours about the interesting side tracks, just to give a sense of the acronyms in play. Before I dig into the graphics primitive i would like to talk about the relationship between WebGL and WebXR, I get asked this a lot it is not unique to this group. The best way to think about his is that we have thought of WebGl being a graphics language it takes imperative data and

turns it into pixels, WebXR does not do that. WebXR could not function withot WebGL. WebGl is for drawing the pixels, WebXR provides the information on to where and how to draw those pictures.

scribe: and describing the view frustum. Like a cropped of pyramid with the top at your eyes.
... THe WebXR describes the shape of this view frustum.

Matt_King_: Like a cone on my face.

NellWaliczek: Yes

Joshue108: How does this relate to field of vision (fov)

NellWaliczek: it is roughly the same concept but also includes the near plane and far plane.
... the point htere is that you need all that information to know where to draw, yo need to know where to draw and how far they have moved from the origin of the space. When drawing for a headset you may have to draw at least stereoscopically, somethimes more as some headsets have more multiple screens per eyes.
... The API describribes the view frustum from each panel and the location of each.
... i just talked about a simplification of what the API provides, once you have joined those pixels they don't go on the monitor they go on the displays in the headset (which runs at a different frameratre to your monitor) the API then describes how to send the images to the screens.
... the hardware will slightly move the images to account for some minor head motion this is known as reprojection and stops people being ill.

Matt_King_: if you are a developer, does the developer make WebGL calls to WebXR or just take information from it?

NellWaliczek: They also submit the rendered pixels to the screen.
... It is part of the RAF loop, on a monitor at 60fps the screen is wiped and redrawn, you can hook into this loop with requestAnimationFrame, to move objects before the next draw.

<scribe> ... New monitors can run at 144fps, which can be problematic for developers which have assumed 60fps because their animations run extra fast.

UNKNOWN_SPEAKER: A VR headset which is plugged into a computer the headsets have to draw faster than 60fps to reduce the effects of VR sickness, it is a different frame rate. So it needs to provide it's own requestAnimationFrame at it's own refresh rate.

Matt_King_: Is RAF a WebGL API?

NellWaliczek: There is one on window and one in the WebXR device API.
... Once you have initialised the XR session the RAF is on the XR session object.
... The Web Developer will first ask if there is AR hardware to use. 'isSessionSupported' so they know whether to add a button the screen.

In the button handler you will call, navigator.XR.requestSession that is where the session begins and it will set up a new session for you ending any other. It is async from a promise which resolves to let you start setting up all the things you need to create to start rendering.

scribe: You will start an xr WebGL Layer.
... Creates 2D buffers you will draw your 2D content into they bound into the displays n the headset. It is important to render directly to the buffers, because copying pixels between buffers slows things down which makes people sick.
... sushrajaMSFT: at a higher level you can think of it as any WebGL commands against that context will render directly to the headset.

NellWaliczek: The context is what you have to call the commands from to render into. You get it from the canvas it maybe a WebGL, WebGL2 or WebGPU context.
... if you think about it seperately from the WebGL APIs there is a 1-1 mapping between the canvas and the context.
... in this case a canvas may have multiple contexts.
... You pass in a canvas and it will pull out the contexts it needs to render the content to the headset.
... Data can't be shared between WebGL contexts for security reasons.

<Joshue108> Ada: If you send. canvas it generates one per panel.

<Joshue108> Nell: One per additional one.

<Joshue108> One per headset.

This may change but right now to support WebGL it is just one.

NellWaliczek: The WebGL context associated with The XR device, the final buffer you draw into goes directly to the display it doesn't get copied anywhere.

<Joshue108> Ada: To clarify it is onto the pixel but shifted for reprojection.

NellWaliczek: slightly shifted for many purposes, you draw a pixel and it goes to the identical spot on the display.

Matt_King__: in stereoscopic there are typically one panel per eye, the information for those panels is associated with the context from the canvas, when I get these XR session RAF callbacks, for each one I populate information into those contexts which are attached to the canvas and the panels on the display, where I cannot share information between the canvas context and the headset context?

NellWaliczek: [confirms] yes you can, information can be shared within one canvas.
... We didn't talk about what you do in the RAF loop, the last piece of this puzzle is what to do. The first thing you do is, hey session where is the headset and each panel in 3D space. You get a frustum for each panel.
... you can create combined frustums. Which we won't get into right now for perf reasons.
... You'll ask where are the motion controllers so I can draw them in the correct place.
... This is where we talk about the renderloop which is graphics specific but not XR specific. Once it complese you then have Pixels which can be displayed by the UA.

Matt_King__: How does this apply to audio?

NellWaliczek: it is part of the same black box.

kip: we also have the central position of the head which can be used for positioning 3D audio.

NellWaliczek: We essentially have a ray which points out from between they eyes, which is used for spatialising the audio.

<Zakim> Joshue, you wanted to ask if other content such as related semantics can be rendered to support generated XR session contexts that are not canvas

Joshue108: Can content related semantics be generated within those loops?

NellWaliczek: Yes i'll talk about it in context of rendering.
... I was talking before about data that gets sents to the GPU, yesterday we were talking about the scenegraph.

THe scenegraph is kind of like a DOM tree but it is a totally made up idea, that is not standardised.

Different engines have their own different ways of describing it.

On native these engines/middleware are like Unity 3D.

Or Unreal

It is a combination of an editor and renderer.

On the Web the most well known is THREE.js which a JS libary which has it's own concepts of a scenegraph, Babylon, Sumerian (and others)

Babylon and THREE.js are programatic, Sumerian is a Visual editor where the scenegraph is visualised. WHere you get almost a WYSIWYG experience.

scribe: this is all middleware it has nothing to do with the web.
... When we talk about the scenegraph it is a made up concept that describes the commands that should be said to WebGL.

Matt_King__: A developer won't make WebGL calls they will use a library?

NellWaliczek: Yes WebGL extremely verbose, it takes more than a page of code just to render a triangle. You use a 3D engine. Because you use an engine you probably won't be using WebXR directly either.

Matt_King__: SO any a11y standard would have to be supported by these middlewares, these 3D engines.

NellWaliczek: Almost, because that is where we are today. When looking to the future, file formats for 3D models

Matt_King__: 3D hello worlds?

NellWaliczek: correct, these 3d formats include geometry and texture but don't tend to include things like physics or scripting. ANy animations they have will be on rails. They are static.
... the history of the 3D file formats is long and contentious. The most well known one FBX is only made available through AUtodesk, it is propriety and only they provide the encoders. There are others like OBJ very simple cannot have labels, collada which never got traction.
... the current darling is glTF "gl transition format" and usdz, they are very similar usdz is proprietary. Blender will convert models to as needed.
... There are 3 kinds of formats, editor formats like photoshop files, you have interchange formats which can be shared between editors uncompressed, gltf is the first open source runtime format.

<Joshue108> scribe: Joshue108

N: With the advent of glTF where we want to codify scene graphs we open the door for, future vision, soon to happen..

It is likely there will be a new HTML element, similar to a canvas but will take a file like glTF..

defferring pixel drawing to the browser.

So you will have geometry, textures etc that will be communicated, so it we may pack accessibility into into glTF for example.

1) We can forsee UAs exposing a model element to draw with..

A UA can add button that allows you to view in AR or XR etc but it is declaritive.

An author is saying here is a scene in glTF and asking the browser to draw it..

This would not be interactive, so we would need to prototype and hit test etc.

To make more advanced things you need to be able to script against the scene graph.

This is new stuff, glTF was only finalised recently.

The format has an extension system, it has interoperability features.

Extension formats can be written in, a11y extensions could be added, also for scripting.

This would mean other 3D engines would have the ability to expose that info.

Rendering engines etc could then parse this info to create more accessible stuff.

Mk: Question about structures..

N: Yes, you can - the scene graph can contain tons of objects that wont get drawn as they are hidden but as the user moves items can be drawn and hidden as needed.

MK: Sounds like glTF is a like combining HTML/CSS..

N: Yes, but not the JS.

MC: Accessibility could be a use case for things that have been thought about and where other use cases exist.

So we want to document a11y use cases and push this along?

N: In the short term I would not focus on the rendering engines.

But in the format, extension etc.

W3C and Khronos do have arrangement and agreements.

MC: So this is being done by Khronos so we should talk with them.
... We need to talk with Dom HM as we may want to delegate W3C to that or stimulate discussion with Khronos.

N: W3C hosted a games workshop..

JS: We have someone there...

N: Neil Trevis was keen and happy to work with the W3C.

JS: Matt Atkinson is working there and was supprtive of glTF.

N: For this to work it cannot be drawn imperatively.

It needs to be declaratively, we will see investement into this space in platforms and UAs.

N: This is simplistic now, but over time will be exposed.

Investing now is smart.

MC: Accessibility like declarative things, sounds like we need to do this.

<Zakim> Joshue, you wanted to talk about standardising semantic scene graphs

N: We have to focus on the audio thatnks.

<ada> Joshue108: I have a q, related to something Ada brought up yesterday, about semantic scenegraph and thinking about how a DOM tree can be used to anotate a semantic scenegraph.

<Irfan> scribe: ada

NellWaliczek: My goals here are to give you the information you need to think about this so we can talk about this later.

Matt_King_: What does it mean to put semantics on a scenegraph? They are similar to element names and class names.

kip: The GLTF file is like a snapshot of a dynamic system which at run time may get mangled to display the content.

<Joshue108> Ada: Can one thing call the glTF as a scene graph?

<Joshue108> N: We talked about RAF callbacks etc

NellWaliczek: The GLTF part of GLTF is a scenegraph. Which referene external assets such as geometry and textures.

<Joshue108> When you try to track the users head etc, there exists audio APIs, that you can use to generate spacialised sounds etc.

<Joshue108> The days generated is handled by the OS.. to make sure audio is fed to correct device etc.

<Joshue108> Handled by OS, so this means in a render loop - audio gets spatialsed using this data.

<Joshue108> That is what is outputed.

<Joshue108> JS: No reason that we can support rich audio environment like Dolby ATMOS

<Joshue108> N: Another Khronos standard comes into play..

OpenXR. THey ahve the power to implement what is effectively drivers for Dolby Atmos.

<Joshue108> To get audio exposed thru WebXR, the audio implementation has to talk to ?

<Joshue108> Alex: There is support for listeners and emitters etc and how should that be done.

<Joshue108> Virtual listeners and emmitters et..

<Joshue108> JS: That is not enough.

<Joshue108> Some sounds sources are going to need a lot of channels.

<Joshue108> KIP: I've implemented audio engines..

<Joshue108> The hardware for playback, is going to be binaural - it will be tracked.

<Joshue108> hrTF is an impulse response a computeed model..

<Joshue108> How does that get to your inner ear to give you cues.

"head related transfer function"

<Joshue108> As we can track your head inspace.. for stereo there can be multiple, finding the angle relative to your head.

<Joshue108> Then it works out, and simulates the binaural effect and doing it virtuak

<Joshue108> kemar head..

<Joshue108> JS: For regular headsets this may suffince.

<Joshue108> N: Referring to to Atmos, for every piece of hardware, via web APIs the browser needs to be able to talk to it.

<Joshue108> JS: So what syncs these things?

<Joshue108> Kip: The APIs give you enough to generate the similation but there is middleware, WebAudio etc that helps to realise the sonic environment.

<Joshue108> N: As a dev these things are not called directly.

<Joshue108> You set up an environment that handles these things.

<Joshue108> Alex: Described how things are moved around and processed.

<Joshue108> N: You mean the devs view?

<Joshue108> Alex: The engine handles things here.

<Joshue108> JOC: The logic.

<Joshue108> N: Calls are set up on your behaf.

<Joshue108> Sushan: Operating system, primitives, sounds streams and tying to OS primitive etc.

<Joshue108> N: There is an audio only spatialsed headset, got that mixed up with Atmos.

<Joshue108> Kip: So Atos format describes multiple sounds.

<Joshue108> JS: Yes, up to 128 Channels.

<Joshue108> Kip: We can use this technqiue replace many sources virtually, replace binaural stuff to spacialse sound sources.

<Joshue108> MK: The magic leap had multiple transducers around my head, has a lot more.

<Joshue108> CabR: Dont think so.

<Joshue108> Alex: We can play tricks relative to head positioning.

<Joshue108> JOC: So the correct usage of sound is vastly important for accessibility and the quality of the user experience.

<Joshue108> Alex: There are effective things we can do.

<Joshue108> JS: You can turn your head to locate things.

<Joshue108> Alex: And we can do clever things.

<Joshue108> Sushan: Handing the amount of audio and channels is the OS and hardware drivers to support systems with multiple outputs and hardware.

<Joshue108> N: We've talked about semantics, declarative structure etc, how middleware plays a role, audio..

<Joshue108> So these middleware stuff..

<Joshue108> JOC: glTF plugs into WebGL, with the scene info.

<Joshue108> N: We will have a composited experience in the future.

<Joshue108> Lets talk about interaction..

<Joshue108> That is a challenge, think of the complixity of mouse and touchscreen syncing.

<Joshue108> Now we have a bunch of input mechanisms..

<Joshue108> We are getting more.

<Joshue108> Web is deficient for speech input etc.

<Joshue108> N: You can use input to move around your space..

<Joshue108> In VR the space is nearly always larger that the physical space, you can teleport in these spaces.

<Joshue108> N: How this is done is via different input sources.

<Joshue108> N: there are platform that map hand held motion controllers to grab objects.

<mhakkinen> +q

<Joshue108> There is also selection at a distance, that you can aim at, select it and pick it up, move it etc.

<Joshue108> There is also the painting option.

<Joshue108> MK: There can also be chat type things.

<Joshue108> N: These input devices are not inherently accessible.

<Joshue108> N: If you have limited motion, these controllers can be problematic.

<Joshue108> N: There are native platform layer where AT can be plugged in.

<Joshue108> N: Mentions the Microsoft One.

<Joshue108> MC: For WoT we need to use APIs that can provide these functions.

<Joshue108> N: We have been forces to generalise how this is done.

<Joshue108> We have a XRInputSource is the object type for this, that is called on a session.

<Joshue108> N: Target Rays Grip location methods..

<Joshue108> N: Input sources can be parts of your body.

<Joshue108> You can create input sources that you can call these methods on particular objects.

<Joshue108> N: These are opportuities that are worth discussing.

<Joshue108> MC: Do authors have to do anything special here?

<Joshue108> MH: Any discussion on haptics?

<Joshue108> N: Great question.

<Joshue108> N: Prior to WebXR we had WebVR.

<Joshue108> Haptics has not been striped out of proposals..

<Joshue108> When haptics are available they can be used in WebXR, such as the rumble pack on an Occulus.

<Joshue108> Kind of generic use with current controllers.

<Joshue108> MH: Can you capture textual objects?

<Joshue108> N: You can simulate some things here, there are full body suits etc.

<Joshue108> Would be suprised if there was not work going on here.

<Joshue108> N: There are challenges, if on Gamepad API we have that.

<Joshue108> MH: Having more than just the rumble for the controller is important.

<Zakim> Lauriat, you wanted to ask about input source software vs. hardware.

<Joshue108> SL: Regarding input sources, mapping, how easy is it to have a software based input?

<Joshue108> N: Input source objects can be mapped to the grip button, however, now there is only on button - if a Gamepad.

<Joshue108> We call generic select events, thumb stick wont fire if, like user initated actions etc.

<Joshue108> Fake events can be fired, but not the user activation thing, as that is done by the browser.

<Joshue108> SL: What about other interactions?

<Joshue108> N: YOu can polyfill those.

<Joshue108> MK: I've a high level question about Web vs Native..

<Joshue108> In the accessibility world we are trying to go accross multiple ways of delivering experiences etc.

<Joshue108> I'm wondering how much content that is browser based or ??

<Joshue108> Is used today, how much are you living on the web with Occulus etc.

<Joshue108> N: Some of it, regarding the glTF format, that is consumed by Unity and Unreal etc.

<Joshue108> To get accessiiblity inside them - you have the benefit of it being a common file format.

<Joshue108> N: Extensions can be written, browsers can support secondary run times etc.

<Joshue108> N: Lots of current browser based APIs, there is a commonality between how native and web apps are built.

<Joshue108> You need to solve the same problems.

<Joshue108> So how much is web based vs native - we haven't hit CR yet!

<Joshue108> At turning point, not there yet.

<Joshue108> People will use a generic tools and not instal bespoke random apps to do stuff.

<Zakim> Judy, you wanted to comment at the end of the session and to comment on web and beyond web

<Joshue108> JB: I've seen 360 hotel views.

<Joshue108> And regarding Matts question, W3C as a whole is coming accross the question of we should be looking at Web only or beyond that.

<Joshue108> In WAI we are aware that some what we need to look at is beyond the web proper.

<Joshue108> Regarding the inclusive immersive workshop..

<Joshue108> It is filling up, and in WAI as we look emerging web tech, we want to grow a community of experts.

<Joshue108> This session was really good.

<Joshue108> This content could be distilled and shared.

<Joshue108> Will be a good primer - but some may feel unprepared to make people feel welcome.

<Judy_alt> https://www.w3.org/2019/08/inclusive-xr-workshop/

<Roy> Scribe: Joshue108

<ada> join #webapps

<ada> (ignore that)

<jcraig> ScribeNick: jcraig

Web RTC joint meeting with APA

<scribe> Scribe: jcraig

Dom: introduce yourself

Benard Aboba

<hta1> Harald Alvestrand

James Craig, Apple

Armando Miraglia

Jared_Cheshier, Chesire

Josh O Connor, W3C

Youenn Fablet, Apple

<Judy> Judy Brewer, W3C WAI

Joanie Diggs, Igalia

Janina Sajka, APA/WAI

<Bernard> Introduction: Bernard Aboba, Co-Chair of the WEBRTC WG, and formerly a member of the FCC EAAC and TFOPA groups.

Henrik ???, Google

<Jared> Jared Cheshier, new to W3C, in the WebRTC working group and Immersive Web working group.

<jib> Jan-Ivar Bruaroey

<Bernard> Henrik Bostrom, Google.

Daiki, NTT on RTC

Hiroko Akishimoto NTT

and colleague?

Real Time Text

important on behalf of those with speech disability or deaf hard-of-hearing

"Topic 1" is Real Time Text

"Topic 2" is use cases for Web RTC 2.0

Joshue108 has created a document of example use cases

<Joshue108> Here they are https://www.w3.org/WAI/APA/wiki/Accessible_RTC_Use_Cases

Bernard: WG MMusic is standardizing transports over RTT channel

3GPP is citing that effort

almost certainly will result in a final spec

dom__: vocab: RTT also mean rounds trip time in other contexts.. RTT or this discussion is Real Time Text

Bernard: goal is to enable WebRTC as a transport protocol for RTT, and Gunnar Hellström is currently reviewing that

RTT is a codec in the architecture, but somewhat like a data channel too

wouldn't make sense to send music over RTT for example

their plan to use the data channel to send music I think makes sense

RTT is timed, but not synchronized time

Is time sync necessary?

janina: I think not

jcraig: why not?

Joshue108: what about synced sign language track

<Zakim> Joshue, you wanted to ask about time

Judy: I share Josh's concern

hta: ??? and other one is that the system records send time

I think the first thing is the only one required

<dom__> "Any service or device that enables the initiation, transmission, reception, and display of RTT communications must be interoperable over IP-based wireless networks, which can be met by adherence to RFC 4103 or its successor protocol. 26 RFC 4103 can be replaced by an updated standard as long as it supports end-to-end RTT communications and performance requirements." https://www.fcc.gov/document/transition-tty-real-time-text-technology

Bernard: Would affect how time is sent over the channel...

<Joshue108> As long as we are sure that issues with time, dont impact on synchronisation of various alternate media content

because 3GPP is involved... likely to be implemented

janina: idea (with telecom RTT) is to see characters immediately

<Judy> [jb partly wondering if timing is relevant for rtt communication records in emergency communications]

<Joshue108> Challenges with TTS timing for blind users https://www.w3.org/WAI/APA/wiki/Accessible_RTC_Use_Cases#Challenges_with_TTS_timing

<Zakim> jcraig, you wanted to mention the 911 context for immediate chars

<Joshue108> JC: VoiceOver handles this well

Bernard: draft has reliable mode and unreliable (lossy?) mode..

<Zakim> Judy, you wanted to speak to requirement for non-buffering including for deaf-blind use cases in emergency communications

<Bernard> The WebRTC 1.0 API supports both reliable and unreliable modes.

Judy: glad practical details are being discussed .. eg. emergency situation

Deaf community also wants immediacy

<Bernard> Draft is here: https://tools.ietf.org/html/draft-holmberg-mmusic-t140-usage-data-channel

deafblind community may share a need for non-buffered comm

Judy: I'm interested in hearing the background. We jumped straight into discussion

is there an opportunity to add an informative para that explains the relevance and allows polyfill implementations/? the Deaf community thinks so

HTA: If there is nothing required in RTT protocol, you can have a perfect polyfill? but if not, you may need extensions.

Judy: Sometimes JS polyfills can count as one of two required implementations

dom__: ???

dom: may have room in spec to add RTT support in WebRTC today

dom__: I see value in exposing <scribe lost context>

<dom__> dom: if a gateway from RTT to webrtc is already possible (Bernard to confirm), it would be useful to add a note to the WebRTC document to point to that usage of datachannel

Bernard: questions from the use case.. it does not recommend whether to use reliable or unreliable mode.. no rec on whether to send char by char or as a blob

<dom__> ... for a normative change to the API surface, it's hard to consider without understanding the underlying protocol and what it would need to expose

suggest APA review the document and provide feedback

<Zakim> dom__, you wanted to suggest reliable is needed for RTT (completeness is probably more important than latency for text)

<Bernard> Latest version is https://tools.ietf.org/html/draft-ietf-mmusic-t140-usage-data-channel

Bernard: reliabel in order preferred

Judy: colleagues at Galliudet would be interested in sharing polyfill implementations...

I'm concerned about missing the timeline window since you are nearing completion

I'd like final WebRTC to include ack that RTT is on the roadmap?

Bernard: WebRTC has evolved since the RTT proof, we should review that it works with the current draft

<Bernard> Field trial specification: https://tap.gallaudet.edu/IPTransition/TTYTrial/Real-Time%20Text%20Interoperability%20report%2017-December-2015.pdf

henrik: what is the requirement on WebRTC for RTT... sounds like you can do this today'?

dom: there is a dedicated RTT spec required by FCC .. question is how to you expose this in the RTC stack

and provide interop with existing services like TTML

<Zakim> Joshue, you wanted to ask confirm that this doc contains the technnical requirements for RTT implementiations in WebRT and APA should review

<henbos> Henrik Boström here

Bernard: I've entered the spec Id like APA to review

also added the Gallaudet prototype from 2015

janina: I would like the use cases to clearly distinguish the nuanced differences... e.g. emergency services, etc.

<Joshue108> JC: Create implementations that can be brailled immediately..

<Joshue108> We are presenting characters as fast as possible with minor adjustments in VoiceOver.

<Joshue108> So you can get the existing string asap

dom: req that the character buffer be sent as fast as possible

from a webrtc perspective, what you will get is a stream off characters, and it will be up to the app to determine how to transport those characters

<Bernard> The current holmberg draft specifies reliable transport. Are there use cases where partial reliability might be desired?

<Bernard> For example, where a maximum latency might be desired. WebRTC 1.0 API supports maxPacketLifeTime or maxRetransmissions for partial reliability.

janina: it's wonderful that we'll be able to use this side-by-side with existing rtc

Level 1, emergency services, Level 2 disabilities, Level 3 personal pref

(priorities ^)

Bernard: I'd like to discuss emergency services a bit more

max-transport time might be affected

iommediacy and accuracy are sometime in conflict. each use case could result in difference implementations... max-package-lifetime for example

Judy: FCC did research. Some examples of death resulting from lack of immediacy in RTT communication.

<Joshue108> Here is an outline of some of the RTT emergency use cases

<Joshue108> https://www.w3.org/WAI/APA/wiki/Accessible_RTC_Use_Cases#Support_for_Real_Time_Text_.28RTT.29

max-partial-packet may have helped in these cases

<dontcallmeDOM> [it is 118 apparently]

<dontcallmeDOM> [110 sorry]

<hta> (after checking: maxPacketLifetime attribute seems to be wired all the way down the stack, so it probably works.)

document of use cases for Web RTC 2.0

<Joshue108> https://www.w3.org/WAI/APA/wiki/Accessible_RTC_Use_Cases

<Bernard> +q

Bernard: I think the doc is useful... especially mentioned of other tech outside RTC

woful by completion, this can be tested to reveal any shortcomings

FCC has funded open source project that can be run

useful to have a standalone doc that can be used against multiple sources. RTC etc.

<Zakim> Judy, you wanted to talk about finding details of emergency use cases and to and to say 1) realize probably need more nuance in our use cases; 2) want to make sure we have a

Judy: I think we need more nuance in use cases for the fine detail questions you're asking

<Bernard> IETF RUM WG: https://datatracker.ietf.org/wg/rum/about/

Judy: if we do the right things, we may end up with a hacked ??? in RTC 1.0

my guess is you need to add something in the spec that indicates "here's how to support RTT for now"

<Joshue108> +1 to Judy

and for 2.0, we need a plan to make sure RTT is supported sans hacks

<Bernard> RUM document on the VRS profile: https://tools.ietf.org/html/draft-rosen-rue

<Bernard> +q

ask the tech companies move today more carbon neutral teleconferencing, can RTC become that open stnadard

dontcallmeDOM: milestone: end of march will require recharter for RTC

might be a good time to reflect RTT use cases in charter

<Judy> s/ask the tech companies more today/as the tech companies more towards/

dontcallmeDOM: response to Judy: I don't think 1.0 RTT implementation is a hack... happy to work with you on a clarifying note.

<Judy> s/that open standard/that open standard for fully accessible virtual carbon-neutral conferencing/

saying no direct support in spec today, but there is ongoing work that can be referenced in the doc

<Zakim> Joshue, you wanted to give overview

<Joshue108> https://www.w3.org/WAI/APA/wiki/Accessible_RTC_Use_Cases#Data_table_mapping_User_Needs_with_related_specifications

Joshue108: part of the doc has a data table which provides mapping to use cases
... could publish as a note from APA

Web RTC group could contribute back to that Note

Bernard: good reason to have use cases as separate doc

we've learned getting access to raw media opens a host of Accessibility opportunities... e.g. live captioning on a bitstream for example

janina: thank you all for coming

dontcallmeDOM: thanks

judy

judy: thanks all

[adjourned]

<scribe> scribe:

<scribe> scribenick:

<jcraig> rrsagent make minutes

<jcraig> rrsagent: make minutes

Pronunciation Explainer

<Irfan> Scribe: Irfan

https://github.com/w3c/pronunciation/blob/master/docs/explainer.md

mhakkinen: took the recommendation and put together a document with recommended.

with all goals and non-goals with open questions

need feedback from Michael or Roy with the format

Janina: good point..

roy: personalization task force has a document

mhakkinen: I am looking from the group to hear if I have covered everything in this document.

HOw can we bring SSML into HTML- one approach

inline-ssml... just drop right in

or bring it as attr model

one of the concern that we have, AT products may have more challenging time extracting SSML into the document.

based on our survey, one of the big AT vendor came up in support of attribute model

the issue is broader.. with spoken interfaces

burn: I can imagine what inline means.. but have no idea about attr model

mhakkinen: <ePub Example>

what they did was created an attribute.. ssml:alphabet

you can drop two attribute in <span>

we come from Education testing which is assessment world and trying solving problem for pronunciation. aria-label doesnt work

have seen data-ssml and some funny interpretation of ssml..

we got json structure which is relatively clean and prototype.. thats one model

with explainer we are trying to explain the problems and proposing solutions..

seeking inputs from stack-holders

burn: when we created SSML.. we expected that XHTML is next which would made it easy.

if you gonna do it JSON model, how do you maintain context?

you are going to loose scoping? how is it going to work?

mhakkinen: thats the question and we are looking more feedback

in-general in assessment we are looking for very specific features... such as say-as, pausing.. sub

burn: you are going to deal with name-spacing issue

I dont see any reason that you can't do that

mhakkinen: when we talked about bring inline with the HTML.. comment from SR vendor was that it is going to be hard to impliment..

some browser vendor also shared the same concerns

burn: Its possible to rename the element .. like p element

there internal models have to deal with video rendering.. you could leave the original text there and ignore the element if you are adding any SSML

mhakkinen: problem with braille display as well

pronunciation string goes to both, braille and SR

some discussion.. like ariabraille-label

could this control purely by aria.. but that doesn't solve the broad problem

joanie: what voice assistance support this?

mhakkinen: google and alexa both allows

burn: we did some work in the past.. good to know that it has some life now

mhakkinen: I tried to hack a demo with Alexa.. it looks like.. pullin some HTML content and if contains SSML into it, IT can be rendered directly

I cant believe that amazon team is not looking for a solution

It is great way to extend

have contacted amazon as well

we want to make sure to make it render on web and voice assistance

burn: is inline dead?

mhakkinen: we have two ways with advatages and disadvantages

this is just draft and would like explain more in details

janina: we have time to discuss it

mhakkinen: GG from jaws is been pretty clear that he likes attribute approach

havent heard from other org so far

janina: how about narrator

mhakkinen: talked to them and they seem to be working on this

?? is trying to work to get pass through from the browser to AT

for the voice assistance .. we can live with either approach

AT are less of the challenge here

Joanie: implementation on browser side.

mhakkinen: talked to chief arch at pearson.. they use some hack to handle pronunciation. they like JSON model.. its easier for them because they dont have to change much

Joanie: thinking.. may be waht we want combination of version 1 and version 2

how do you expose it to a11y API?

speak tag is not exposed to A11Y API

mhakkinen: *showing example... inline is simple

<speak> is not exposed to A11Y API

joanie: we could use span instead

it is still going to be included in AT tree

for any AT.. we need an object attr.. which can be exposed to API

that will make me super happy

we would want HTML to bless this

burn: if you want to filter out the speak content, you have to pay attention to the text which is there

joanie: good point but you are wrong... because its an object which can be exposed to a11y tree

we have DOM tree and render tree. A11Y tree is combination

if we have <div> foo</div> element.. we are going to have ATKObject which is accessible element with some property like APK_role_section

state-enabled

ATK Text interface will bring all stuff about text

all these ATK object attributes going to include the text

if we do like <div aria-bar="baz">foo</div>

I am also going to get bar:baz

ATKText- "foo"

also going to have attribute like ssml : <say-as>.. that means text doesn't go away because it is an object attribute

mhakkinen: is there any char limit?

joanie: probably there is

mhakkinen: we need to think about it

joanie: we we have a limit then we need to break it into multiple attrbute

burn: you might have multiple properties.. one say-as attr is not going to work..

joanie: then we have to go to SSML

burn: we can break it up which is an array of literal string

if you need some of the hierarchal properties then there could be a problem otherwise it is okay

joanie: agrees

mhakakinen: during IMS discussion we talked about these challenges... people want to exercise all the capabilities

burn: one of the issue with us was... there would be couple of different TTS voice loaded.. may english voice.. female german voice...

its been long time

idea was that you have an html page or voice xml

you have text there and some one add lang tag.. it would switch TTS voice to german and make it female.. which is more disruptive

we hade to change which would affect to larger users but here we dont have that challenge

joanie: AT are going to have to deal with inheritance which is un-fun.

not asking to change your explainer but inheritance or multiple level properties are going to be applied

I was happy about this solution and we started talking about child or other voice

mhakkinen: we are stepping back to full SSML inline.. to a subset either an element or new Attr... span based model.. that is simple and clean

burn: where is this work is going to be happen

github handel @burnburn

its a good way to start

mhakkinen: SSML is going to have much broader impact on the web.

amazon is already doing extension to SSML

there is so much potential here

joanie: JSON has to be parsed, and in the case of at least some SR turned back into the original SSML
... I am going to parsed it into option one...

all the ssml property should be used as literal string as a single object attribute

it is going to be very lengthy literal object attribute

mhakkinen: if user wants to navigate with a char.. what is affecting ?

if user wants it in female voice.. are you going to retain as active ssml?

joanie: that has nothing to do with option 1 or option 2

I dont want to learn SSML but I want to use it with learning it

joanie: need to talk to vendors and say that no matter what.. it is going to be attribute.. could you deal with actually SSML mark up and an attribute

- DRAFT -

APA WG TPAC 2019

20 Sep 2019

Attendees

Contents

XR

Web RTC joint meeting with APA

Real Time Text

document of use cases for Web RTC 2.0

Pronunciation Explainer

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output