Accessible Platform Architectures Working Group Teleconference

16 Oct 2020


Appelquist, Dan, Irfan, Joshue108_, Matthew_Atkinson, MichaelC, NeilS, Rossen_, SteveNoble, becky, hober, janina, jasonjgw, mhakkinen, paul_grenier


<janina> trackbot, start meeting

<Irfan> Scribe: Matthew_Atkinson

Janina: We are at an impasse whereby we know how to solve the pronunciation problem for accessibility, but there is potential to solve the problem in the mainstream sense, for e.g. personal digital assistants, which would require some contribution from WHATWG. The purpose of the meeting is to present both paths and gauge interest from the wider community and decide if the accessibility-focused or more widely-applicable approach s[CUT]
... We could move forward on publishing the accessibility path soon; the mainstream path would take longer, but seems like a better outcome overall, as accessibility would be part of the mainstream solution.

<Irfan> https://w3c.github.io/pronunciation/explainer

Janina: There is also a video that explains the work done on pronunciation.

<Irfan> https://www.w3.org/2020/10/TPAC/apa-pronunciation.html

<becky> Pronunciation video: https://www.w3.org/2020/10/TPAC/apa-pronunciation.html

Janina: tl;dr: we would be asking WHATWG to allow a portion of SSML into HTML so that UAs that consume it would be able to access it and expose it to ATs/TTS.

Paul: Two approaches; one possible to implement today; using data-* attributes we can pack a JSON repr of SSML into HTML which can then be unpacked by UAs. There's a PoC for Macs (which don't support SSML) to demo the functionality there.

<paul_grenier> https://www.w3.org/TR/pronunciation-gap-analysis-and-use-cases/#gap-analysis

Paul: The other option is to promote SSML as a first-class citizen in HTML.
... SSML differs from other options as detailed by the doc linked above.
... Once decided on implementation path, need to liaise with browser and AT vendors to establish how the inof will be exposed in the AX tree.

Janina: AT vendors have been clear about not wanting to have to parse the HTML themselves.

Rossen: is this related to the TAG design review issue (476)?

Janina: there's a specific request from Personalization to reserve a prefix. That's a different spec and use-case, regarding supporting users with cognitive disabilities. It is around providing support for presenting content using different symbol sets (there's a Personalization video too). Thus we may end up with one prefix for Personalization and one for Pronunciation.

<paul_grenier> (another "pollyfill" example using custom elements: https://ssml-components.glitch.me/

Rossen: is there a TAG issue we're currently discussing?

Becky: Issue 46 under meetings.

<becky> https://github.com/w3ctag/meetings/issues/46

Janina: We're trying to determine if it's reasonable to request specific markup in order to support this use-case, or if this is premature.

Dan: SSML v1.1 is the latest (2010; XML-based). What would need to be done in order to integrate it into HTML?

Janina: This has already been done for SVG and MathML and this would be similar. Parsers that can consume them do so; others skip them. The markup would apply to e.g. a span in the content. The video Pronunciation produced features some code samples.

<dka_> Ref mathml, there has been some work on modernizing mathml (mathml core) which you may want to reference. https://mathml-refresh.github.io/mathml-core/

Tess: Would the proposal require HTML processors to be aware of new elements, or just new attributes?

Paul: There are some overlaps with SSML tags, but for the most part, since none of the SSML information is visual, we don't expect problems integrating with current HTML processing—should be easier than existing SVG/MathML integration.

Tess: Parser changes for both SVG and MathML are substantial—elements require parser changes; attrs don't.
... parser changes are generally avoided (consider <template>—the main change in the past decade). Reticence comes due to potential security issues.

Becky: Given that speech is becoming a more significant means of interaction, should we take the broader approach now, not for accessibility alone, but more widely applicable?

Janina: Acknowledged the need to avoid adding elements to the parser for anything trivial; we feel this modality is non-trivial, gaining in relevance.
... *gives some examples around classic issues* "cd" could be "change directory", "compact disc", "candellas", "certificate of deposit" [scribe: may have that last one wrong]

Dan: Still some significant issues that need to be untangled. Usage patterns are not yet very clear.
... e.g. multimodal input

Janina: So far we've looked just at output.

Mark: Educational tech sector. Long-term issue of students needing content being read correctly to them. This is acutely important in the current climate. Need to make sure that the students' devices pronounce things as the teacher would, but also convey prossidy correctly too. There are a lot of hacks that have been developed in the industry. We would like to create a non-hacky way to do this. Concern that as time goes on, appro[CUT]
... It's important to accellerate some sort of solution before it's too late for vendors.

Tess: How do we define "too late for vendors"—is there a particular timeframe?

Mark: it seems that different vendors will create and pursuing their own solutions, which will cause fragmentation.
... IMS Global Learning Consortium develops standards (e.g. QTI, used for testing) allows people to embed SSML. What happens when these get transformed into HTML for delivery?

Tess: Seems like major goal is to help personal assistants improve their pronunciation of web content. In addition to engaging browser vendors wrt implementability and fitness, personal assistant vendors should be consulted too, to encourage successful results.
... Amazon, Google and Apple are W3C members; are they involved? Would this TPAC be an opportunity to seek input?

Rossen: Some contacts in CSS.

Mark: we have some contacts also [scribe: didn't catch group name]

Tess: Need to be able to justify parser changes—if digital assistant teams are on-board with this, this makes the argument more compelling.

Paul: If we were to put SSML into an HTML doc currently, it wouldn't validate, but the browser would ignore the unknown tags. This could be done in stages...
... content that's already ready to have customisations could be parsed specifically by, e.g. a smart speaker.
... for screen readers, maybe specific tools (e.g. ChromeVox) could do their own parsing.
... ultimately we need to have the speech semantics exposed in the AX tree but could develop a separate one in the meantime?
... A lot of emphasis on this technology is coming from authors' desire to have content announced correctly, but if we could enocourage vendors to be involved this would help.

Tess: There are some significant implementation issues, e.g. SSML has a <p> element. If this was copied into an HTML doc, and the HTML parser found the <p> element it would think it was an HTML <p> element. (*Considerations around namespacing and the structure, e.g. with <table>s were also described*)
... MathML was mostly able to avoid this due to being designed to be embedded in HTML; all of its elements start with 'm' (e.g.)
... however MathML did require a number of parser changes.
... As Paul suggested, a polyfill could work in some cases, but may not be possible to guarantee it will behave as a future native implementation would, a subset profile of SSML would have to be used (probably not including <p> elements e.g.)

Paul: We have control over how elements are presented visually, but not aurally. E.g. can override styles for <p>. Not expecting the parsing differences to affect the aural nature of the SSML. Expect that devs knowing that it's not (yet) native, they could use styling and extra accessibility attrs to provide more info for the AX tree.

Janina: Do we need _all_ of SSML 1.1 or is there a subset/profile that meets our needs, as we develop the wider case?

Tess: The profile wouldn't necessarily have to be a big change; just minimize parser clashes. Would need to check, but it could be _just_ the <p> element for example.

Mark: in the educational context we often use subsets—certainly achievable to try the same approach here.

Paul: elements include <p>, <s>, <sub> (different meaning in SSML/HTML). Some elements mean simlar things across both languages.
... woudl have to study which elements are already close semantically.

Mark: substitution (i.e. <sub>) is a heavily-used feature in educational settings.

Paul: Some AT have tried to establish whether <em>/<strong> make a difference, for example. This would separate the visual/aural and allow more control for authors, rather than general rules that have been developed over time to try to provide some aural customisation for content authors.

Rossen: are any other working groups involved with the Pronunciation TF?

Paul: Léonie suggested early on that this shouldn't be part of ARIA as it has a wider remit than AT.

Mark: *+1*

Rossen: Some tools such as reading aloud can get the info they need from the AX tree.

Tess: as well as getting personal assistants involved, people who work on read aloud on platforms like e-readers could be interested too.
... Publishing has contacts.

Janina: reaching a conclusion today isn't practicable, but we have a lot of suggestions, contacts and approaches. We should pursue those and then re-assess later.

Mark: been discussing with EPUB (who've implemented a form of SSML, which is used in Japan when generating audio files from EPUB).

Tess: Sounds like useful prior art.

Mark: this was a namespaced attribute model.

Tess: That's a lot easier to implement than elements.

Neil: Noticed that Amazon supports SSML for Alexa skills—and they have a lot of documentation on this.

Mark: *is developing a skill that demos some issues; this is featured in the Pronunciation video*
... a lot of skills pick up content from Wikipedia. Woudl be great to be able to pass through the pronunciation info from that content.

Janina: what might our next steps be?

Rossen: One clear action is to engage with the community that works on digital assistants and similar devices and guage their interest in the space. This should provide a lot of additional prior art and/or opinions, and could generate a lot of interest.

Janina: Sounds good
... summary: there are challenges but also opportunities; important to bring the wider community along.

Rossen: next steps for TAG?

Janina: (separate issue) Personalization TF has a similar request (for different reasons) around a prefix.
... *APA to respond to TAG concerns and seek further discussion*
... Thanks TAG

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version (CVS log)
$Date: 2020/10/16 16:06:43 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision of Date 
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Present: Appelquist Dan Irfan Joshue108_ Matthew_Atkinson MichaelC NeilS Rossen_ SteveNoble becky hober janina jasonjgw mhakkinen paul_grenier
Found Scribe: Matthew_Atkinson
Inferring ScribeNick: Matthew_Atkinson

WARNING: No "Topic:" lines found.

Found Date: 16 Oct 2020
People with action items: 

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)

[End of scribe.perl diagnostic output]