IRC log of apa on 2021-10-26
Timestamps are in UTC.
- 16:03:57 [RRSAgent]
- RRSAgent has joined #apa
- 16:03:58 [RRSAgent]
- logging to https://www.w3.org/2021/10/26-apa-irc
- 16:03:59 [Zakim]
- RRSAgent, make logs Public
- 16:04:01 [Zakim]
- please title this meeting ("meeting: ..."), jamesn
- 16:04:20 [IrfanA]
- present+
- 16:04:22 [PaulG]
- present+
- 16:04:24 [becky]
- present+
- 16:04:29 [Joshue108]
- Scribe: Joshue108
- 16:04:31 [NeilS]
- NeilS has joined #apa
- 16:04:33 [Joshue108]
- present+
- 16:04:35 [NeilS]
- present+
- 16:04:36 [aaronlev]
- aaronlev has joined #apa
- 16:04:38 [jasonjgw]
- jasonjgw has joined #apa
- 16:05:05 [jamesn]
- meeting: APA & ARIA: The Future of Accessibility APIs
- 16:05:23 [jasonjgw]
- present+
- 16:05:24 [SamKanta]
- SamKanta has joined #apa
- 16:05:34 [SamKanta]
- present+
- 16:05:42 [aaronlev]
- present+
- 16:06:53 [Joshue108]
- <Intros>
- 16:08:07 [Joshue108]
- hmm
- 16:08:22 [Joshue108]
- hEn!2xGg!u
- 16:08:28 [Joshue108]
- try that
- 16:08:35 [Joshue108]
- or
- 16:08:36 [Joshue108]
- 4613145760
- 16:08:54 [Joshue108]
- it gives two for some reason
- 16:09:13 [bkardell_]
- bkardell_ has joined #apa
- 16:09:17 [Matthew_Atkinson]
- Matthew_Atkinson has joined #apa
- 16:09:48 [Joshue108]
- s/hmm/
- 16:09:56 [Joshue108]
- s/hEn!2xGg!u/
- 16:10:01 [Joshue108]
- s/4613145760/
- 16:10:12 [cyns]
- cyns has joined #apa
- 16:15:56 [SamKanta]
- present+
- 16:16:14 [SteveNoble]
- SteveNoble has joined #apa
- 16:16:27 [SteveNoble]
- present+
- 16:16:57 [Joshue108]
- TOPIC: Pronunciation Spec Discussion
- 16:17:06 [mhakkinen]
- mhakkinen has joined #apa
- 16:17:19 [Joshue108]
- JS: Thanks for joining - lets look at this
- 16:17:32 [Joshue108]
- Bridge differences from engineering perspectives.
- 16:17:40 [Joshue108]
- Can Mark or Irfan kick this off?
- 16:17:53 [Joshue108]
- So we can share perspectives etc?
- 16:18:01 [PaulG]
- q+
- 16:18:04 [Joshue108]
- Single vs multiple attributes..
- 16:18:10 [Joshue108]
- ack Paul
- 16:18:30 [IrfanA]
- https://www.w3.org/TR/spoken-html/
- 16:18:36 [Joshue108]
- PG: The goal is create authoring capabilities in HTML
- 16:18:51 [Joshue108]
- We have identified a gap in specs and APIs
- 16:19:00 [Joshue108]
- This is augmentation of AX Tree
- 16:19:17 [Joshue108]
- there are two candidates - one is a single attribute tbd
- 16:19:35 [Joshue108]
- Also there is a multi attribute approach data-ssml currently
- 16:19:49 [Joshue108]
- Currently tech based values
- 16:20:20 [bkardell_]
- q+ to ask about tag review
- 16:20:25 [Joshue108]
- Irf: We need to find a way to expose this
- 16:20:41 [Joshue108]
- JS: The reason it is prefixed data dash is as this is defined in HTML
- 16:21:01 [Joshue108]
- Once we have an implementation, then we go to the HTML group, and ask for a reserved prefix
- 16:21:06 [Joshue108]
- We are a way off that.
- 16:21:26 [Joshue108]
- But we need to get POC built etc. Make sure it works.
- 16:21:39 [Joshue108]
- Then we can get reserved prefix etc
- 16:21:49 [Joshue108]
- ack br
- 16:21:54 [Joshue108]
- ack bk
- 16:21:54 [Zakim]
- bkardell_, you wanted to ask about tag review
- 16:22:07 [Joshue108]
- BK: Is the spoken HTML idea reviewed by TAG?
- 16:22:12 [Joshue108]
- Seems like a good idea.
- 16:22:32 [Joshue108]
- JS: We did that last year - we heard from them, don't ask the parser to change
- 16:22:58 [Joshue108]
- Our current approach is within the scope of current parsing capability
- 16:23:13 [Joshue108]
- BK: I've seen two diff interpretations around the use of these attributes.
- 16:23:17 [Joshue108]
- Where can we discuss?
- 16:23:26 [Joshue108]
- JS: Happy to discuss now.
- 16:23:48 [Joshue108]
- BK: has heard different interpretations of this
- 16:23:55 [Joshue108]
- I think data attributes are fine
- 16:24:22 [Joshue108]
- Some feel strongly that for things that are standard, that isn't appropriate
- 16:24:27 [Joshue108]
- Can we open an issue?
- 16:24:29 [Joshue108]
- JS: Yes
- 16:24:42 [Joshue108]
- It may be on the HTML spec - we are following their guidelines.
- 16:24:56 [Joshue108]
- To drive consistent TTS out put in various envs.
- 16:25:26 [Joshue108]
- Matthew mentioned approach in personalisation , is using data- to drive it over there.
- 16:25:35 [Joshue108]
- To drive personalization
- 16:25:45 [Joshue108]
- JS: You can't get to a W3C REC using data-
- 16:25:51 [Joshue108]
- CR is as far as it will go.
- 16:25:59 [Joshue108]
- Thats the sandbox for data-
- 16:26:21 [Joshue108]
- keep implementations to allay cross site concerns
- 16:26:34 [Joshue108]
- JS: <gives overview of process and IP issues>
- 16:26:59 [Joshue108]
- <And how to progress specs>
- 16:27:03 [Joshue108]
- JS: Does that help?
- 16:27:23 [Joshue108]
- BK: Thats not new but just sharing counter interpretation
- 16:27:38 [Joshue108]
- This seems like a good use case to begin discussion
- 16:27:58 [Joshue108]
- JS: Mentions other specs using this approach
- 16:28:22 [Joshue108]
- Do we have a preference, is the crux here?
- 16:28:25 [Joshue108]
- q?
- 16:28:41 [Joshue108]
- JS: Others ?
- 16:28:56 [Joshue108]
- JS: Dave Tseng, how does that sound?
- 16:29:06 [Joshue108]
- Is multiple attribute preferable?
- 16:29:50 [Joshue108]
- Paul: Did mention an affinity for the multi attribute approach. There is no corollary for JSON as a value.
- 16:30:18 [Joshue108]
- JS: That is one view from one AT , which is fine.
- 16:30:20 [Joshue108]
- q?
- 16:30:40 [Joshue108]
- The difference for AT is around approach
- 16:31:08 [Joshue108]
- The group is more interested in JSON, as it is a single target, selector - info is picked up, the AX can abstract that, augment and provide info
- 16:31:26 [mhakkinen]
- +q
- 16:31:47 [Joshue108]
- JS: The direct read group may be different - things that power our speech recognition devices.
- 16:32:02 [cyns]
- q+
- 16:32:14 [Joshue108]
- GlenG: We do not have a problem parsing the HTML
- 16:32:32 [Joshue108]
- Making fewer calls, from an AT angle, is good - esp if noisy.
- 16:32:38 [janina]
- q?
- 16:32:41 [Joshue108]
- JSON, is good so we can get it all at once
- 16:32:58 [becky]
- ack mhakkinen
- 16:33:03 [Joshue108]
- In a single attribute we can do that.
- 16:33:03 [Joshue108]
- ack m
- 16:33:06 [tink]
- q+
- 16:33:11 [Joshue108]
- MK: Mentions read aloud tools
- 16:33:25 [Joshue108]
- Text Help have a preference for single attribute
- 16:33:39 [Joshue108]
- MS has immersive reader capabilities
- 16:33:44 [becky]
- ack cyns
- 16:33:56 [Joshue108]
- How would they use pronunciation q..
- 16:34:11 [Joshue108]
- CS: Jumping out of the A11y APIs seems like a big step.
- 16:34:14 [Joshue108]
- How did that happen?
- 16:34:31 [Matthew_Atkinson]
- s/pronunciation q../pronunciation cues/
- 16:34:35 [Joshue108]
- JS: We see use cases that provide benefit for AT that doesn't use the AX tree.
- 16:34:36 [becky]
- ack tink
- 16:34:40 [bkardell_]
- s/Some feel strongly that for things that are standard,/Some have expressed to me subtler interpretations that data-* is for something more narrow, I'd like to see if I can get them to share discussion there
- 16:35:09 [SteveNoble]
- q+
- 16:35:12 [Joshue108]
- LW: From those who train and teach - while the JSON attribute is unfamiliar - adding more attributes could be confusing
- 16:35:38 [Joshue108]
- We have spent years explaining around applied semantics, and the ground work of understanding the A11y tree
- 16:35:49 [becky]
- ack SteveNoble
- 16:35:51 [Joshue108]
- And it could be confusing for adoption with a different approach
- 16:36:23 [Joshue108]
- SN: Pearson has an implementation of the single attribute approach - just to channel what Paul G says..
- 16:36:37 [Joshue108]
- <discusses runtime performance>
- 16:37:01 [Joshue108]
- Sniffing and selecting tons of attributes in the DOM will be worse, that teasing out JSON via a processor
- 16:37:01 [Matthew_Atkinson]
- The GitHub comment being referenced is https://github.com/w3c/pronunciation/issues/86#issue-904400398
- 16:37:22 [Joshue108]
- When a TTS player has to rip through the data the round trip is brutal
- 16:37:43 [Jemma]
- Jemma has joined #apa
- 16:37:53 [Jemma]
- present+
- 16:38:05 [PaulG]
- q+
- 16:38:15 [Joshue108]
- SN: Performance will be impacted in the read aloud environment, it's a concern.
- 16:38:20 [Jemma]
- rrsagent, make minutes
- 16:38:20 [RRSAgent]
- I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html Jemma
- 16:38:26 [becky]
- ack paulg
- 16:38:40 [Matthew_Atkinson]
- s/comment being referenced/SteveNoble mentioned/
- 16:39:39 [Joshue108]
- PG: With ARIA live the A11y tree gets updated - there is a concern about dynamic content - when there is a special region, the browser has to catch up
- 16:39:47 [Matthew_Atkinson]
- scribe: Matthew_Atkinson
- 16:39:50 [jamesn]
- present+
- 16:39:55 [Matthew_Atkinson]
- Matthew_Atkinson: present+
- 16:39:57 [Matthew_Atkinson]
- scribe: Matthew_Atkinson
- 16:40:05 [becky]
- q?
- 16:40:35 [Matthew_Atkinson]
- cyns: aaronlev/David Tseng: any thoughts?
- 16:41:17 [PaulG]
- q+
- 16:41:19 [Matthew_Atkinson]
- aaronlev: Concerns around how these things will be misused by authors (c.f. live regions). What is the ideal markup that we would want?
- 16:41:28 [PaulG]
- q-
- 16:41:42 [cyns]
- q+ to ask about css pronunciation
- 16:42:03 [Matthew_Atkinson]
- janina: The ideal would be to make SSML a native citizen of HTML. Concern around it not being possible to change validators (not necessarily parsers as mentioned above).
- 16:42:08 [bkardell_]
- q+
- 16:42:13 [Matthew_Atkinson]
- aaronlev: How about providing a separate SSML file?
- 16:42:37 [Matthew_Atkinson]
- janina/becky: Don't think that was considered.
- 16:42:59 [Matthew_Atkinson]
- aaronlev: Want to consider: what's the potential for misuse. Also: platform AX APIs can often be extended to provide more information.
- 16:43:36 [Matthew_Atkinson]
- PaulG: Made a note about homonym attacks (in the document).
- 16:43:56 [janina]
- q?
- 16:44:13 [janina]
- q+
- 16:44:15 [jamesn]
- q+ to echo that authors WILL use things if they are available - just because they can
- 16:44:43 [Matthew_Atkinson]
- aaronlev: Concern around empowering authors to give the users a bad experience (c.f. ARIA can be misused in this way). Interested to hear Glen Gordon [who's on the line]'s thoughts on this. Examples such as inconsistencies inter-site or intra-site.
- 16:45:09 [Matthew_Atkinson]
- aaronlev: Using a single element could increase the risk that the AT can't present things consistently to users?
- 16:45:12 [becky]
- ack cyn
- 16:45:12 [Zakim]
- cyns, you wanted to ask about css pronunciation
- 16:45:29 [tink]
- q+
- 16:45:57 [Matthew_Atkinson]
- cyns: Is there a relationship between this and the pronounciation functionality being proposed for CSS, or are they different use cases. Also concern around author misuse: remember when everyone made all the fonts small and light gray? Worried that a lot of things will get sped up.
- 16:46:17 [PaulG]
- We covered CSS Speech gap analysis here https://w3c.github.io/pronunciation/gap-analysis/
- 16:46:41 [Matthew_Atkinson]
- janina: I think the CSS work is orthogonal to what we're trying to do; we did a gap analysis [URL above] that may have more info.
- 16:46:42 [PaulG]
- section #3
- 16:46:54 [bkardell_]
- could we ++ tink in the queue before me?
- 16:47:11 [becky]
- sure will do that bkardell
- 16:47:37 [Matthew_Atkinson]
- janina: Possibility for misuse: anything that allows extra functionality could be mis-used. Need to be aware of it. But this allows us to do things such as mixing languages within a book (such as historical text).
- 16:48:34 [cyns]
- Can someone drop a link to the use cases in here?
- 16:48:36 [Matthew_Atkinson]
- janina: This also helps confuse wind/wind and tear/tear. Much opportunity for improvement. It's not a proposal that the entire document needs to be marked up for pronounciation. In most cases TTS engines will do reasonably well.
- 16:49:00 [PaulG]
- https://w3c.github.io/pronunciation/use-cases/
- 16:50:00 [becky]
- ack janina
- 16:50:27 [PaulG]
- q+ AT and voice assistants could "learn" from authors
- 16:50:30 [bkardell_]
- q--
- 16:50:31 [Matthew_Atkinson]
- jamesn: Worried about over-use and mis-use. Not sure how we counter this. Yes, it is necessary in certain cases. Not a screen reader user, so unsure if this is a problem: company names. Can see people wanting to put this into their company name. Is it a problem if a compan's site pronounces it correctly, but everywhere else on the web it's incorrect?
- 16:50:35 [becky]
- ack Jamesn
- 16:50:35 [Zakim]
- jamesn, you wanted to echo that authors WILL use things if they are available - just because they can
- 16:50:42 [becky]
- ack tink
- 16:51:11 [PaulG]
- q+ to comment "learning" from authors
- 16:51:27 [Matthew_Atkinson]
- tink: To answer your question jamesn, I would find it useful to hear the canonical company name pronounciation. Can get too used to how the AT pronounces it.
- 16:51:54 [Matthew_Atkinson]
- tink: CSS Speech is catering for a specific set of use cases: it's trying to make the auditory experience less tedious.
- 16:52:22 [Matthew_Atkinson]
- tink: Yes it can be misued: HTML, ARIA, XML are all misused, but think we can mitigate against this, but not stop it.
- 16:52:56 [Matthew_Atkinson]
- tink: For now the CSS Speech media type isn't supported by UAs, sadly. But, different use cases.
- 16:52:57 [becky]
- ack bkardell_
- 16:54:10 [Matthew_Atkinson]
- bkardell_: The use cases are different, but problems are similar in that we need to affiliate nodes with values. Presumably wouldn't have one SSML document for the entire page, nor hundreds/thousands (would be very slow to load from network).
- 16:55:07 [Matthew_Atkinson]
- aaronlev: Was spitballing; though we can load a hundred images for a document. Can we put element IDs in SSML documents? The idea is mainly to avoid adding noise to the markup of the document.
- 16:55:30 [Matthew_Atkinson]
- bkardell_: Whilst the single JSON attribute could be ugly, can also see the benefit of keeping the info together.
- 16:55:57 [becky]
- q?
- 16:56:01 [Matthew_Atkinson]
- bkardell_: Brought this up in the MathML meeting as well, but we could polyfill something like this with existing technologies? Then authors wouldn't need to create the cumbersome JSON attributes.
- 16:56:26 [becky]
- ack PaulG
- 16:56:26 [Zakim]
- PaulG, you wanted to comment "learning" from authors
- 16:56:26 [Matthew_Atkinson]
- aaronlev: This feels a bit like [CSS] background images; it's changing the presentation as opposed to the semantics?
- 16:57:16 [Matthew_Atkinson]
- PaulG: For linking, we did talk briefly about linking external resources (for the next stage of the spec). If SSML came to the document as a first-class citizen like SVG we would look into that.
- 16:58:06 [Matthew_Atkinson]
- PaulG: Performance: TTS uses a lot of heuristics to determine pronounciation. Reducing the need for heuristics may mitigate some performance hit.
- 16:58:28 [mhakkinen]
- q+
- 16:58:44 [Matthew_Atkinson]
- PaulG: Voice assistants might start to learn correct pronounciations e.g. for company names from their official sites.
- 16:59:07 [becky]
- ack mhakkinen
- 16:59:11 [Matthew_Atkinson]
- janina: e.g. Versailles is pronounced differently based on location.
- 16:59:14 [bkardell_]
- interesting point PaulG
- 17:00:00 [Matthew_Atkinson]
- mhakkinen: We have a lot of need for pronounciation in education (ref Pearson's work discussed before). We have looked for a standard solution, e.g. PLS. Pronounciation Lexicon Specification [scribe note: PaulG mentioned this just above].
- 17:00:28 [Joshue108]
- q?
- 17:01:02 [Matthew_Atkinson]
- mhakkinen: We want screen readers, read-aloud tools, etc. to benefit. Another example: pharmaceutical producs. And another: television/film/movie program guides (character names, actor names, etc.)
- 17:01:26 [Matthew_Atkinson]
- janina: We wanted to discuss this near-term problem but didn't intend to take the whole time for this; will summarize.
- 17:01:46 [cyns]
- q+
- 17:02:09 [Matthew_Atkinson]
- janina: Hearing from aaronlev that browsers aren't expected to be a blocker as to which approach is taken. Need more feedback from AT vendors. Is that a reasonable summary?
- 17:02:18 [cyns]
- q?
- 17:02:52 [becky]
- ack cyns
- 17:02:54 [Matthew_Atkinson]
- aaronlev: We _can_ implement anything; we still would need to look carefully at the proposal. We'd want good markup, good API support, good AT support; an end-to-end plan. Doesn't sound like all options have been looked at yet.
- 17:03:39 [Matthew_Atkinson]
- cyns: Have a similar view to aaronlev. The single-attribute approach feels counterintuitive for authors. It doesn't feel very HTML-like. Concerned about readability.
- 17:03:44 [mhakkinen]
- q+
- 17:03:56 [tink]
- q+
- 17:04:12 [becky]
- ack mhakkinen
- 17:04:17 [Matthew_Atkinson]
- aaronlev: JSON can be hard to read.
- 17:04:33 [Matthew_Atkinson]
- cyns: In general, it is OK but as an attribute value it is hard to read.
- 17:05:18 [cyns]
- q+ one of the goals of markup is to be human readable
- 17:05:32 [Matthew_Atkinson]
- mhakkinen: From an authoring tool perspective, authors don't necessarily need to see the output HTML. We have tools already that allow authors to provide pronounciation hints that are intuitive to use. We need a standard way for ATs and others to consume it.
- 17:05:53 [becky]
- ack tink
- 17:06:03 [Matthew_Atkinson]
- tink: Is the idea with the single attribute that the JSON will be in the HTML code, or some external file that will be linked?
- 17:06:24 [Matthew_Atkinson]
- PaulG: Our current implementations/experimentations have the attribute value embedded in the HTML.
- 17:06:28 [Matthew_Atkinson]
- tink: How about an external file?
- 17:07:08 [Matthew_Atkinson]
- PaulG: We've had discussions about this before; have not yet found/developed method to do external linking.
- 17:07:32 [jcraig]
- jcraig has joined #apa
- 17:07:42 [Matthew_Atkinson]
- tink: Providing common rules is very much like CSS and could be of benefit here.
- 17:07:56 [Matthew_Atkinson]
- PaulG: Agree; would be great to have first-class SSML support.
- 17:08:18 [Matthew_Atkinson]
- cyns: Concerns around readability; if it's an external file this is less so. Could this just use CSS?
- 17:08:38 [jcraig]
- q+ to point out that external file would violate the AT privacy guidelines from web platform design principles that Leonie helped author
- 17:09:04 [becky]
- ack jcraig
- 17:09:04 [Zakim]
- jcraig, you wanted to point out that external file would violate the AT privacy guidelines from web platform design principles that Leonie helped author
- 17:09:11 [Matthew_Atkinson]
- bkardell_: There are efforts ongoing to allow authors to create CSS-like languages. (c.f. Houdini)
- 17:10:04 [cyns]
- q+ to say that pronunciation could be used by other things besides AT
- 17:10:07 [bkardell_]
- but it isn't really AT specific, it would apply to many speech agents
- 17:10:09 [Matthew_Atkinson]
- jcraig: The web platform design principles mention the importance of making AT _not_ detectable. Would be good to have SSML in the document, but requesting an external file would be detectable.
- 17:10:26 [becky]
- ack cyns
- 17:10:26 [Zakim]
- cyns, you wanted to say that pronunciation could be used by other things besides AT
- 17:10:46 [Matthew_Atkinson]
- cyns: I think use cases for this extend beyond AT, so not sure this would be useful for fingerprinting. Don't want to end up with what looks like inline CSS.
- 17:10:47 [bkardell_]
- "hey <assistant> read this" is a thing I use all the time - those would be indistinguishable
- 17:10:50 [aaronlev]
- q+
- 17:10:55 [jamesn]
- q+
- 17:11:18 [Matthew_Atkinson]
- janina: Referencing external files could be helpful to avoid repetition.
- 17:11:20 [becky]
- ack aaronlev
- 17:12:02 [Matthew_Atkinson]
- aaronlev: Not sure if proposed, but: for the use case where changing the name of a product/address/company, sounds like we could use a dictionary. Problem: every time that name/phrase/word is announced you'd have to wrap its markup.
- 17:12:37 [Matthew_Atkinson]
- PaulG: We discussed this. Some tags like prossidy or voice can control an entire block. Others like pauses weren't there originally, so need an extra <span>, with single or multi-attribute.
- 17:12:46 [jcraig]
- indistinguishable depends on many factors of entropy... client + accessed this other file + other factors might equal reasonable certainty of AT... FWIW, I think pronunciation rules are necessary. Just trying to point out the complications wrt that particular design principle
- 17:13:09 [Matthew_Atkinson]
- PaulG: The single attribute would, at first, encourage authors to summarize an entire block of text all at once, thus making it hard to update the pronounciation if the text changes.
- 17:13:19 [Matthew_Atkinson]
- PaulG: Would thus need help for developers to keep those in sync
- 17:13:42 [Matthew_Atkinson]
- PaulG: If everything is chopped up (multi-attribute), as a developer I think this would be easier, espeically for hand-coding devs. Interested as to others' views.
- 17:13:44 [becky]
- ack jamesn
- 17:14:25 [Matthew_Atkinson]
- jamesn: Replying to jcraig around detection. We _could_ require browsers to always fetch these files (is an additional complication, but could be managed).
- 17:15:01 [Matthew_Atkinson]
- jcraig: Absolutely agree that pronounication rules need to be defined in some format; just wanted to raise the issue. Has AX API design implications.
- 17:15:25 [bkardell_]
- embeddacble as a css-like would work too and no extra fetch
- 17:15:31 [aaronlev]
- q+
- 17:15:39 [Matthew_Atkinson]
- janina: Seems everyone's agreed on the _need_ but we are still unsure as to single/multiple attributes, and there is the second-order question of external file.
- 17:16:04 [jcraig]
- q+ to ask if l10n/i18n was discussed in this context earlier
- 17:16:14 [janina]
- q+
- 17:16:28 [Matthew_Atkinson]
- Joanmarie: If ATs want a single attribute, but authors want multiple attributes, or vice-versa, the implementation could be to take all the single attributes and parse them all together.
- 17:16:44 [Matthew_Atkinson]
- Joanmarie: We should consider what's best for authors, as a result.
- 17:16:46 [becky]
- ack aaronlev
- 17:16:48 [jcraig]
- q+ to mention l10n both with languages as well as with TTS capabilities
- 17:17:04 [jcraig]
- qv?
- 17:17:34 [Matthew_Atkinson]
- aaronlev: I feel there are many proposals that haven't been made yet, so should continue offline. But for the dictionary resource proposal: this could be something the AT fetches itself (circumventing the privacy issues; allowing caching across sites/domains).
- 17:17:55 [Matthew_Atkinson]
- aaronlev: Seems odd to me that we're going to be saying how to pronounce things, but only in one place.
- 17:18:09 [PaulG]
- "pronunciation" is only for phonemes. There are many more aural expressions from SSML that this spec would allow for.
- 17:18:16 [Matthew_Atkinson]
- aaronlev: ...where it'd be more useful if that was everywhere.
- 17:18:26 [becky]
- ack jcraig
- 17:18:26 [Zakim]
- jcraig, you wanted to ask if l10n/i18n was discussed in this context earlier and to mention l10n both with languages as well as with TTS capabilities
- 17:18:35 [becky]
- Q?
- 17:18:43 [Matthew_Atkinson]
- jcraig: aaronlev: are you implying there's a need for a global registry?
- 17:18:54 [Matthew_Atkinson]
- aaronlev: Not sure, but worth looking into. Consistency is important.
- 17:19:52 [PaulG]
- q+
- 17:20:00 [Matthew_Atkinson]
- jcraig: Has l10n and i18n been discussed? E.g. Homonyms, in different languages/locales. Also different TTS voices may be able to pronounce Spanish and English, but not Chinese. Has any of the rules discussion covered this?
- 17:20:03 [becky]
- ack janina
- 17:20:33 [Matthew_Atkinson]
- janina: We have discussed those naunaces and the need to disambiguate them. The problem is that defining what the correct pronounciation is will change (e.g. wind/wind).
- 17:20:41 [jcraig]
- s/Homonyms, in different languages/locales. /Homonyms may be pronounced differently in languages/locales. /
- 17:21:27 [Matthew_Atkinson]
- [ scribe note: jcraig wished to raise having been delayed in joining, so missed some prior discussion ]
- 17:21:38 [jcraig]
- close/close is a more common homonym in UI in English
- 17:21:46 [Matthew_Atkinson]
- janina: Another example is English, but at different times in history, as proncounciations evolve.
- 17:22:10 [becky]
- ack PaulG
- 17:23:01 [janina]
- q+
- 17:23:17 [Matthew_Atkinson]
- PaulG: A dictionary would be limited to phonemes. We have an example that's wider than this [Vincent Price reading The Raven]; covering audio "performance".
- 17:24:07 [Matthew_Atkinson]
- PaulG: Devs are guided towards specifying the language of the document, and the TTS does the rest. But there is contextual info (such as location) that might impact accent, vernacular, local place names, and that's part of what we're aiming to provide.
- 17:24:39 [bkardell_]
- q+
- 17:25:14 [Matthew_Atkinson]
- PaulG: Voice packs being able to support different proncounciations is another issue that we would need to resolve as an industry, but isn't something we can solve in the spec. Some pre-reading, or meta tags could be added to encourage assistants/AT to load specific voice packs/TTS capabilities to ensure a good experience for the user.
- 17:25:14 [becky]
- ack janina
- 17:25:39 [Matthew_Atkinson]
- janina: Maybe the voice packs issue is a metadata issue.
- 17:26:28 [Matthew_Atkinson]
- janina: Want to revisit Joanmarie's suggestion, as that could give us a path forward. If authoring is easier in multi-attributes, as long as the UAs can expose what the ATs need, that could address this. We should explore this.
- 17:27:15 [Matthew_Atkinson]
- janina: My concern is if we were to have conflicting views accross UAs. Joanmarie's suggested approach could help us address the UA-AT aspect. Does that sound good?
- 17:27:37 [jcraig]
- +1 to glen
- 17:27:50 [jamesn]
- +1 to glen
- 17:27:55 [Matthew_Atkinson]
- Glen: I don't think authors will unanimously agree on whether single-, or multi-attribute approach is easier.
- 17:28:11 [SteveNoble]
- q+
- 17:28:17 [Matthew_Atkinson]
- jcraig: +1; depends on tooling
- 17:28:24 [Matthew_Atkinson]
- janina: I think we have to presume tooling.
- 17:28:39 [Matthew_Atkinson]
- cyns: Still thinking readability is important.
- 17:29:36 [becky]
- ack bkardell_
- 17:30:29 [Matthew_Atkinson]
- bkardell_: There should be some experimentation, particularly with a CSS-like solution. There was discussion of polyfills in last year's TPAC? How are they getting the SSML to the AT?
- 17:30:39 [mhakkinen]
- q+
- 17:30:45 [becky]
- ack SteveNoble
- 17:31:20 [Matthew_Atkinson]
- SteveNoble: Authoring: as mhakkinen said, the people authoring this stuff every day are using authoring tools.
- 17:32:28 [Matthew_Atkinson]
- +1 to general philosophical view that readability is important, though I am not an implementation expert in this field!
- 17:32:51 [Matthew_Atkinson]
- SteveNoble: [demonstrates some content that has been marked up for proncounciation]
- 17:33:59 [Matthew_Atkinson]
- ... The authoring tool identifies this as "alternate text for text-to-speech" that allows users to highlight a word, e.g. melancholy, and provide alternate text, e.g. melancollie that the system turns into SSML.
- 17:34:23 [jcraig]
- q+ to demo something similar from the iOS VoiceOver settings
- 17:34:30 [cyns]
- q+ to say that a wysiwyg editors don't address my concerns about human readability of markup. You shouldn't have to use a special tool to write or read markup
- 17:34:41 [Matthew_Atkinson]
- ... There are also tables of words and "how to spell these phonetically in the system"
- 17:34:42 [AndroUser]
- AndroUser has joined #apa
- 17:34:56 [jamesn]
- q+ to say that to me this looks like the kind of overuse I fear
- 17:35:06 [Matthew_Atkinson]
- ... e.g. dinosaur may be expressed as dinosore
- 17:35:43 [Matthew_Atkinson]
- ... Authors may be creating 1,000 SSML fragments per week (though they don't know it as SSML) to correct the way that the TTS pronounces things.
- 17:36:08 [Matthew_Atkinson]
- ... [Compares this to creating MathML with a WYSIWYG editor]
- 17:36:26 [becky]
- ack mhakkinen
- 17:36:55 [Matthew_Atkinson]
- mhakkinen: To echo what SteveNoble said, for classrom materials, many states have specific guidelines on pronounciation and we've had to spend time tweaking text so that it'll be pronounced with the right sort of pronounciation or pausing.
- 17:37:08 [Matthew_Atkinson]
- mhakkinen: We've tried to do this without authors having to learn SSML.
- 17:37:33 [Matthew_Atkinson]
- mhakkinen: We've had to create hacks to support whatever sorts of AT/read-aloud we are using at delivery time. We have not necesarily been able to get this into screen readers.
- 17:37:56 [Matthew_Atkinson]
- mhakkinen: E.g. if we altered the way the screen reader pronounced things, this could really confuse Braille users.
- 17:38:08 [Matthew_Atkinson]
- mhakkinen: We don't think this is a challenge for authors with the correct tooling.
- 17:38:26 [becky]
- ack jcraig
- 17:38:26 [Zakim]
- jcraig, you wanted to demo something similar from the iOS VoiceOver settings
- 17:39:04 [Matthew_Atkinson]
- jcraig: [Demonstrates VoiceOver Settings > Speech > Pronounciation for some of our names]
- 17:39:33 [jasonjgw]
- q+
- 17:40:03 [Matthew_Atkinson]
- jcraig: You can speak how you want the term to be pronounced, and the device will interpret this and offer options (that it reads back) from which you can choose.
- 17:41:18 [Matthew_Atkinson]
- jcraig: Users can do this accross the system. Perhaps this could be exposed through WebKit.
- 17:41:35 [becky]
- ack cyns
- 17:41:35 [Zakim]
- cyns, you wanted to say that a wysiwyg editors don't address my concerns about human readability of markup. You shouldn't have to use a special tool to write or read markup
- 17:41:56 [Matthew_Atkinson]
- cyns: Special tools shouldn't be needed to make markup readable.
- 17:42:30 [Matthew_Atkinson]
- cyns: Have you looked at using a polyfil to pull all of the info into the AX API's description field?
- 17:42:36 [bkardell_]
- thanks cyns that was the q I was asking too - your articulation was better
- 17:42:54 [jcraig]
- s/AX API/Accessibility API/
- 17:42:55 [Matthew_Atkinson]
- mhakkinen: We've not done anything specifically for screen readers; our use cases are wider (e.g. read aloud tools).
- 17:43:03 [Matthew_Atkinson]
- mhakkinen: We prefer a standards-based approach.
- 17:43:32 [PaulG]
- The authoring standard also allows for scenarios like kiosks where an individual's AT/voice assistance solution may not be integrated with the content
- 17:43:44 [Matthew_Atkinson]
- SteveNoble: Our support internally is for our own TTS system and read and write extension. TextHelp is another vendor that supports SSML (single-attribute).
- 17:43:49 [becky]
- q?
- 17:44:24 [jamesn]
- ack me
- 17:44:24 [Zakim]
- jamesn, you wanted to say that to me this looks like the kind of overuse I fear
- 17:44:26 [becky]
- ack jamesn
- 17:44:29 [jcraig]
- s/Perhaps this could be exposed through WebKit./Perhaps this could be exposed through WebKit. I don't have a strong preference for whether that's via and attribute, or in a dictionary defined in a page script block or external resource. /
- 17:44:35 [Matthew_Atkinson]
- mhakkinen: Some years ago we prototyped a custom element that allows you to specify pronounciation and a Braiile label, but this didn't solve the problem of getting the screen reader to direct content specifically to TTS vs Braille.
- 17:45:36 [becky]
- ack jasonjgw
- 17:45:41 [Matthew_Atkinson]
- jamesn: Can see the publishing appraoch SteveNoble demo'd working when you have control over the TTS, but this standards approach is much more general. This doesn't seem like an appropriate use case for the wider web.
- 17:46:18 [jcraig]
- +q to agree with ETS comments that any polyfill implementable today may help speech users, but would be harmful to braille users. the standards approach takes longer but is the right path.
- 17:46:54 [janina]
- q?
- 17:47:19 [Matthew_Atkinson]
- jasonjgw: Trying to maximize author convenience and ACKing that this will differ across authors. The ability to define information globally and at the individual text element level seems to have got agreement. There's some flexibility on the UA side as to how it's represented in the markup and it seems possible to tailor the deliverey via the AX API for ATs that will maximize efficiency there.
- 17:48:15 [Matthew_Atkinson]
- jasonjgw: This has some parallels to the work NeilS demo'd at TPAC last week about how to provide disambiguating information on MathML. They're considering the same problems wrt how to specify the markup side and the deliverey side. We should aim to produce a simlar approach in both cases.
- 17:48:17 [NeilS]
- q+
- 17:48:41 [jcraig]
- s/AX API/Accessibility API/g
- 17:48:47 [jcraig]
- ack me
- 17:48:47 [Zakim]
- jcraig, you wanted to agree with ETS comments that any polyfill implementable today may help speech users, but would be harmful to braille users. the standards approach takes
- 17:48:50 [Zakim]
- ... longer but is the right path.
- 17:48:51 [becky]
- ack jcraig
- 17:48:52 [Matthew_Atkinson]
- jasonjgw: That might help the discussion along. There were broader issues raised in the agenda, but these seem to have specific parallels across the work of different groups.
- 17:48:57 [bkardell_]
- I agree it is very hard for me to not see the interrelationship here -- they might not be the exact same thing, but they certainly seem to have some intersection of concerns
- 17:49:25 [Matthew_Atkinson]
- jcraig: Wanted to +1 the ETS comments: bending existing pronounciation rules in specific contexts would be harmful to Braille in the general context.
- 17:50:05 [Matthew_Atkinson]
- jcraig: The standards approach is the right approach for our use-cases; don't see a polyfill approach working.
- 17:50:08 [becky]
- ack NeilS
- 17:50:37 [jcraig]
- q+ to respond
- 17:50:47 [Matthew_Atkinson]
- NeilS: Our (MathML)'s question was: if we're polyfilling, what's the target? Can't use aria-label as would negatively affect Braille. There is no target in the AX API. Seems we have to add something.
- 17:50:54 [becky]
- ack jcraig
- 17:50:54 [Zakim]
- jcraig, you wanted to respond
- 17:51:56 [Matthew_Atkinson]
- jcraig: +1; MathJax polyfil is a good example as it degrades the user experience when the platform has wider features (such as conversion to Nemeth Braille, which is bypassed by the Polyfill).
- 17:52:23 [Matthew_Atkinson]
- becky: Does anyone want to provide a summary, or next steps?
- 17:53:07 [jcraig]
- s/AX API/Accessibility API/g
- 17:53:30 [Matthew_Atkinson]
- bkardell_: The single-attribute version is not pretty, but if we could figure out how to plumb that down so that it could be used for polyfills/idea experimentation, we could always sugar on top of it (e.g. like with CSS, you can use inline style attributes, but normal humans authoring HTML wouldn't—we could have a similar abstraction).
- 17:53:44 [kirkwood]
- present+
- 17:54:23 [cyns]
- q+ to ask if we can have the broader api discussion in the second session
- 17:54:32 [jcraig]
- q+
- 17:54:52 [becky]
- ack cyns
- 17:54:52 [Zakim]
- cyns, you wanted to ask if we can have the broader api discussion in the second session
- 17:55:06 [Matthew_Atkinson]
- cyns: Could we have the broader AAPI discussion in the second session?
- 17:56:02 [becky]
- ack jcraig
- 17:56:09 [Matthew_Atkinson]
- mhakkinen: Helpful discussion; lots for the TF to consider.
- 17:57:23 [Matthew_Atkinson]
- jcraig: Is there a subset of spoken presentation in HTML draft that you'd recommend (some things such as prossidy have been mentioned as out of scope). In order to get this on an implementation schedule, suggest cutting it down and agreeing on any non-contraversial aspects, as could then get implementations behind runtime flags.
- 17:58:38 [Matthew_Atkinson]
- janina: One thing holding us back is to decide on the either/or with respect to attributes.
- 17:59:21 [Matthew_Atkinson]
- jcraig: Could include a dictionary inside a <script> tag in the page, and only resolve the attribute issue later when you are addressing the issue of providing info for specific elements.
- 17:59:58 [Matthew_Atkinson]
- RRSAgent, make minutes
- 17:59:58 [RRSAgent]
- I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html Matthew_Atkinson
- 18:00:17 [Matthew_Atkinson]
- becky: janina: thanks everyone!
- 18:00:23 [jcraig]
- s/when you are addressing the issue of providing info for/when you need to provide pronunciation on/
- 18:00:50 [jcraig]
- rrsagent, make minutes
- 18:00:50 [RRSAgent]
- I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html jcraig
- 18:00:50 [Matthew_Atkinson]
- (general agreement on productive discussion from everyone)
- 18:01:11 [becky]
- matthew are you good posting the minutes to apa?
- 18:01:18 [janina]
- rrsagent, make minutes
- 18:01:20 [RRSAgent]
- I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html janina
- 18:01:20 [jcraig]
- present+
- 18:01:21 [janina]
- I can do it!
- 18:12:16 [jcraig]
- Throwing out some not-even-half-baked ideas after the meeting...
- 18:12:45 [jcraig]
- navigator.pronunciationDictionary.append(
- 18:12:45 [jcraig]
- ['ipa', term1, ipa_string1],
- 18:12:45 [jcraig]
- ['ipa', term2, ipa_string2]
- 18:12:45 [jcraig]
- );
- 18:13:01 [jcraig]
- navigator.pronunciationDictionary.append(['ipa', term3, ipa_string3]);
- 18:13:17 [jcraig]
- in-page script block or external
- 18:14:33 [jcraig]
- something like that could be associated with reflected content attributes later, but it would not preclude run-time-flag implementations based on the "which attributes?" question.
- 18:15:00 [jcraig]
- cc IrfanA mhakkinen
- 18:16:38 [jcraig]
- cc bkardell_ NeilS
- 18:18:44 [jcraig]
- window.pronunciationDictionary.append([
- 18:18:44 [jcraig]
- [term1, ipa_string1],
- 18:18:44 [jcraig]
- [term2, ipa_string2]
- 18:18:44 [jcraig]
- ], 'ipa'); // or this more terse version? also, window or doc obj seems better than navigator.
- 18:20:08 [jcraig]
- could also work with a site-wide dictionary: <script src="../en/speech_dictionary.js">
- 19:07:05 [stevelee]
- stevelee has joined #apa
- 20:09:08 [bkardell_]
- sorry jcraig I had to run to the courthouse to handle some things for my dad's estate right after this meeting... but I suppose that is one option, close to the level I was suggesting to start with, if all we care about is pronounciations in a dictionary... I am personally unconvinced of that but it would be something
- 20:11:18 [bkardell_]
- like, I'm not sure that helps math, nor lots of cases for say-as for example- and can imagine lots of documents contain 2 short words or symbols which could actually be different even
- 20:13:55 [bkardell_]
- I guess I would also have to understand more about how that would work in terms of matching - I assume since you are suggesting this that it would be pretty easy to pipe that down so that there was some matching done... somewhere... to provide to the speech engine rather easily (more than ssml would?)
- 20:14:17 [bkardell_]
- I guess I would also have to understand more about how that would work in terms of matching - I assume since you are suggesting this that it would be pretty easy to pipe that down so that there was some matching done... somewhere... to provide to the speech engine rather easily (more than ssml would? - I think a lot of engines do actually support ssml if passed correctly)
- 20:24:53 [Zakim]
- Zakim has left #apa
- 20:48:38 [mhakkinen]
- jcraig... like the half baked idea, but this doesn't seem to solve cases (or does it) where there may be more than one pronunciation of a word active in a given page, especially in an assessment where words out of context wouldn't cue an intelligent TTS... e.g., read (reed) vs read (red). I can see tagging a word with a specific idref to a dictionary entry that could be loaded. PLS wouldn't work, as there isn't an id associated with entries [CUT]
- 20:50:53 [mhakkinen]
- pronunciations allowed per word entry). PLS isn't a bad model, for the basic dictionary, if web content referenced the PLS in the document head via a link, and Read Aloud, screen reader or voice assistants picked it up. But as it stands, PLS is incomplete for everything we need.
- 20:51:04 [jcraig]
- Sure. The API would need several more optionals, such as:
- 20:51:10 [jcraig]
- lang/locale
- 20:51:56 [jcraig]
- homophones in the lang.... probably by part of speech like read (verb) versus read (past tense verb or adjective)
- 20:52:28 [jcraig]
- the IDREF idea is interesting...
- 20:55:00 [jib]
- jib has joined #apa
- 21:13:23 [bkardell_]
- https://www.irccloud.com/pastebin/EWiiEjZ3/
- 21:15:58 [bkardell_]
- I think here we could decide what that actually looks like - maybe it isn't json or maybe it is, but the idea of a thing we can rather simply say "let's define that very limited thing and make it possible" would allow lots of experiments of expression and serialization/connection.
- 21:16:30 [bkardell_]
- I think here we could decide what that actually looks like - maybe it isn't json or maybe it is, but the idea of a thing we can rather simply say "let's define that very limited thing and make it possible by without figuring out serialization/etc" would allow lots of experiments of expression and serialization/connection.
- 21:38:09 [mhakkinen]
- +1 to both jcraig and bkardell_
- 23:56:23 [cyns]
- cyns has joined #apa
- 23:56:42 [cyns]
- I will be 15-20 minutes late to the ARIA/APA meeting.
- 23:59:55 [becky]
- becky has joined #apa