IRC log of apa on 2021-10-26

Timestamps are in UTC.

16:03:57 [RRSAgent]
RRSAgent has joined #apa
16:03:58 [RRSAgent]
logging to https://www.w3.org/2021/10/26-apa-irc
16:03:59 [Zakim]
RRSAgent, make logs Public
16:04:01 [Zakim]
please title this meeting ("meeting: ..."), jamesn
16:04:20 [IrfanA]
present+
16:04:22 [PaulG]
present+
16:04:24 [becky]
present+
16:04:29 [Joshue108]
Scribe: Joshue108
16:04:31 [NeilS]
NeilS has joined #apa
16:04:33 [Joshue108]
present+
16:04:35 [NeilS]
present+
16:04:36 [aaronlev]
aaronlev has joined #apa
16:04:38 [jasonjgw]
jasonjgw has joined #apa
16:05:05 [jamesn]
meeting: APA & ARIA: The Future of Accessibility APIs
16:05:23 [jasonjgw]
present+
16:05:24 [SamKanta]
SamKanta has joined #apa
16:05:34 [SamKanta]
present+
16:05:42 [aaronlev]
present+
16:06:53 [Joshue108]
<Intros>
16:08:07 [Joshue108]
hmm
16:08:22 [Joshue108]
hEn!2xGg!u
16:08:28 [Joshue108]
try that
16:08:35 [Joshue108]
or
16:08:36 [Joshue108]
4613145760
16:08:54 [Joshue108]
it gives two for some reason
16:09:13 [bkardell_]
bkardell_ has joined #apa
16:09:17 [Matthew_Atkinson]
Matthew_Atkinson has joined #apa
16:09:48 [Joshue108]
s/hmm/
16:09:56 [Joshue108]
s/hEn!2xGg!u/
16:10:01 [Joshue108]
s/4613145760/
16:10:12 [cyns]
cyns has joined #apa
16:15:56 [SamKanta]
present+
16:16:14 [SteveNoble]
SteveNoble has joined #apa
16:16:27 [SteveNoble]
present+
16:16:57 [Joshue108]
TOPIC: Pronunciation Spec Discussion
16:17:06 [mhakkinen]
mhakkinen has joined #apa
16:17:19 [Joshue108]
JS: Thanks for joining - lets look at this
16:17:32 [Joshue108]
Bridge differences from engineering perspectives.
16:17:40 [Joshue108]
Can Mark or Irfan kick this off?
16:17:53 [Joshue108]
So we can share perspectives etc?
16:18:01 [PaulG]
q+
16:18:04 [Joshue108]
Single vs multiple attributes..
16:18:10 [Joshue108]
ack Paul
16:18:30 [IrfanA]
https://www.w3.org/TR/spoken-html/
16:18:36 [Joshue108]
PG: The goal is create authoring capabilities in HTML
16:18:51 [Joshue108]
We have identified a gap in specs and APIs
16:19:00 [Joshue108]
This is augmentation of AX Tree
16:19:17 [Joshue108]
there are two candidates - one is a single attribute tbd
16:19:35 [Joshue108]
Also there is a multi attribute approach data-ssml currently
16:19:49 [Joshue108]
Currently tech based values
16:20:20 [bkardell_]
q+ to ask about tag review
16:20:25 [Joshue108]
Irf: We need to find a way to expose this
16:20:41 [Joshue108]
JS: The reason it is prefixed data dash is as this is defined in HTML
16:21:01 [Joshue108]
Once we have an implementation, then we go to the HTML group, and ask for a reserved prefix
16:21:06 [Joshue108]
We are a way off that.
16:21:26 [Joshue108]
But we need to get POC built etc. Make sure it works.
16:21:39 [Joshue108]
Then we can get reserved prefix etc
16:21:49 [Joshue108]
ack br
16:21:54 [Joshue108]
ack bk
16:21:54 [Zakim]
bkardell_, you wanted to ask about tag review
16:22:07 [Joshue108]
BK: Is the spoken HTML idea reviewed by TAG?
16:22:12 [Joshue108]
Seems like a good idea.
16:22:32 [Joshue108]
JS: We did that last year - we heard from them, don't ask the parser to change
16:22:58 [Joshue108]
Our current approach is within the scope of current parsing capability
16:23:13 [Joshue108]
BK: I've seen two diff interpretations around the use of these attributes.
16:23:17 [Joshue108]
Where can we discuss?
16:23:26 [Joshue108]
JS: Happy to discuss now.
16:23:48 [Joshue108]
BK: has heard different interpretations of this
16:23:55 [Joshue108]
I think data attributes are fine
16:24:22 [Joshue108]
Some feel strongly that for things that are standard, that isn't appropriate
16:24:27 [Joshue108]
Can we open an issue?
16:24:29 [Joshue108]
JS: Yes
16:24:42 [Joshue108]
It may be on the HTML spec - we are following their guidelines.
16:24:56 [Joshue108]
To drive consistent TTS out put in various envs.
16:25:26 [Joshue108]
Matthew mentioned approach in personalisation , is using data- to drive it over there.
16:25:35 [Joshue108]
To drive personalization
16:25:45 [Joshue108]
JS: You can't get to a W3C REC using data-
16:25:51 [Joshue108]
CR is as far as it will go.
16:25:59 [Joshue108]
Thats the sandbox for data-
16:26:21 [Joshue108]
keep implementations to allay cross site concerns
16:26:34 [Joshue108]
JS: <gives overview of process and IP issues>
16:26:59 [Joshue108]
<And how to progress specs>
16:27:03 [Joshue108]
JS: Does that help?
16:27:23 [Joshue108]
BK: Thats not new but just sharing counter interpretation
16:27:38 [Joshue108]
This seems like a good use case to begin discussion
16:27:58 [Joshue108]
JS: Mentions other specs using this approach
16:28:22 [Joshue108]
Do we have a preference, is the crux here?
16:28:25 [Joshue108]
q?
16:28:41 [Joshue108]
JS: Others ?
16:28:56 [Joshue108]
JS: Dave Tseng, how does that sound?
16:29:06 [Joshue108]
Is multiple attribute preferable?
16:29:50 [Joshue108]
Paul: Did mention an affinity for the multi attribute approach. There is no corollary for JSON as a value.
16:30:18 [Joshue108]
JS: That is one view from one AT , which is fine.
16:30:20 [Joshue108]
q?
16:30:40 [Joshue108]
The difference for AT is around approach
16:31:08 [Joshue108]
The group is more interested in JSON, as it is a single target, selector - info is picked up, the AX can abstract that, augment and provide info
16:31:26 [mhakkinen]
+q
16:31:47 [Joshue108]
JS: The direct read group may be different - things that power our speech recognition devices.
16:32:02 [cyns]
q+
16:32:14 [Joshue108]
GlenG: We do not have a problem parsing the HTML
16:32:32 [Joshue108]
Making fewer calls, from an AT angle, is good - esp if noisy.
16:32:38 [janina]
q?
16:32:41 [Joshue108]
JSON, is good so we can get it all at once
16:32:58 [becky]
ack mhakkinen
16:33:03 [Joshue108]
In a single attribute we can do that.
16:33:03 [Joshue108]
ack m
16:33:06 [tink]
q+
16:33:11 [Joshue108]
MK: Mentions read aloud tools
16:33:25 [Joshue108]
Text Help have a preference for single attribute
16:33:39 [Joshue108]
MS has immersive reader capabilities
16:33:44 [becky]
ack cyns
16:33:56 [Joshue108]
How would they use pronunciation q..
16:34:11 [Joshue108]
CS: Jumping out of the A11y APIs seems like a big step.
16:34:14 [Joshue108]
How did that happen?
16:34:31 [Matthew_Atkinson]
s/pronunciation q../pronunciation cues/
16:34:35 [Joshue108]
JS: We see use cases that provide benefit for AT that doesn't use the AX tree.
16:34:36 [becky]
ack tink
16:34:40 [bkardell_]
s/Some feel strongly that for things that are standard,/Some have expressed to me subtler interpretations that data-* is for something more narrow, I'd like to see if I can get them to share discussion there
16:35:09 [SteveNoble]
q+
16:35:12 [Joshue108]
LW: From those who train and teach - while the JSON attribute is unfamiliar - adding more attributes could be confusing
16:35:38 [Joshue108]
We have spent years explaining around applied semantics, and the ground work of understanding the A11y tree
16:35:49 [becky]
ack SteveNoble
16:35:51 [Joshue108]
And it could be confusing for adoption with a different approach
16:36:23 [Joshue108]
SN: Pearson has an implementation of the single attribute approach - just to channel what Paul G says..
16:36:37 [Joshue108]
<discusses runtime performance>
16:37:01 [Joshue108]
Sniffing and selecting tons of attributes in the DOM will be worse, that teasing out JSON via a processor
16:37:01 [Matthew_Atkinson]
The GitHub comment being referenced is https://github.com/w3c/pronunciation/issues/86#issue-904400398
16:37:22 [Joshue108]
When a TTS player has to rip through the data the round trip is brutal
16:37:43 [Jemma]
Jemma has joined #apa
16:37:53 [Jemma]
present+
16:38:05 [PaulG]
q+
16:38:15 [Joshue108]
SN: Performance will be impacted in the read aloud environment, it's a concern.
16:38:20 [Jemma]
rrsagent, make minutes
16:38:20 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html Jemma
16:38:26 [becky]
ack paulg
16:38:40 [Matthew_Atkinson]
s/comment being referenced/SteveNoble mentioned/
16:39:39 [Joshue108]
PG: With ARIA live the A11y tree gets updated - there is a concern about dynamic content - when there is a special region, the browser has to catch up
16:39:47 [Matthew_Atkinson]
scribe: Matthew_Atkinson
16:39:50 [jamesn]
present+
16:39:55 [Matthew_Atkinson]
Matthew_Atkinson: present+
16:39:57 [Matthew_Atkinson]
scribe: Matthew_Atkinson
16:40:05 [becky]
q?
16:40:35 [Matthew_Atkinson]
cyns: aaronlev/David Tseng: any thoughts?
16:41:17 [PaulG]
q+
16:41:19 [Matthew_Atkinson]
aaronlev: Concerns around how these things will be misused by authors (c.f. live regions). What is the ideal markup that we would want?
16:41:28 [PaulG]
q-
16:41:42 [cyns]
q+ to ask about css pronunciation
16:42:03 [Matthew_Atkinson]
janina: The ideal would be to make SSML a native citizen of HTML. Concern around it not being possible to change validators (not necessarily parsers as mentioned above).
16:42:08 [bkardell_]
q+
16:42:13 [Matthew_Atkinson]
aaronlev: How about providing a separate SSML file?
16:42:37 [Matthew_Atkinson]
janina/becky: Don't think that was considered.
16:42:59 [Matthew_Atkinson]
aaronlev: Want to consider: what's the potential for misuse. Also: platform AX APIs can often be extended to provide more information.
16:43:36 [Matthew_Atkinson]
PaulG: Made a note about homonym attacks (in the document).
16:43:56 [janina]
q?
16:44:13 [janina]
q+
16:44:15 [jamesn]
q+ to echo that authors WILL use things if they are available - just because they can
16:44:43 [Matthew_Atkinson]
aaronlev: Concern around empowering authors to give the users a bad experience (c.f. ARIA can be misused in this way). Interested to hear Glen Gordon [who's on the line]'s thoughts on this. Examples such as inconsistencies inter-site or intra-site.
16:45:09 [Matthew_Atkinson]
aaronlev: Using a single element could increase the risk that the AT can't present things consistently to users?
16:45:12 [becky]
ack cyn
16:45:12 [Zakim]
cyns, you wanted to ask about css pronunciation
16:45:29 [tink]
q+
16:45:57 [Matthew_Atkinson]
cyns: Is there a relationship between this and the pronounciation functionality being proposed for CSS, or are they different use cases. Also concern around author misuse: remember when everyone made all the fonts small and light gray? Worried that a lot of things will get sped up.
16:46:17 [PaulG]
We covered CSS Speech gap analysis here https://w3c.github.io/pronunciation/gap-analysis/
16:46:41 [Matthew_Atkinson]
janina: I think the CSS work is orthogonal to what we're trying to do; we did a gap analysis [URL above] that may have more info.
16:46:42 [PaulG]
section #3
16:46:54 [bkardell_]
could we ++ tink in the queue before me?
16:47:11 [becky]
sure will do that bkardell
16:47:37 [Matthew_Atkinson]
janina: Possibility for misuse: anything that allows extra functionality could be mis-used. Need to be aware of it. But this allows us to do things such as mixing languages within a book (such as historical text).
16:48:34 [cyns]
Can someone drop a link to the use cases in here?
16:48:36 [Matthew_Atkinson]
janina: This also helps confuse wind/wind and tear/tear. Much opportunity for improvement. It's not a proposal that the entire document needs to be marked up for pronounciation. In most cases TTS engines will do reasonably well.
16:49:00 [PaulG]
https://w3c.github.io/pronunciation/use-cases/
16:50:00 [becky]
ack janina
16:50:27 [PaulG]
q+ AT and voice assistants could "learn" from authors
16:50:30 [bkardell_]
q--
16:50:31 [Matthew_Atkinson]
jamesn: Worried about over-use and mis-use. Not sure how we counter this. Yes, it is necessary in certain cases. Not a screen reader user, so unsure if this is a problem: company names. Can see people wanting to put this into their company name. Is it a problem if a compan's site pronounces it correctly, but everywhere else on the web it's incorrect?
16:50:35 [becky]
ack Jamesn
16:50:35 [Zakim]
jamesn, you wanted to echo that authors WILL use things if they are available - just because they can
16:50:42 [becky]
ack tink
16:51:11 [PaulG]
q+ to comment "learning" from authors
16:51:27 [Matthew_Atkinson]
tink: To answer your question jamesn, I would find it useful to hear the canonical company name pronounciation. Can get too used to how the AT pronounces it.
16:51:54 [Matthew_Atkinson]
tink: CSS Speech is catering for a specific set of use cases: it's trying to make the auditory experience less tedious.
16:52:22 [Matthew_Atkinson]
tink: Yes it can be misued: HTML, ARIA, XML are all misused, but think we can mitigate against this, but not stop it.
16:52:56 [Matthew_Atkinson]
tink: For now the CSS Speech media type isn't supported by UAs, sadly. But, different use cases.
16:52:57 [becky]
ack bkardell_
16:54:10 [Matthew_Atkinson]
bkardell_: The use cases are different, but problems are similar in that we need to affiliate nodes with values. Presumably wouldn't have one SSML document for the entire page, nor hundreds/thousands (would be very slow to load from network).
16:55:07 [Matthew_Atkinson]
aaronlev: Was spitballing; though we can load a hundred images for a document. Can we put element IDs in SSML documents? The idea is mainly to avoid adding noise to the markup of the document.
16:55:30 [Matthew_Atkinson]
bkardell_: Whilst the single JSON attribute could be ugly, can also see the benefit of keeping the info together.
16:55:57 [becky]
q?
16:56:01 [Matthew_Atkinson]
bkardell_: Brought this up in the MathML meeting as well, but we could polyfill something like this with existing technologies? Then authors wouldn't need to create the cumbersome JSON attributes.
16:56:26 [becky]
ack PaulG
16:56:26 [Zakim]
PaulG, you wanted to comment "learning" from authors
16:56:26 [Matthew_Atkinson]
aaronlev: This feels a bit like [CSS] background images; it's changing the presentation as opposed to the semantics?
16:57:16 [Matthew_Atkinson]
PaulG: For linking, we did talk briefly about linking external resources (for the next stage of the spec). If SSML came to the document as a first-class citizen like SVG we would look into that.
16:58:06 [Matthew_Atkinson]
PaulG: Performance: TTS uses a lot of heuristics to determine pronounciation. Reducing the need for heuristics may mitigate some performance hit.
16:58:28 [mhakkinen]
q+
16:58:44 [Matthew_Atkinson]
PaulG: Voice assistants might start to learn correct pronounciations e.g. for company names from their official sites.
16:59:07 [becky]
ack mhakkinen
16:59:11 [Matthew_Atkinson]
janina: e.g. Versailles is pronounced differently based on location.
16:59:14 [bkardell_]
interesting point PaulG
17:00:00 [Matthew_Atkinson]
mhakkinen: We have a lot of need for pronounciation in education (ref Pearson's work discussed before). We have looked for a standard solution, e.g. PLS. Pronounciation Lexicon Specification [scribe note: PaulG mentioned this just above].
17:00:28 [Joshue108]
q?
17:01:02 [Matthew_Atkinson]
mhakkinen: We want screen readers, read-aloud tools, etc. to benefit. Another example: pharmaceutical producs. And another: television/film/movie program guides (character names, actor names, etc.)
17:01:26 [Matthew_Atkinson]
janina: We wanted to discuss this near-term problem but didn't intend to take the whole time for this; will summarize.
17:01:46 [cyns]
q+
17:02:09 [Matthew_Atkinson]
janina: Hearing from aaronlev that browsers aren't expected to be a blocker as to which approach is taken. Need more feedback from AT vendors. Is that a reasonable summary?
17:02:18 [cyns]
q?
17:02:52 [becky]
ack cyns
17:02:54 [Matthew_Atkinson]
aaronlev: We _can_ implement anything; we still would need to look carefully at the proposal. We'd want good markup, good API support, good AT support; an end-to-end plan. Doesn't sound like all options have been looked at yet.
17:03:39 [Matthew_Atkinson]
cyns: Have a similar view to aaronlev. The single-attribute approach feels counterintuitive for authors. It doesn't feel very HTML-like. Concerned about readability.
17:03:44 [mhakkinen]
q+
17:03:56 [tink]
q+
17:04:12 [becky]
ack mhakkinen
17:04:17 [Matthew_Atkinson]
aaronlev: JSON can be hard to read.
17:04:33 [Matthew_Atkinson]
cyns: In general, it is OK but as an attribute value it is hard to read.
17:05:18 [cyns]
q+ one of the goals of markup is to be human readable
17:05:32 [Matthew_Atkinson]
mhakkinen: From an authoring tool perspective, authors don't necessarily need to see the output HTML. We have tools already that allow authors to provide pronounciation hints that are intuitive to use. We need a standard way for ATs and others to consume it.
17:05:53 [becky]
ack tink
17:06:03 [Matthew_Atkinson]
tink: Is the idea with the single attribute that the JSON will be in the HTML code, or some external file that will be linked?
17:06:24 [Matthew_Atkinson]
PaulG: Our current implementations/experimentations have the attribute value embedded in the HTML.
17:06:28 [Matthew_Atkinson]
tink: How about an external file?
17:07:08 [Matthew_Atkinson]
PaulG: We've had discussions about this before; have not yet found/developed method to do external linking.
17:07:32 [jcraig]
jcraig has joined #apa
17:07:42 [Matthew_Atkinson]
tink: Providing common rules is very much like CSS and could be of benefit here.
17:07:56 [Matthew_Atkinson]
PaulG: Agree; would be great to have first-class SSML support.
17:08:18 [Matthew_Atkinson]
cyns: Concerns around readability; if it's an external file this is less so. Could this just use CSS?
17:08:38 [jcraig]
q+ to point out that external file would violate the AT privacy guidelines from web platform design principles that Leonie helped author
17:09:04 [becky]
ack jcraig
17:09:04 [Zakim]
jcraig, you wanted to point out that external file would violate the AT privacy guidelines from web platform design principles that Leonie helped author
17:09:11 [Matthew_Atkinson]
bkardell_: There are efforts ongoing to allow authors to create CSS-like languages. (c.f. Houdini)
17:10:04 [cyns]
q+ to say that pronunciation could be used by other things besides AT
17:10:07 [bkardell_]
but it isn't really AT specific, it would apply to many speech agents
17:10:09 [Matthew_Atkinson]
jcraig: The web platform design principles mention the importance of making AT _not_ detectable. Would be good to have SSML in the document, but requesting an external file would be detectable.
17:10:26 [becky]
ack cyns
17:10:26 [Zakim]
cyns, you wanted to say that pronunciation could be used by other things besides AT
17:10:46 [Matthew_Atkinson]
cyns: I think use cases for this extend beyond AT, so not sure this would be useful for fingerprinting. Don't want to end up with what looks like inline CSS.
17:10:47 [bkardell_]
"hey <assistant> read this" is a thing I use all the time - those would be indistinguishable
17:10:50 [aaronlev]
q+
17:10:55 [jamesn]
q+
17:11:18 [Matthew_Atkinson]
janina: Referencing external files could be helpful to avoid repetition.
17:11:20 [becky]
ack aaronlev
17:12:02 [Matthew_Atkinson]
aaronlev: Not sure if proposed, but: for the use case where changing the name of a product/address/company, sounds like we could use a dictionary. Problem: every time that name/phrase/word is announced you'd have to wrap its markup.
17:12:37 [Matthew_Atkinson]
PaulG: We discussed this. Some tags like prossidy or voice can control an entire block. Others like pauses weren't there originally, so need an extra <span>, with single or multi-attribute.
17:12:46 [jcraig]
indistinguishable depends on many factors of entropy... client + accessed this other file + other factors might equal reasonable certainty of AT... FWIW, I think pronunciation rules are necessary. Just trying to point out the complications wrt that particular design principle
17:13:09 [Matthew_Atkinson]
PaulG: The single attribute would, at first, encourage authors to summarize an entire block of text all at once, thus making it hard to update the pronounciation if the text changes.
17:13:19 [Matthew_Atkinson]
PaulG: Would thus need help for developers to keep those in sync
17:13:42 [Matthew_Atkinson]
PaulG: If everything is chopped up (multi-attribute), as a developer I think this would be easier, espeically for hand-coding devs. Interested as to others' views.
17:13:44 [becky]
ack jamesn
17:14:25 [Matthew_Atkinson]
jamesn: Replying to jcraig around detection. We _could_ require browsers to always fetch these files (is an additional complication, but could be managed).
17:15:01 [Matthew_Atkinson]
jcraig: Absolutely agree that pronounication rules need to be defined in some format; just wanted to raise the issue. Has AX API design implications.
17:15:25 [bkardell_]
embeddacble as a css-like would work too and no extra fetch
17:15:31 [aaronlev]
q+
17:15:39 [Matthew_Atkinson]
janina: Seems everyone's agreed on the _need_ but we are still unsure as to single/multiple attributes, and there is the second-order question of external file.
17:16:04 [jcraig]
q+ to ask if l10n/i18n was discussed in this context earlier
17:16:14 [janina]
q+
17:16:28 [Matthew_Atkinson]
Joanmarie: If ATs want a single attribute, but authors want multiple attributes, or vice-versa, the implementation could be to take all the single attributes and parse them all together.
17:16:44 [Matthew_Atkinson]
Joanmarie: We should consider what's best for authors, as a result.
17:16:46 [becky]
ack aaronlev
17:16:48 [jcraig]
q+ to mention l10n both with languages as well as with TTS capabilities
17:17:04 [jcraig]
qv?
17:17:34 [Matthew_Atkinson]
aaronlev: I feel there are many proposals that haven't been made yet, so should continue offline. But for the dictionary resource proposal: this could be something the AT fetches itself (circumventing the privacy issues; allowing caching across sites/domains).
17:17:55 [Matthew_Atkinson]
aaronlev: Seems odd to me that we're going to be saying how to pronounce things, but only in one place.
17:18:09 [PaulG]
"pronunciation" is only for phonemes. There are many more aural expressions from SSML that this spec would allow for.
17:18:16 [Matthew_Atkinson]
aaronlev: ...where it'd be more useful if that was everywhere.
17:18:26 [becky]
ack jcraig
17:18:26 [Zakim]
jcraig, you wanted to ask if l10n/i18n was discussed in this context earlier and to mention l10n both with languages as well as with TTS capabilities
17:18:35 [becky]
Q?
17:18:43 [Matthew_Atkinson]
jcraig: aaronlev: are you implying there's a need for a global registry?
17:18:54 [Matthew_Atkinson]
aaronlev: Not sure, but worth looking into. Consistency is important.
17:19:52 [PaulG]
q+
17:20:00 [Matthew_Atkinson]
jcraig: Has l10n and i18n been discussed? E.g. Homonyms, in different languages/locales. Also different TTS voices may be able to pronounce Spanish and English, but not Chinese. Has any of the rules discussion covered this?
17:20:03 [becky]
ack janina
17:20:33 [Matthew_Atkinson]
janina: We have discussed those naunaces and the need to disambiguate them. The problem is that defining what the correct pronounciation is will change (e.g. wind/wind).
17:20:41 [jcraig]
s/Homonyms, in different languages/locales. /Homonyms may be pronounced differently in languages/locales. /
17:21:27 [Matthew_Atkinson]
[ scribe note: jcraig wished to raise having been delayed in joining, so missed some prior discussion ]
17:21:38 [jcraig]
close/close is a more common homonym in UI in English
17:21:46 [Matthew_Atkinson]
janina: Another example is English, but at different times in history, as proncounciations evolve.
17:22:10 [becky]
ack PaulG
17:23:01 [janina]
q+
17:23:17 [Matthew_Atkinson]
PaulG: A dictionary would be limited to phonemes. We have an example that's wider than this [Vincent Price reading The Raven]; covering audio "performance".
17:24:07 [Matthew_Atkinson]
PaulG: Devs are guided towards specifying the language of the document, and the TTS does the rest. But there is contextual info (such as location) that might impact accent, vernacular, local place names, and that's part of what we're aiming to provide.
17:24:39 [bkardell_]
q+
17:25:14 [Matthew_Atkinson]
PaulG: Voice packs being able to support different proncounciations is another issue that we would need to resolve as an industry, but isn't something we can solve in the spec. Some pre-reading, or meta tags could be added to encourage assistants/AT to load specific voice packs/TTS capabilities to ensure a good experience for the user.
17:25:14 [becky]
ack janina
17:25:39 [Matthew_Atkinson]
janina: Maybe the voice packs issue is a metadata issue.
17:26:28 [Matthew_Atkinson]
janina: Want to revisit Joanmarie's suggestion, as that could give us a path forward. If authoring is easier in multi-attributes, as long as the UAs can expose what the ATs need, that could address this. We should explore this.
17:27:15 [Matthew_Atkinson]
janina: My concern is if we were to have conflicting views accross UAs. Joanmarie's suggested approach could help us address the UA-AT aspect. Does that sound good?
17:27:37 [jcraig]
+1 to glen
17:27:50 [jamesn]
+1 to glen
17:27:55 [Matthew_Atkinson]
Glen: I don't think authors will unanimously agree on whether single-, or multi-attribute approach is easier.
17:28:11 [SteveNoble]
q+
17:28:17 [Matthew_Atkinson]
jcraig: +1; depends on tooling
17:28:24 [Matthew_Atkinson]
janina: I think we have to presume tooling.
17:28:39 [Matthew_Atkinson]
cyns: Still thinking readability is important.
17:29:36 [becky]
ack bkardell_
17:30:29 [Matthew_Atkinson]
bkardell_: There should be some experimentation, particularly with a CSS-like solution. There was discussion of polyfills in last year's TPAC? How are they getting the SSML to the AT?
17:30:39 [mhakkinen]
q+
17:30:45 [becky]
ack SteveNoble
17:31:20 [Matthew_Atkinson]
SteveNoble: Authoring: as mhakkinen said, the people authoring this stuff every day are using authoring tools.
17:32:28 [Matthew_Atkinson]
+1 to general philosophical view that readability is important, though I am not an implementation expert in this field!
17:32:51 [Matthew_Atkinson]
SteveNoble: [demonstrates some content that has been marked up for proncounciation]
17:33:59 [Matthew_Atkinson]
... The authoring tool identifies this as "alternate text for text-to-speech" that allows users to highlight a word, e.g. melancholy, and provide alternate text, e.g. melancollie that the system turns into SSML.
17:34:23 [jcraig]
q+ to demo something similar from the iOS VoiceOver settings
17:34:30 [cyns]
q+ to say that a wysiwyg editors don't address my concerns about human readability of markup. You shouldn't have to use a special tool to write or read markup
17:34:41 [Matthew_Atkinson]
... There are also tables of words and "how to spell these phonetically in the system"
17:34:42 [AndroUser]
AndroUser has joined #apa
17:34:56 [jamesn]
q+ to say that to me this looks like the kind of overuse I fear
17:35:06 [Matthew_Atkinson]
... e.g. dinosaur may be expressed as dinosore
17:35:43 [Matthew_Atkinson]
... Authors may be creating 1,000 SSML fragments per week (though they don't know it as SSML) to correct the way that the TTS pronounces things.
17:36:08 [Matthew_Atkinson]
... [Compares this to creating MathML with a WYSIWYG editor]
17:36:26 [becky]
ack mhakkinen
17:36:55 [Matthew_Atkinson]
mhakkinen: To echo what SteveNoble said, for classrom materials, many states have specific guidelines on pronounciation and we've had to spend time tweaking text so that it'll be pronounced with the right sort of pronounciation or pausing.
17:37:08 [Matthew_Atkinson]
mhakkinen: We've tried to do this without authors having to learn SSML.
17:37:33 [Matthew_Atkinson]
mhakkinen: We've had to create hacks to support whatever sorts of AT/read-aloud we are using at delivery time. We have not necesarily been able to get this into screen readers.
17:37:56 [Matthew_Atkinson]
mhakkinen: E.g. if we altered the way the screen reader pronounced things, this could really confuse Braille users.
17:38:08 [Matthew_Atkinson]
mhakkinen: We don't think this is a challenge for authors with the correct tooling.
17:38:26 [becky]
ack jcraig
17:38:26 [Zakim]
jcraig, you wanted to demo something similar from the iOS VoiceOver settings
17:39:04 [Matthew_Atkinson]
jcraig: [Demonstrates VoiceOver Settings > Speech > Pronounciation for some of our names]
17:39:33 [jasonjgw]
q+
17:40:03 [Matthew_Atkinson]
jcraig: You can speak how you want the term to be pronounced, and the device will interpret this and offer options (that it reads back) from which you can choose.
17:41:18 [Matthew_Atkinson]
jcraig: Users can do this accross the system. Perhaps this could be exposed through WebKit.
17:41:35 [becky]
ack cyns
17:41:35 [Zakim]
cyns, you wanted to say that a wysiwyg editors don't address my concerns about human readability of markup. You shouldn't have to use a special tool to write or read markup
17:41:56 [Matthew_Atkinson]
cyns: Special tools shouldn't be needed to make markup readable.
17:42:30 [Matthew_Atkinson]
cyns: Have you looked at using a polyfil to pull all of the info into the AX API's description field?
17:42:36 [bkardell_]
thanks cyns that was the q I was asking too - your articulation was better
17:42:54 [jcraig]
s/AX API/Accessibility API/
17:42:55 [Matthew_Atkinson]
mhakkinen: We've not done anything specifically for screen readers; our use cases are wider (e.g. read aloud tools).
17:43:03 [Matthew_Atkinson]
mhakkinen: We prefer a standards-based approach.
17:43:32 [PaulG]
The authoring standard also allows for scenarios like kiosks where an individual's AT/voice assistance solution may not be integrated with the content
17:43:44 [Matthew_Atkinson]
SteveNoble: Our support internally is for our own TTS system and read and write extension. TextHelp is another vendor that supports SSML (single-attribute).
17:43:49 [becky]
q?
17:44:24 [jamesn]
ack me
17:44:24 [Zakim]
jamesn, you wanted to say that to me this looks like the kind of overuse I fear
17:44:26 [becky]
ack jamesn
17:44:29 [jcraig]
s/Perhaps this could be exposed through WebKit./Perhaps this could be exposed through WebKit. I don't have a strong preference for whether that's via and attribute, or in a dictionary defined in a page script block or external resource. /
17:44:35 [Matthew_Atkinson]
mhakkinen: Some years ago we prototyped a custom element that allows you to specify pronounciation and a Braiile label, but this didn't solve the problem of getting the screen reader to direct content specifically to TTS vs Braille.
17:45:36 [becky]
ack jasonjgw
17:45:41 [Matthew_Atkinson]
jamesn: Can see the publishing appraoch SteveNoble demo'd working when you have control over the TTS, but this standards approach is much more general. This doesn't seem like an appropriate use case for the wider web.
17:46:18 [jcraig]
+q to agree with ETS comments that any polyfill implementable today may help speech users, but would be harmful to braille users. the standards approach takes longer but is the right path.
17:46:54 [janina]
q?
17:47:19 [Matthew_Atkinson]
jasonjgw: Trying to maximize author convenience and ACKing that this will differ across authors. The ability to define information globally and at the individual text element level seems to have got agreement. There's some flexibility on the UA side as to how it's represented in the markup and it seems possible to tailor the deliverey via the AX API for ATs that will maximize efficiency there.
17:48:15 [Matthew_Atkinson]
jasonjgw: This has some parallels to the work NeilS demo'd at TPAC last week about how to provide disambiguating information on MathML. They're considering the same problems wrt how to specify the markup side and the deliverey side. We should aim to produce a simlar approach in both cases.
17:48:17 [NeilS]
q+
17:48:41 [jcraig]
s/AX API/Accessibility API/g
17:48:47 [jcraig]
ack me
17:48:47 [Zakim]
jcraig, you wanted to agree with ETS comments that any polyfill implementable today may help speech users, but would be harmful to braille users. the standards approach takes
17:48:50 [Zakim]
... longer but is the right path.
17:48:51 [becky]
ack jcraig
17:48:52 [Matthew_Atkinson]
jasonjgw: That might help the discussion along. There were broader issues raised in the agenda, but these seem to have specific parallels across the work of different groups.
17:48:57 [bkardell_]
I agree it is very hard for me to not see the interrelationship here -- they might not be the exact same thing, but they certainly seem to have some intersection of concerns
17:49:25 [Matthew_Atkinson]
jcraig: Wanted to +1 the ETS comments: bending existing pronounciation rules in specific contexts would be harmful to Braille in the general context.
17:50:05 [Matthew_Atkinson]
jcraig: The standards approach is the right approach for our use-cases; don't see a polyfill approach working.
17:50:08 [becky]
ack NeilS
17:50:37 [jcraig]
q+ to respond
17:50:47 [Matthew_Atkinson]
NeilS: Our (MathML)'s question was: if we're polyfilling, what's the target? Can't use aria-label as would negatively affect Braille. There is no target in the AX API. Seems we have to add something.
17:50:54 [becky]
ack jcraig
17:50:54 [Zakim]
jcraig, you wanted to respond
17:51:56 [Matthew_Atkinson]
jcraig: +1; MathJax polyfil is a good example as it degrades the user experience when the platform has wider features (such as conversion to Nemeth Braille, which is bypassed by the Polyfill).
17:52:23 [Matthew_Atkinson]
becky: Does anyone want to provide a summary, or next steps?
17:53:07 [jcraig]
s/AX API/Accessibility API/g
17:53:30 [Matthew_Atkinson]
bkardell_: The single-attribute version is not pretty, but if we could figure out how to plumb that down so that it could be used for polyfills/idea experimentation, we could always sugar on top of it (e.g. like with CSS, you can use inline style attributes, but normal humans authoring HTML wouldn't—we could have a similar abstraction).
17:53:44 [kirkwood]
present+
17:54:23 [cyns]
q+ to ask if we can have the broader api discussion in the second session
17:54:32 [jcraig]
q+
17:54:52 [becky]
ack cyns
17:54:52 [Zakim]
cyns, you wanted to ask if we can have the broader api discussion in the second session
17:55:06 [Matthew_Atkinson]
cyns: Could we have the broader AAPI discussion in the second session?
17:56:02 [becky]
ack jcraig
17:56:09 [Matthew_Atkinson]
mhakkinen: Helpful discussion; lots for the TF to consider.
17:57:23 [Matthew_Atkinson]
jcraig: Is there a subset of spoken presentation in HTML draft that you'd recommend (some things such as prossidy have been mentioned as out of scope). In order to get this on an implementation schedule, suggest cutting it down and agreeing on any non-contraversial aspects, as could then get implementations behind runtime flags.
17:58:38 [Matthew_Atkinson]
janina: One thing holding us back is to decide on the either/or with respect to attributes.
17:59:21 [Matthew_Atkinson]
jcraig: Could include a dictionary inside a <script> tag in the page, and only resolve the attribute issue later when you are addressing the issue of providing info for specific elements.
17:59:58 [Matthew_Atkinson]
RRSAgent, make minutes
17:59:58 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html Matthew_Atkinson
18:00:17 [Matthew_Atkinson]
becky: janina: thanks everyone!
18:00:23 [jcraig]
s/when you are addressing the issue of providing info for/when you need to provide pronunciation on/
18:00:50 [jcraig]
rrsagent, make minutes
18:00:50 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html jcraig
18:00:50 [Matthew_Atkinson]
(general agreement on productive discussion from everyone)
18:01:11 [becky]
matthew are you good posting the minutes to apa?
18:01:18 [janina]
rrsagent, make minutes
18:01:20 [RRSAgent]
I have made the request to generate https://www.w3.org/2021/10/26-apa-minutes.html janina
18:01:20 [jcraig]
present+
18:01:21 [janina]
I can do it!
18:12:16 [jcraig]
Throwing out some not-even-half-baked ideas after the meeting...
18:12:45 [jcraig]
navigator.pronunciationDictionary.append(
18:12:45 [jcraig]
['ipa', term1, ipa_string1],
18:12:45 [jcraig]
['ipa', term2, ipa_string2]
18:12:45 [jcraig]
);
18:13:01 [jcraig]
navigator.pronunciationDictionary.append(['ipa', term3, ipa_string3]);
18:13:17 [jcraig]
in-page script block or external
18:14:33 [jcraig]
something like that could be associated with reflected content attributes later, but it would not preclude run-time-flag implementations based on the "which attributes?" question.
18:15:00 [jcraig]
cc IrfanA mhakkinen
18:16:38 [jcraig]
cc bkardell_ NeilS
18:18:44 [jcraig]
window.pronunciationDictionary.append([
18:18:44 [jcraig]
[term1, ipa_string1],
18:18:44 [jcraig]
[term2, ipa_string2]
18:18:44 [jcraig]
], 'ipa'); // or this more terse version? also, window or doc obj seems better than navigator.
18:20:08 [jcraig]
could also work with a site-wide dictionary: <script src="../en/speech_dictionary.js">
19:07:05 [stevelee]
stevelee has joined #apa
20:09:08 [bkardell_]
sorry jcraig I had to run to the courthouse to handle some things for my dad's estate right after this meeting... but I suppose that is one option, close to the level I was suggesting to start with, if all we care about is pronounciations in a dictionary... I am personally unconvinced of that but it would be something
20:11:18 [bkardell_]
like, I'm not sure that helps math, nor lots of cases for say-as for example- and can imagine lots of documents contain 2 short words or symbols which could actually be different even
20:13:55 [bkardell_]
I guess I would also have to understand more about how that would work in terms of matching - I assume since you are suggesting this that it would be pretty easy to pipe that down so that there was some matching done... somewhere... to provide to the speech engine rather easily (more than ssml would?)
20:14:17 [bkardell_]
I guess I would also have to understand more about how that would work in terms of matching - I assume since you are suggesting this that it would be pretty easy to pipe that down so that there was some matching done... somewhere... to provide to the speech engine rather easily (more than ssml would? - I think a lot of engines do actually support ssml if passed correctly)
20:24:53 [Zakim]
Zakim has left #apa
20:48:38 [mhakkinen]
jcraig... like the half baked idea, but this doesn't seem to solve cases (or does it) where there may be more than one pronunciation of a word active in a given page, especially in an assessment where words out of context wouldn't cue an intelligent TTS... e.g., read (reed) vs read (red). I can see tagging a word with a specific idref to a dictionary entry that could be loaded. PLS wouldn't work, as there isn't an id associated with entries [CUT]
20:50:53 [mhakkinen]
pronunciations allowed per word entry). PLS isn't a bad model, for the basic dictionary, if web content referenced the PLS in the document head via a link, and Read Aloud, screen reader or voice assistants picked it up. But as it stands, PLS is incomplete for everything we need.
20:51:04 [jcraig]
Sure. The API would need several more optionals, such as:
20:51:10 [jcraig]
lang/locale
20:51:56 [jcraig]
homophones in the lang.... probably by part of speech like read (verb) versus read (past tense verb or adjective)
20:52:28 [jcraig]
the IDREF idea is interesting...
20:55:00 [jib]
jib has joined #apa
21:13:23 [bkardell_]
https://www.irccloud.com/pastebin/EWiiEjZ3/
21:15:58 [bkardell_]
I think here we could decide what that actually looks like - maybe it isn't json or maybe it is, but the idea of a thing we can rather simply say "let's define that very limited thing and make it possible" would allow lots of experiments of expression and serialization/connection.
21:16:30 [bkardell_]
I think here we could decide what that actually looks like - maybe it isn't json or maybe it is, but the idea of a thing we can rather simply say "let's define that very limited thing and make it possible by without figuring out serialization/etc" would allow lots of experiments of expression and serialization/connection.
21:38:09 [mhakkinen]
+1 to both jcraig and bkardell_
23:56:23 [cyns]
cyns has joined #apa
23:56:42 [cyns]
I will be 15-20 minutes late to the ARIA/APA meeting.
23:59:55 [becky]
becky has joined #apa