W3C

– DRAFT –
Spoken Pronunciation Task Force Teleconference

01 December 2021

Attendees

Present
Alan_, Dee, IrfanA, janina_, JF, NeilS, Roy, Sam_K, Steve_Noble
Regrets
-
Chair
Irfan Ali
Scribe
bkardell_, Dee

Meeting minutes

<JF> GM all.

Follow up on the ARIA/APA meeting at TPAC

<Roy> Minutes

jcraig: there have been a number of adjacent meetings on this recently...

<janina_> TPAC Minutes: https://www.w3.org/2021/10/26-apa-minutes.html

<Dee> James: There were multiple meetings about pronunciation. General interest in solving the pronunciation problem. What has bogged us down is that previous specs tried to solve too much. Path forward, don't try to solve everything. Incubate and get some some prototypes out.

<Dee> JF: Can you help define this further?

<Dee> James: Suggest not including things like pauses and other secondary features.

<Dee> JF: The primary problem we are trying to solve is the pronunciation. Struggle to understand where to draw the line.

<jcraig> document.pronunciationDictionary.append([

<jcraig> [term1, ipa_string1],

<jcraig> [term2, ipa_string2]

<jcraig> ], 'ipa');

<jcraig> document.pronunciationDictionary.append([

<jcraig> [term1, soundslike_string1],

<jcraig> [term2, soundslike_string2]

<jcraig> ], 'soundslike');

<jcraig> tomato - toe-may-toe

<JF> https://www.w3.org/TR/spoken-html/#break

<Dee> Mark: One of the things that drove the initial set of features was pausing in the assessment realm.

<JF> +1 to Markku

<IrfanA> +1

<Dee> Neil: Pausing is important for math.

<Dee> James: Incubation is important. Solve primary problem first and then move to the other features, prosody etc.

<Dee> Alan: Pausing is critical.

<Dee> Janina: We came to a very specific conclusion after TPAC. We heard that we shouldn't worry about the browser agents. Single attribute was preferred by the authoring side.

<IrfanA> +1 to Janina

<Dee> Janina: Agrees that pausing is important.

<JF> +1 - I'd go so far as to say that "break" is the second most required need after actual "pronunciation"

<Dee> Brian: We can focus on key capabilities that do not also involve solving serialization right now. Step one of all of this has to be something associating actual data with the tree and getting that down to the AT. I would suggest that if we have that, a lot of the other questions will flow out of this and there are opportunities to rapidly explore and improve what's above that Then we can explore other options.

<Dee> Irfan: We have several use cases for break.

<Dee> Cynthia: Fundamental question. This feels like presentation so why is this markup and not CSS?

<Dee> JF: Yes, but it is non-visual. CSS sits in the visual.

<jcraig> I was somehow removed from the queue

<Dee> JF: We have 8 proposed attributes. If this is too many, what is an acceptable number?

<Dee> Irfan: Break is also important.

<jcraig> Maybe a way to import a dictionary from a url, for canonical state testing pronunciation (ETS & Pearson use case) or crowd-sourced dictionaries

<jcraig> <script src=“http://wikipedia.org/en/speech_dictionary.js”>

<jcraig> so this is just for the page not globally right?

<jcraig> for the record, I am not proposing the “web” global dictionary, just showing that would be easy for a script-based solution.

<jcraig> yes, just for the current page

<jcraig> same scope as JS for other things

<Dee> James: May be opportunities for specific overrides.

<mhakkinen> +1 to element specific overrides

<Dee> Irfan: Want to answer Cynthia's question. We performed a gap analysis. There are features that CSS does not support.

<Roy> https://www.w3.org/TR/pronunciation-gap-analysis-and-use-cases/

<Dee> Neil: Second Brian's idea.

<Dee> Neil: It seems like pronunciation itself is fundamentally different from the other features.

<Zakim> JF, you wanted to add there is also 'state' and 'context' use-cases: read versus read, resume versus resume

<jcraig> homonyms are valid though... read/read is a great example. but mostly solved by ML content in a sentence. out-of-context homonyms or abbreviations are the harder problems.

<JF> not to forget i18n concerns as well

<jcraig> close/close button

<Dee> Janina: common homonyms - if there is a quicker markup, that would make sense.

<cyns> doesn't <span lang="fr-ca">Foliot</span> handle the i18n issue?

<JF> @cyns - no

<Dee> Mark: Experimentation. If you look as the assessment companies (Text Help), they are doing this with read aloud tools. We have not been able to do this with screen readers.

<Zakim> jcraig, you wanted to mention CSS Speech implementations and to respond to the homonym comments

<cyns> @jf is that a lack of implementation or a fundamental problem with using lang as pronunciation guide?

<IrfanA> https://demos.learnosity.com/partners/texthelp.php

<mhakkinen> TextHelp Speechstream: https://www.texthelp.com/products/speechstream/

<mhakkinen> words out of context in multiple choice options

<Dee> James: Homonyms such as read vs read(red) don't cause issues in context.

<JF> @cyns I believe it is the latter - but I can't say for sure

<Dee> Mark: In the assessment context, words may read correctly in context, but then are also presented out of context in response options.

<NeilS> For math, "a" has both long and short sounds, but math always needs the long "a" sound. Part of speech won't help

<Dee> Irfan: we need to make a decision on how to proceed.

<Dee> Brian: The dictionary approach may be better as a second step. Seems to cater to the authoring experience.

<jcraig> agree with the "second step" suggestion

<Dee> Brian: We have a fundamental problem. There is an element that needs pronunciation data. How do we communicate that to the AT?

<Dee> Brian: Solve the fundamental problem as a first step.

<Dee> Aaron: In the long run, for authoring, we would want a global solution. As a first step support the local markup.

<Zakim> cyns, you wanted to say what's the minimum thing needed to be able start experimenting? Could that thing be a polyfill to start with?

<JF> @cyns can it be "things" (plural)

<Dee> Cynthia: What is the minimum thing needed to start experimenting?

<Dee> JF: Can it be things?

<Dee> Janina: We should think about what is needed to experiment with screen readers?

<Zakim> jcraig, you wanted to respond to Aaron re: element.... but not ballooning attributes. maybe element.pronunciation?

`element.speakAs = ?` where ? is maybe ipa or maybe ssml or ...? and where the api lives and name is also bikesheddable is basically what I think

<Dee> James: Responding to Aaron about element, I would prefer not to balloon a lot of attributes (I think pronunciation would require at least two) so perhaps piggyback on the element object in JS: element.pronunciation... or Brian's suggestion. Could make it easier to experiment for other use cases like Math, too.

<Dee> Janina: Sounds like we are not finished here. We have identified some of the core concerns.

+1 to jcraig

<cyns> i need to drop

<JF> Gotta dash - bye all, and thanks to aria wg for swinging by

<jcraig> rrsagent. make minutes

Minutes manually created (not a transcript), formatted by scribe.perl version 159 (Fri Nov 5 17:37:14 2021 UTC).

Diagnostics

Succeeded: s/We can focus on the basic problem./We can focus on key capabilities that do not also involve solving serialization right now. Step one of all of this has to be something associating actual data with the tree and getting that down to the AT. I would suggest that if we have that, a lot of the other questions will flow out of this and there are opportunities to rapidly explore and improve what's above that

Succeeded 2 times: s/navigator.pronunciationDisctionary/document.pronunciationDictionary/g

Succeeded: s/assessment companies/assessment companies (Text Help)/

Succeeded: s/The primitive is to piggyback on the element object in JS. Easier to experiment./Responding to Aaron about element, I would prefer not to balloon a lot of attributes (I think pronunciation would require at least two) so perhaps the piggyback element object in JS: element.pronunciation... or Brian's suggestion. Could make it easier to experiment for other use cases like Math, too./

Succeeded: s/so perhaps the piggyback element object/so perhaps piggyback on the element object/

Maybe present: jcraig