(MEETING TITLE) – 11 December 2023

Meeting minutes

<PaulG> presnt+

https://github.com/w3c/pronunciation/wiki/Assessment-Examples-for-the-Break-Tag

Agenda Review, Membership & Announcements

<PaulG> matatk: we are trying to check attendance at CSUN

<PaulG> ...is anyone attending?

Review Use Case

<PaulG> Alan: is there a reason we're using both commas and breaks?

<PaulG> Dee: sometimes the speech engine doesn't pronounce "A." but pronounces a short vowel 'a'

<PaulG> S_Wood: for clarity we should remove the comma

<PaulG> matatk: these being from real world examples is very valuable and I think we need to be careful to separate the broader context from the specific asks that we have from implementors

<PaulG> matatk: we don't want them to pinpoint what could be attributable to a bug in the TTS engine

<PaulG> matatk: the break in example 4 is not trying to be the comma

<PaulG> matatk: so we don't need to drop it except that it was being used as a workaround for a bug

<PaulG> Dee: totally makes sense to me

PaulG: We use @time in all of these examples.

<gb> @time

PaulG: but users' AT/TTS have different rates of speech.
… There are encoded strengths for breaks of different lengths, e.g. after a comma would be a 'weak' break; after a sentence would be a 'strong' break.
… Devs may be more amenable to coding along the lines of those types of breaks, rather than times.
… This is an opportunity to use SSML as a starting point, but make the names for the breaks more intiutive.
… We could come up with our own terms, or use the ones they have.
… What are your thoughts about using strength over time?

Alan: Is strength relative - you could adjust browser settings (baseline for them).

PaulG: My understanding is it's flexible, so that if you adjust TTS base speed, then breaks are affected proportionally to speech speed factor (e.g. 1.5x)

PaulG: A half-second break could sound like the end of the content to someone who's listening to 450wpm vs 100wpm

Alan: Is 'weak' in Chrome going to be the same as 'weak' in Firefox?

PaulG: Undetermined.

PaulG: I think it's based on the platform
… Even the spec isn't clear about what those tokens are.

PaulG: If we only put @time in our examples, that's all we'll get.

<gb> @time

PaulG: When someone's using their own AT, we want their AT to be able to honor that.
… Time may be OK with a built-in AT (maybe).
… Amazon implements it, and Google has strength.

Alan: I've not used strength, nor rate, in our SSML. Most of our SSML is hand-rolled. When you use Web Speech API, you send a rate separately to SSML.

Alan: Are we saying change some, because we want both?

PaulG: Yes

Alan: Great

PaulG: If someone's concerned because these times aren't well specified, or for any reason, let us know

Alan: I don't think Web Speech was designed to work with SSML

PaulG: It may be that strength maps to something inside the TTS engine, and could be well received

Dee: These examples were real-world examples relating to Amazon Polly use in ETS.

<PaulG> S_Wood: I should talk to mark about all this

<PaulG> ... for this document, we need a varied example base

PaulG: Maybe not specified (strength) to avoid the spec being too English-centric.
… This could be a way to pivot away from time being a make-or-break feature for us.
… If we could get by with pauses being based on existing breaks, and you don't have to add a timer to your software, can we work with that?

<PaulG> https://github.com/w3c/pronunciation/wiki/Reasons-for-Delays-in-Spoken-Content

PaulG: We'll have a meeting next week (unless nobody available) and then not for the following 2 weeks.

PaulG: When we come back, we should take some time to collate these examples.

PaulG: Then we can try to restart conversations with the vendors.

Dee: Would be helpful to know how the browsers are implementing things like break strength in relation to prossidy.

PaulG: Fantastic idea. Maybe we can do some outreach.
… We need devs who have knowledge of the inner workings.

Dee: Important to understand how that works, and how time plays into that.
… If you look up the docs, both time and strength are mentioned.
… Time being optional attribute, strength not.

Dee: That was from 2010's SSML REC

Dee: Could ask Mark?

PaulG: Is everyone here now able to be next week?

Alan: +1

Dee: +1

matatk: +1

PaulG: For next week, could you talk to Mark? Maybe he knows someone who knows?
… We need to be prepared to find different AT does it different ways.

<PaulG> matatk: we need to preserve the "real world" examples but filter it for the people we want to focus on specific implementation details

<PaulG> matatk: the two audiences, our internal folks and the external implementors we need to influence.

<PaulG> ...Alan mentioned how web speech API doesn't fully implement SSML

<PaulG> ...is that a path to solve our problems?

<PaulG> Alan: I just wrote a higher-level tool to help us with some of the issues

<PaulG> ...in some ways Web Speech gives more control

<PaulG> ...over rate and timing

<PaulG> matatk: Can you share the code or examples for that higher-level tool?

<PaulG> matatk: I was just thinking, whenever we show people we should use the TAG explainer format

<PaulG> ...they focus on the user problem being solved

<PaulG> ...and compare how the various solutions (ssml, web speech, etc) deliver on those solutions

https://tag.w3.org/explainers/

Action Items

Github Issues and examples

Other Business

<PaulG> Alan: clarification: web speech is not a solution for these problems

<PaulG> https://www.w3.org/TR/pronunciation-gap-analysis-and-use-cases/#gap-analysis

PaulG: We mentioned Web Speech in the gap analysis, but we didn't include it in the table (probably because it's very separate from the content)

PaulG: Maybe we just call that out, and make it more clear.

PaulG: We can explain why AT users are left behind if things are not moved into the AX tree

S_Wood: I may have some slides that explain this.

– DRAFT –
(MEETING TITLE)

11 December 2023

Attendees