W3C

– DRAFT –
Web Speech API: New Features & Future Directions

12 November 2025

Attendees

Present
anssik, AramZS, cwilso, Hadrien, Kenji_Baheux, shiestyle
Regrets
-
Chair
evanbliu, Paul Adenot
Scribe
kbx

Meeting minutes

<JRJurman7> Room is muted

AramZS: are there limits on biasing?

Evan: there is a limit but it's very high.

Paul: hash table lookup. complexity in the noise compared to the actual speech reco.

cwilso: awesome to see this moving. standardization plans?

Paul: timeline not perfect as we just had rechartered. came after.

Paul: need to wait for next charter. adopted by the wg.

cwilso: if you are ready you can rechartered whenever.

Evan: tackling the other part of webspeech next year (speech gen)

Sushan: our model doesn't have a confidence score.

Paul: mozilla has one.

Evan: was proposed way back then when accuracy wasn't good.

Paul: open an issue to make it clear that it's not always there.

Sushan: on quality, word error rate based?

Evan: want to let site choose between different quality; still an exploration area

Sushan: parallels with structured output for biasing.

Evan: cloud speech API has similar biasing mechanism. Exploring LLM based approach in Chrome; Mozilla already using one.

Handellm (Markus; google): when would you select low or high

Evan: could be depending on other features being run, balancing usage of resources; could also be the UA doing the best it can given workload.

Handellm: nitpicking but could be power efficient or other labels

Hadrien: TTS related; more is needed. very broken in current state, voice selection issues (quality, gender); voices not listed, bad voices; events issues (pausing not working). ebooks field, users need TTS.
… curious to hear your ideas

Evan: been neglected thus far; we will be looking at it from Chrome perspective.

Paul: same for Mozilla

Hadrien: investment in read aloud type features but none of that is exposed.

Evan: yeah, we also had similar situations with speech to text (e.g. live caption), working on it.

Ningxin: quality related; if dev select high, download size; resource consumption, etc

Evan: high level at the moment; tradeoff between offering more knobs and being stuck with it in the future; potential fingerprinting bits; lots of TBD; some folks are against adding hints; but there has been requests from dev.

Ningxin: in WebML WG, how can dev give hints about what they care about for instance power efficiency (VC / Meet / zoom scenario).

Evan: could be part of the hints approach. We now supports multiple streams

Ningxin: re customization; users with speech handicaps => users' data for those special cases. Thoughts?

Evan: there are teams in google but we haven't worked closely with them yet; can make a note.

Paul: likely the models will be open, community could contribute; dataset available to all for building models.

Tarek: plans to open the API for pointing at a different model?

Paul: we are using different implementations and models

Anssi: speech installation method to download packs; built-in AI is similar in that regard; are you talking with this team?

Evan: yes

Mike: yes, awesome demo; got to polyfill an example with the Prompt API's audio input; showed the prospect of polyfilling this witt other models;

q:

Anssi: contrast with built-in AI API?

Evan: main difference: builtin ai allow to monitor progress, but not for web speech.

Evan: we are embracing similar patterns where possible (e.g. anti fingerprinting).

Anssi: improves ergo

Kenji: trying to align where possible; web speech API has been a thing for a long time so we can't break it.

Sushan: mediastream, timestamps?

Paul: on the result. Mediastream as a timed source so it works

Sushan: ...

Paul: all events have timecode

Handellm: [...]

Guido: discussion tomorrow related to the topic. WebRTC Media joint session.

Sushan: rate of reco when using mediastream? faster than the event?

Paul: clocked to realtimesource; audio device tied.

Paul: with the proposal for burst you could issue the events as fast or as slow as you want
… control the pace

Evan: Speech synthesis already has local processing option

Paul: doesn't require heavy resources

msw (mike): like model quality; quantifiable metric on error rate; would that be a reasonable requirement that dev could provide? other domains that they may want to provide like faster than realtime?

Sushan: good idea but challenging; Dev may want Language support despite higher error rate.

Sushan: maybe raise an issue.

Evan: will get an issue to continue the feedback

Paul: https://webaudio.github.io/web-speech-api/

Minutes manually created (not a transcript), formatted by scribe.perl version 248 (Mon Oct 27 20:04:16 2025 UTC).

Diagnostics

No scribenick or scribe found. Guessed: kbx

Maybe present: Anssi, Evan, Guido, Handellm, Kenji, Mike, Ningxin, Paul, q, Sushan, Tarek

All speakers: Anssi, AramZS, cwilso, Evan, Guido, Hadrien, Handellm, Kenji, Mike, Ningxin, Paul, q, Sushan, Tarek

Active on IRC: anssik, AramZS, breakout-bot, cwilso, Hadrien, handellm, JRJurman7, kbx, msw, ningxin, shiestyle, tidoust