Meeting minutes
<JRJurman7> Room is muted
AramZS: are there limits on biasing?
Evan: there is a limit but it's very high.
Paul: hash table lookup. complexity in the noise compared to the actual speech reco.
cwilso: awesome to see this moving. standardization plans?
Paul: timeline not perfect as we just had rechartered. came after.
Paul: need to wait for next charter. adopted by the wg.
cwilso: if you are ready you can rechartered whenever.
Evan: tackling the other part of webspeech next year (speech gen)
Sushan: our model doesn't have a confidence score.
Paul: mozilla has one.
Evan: was proposed way back then when accuracy wasn't good.
Paul: open an issue to make it clear that it's not always there.
Sushan: on quality, word error rate based?
Evan: want to let site choose between different quality; still an exploration area
Sushan: parallels with structured output for biasing.
Evan: cloud speech API has similar biasing mechanism. Exploring LLM based approach in Chrome; Mozilla already using one.
Handellm (Markus; google): when would you select low or high
Evan: could be depending on other features being run, balancing usage of resources; could also be the UA doing the best it can given workload.
Handellm: nitpicking but could be power efficient or other labels
Hadrien: TTS related; more is needed. very broken in current state, voice selection issues (quality, gender); voices not listed, bad voices; events issues (pausing not working). ebooks field, users need TTS.
… curious to hear your ideas
Evan: been neglected thus far; we will be looking at it from Chrome perspective.
Paul: same for Mozilla
Hadrien: investment in read aloud type features but none of that is exposed.
Evan: yeah, we also had similar situations with speech to text (e.g. live caption), working on it.
Ningxin: quality related; if dev select high, download size; resource consumption, etc
Evan: high level at the moment; tradeoff between offering more knobs and being stuck with it in the future; potential fingerprinting bits; lots of TBD; some folks are against adding hints; but there has been requests from dev.
Ningxin: in WebML WG, how can dev give hints about what they care about for instance power efficiency (VC / Meet / zoom scenario).
Evan: could be part of the hints approach. We now supports multiple streams
Ningxin: re customization; users with speech handicaps => users' data for those special cases. Thoughts?
Evan: there are teams in google but we haven't worked closely with them yet; can make a note.
Paul: likely the models will be open, community could contribute; dataset available to all for building models.
Tarek: plans to open the API for pointing at a different model?
Paul: we are using different implementations and models
Anssi: speech installation method to download packs; built-in AI is similar in that regard; are you talking with this team?
Evan: yes
Mike: yes, awesome demo; got to polyfill an example with the Prompt API's audio input; showed the prospect of polyfilling this witt other models;
q:
Anssi: contrast with built-in AI API?
Evan: main difference: builtin ai allow to monitor progress, but not for web speech.
Evan: we are embracing similar patterns where possible (e.g. anti fingerprinting).
Anssi: improves ergo
Kenji: trying to align where possible; web speech API has been a thing for a long time so we can't break it.
Sushan: mediastream, timestamps?
Paul: on the result. Mediastream as a timed source so it works
Sushan: ...
Paul: all events have timecode
Handellm: [...]
Guido: discussion tomorrow related to the topic. WebRTC Media joint session.
Sushan: rate of reco when using mediastream? faster than the event?
Paul: clocked to realtimesource; audio device tied.
Paul: with the proposal for burst you could issue the events as fast or as slow as you want
… control the pace
Evan: Speech synthesis already has local processing option
Paul: doesn't require heavy resources
msw (mike): like model quality; quantifiable metric on error rate; would that be a reasonable requirement that dev could provide? other domains that they may want to provide like faster than realtime?
Sushan: good idea but challenging; Dev may want Language support despite higher error rate.
Sushan: maybe raise an issue.
Evan: will get an issue to continue the feedback