07:51:15 RRSAgent has joined #web-speech-api 07:51:19 logging to https://www.w3.org/2025/11/12-web-speech-api-irc 07:51:19 RRSAgent, do not leave 07:51:20 RRSAgent, this meeting spans midnight 07:51:20 RRSAgent, make logs public 07:51:22 Meeting: Web Speech API: New Features & Future Directions 07:51:22 Chair: evanbliu, Paul Adenot 07:51:22 Agenda: https://github.com/w3c/tpac2025-breakouts/issues/5 07:51:22 Zakim has joined #web-speech-api 07:51:23 Zakim, clear agenda 07:51:23 agenda cleared 07:51:23 Zakim, agenda+ Pick a scribe 07:51:23 agendum 1 added 07:51:23 Zakim, agenda+ Reminders: code of conduct, health policies, recorded session policy 07:51:24 agendum 2 added 07:51:24 Zakim, agenda+ Goal of this session 07:51:24 agendum 3 added 07:51:24 Zakim, agenda+ Discussion 07:51:25 agendum 4 added 07:51:26 Zakim, agenda+ Next steps / where discussion continues 07:51:26 agendum 5 added 07:51:26 Zakim, agenda+ Adjourn / Use IRC command: Zakim, end meeting 07:51:27 agendum 6 added 07:51:27 breakout-bot has left #web-speech-api 08:20:45 shiestyle has joined #web-speech-api 08:25:33 mjwilson has joined #web-speech-api 08:26:50 kbx has joined #web-speech-api 08:27:18 Present+ Kenji_Baheux 08:28:43 smaug has joined #web-speech-api 08:29:04 ningxin has joined #web-speech-api 08:29:55 Kitahara has joined #web-speech-api 08:30:24 hagio has joined #web-speech-api 08:31:30 cwilso has joined #web-speech-api 08:31:37 present+ 08:32:12 JRJurman7 has joined #web-speech-api 08:32:27 Room is muted 08:32:33 present+ 08:35:20 m-alkalbani has joined #web-speech-api 08:37:03 tako has joined #web-speech-api 08:38:12 anssik has joined #web-speech-api 08:38:26 q? 08:39:44 q+ 08:42:00 q+ 08:42:18 AramZS has joined #web-speech-api 08:42:19 q? 08:42:35 present+ 08:42:39 ack kbx 08:43:21 msw has joined #web-speech-api 08:43:28 Hadrien has joined #web-speech-api 08:43:39 present+ 08:43:44 handellm has joined #web-speech-api 08:43:51 q? 08:43:51 q+ 08:44:00 ack ans 08:44:22 q+ 08:45:43 AramZS: are there limits on biasing? 08:45:54 Evan: there is a limit but it's very high. 08:46:16 q+ 08:46:17 Paul: hash table lookup. complexity in the noise compared to the actual speech reco. 08:46:18 q+ 08:46:21 ack ar 08:46:43 cwilso: awesome to see this moving. standardization plans? 08:47:07 Paul: timeline not perfect as we just had rechartered. came after. 08:47:18 Paul: need to wait for next charter. adopted by the wg. 08:47:44 cwilso: if you are ready you can rechartered whenever. 08:47:48 ack cw 08:47:54 q+ 08:48:07 Evan: tackling the other part of webspeech next year (speech gen) 08:48:39 Sushan: our model doesn't have a confidence score. 08:48:45 Paul: mozilla has one. 08:49:03 Evan: was proposed way back then when accuracy wasn't good. 08:49:16 Paul: open an issue to make it clear that it's not always there. 08:49:20 LaurentLM has joined #web-speech-api 08:49:49 Sushan: on quality, word error rate based? 08:50:32 Evan: want to let site choose between different quality; still an exploration area 08:51:00 Sushan: parallels with structured output for biasing. 08:51:29 Evan: cloud speech API has similar biasing mechanism. Exploring LLM based approach in Chrome; Mozilla already using one. 08:51:36 ack next 08:51:58 Handellm (Markus; google): when would you select low or high 08:52:32 Evan: could be depending on other features being run, balancing usage of resources; could also be the UA doing the best it can given workload. 08:53:03 Handellm: nitpicking but could be power efficient or other labels 08:53:12 ack next 08:54:15 Hadrien: TTS related; more is needed. very broken in current state, voice selection issues (quality, gender); voices not listed, bad voices; events issues (pausing not working). ebooks field, users need TTS. 08:54:32 ... curious to hear your ideas 08:55:06 Evan: been neglected thus far; we will be looking at it from Chrome perspective. 08:55:10 Paul: same for Mozilla 08:55:33 Hadrien: investment in read aloud type features but none of that is exposed. 08:55:54 ack next 08:55:58 Evan: yeah, we also had similar situations with speech to text (e.g. live caption), working on it. 08:56:29 Ningxin: quality related; if dev select high, download size; resource consumption, etc 08:57:17 Evan: high level at the moment; tradeoff between offering more knobs and being stuck with it in the future; potential fingerprinting bits; lots of TBD; some folks are against adding hints; but there has been requests from dev. 08:58:13 Ningxin: in WebML WG, how can dev give hints about what they care about for instance power efficiency (VC / Meet / zoom scenario). 08:58:35 Evan: could be part of the hints approach. We now supports multiple streams 08:59:22 Ningxin: re customization; users with speech handicaps => users' data for those special cases. Thoughts? 08:59:40 Evan: there are teams in google but we haven't worked closely with them yet; can make a note. 09:00:20 Paul: likely the models will be open, community could contribute; dataset available to all for building models. 09:00:55 Tarek: plans to open the API for pointing at a different model? 09:01:07 Paul: we are using different implementations and models 09:01:40 Anssi: speech installation method to download packs; built-in AI is similar in that regard; are you talking with this team? 09:01:42 Evan: yes 09:02:13 Mike: yes, awesome demo; got to polyfill an example with the Prompt API's audio input; showed the prospect of polyfilling this witt other models; 09:02:15 q: 09:02:15 q+ 09:02:31 Anssi: contrast with built-in AI API? 09:02:55 Evan: main difference: builtin ai allow to monitor progress, but not for web speech. 09:03:09 Evan: we are embracing similar patterns where possible (e.g. anti fingerprinting). 09:03:23 ack next 09:03:38 Anssi: improves ergo 09:03:56 Kenji: trying to align where possible; web speech API has been a thing for a long time so we can't break it. 09:04:09 Sushan: mediastream, timestamps? 09:04:23 Paul: on the result. Mediastream as a timed source so it works 09:04:40 Sushan: ... 09:04:53 Paul: all events have timecode 09:05:11 present+ 09:06:07 Handellm: [...] 09:06:08 Guido: discussion tomorrow related to the topic. WebRTC Media joint session. 09:06:31 Sushan: rate of reco when using mediastream? faster than the event? 09:06:51 LaurentLM has joined #web-speech-api 09:06:51 Paul: clocked to realtimesource; audio device tied. 09:07:19 Paul: with the proposal for burst you could issue the events as fast or as slow as you want 09:07:22 ... control the pace 09:07:30 q+ 09:07:57 Evan: Speech synthesis already has local processing option 09:08:02 Paul: doesn't require heavy resources 09:08:11 ack next 09:08:47 msw (mike): like model quality; quantifiable metric on error rate; would that be a reasonable requirement that dev could provide? other domains that they may want to provide like faster than realtime? 09:09:21 Sushan: good idea but challenging; Dev may want Language support despite higher error rate. 09:09:41 Sushan: maybe raise an issue. 09:09:59 Evan: will get an issue to continue the feedback 09:10:29 Paul: https://webaudio.github.io/web-speech-api/ 09:11:11 mjwilson has left #web-speech-api 09:11:20 shiestyle has left #web-speech-api 09:11:36 hagio has left #web-speech-api 09:31:54 LaurentLM has joined #web-speech-api 11:03:27 shiestyle has joined #web-speech-api 13:25:25 tidoust has joined #web-speech-api 13:25:30 RRSAgent, draft minutes 13:25:32 I have made the request to generate https://www.w3.org/2025/11/12-web-speech-api-minutes.html tidoust 13:26:01 Zakim, bye 13:26:01 leaving. As of this point the attendees have been Kenji_Baheux, cwilso, shiestyle, AramZS, Hadrien, anssik 13:26:01 Zakim has left #web-speech-api 13:26:06 RRSAgent, bye 13:26:06 I see no action items