W3C

– DRAFT –
Voice Interaction

23 April 2025

Attendees

Present
debbie, dirk, gerard, kaz
Regrets
-
Chair
debbie
Scribe
dadahl

Meeting minutes

GitHub issue 65, compare semantic representations

JROSI features

nbest with start and stop times and "medium" and "mode", medium possibly not needed

version

multiple interpretations

every interpretation has tokens, id and confidence

debbie: do we need all semantic structure now because of LLMs

dirk: is emma really suited to cover LLM systems

debbie: we don't need all this semantic structure today

dirk: maybe only a subset would be needed

debbie: maybe not necessary for interactive systems

dirk: the full EMMA spec would be overkill

debbie: a lot of EMMA is optional

debbie: we're talking about requirements for voice interaction

dirk: we could use some EMMA metadata and multimodal information

debbie: EMMA 1.0 and JROSI don't cover streaming

debbie: should we include streaming?

dirk: we should support streaming to an endpoint

debbie: we could look for streaming in old EMMA 2.0 document and see if we could use that

dirk: maybe use a subset

debbie: work with Open Voice on streaming

dirk: let's look at Open Voice

https://github.com/open-voice-interoperability/docs/blob/main/specifications/DialogEvents/1.0.2/InteropDialogEventSpecs.md

timestamp, id, speakerUri

debbie: looking at section 1.4 of Dialog Events

workshop

<kaz> Draft CfP

<kaz> * should rather concentrate on voice interaction

<kaz> * how to deal with use cases like connected cars?

<kaz> * should mention what would be the impact to the Web platform

kaz: brought CFP to W3C Strategy meeting

kaz: avoid dealing with AI-based agents concentrate on voice

dirk: voice and smart agents overlap

kaz: impact on the web platform, but standards are not just web browser, for example web data
… how to identify person and credentials

dirk: doesn't this overlap with security?

kaz: security is very important to smart agents

dirk: can we include smart agents

kaz: will update proposal
… core target should be voice and multimodal interaction

kaz: will update proposal and take it back to W3C Strategy
… will check on when to form PC

kaz: can start to work on converting MD to HTML

dirk: should wait to create a table of possible topics until the proposal is approved

kaz: next strategy meeting will be in two weeks

Minutes manually created (not a transcript), formatted by scribe.perl version 244 (Thu Feb 27 01:23:09 2025 UTC).

Diagnostics

Succeeded: s/EMMA/streaming

All speakers: debbie, dirk, kaz

Active on IRC: dadahl, kaz