W3C

– DRAFT –
voice interaction

27 October 2021

Attendees

Present
bev, debbie, dirk, kazuyuki, mustaq ahmed, paul grenier
Regrets
-
Chair
Debbie
Scribe
ddahl

Meeting minutes

Breakout feedback and expected workshop

<PaulG_> https://www.w3.org/TR/spoken-html/

https://lists.w3.org/Archives/Public/public-voiceinteraction/2021Oct/0012.html

debbie: review discussion from last week's breakout groups

https://web-eur.cvent.com/event/2b77fe3d-2536-467d-b71b-969b2e6419b5/websitePage:efc4b117-4ea4-4be5-97b4-c521ce3a06db

<kaz> https://www.w3.org/2021/10/20-voice-minutes.html

<kaz> https://www.w3.org/2021/10/19-voice-minutes.html

debbie: possibility of a voice workshop

kaz: how to integrate speech API and SSML in a workshop
… organized session with voice interoperability session

kaz: decided to have a workshop, not voice but smart agent workshop
… interoperability, voice interface, accessibility
… some overlap with semantic web? is that too broad?
… when we talk about smart agents
… one or two days, online

kaz: online workshop is much easier

<Bev> Perhaps hybrid online and in person?

kaz: usually takes six months or so, around May

<Bev> Include the Cognitive Inclusion COGA group

bev: could also do a hybrid event
… cognitive inclusion group has some overlap

<Bev> Information Architecture Community Group is also supportive and can participate

kaz: should have a dedicated session on accessibility

debbie: to attend need to prepare a position paper and the program committee will review

<Bev> anyone interested can prepare submission position proposal to program committee

<kaz> e.g., Smart Cities Workshop CfP

debbie: prerecorded videos with captions
… need to be provided

debbie: other topics like Open Voice Network
… could be included

paul: disambiguation in Spoken HTML spec, machine learning has its own heuristics, but in the meantime author-controlled pronunciation would be useful

paul: trying to get feedback from implementers, can't just bring SSML into HTML
… will have some representation of SSML into HTML, especially pronunciation
… could use this in machine learning

paul: word clusters could be modified by IPA
… a layer could map pronunciation to IPA
… and match to user's intent
… language, cultural information is missing
… when input happens, e.g. speech difficulty is like a transform over standard language
… we can transform from word or from sound
… they could have had a stroke or something that altered their speech

bev: iPads for elderly after dental surgery
… speech was different
… could we use this to transform speech

paul: for SpeechHTML this is the first step
… if the system doesn't find a match it could look for transforms
… could be useful in a kiosk situation where user can't add their preferences

kaz: two points, one for speech synthesis and one for speech recognition
… for speech output it would be nice to have another layer to get correct pronunciation

<Bev> Kaz: acoustic model

kaz: for speech input, we might want to include another mechanism

<Bev> Kaz: command input expected actions, speech and gesture

kaz: such as hardware switch, gesture

debbie: also Natural Language Interfaces spec

<kaz> kaz: btw, it would be really nice if you all by chance could join the Program Committee for the expected workshop :)

debbie: can join the program committee

paul: maybe could join

bev: could join program committee
… depends on timing

Architecture document

architecture document https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-2.htm

IPA means "intelligent personal assistant"

dirk: (reviews input architecture)
… provider selection strategies can be used to select providers

dirk: (goes through output path)

bev: question about intent sets
… could you talk about that a little more

dirk: information that could be used to fill in slots

bev: is that a standard?

dirk: for now this is pretty abstract

bev: would that include security information

dirk: thinking in terms of SISR, more like that
… have to distinguish between local intent sets and provider intent sets

debbie: Emotion ML

debbie: could be used in input and output

kaz: don't have any specific comments, should discuss with browser and speech vendors
… should present at workshop
… EMMA would be a good format for all this data

kaz: would like to integrate MMI architecture and SCXML for interaction management with WoT standards for device management
… DID (decentralized identifier) standard, there are many implementers, based on blockchain, should be a Recommendation soon
… that can be used to identify users and devices, also discovery can be handled this way

debbie: next call will be November 10

Minutes manually created (not a transcript), formatted by scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).

Diagnostics

Succeeded: s/...pre/debbie: pre/

Succeeded: s/expe/espe/

Succeeded: i/can/kaz: btw, it would be really nice if you all by chance could join the Program Committee for the expected workshop :)/

Succeeded: i/arch/topic: Architecture document/

Succeeded: i/spoken-html/topic: Breakout feedback and expected workshop/

Succeeded: s/SCXML/SCXML for interaction management with WoT standards for device management/

Maybe present: kaz, paul