<kaz> scribenick: cpn
https://github.com/w3c/strategy/issues/221 Voice Agents Workshop proposal
Kaz: Lots of technologies, speech
technologies, Google Voice Agent, Alexa available
... Can use voice technology with TV sets, kiosk
services
... A breakout at TPAC 2019 there was discussion on the need to
improve voice agent technology, especially for web
services
... So many viewpoints and expected use cases. Focus on
interaction with smart devices, from web browsers, smart
navigation, accessibility
... What is missing, from a global viewpoint?
... We've had lots of comments on the GitHub issue
... Can we identify the missing features, user needs, and
developer needs?
... Smarter interactions, short and clear commands, smarter
dialog model between human and the system
... Support for various languages
... Advanced voice technology: style, expression, feeling,
emotion
... Input and output entities from various vendors
... Timing, how and when, using which modality
... Typing, handwriting, voice
... A possible session could be on underlying technologies:
dialog management from a research viewpoint
... Protocols for data transfer
... State transition management, improved model for voice
interaction
... Also horizontal platform requirements: discovery, privacy,
security, accessibility and usability
... Examples of related use cases: voice agents, smartphones,
smart speakers, connected car, smart homes, IoT in
general
... Example of user asking a TV to play something. A more human
interaction is useful
... We need multiple stakeholders to participate
... Looking for participants, and people to join the workshop's
programme committee
Phil: Thank you for doing this.
I'm interested in taking part
... At the last TPAC meeting, there was a desire to update
SMIL
... GS1 would be interested. Asking the TV to order a pizza,
"where from?" Does the user or the TV choose?
... Any contact with the Open Voice Network? US retailer,
Target, is behind it
<xfq> https://openvoicenetwork.org/
Kaz: I agree, SMIL is important.
I also work for the WoT group, they're thinking about something
similar for serialization of device based services
... You're input is welcome, would you like to be on the
programme committee?
<ddahl> we might be mixing up smil and ssml
Phil: Yes
<phila_> Open Voice network
Mark: Helped start up the APA
pronunciation TF. My organization is interested in this, as
we're looking to solve pronunciation in text-to-speech
... Language learning on web and mobile
... I'm interested to hear about emergence of SMIL in this. I
was involved in SMIL 1 and 2.
... What we're trying to solve in education is more auditory
presentation of content, e.g., by voice assistants
... Make it easy to support, students get better experience
regardless of modality
... I'm interested in participating in the workshop
... Other publishers, such as Pearson could be interested
<Roy> pronunciation TF https://www.w3.org/WAI/pronunciation/
Kaz: The Publishing BG chair is also interested in this activity
Mark: I'm intersested in the programme committe
Xiaoqian: Work for W3C in China. Working on MiniApp, the vendors see a strong need for a markup language to monitor the application by voice. If there's interest in a markup language, the MiniApp vendors would be pleased to try to implement
Kaz: There was a voice browser WG
maybe 15 years ago. We visited Beijing for i18n of speech
synthesis
... Either a markup or an API approach, involving Chinese
stakeholders is important
Deborah: I'm worried about some
of the potential topics. Some are looking to make improvements
in the fundamental voice technology
... Our focus could be better put to interoperability, similar
to HTML as underlying markup for different browsers
<xiaoqian> +1 to ddahl to stay focus
Kaz: We can't work on AI, but
easier way to improve usability maybe using some template like SISR. Need to look at use cases, what kind of things need to improve
... A possibility could be standardizing the interface between
AI services and web services
... Need to look into the detail on that
Deborah: The Voice Interaction CG is looking into this. We have an architecture with standard communication channels
Kaz: Feedback from that CG activity would be great.
<ddahl> https://www.w3.org/community/voiceinteraction/
Joshue: I work in the area of
emerging web technologies.
... I wanted to understand the focus of this work, in relation to devices like Alexa and Siri.
... Where does this sit? Is it about doing similar in the
browser?
... In APA WG there is work to fine tune improvements for text
to speech for users
Kaz: Good point. We need to see what's available for smart speakers, browsers. We should also think about pronunciation for different languages, not just English
Phil: Do you have a sense of where this work might go? A possible WG, for example?
Kaz: The conclusions determined by the workshop itself. But if possible, if we get many actual needs and use cases, we can create a WG and start standardization based on those needs
Phil: One of the reasons we
zeroed in on SMIL at the meeting last year, is that it's
something that could be done.
... Voice assistants is a competitive market, you'd need people
from the assistant vendors in the room
Kaz: Another possibility is an
Interest Group. There are use cases from various stakeholders,
relating to different technologies. We could bring those to a
new possible SMIL group, HTML, CSS, separately. Gap analysis
needs to be done first, then think about potential WGs after
we've done that.
... I'd like to see which level of improvement is needed, and which
process is most appropriate for those needs, as part of the
workshop
<kaz> scribenick: kaz
Chris: a couple of things
... w3c has the web speech api already
... revitalize it might be part of this work
... speech synth ans rec
... also another aspect of use privacy
... voice recording, etc.
<xfq> https://w3c.github.io/web-roadmaps/mobile/userinput.html#exploratory-work
<ddahl> the Web Speech API is only implemented in Chrome as far as I know
<ddahl> I'm not sure there are very many applications using it though
<jib> SpeechSynthesis appears to be in most browsers https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis
Chris: for us, BBC, would like to
see some standardization for multiple devices
... would see desire from vendors
<cpn> scribenick: cpn
Kaz: The Web Speech API is a CG
note, so updating it based on needs and use cases would be
useful
... Also interaction with interactive TV is important to
you?
Chris: Yes
Kaz: Privacy and interaction with cloud services. How to manage the whole sequence of devices and applications could be a possible topic
Michael: Regarding IoT, should coordinate with the WoT WG. There's a lot of overlap
Paul: I'm from the pronunciation
TF. I'd like to see the workshop address better handling of
accents. This may need configuration, constraint matching, to
handle non-native Engish accents. If language processing is
done in the cloud, phrase capture is brief. There should be
better device level control of compressing those clips: wait
for an entire instruction, or that the instruction is complete,
to help reduce user frustration
... Timing and accent for speech disabilities would be huge
improvements
Kaz: I'm also interested in those aspects, from my previous research
Paul: Not sure I could be on the programme committee, depends on level of commitment
Kaz: We can share the work among the committee members
Deborah: We have some big players
in the virtual agent space, but there are some open source
efforts, e.g., Almond from Stanford University
... An open source smart speaker, Mycroft. It has an
intelligent agent. There may be more that would be interested
in this activity
<paul_grenier> https://almond.stanford.edu/
<paul_grenier> https://mycroft.ai/
Kaz: Good point, we should invite them
Deborah: I could find contact information for them
<Zakim> phila_, you wanted to ask about timing
Phil: Do you have a schedule for when to run the workshop?
Kaz: I was planning to hold it
this year, but it would probably be sometime next year
... Based on this feedback, can update the workshop
proposal
... Early next year, possibly
Max: I would like to seem something about content creation. UK Government would like to see its content made available to web and voice agents. People ask simple questions, e.g., how to renew my passport. We don't want to write the content twice.
<mhakkinen> +1 to Max's authoring once to delivering on web and voice
Max: We need to be accurate in both web and voice modalities.
Phil: One area for me is a
clinical setting. There, I want to be able to talk to a box of
medicine to ask the dosage for a patient. How would the
information be structured?
... An interaction with the physical object, then voice
interaction
Paul: From a developer
perspective, it seems that embedding metadata in the content
would be a way to get some of the smart speaker vendors on
board
... This gives author control, by adding metadata to the web
content, e.g., with RDFa
... That might be a way to bring them in, taking some existing
schema and applying it to this new purpose
Mark: From my organization's
perspective, we wanted to push the pronunciation work, as we
want correct spoken presentation across different voice
devices
... The APA website has an example of voice interaction with
pronunciation errors. If you mark-up the content it can help,
but there's no standard way to do that yet
Kaz: Does this work also include language translation, for education purposes, etc.?
Paul: Pronunciation hints are
embedded in HTML, but it's still for the developer to provide
translation. If a page is offered in French and English, the
French pronunciation hints would have to be provided for that
language pack
... For some pronunciations in some language we have incomplete
voice packs
... There's more emphasis on the author in that model
... It may take a while to become automated, needs good ML
models
Kaz: It's a good use case. We need to look at the detailed scenario
Kaz: The next step should be to form the programme committee, then create the webpages with topics for discussion, and think
about scheduling
... Thanks to those of you who agreed to join the programme
committee
... Others, please let me know if you're interested
... I'll create a mailing list for programme committee
discussion
[adjourned]