Voice Agents

Facilitator: Kazuyuki Ashimura

During one of the breakout sessions at TPAC 2019 in Fukuoka, there was discusison about needs for improved voice agents for web services. And we'd like to proceed with the preparation for the expected W3C Workshop on User-friendly Smart Agents on the Web

Minutes (including discussions that were not audio-recorded)

Previous: Making math a first class citizen on the Web All breakouts Next: Secure Data Store Group

Transcript

Starting with the general background.

Yes so there's so many technologies, (indistinct) advancement, and also speech technologies in general.

Apple Siri, Google Voice Agent, Amazon Alexa, and so on where they're available.

And the voice agent is getting one of the essential applications these days.

And I can, or you can use the voice technology for TV sets or audio systems or some kind of a kiosk kind of services.

And then on the other hand, during the week, I can make the font a bit bigger, for accessibility.

So one of the breakout sessions at TPAC 2019 in Fukuoka last year.

Yeah, there was discussion about the potential need for improving voice agents and voice technology in general.

Specifically for web services and I believe there's so many viewpoints and expected use cases for this purpose.

And so we should focus on several points.

For example, introduction with smart devices, controlled from the web browsers, interoperability and access to controls for accessibility and usability and smart navigation, for example.

And probably multiple vendor integration is one of the key points here.

And then discuss what is missing and what is really needed.

From the global viewpoints.

And the possible topics should include.

Actually there have been several comments so far and I've included all the comments already and the possible topics.

Just example includes a summary of the current status.

What we already have.

And common issues or the identified and the missing features and then need of users and the developers.

Two viewpoints here is very important, I think.

And smarter interaction for easier use cases.

And for example, short and clear commands using mixed input including voice and data modalities like handwriting, or gesture and a smart dialog model between human and the system.

And also adjusting the system behavior based on the use of the responses.

Real time.

And then improved pronunciation for speech synthesis in various languages, not only English, but also yeah, various language.

And there's zero input from the accessibility group as well.

And then applying the advanced voice technology for web services, like speech style, expression, feeling, emotion and so on.

And then dealing with both the input entities and output entities supposedly devices and the possible applications, from various vendors.

So multiple, the mash-ups is one of the key as well.

And then presentation issues such as how, what and when to transfer necessary information from input entities and present it to the output entities.

So the timing and also how, and when, and using which modality.

And then integration on multiple interchangeable modalities.

Handwriting typing and voice, so vice versa.

And then also possible topic and also possible session could be underlying technologies.

Like smart dialog management model from research viewpoint and applications services, and also devices for defined multiple vendors for multiple integration.

And then unified data format and the protocol for data transfer.

And then state transition management for service lifecycle management.

And also natural language processing and the resources for that purpose, like phonetic database, parallel corpola, and so on.

And possible improved model and architecture for voice interaction and expected technology.

And we need to be remember some more horizontal platform and items.

So discovery of resources, privacy security, and accessibility and usability, and also internationalization compatibility with region specific technology.

And then example of related use cases should include a voice agent, connected car, smart homes, smart factories, smart cities and usual ordinary smart speakers should be also and smart phones also should be included, and IoT services in general.

Just for information or example, I put hybrid TV integration in Japan, but this is not the one specific example, but just one example.

And another example could be the user saying something for example, to the TV set saying, pay something for the TV.

And the voice assistant will speak here is what I found.

And this kind of human, something quite similar to a human is very important, and this should be useful for actual interaction between the user and the system.

And who to attend.

This section should be improved and to be added.

So the various users, developers, even government and might be involved.

That is the current situation so far, it's just 30 minutes.

And I'd like to hear from you all, for the next steps and including who is interested in joining the program committee for the expected workshop.

Sponsors

Platinum sponsor

Media sponsor

For further details, contact sponsorship@w3.org