HTML Speech XG f2f meeting in Lyon

Google API demo

Source code explanation

element input with attribute and activation of translate

Satish: There is a TTS API
... which returns audio

Rahul: The microphone picture is related to special attributes?

Bjorn: API add attribute to input
... speech boolean, continue boolean, grammar URL, maxresults for nbest, nospeechtimeout
... two events are added: onspeechchange and onspeecherrore;

Bjorn: two input event: stopSpeechInput, as user stop speaking, cancelSpeechInput, in case of an error;
... first for speech to talk

Bjorn: You get events and you can do all implementations. You don't have a dialog

Robert: There is an example at the end.

Olli: In which document there is the microphone ...

Bjorn: You have to click button to start speech.

Olli: If you have two iFrames with speech ..

Bjorn: What if you have multiple form in the same page.

MikeB: You may activate different speech.

Robert: you have a text box, why not define another button for start speech?

Satish: We had a discussion earlier, open to explore different, but this is clear
... same paradigm

Robert: You are speaking to a page, so speech is a different paradigm. you might be not using a text/input

Bjorn: Open to other options

Rahul: Why type is search?

Bjorn: It is an HTML5 new thing
... for instance type equal speech or text or email might have different SLM

Dan: Might be to specify an ontology of semantic types

Robert: You have pattern like dates, etc, to standardize them.

Bjorn: It might be grammars or SLM as a builtin. It makes sense

Satish: Doing text input you can constranit or to do free form recognition.

MikeB: you might have a pattern for builtin
... in HTML5 they specify types, sometimes very precise at word level, other are generic

Olli: About continue, how that work? Privacy issues

Satish: User can stop recording,

Bjorn: It is not solved issues

MikeB: about sproofing ...

Bjorn: this proposal doesn't solve all use cases, but many.

Debbie: Can process to do barge-in?

Bjorn: No TTS at all, it is a press to talk.

DaveB: Might add an event when pressing the button.

Milan: and barge-in false?

DaveB: You can disable the press button

Debbie: Is it only press to talk?

Bjorn: It is only the first time, then continue on will continue.
... If there is not only the button, but also a start event. It is an extension.
... One problem is that it requires a UI element.

Debbie: Stop TTS by typing, instead of speaking.

MikeB: Touch you can stop, type as well and also with speech

DaveB: annoying to have audio

Requirements - R27 (cnt'd)

Dan: Recognition result should be based upon a standard such as EMMA
... Are there concerns with EMMA?

DaveB: EMMA might brings a lot, inject XML in the DOM is an unnecessary code
... Prefer JSON

Robert: EMMA is extensible
... in the XML sense

MikeB: We have a desire to use EMMA and to have live easier to Web developers.
... Some might be JSON or HTTPrequest (?) or there might be multiple way to discuss it.

Bjorn: Seems more complicated to analyse

DaveB: It is an interchange format

Dan: EMMA is a format for representing input.

Bjorn: It should be easy to access from the web application.
... this should be easy other must be possible
... We won't agree on the full set, but on same.

DanB: Replace 27c and d
... access specific to recognizer

Olli: Argue to avoid to know a recognizer

DanB: reach consensus on 27 first

MikeB: I'll keep 27c should be EMMA and 27d an easy to process format

Satish: might not agree on a format today

Debbie: keep EMMA is fine

Bjorn: I'm fine to allow, but not to require it.

DanB: The recognition results must include information that will be available via EMMA

Bjorn: 27c It should be possible for the web application to get the recognition results in a standard format such as EMMA
... 27d It should be easy for the web appls to get access to the most common pieces of recognition results such as utterance, confidence, nbests

DanB: Even in VXML, you can get EMMA results, but we still like to have shadow variables for normal uses.

Bjorn: There should be a standard way to get that ...
... Whether complex stuff should be in EMMA

DanB: We have agreement on R27.
... R31. End Users not web application authors should be the ones to select speech recognition resouces
... Discussion might be open to who ...

Bjorn: The authors should not be forced to choose
... in email discussion for R31, it might sense that browser specifies the default one,
... on top we might or might not to allow to specify another one.

DanB: There are two different ways: (a) it might be possible for browser to specify, (b) the browser must specify a default

MikeB: R1, R15, R16, R22 are all correlated

Bjorn: The browser must have one, the author can give a hint or browser must implement

MikeB: Calling hint is implying ...

Bjorn: The browser gives the audio to the author, who does whatever he wants.

Robert: You won't be able to do good applications.

DanB: I prefer to start on the agreement part:
... We expect the browser have a recognizer to use

Bjorn: like 15

Paolo: This imply you can change if you want

Bjorn: that is anothe requirement
... It might be reworded as: R15 The browser must supply a speech service that web applications can use

DanB: Agreement reached

MikeB: So far there is agreement, but not sure for other

DanB: R16 agreement on wording?

Bjorn: what does it mean "to exclude"?

DanB: That the audio is accessibe to author and results are treated the same.

MikeB: One is to run another or to select among a list.

Olli: ??

Robert: You might not trust to send a piece of voice, application can do misuse, a third is there are big data implied

Bjorn: example of CSS: default, specify by author, and by user

MikeB: My experience for large grammars, you might have different grammar for different accents and tuning recognizer

DanB: Some authors don't care of the performance, but other care very much of the experience they present.
... They need to customize the customer experience and regardless the browser in use.

Dave: Most web developers don't care.

Bjorn: Example of Google search and bing.

DanB: Is it he user agent or the appl developer?

Satish: Don't want to have different experience on different applications

JimB: The user can state a different one and also make it mandatory, then the author can also modify it, ...

Bjorn: It is a variation of CSS

DanD: In current model you don't need a plug-in, if becomes more obvious the power for more accurate ...
... why local component talks to remote one

Bjorn: SRGS is small, but not on SLM large and proprietary

<burn> what is currently being said (but need to confirm that we understand and/or agree):

<burn> 1. Browser must provide default 2. Web apps should be able to request speech service different from default 3. User agent (browser) can refuse to use requested speech service 4. If browser refuses, it must inform the web app 5. If browser uses speech services other than the default one, it must inform the user which one(s) it is using..

Bjorn: Amazon example

Debbie: asking clarification on refusal by the browser the user selection

DaveB: You can in Chrome specify the search, similar the recognition client might test to interoperable
... It doesn't require a protocol

DanB: Go back. Which are the replacement of for?
... Current wording replaces: R15 (with new wording already captured);

Bjorn: objects R1 is inconsistent with (3)
... the browser can refuse to connect on network, etc

MikeB: 1-5 are agreed, but R1 is not covered competely.

DanB: Try to focus on substitutions:
... R16 covered by 1-5
... R31 covered
... R22 covered, but take content for 1-5
... Some of the content of the removed R should be captured for new 1-5
... R1 should be re-phrased
... R15 (new) is replacement of old R15
... Attempt to address R1, at least agree on changes to it.

Bjorn: If the web appls specify speech services, it should be possible to specify parameters.

MikeB: But you can specify parameters also in the default one speech services.

Bjorn: other difference is "network recognizer"

Robert: We discussed on local / remote

MikeB: There were use cases to be capture, there are things that might happen also in the network case

Dave: concerns on the need of the requirement
... seems to be redundant

Bjorn: New reqs: Speech services that can be specified by web appls must include network speech services
... Remove R1 because difference is capture by the last two new requirements

Thursday meeting at 8:30 in Level 2 - Saint Clair!

- DRAFT -

HTML Speech XG f2f meeting in Lyon - Day1
02 Nov 2010

Attendees

Contents

Google API demo

Requirements - R27 (cnt'd)

Summary of Action Items

- DRAFT -

HTML Speech XG f2f meeting in Lyon - Day1 02 Nov 2010

Attendees

Contents

Google API demo

Requirements - R27 (cnt'd)

Summary of Action Items

HTML Speech XG f2f meeting in Lyon - Day1
02 Nov 2010