See also: IRC log
Source code explanation
element input with attribute and activation of translate
Satish: There is a TTS API
... which returns audio
Rahul: The microphone picture is related to special attributes?
Bjorn: API add
attribute to input
... speech boolean, continue boolean, grammar URL, maxresults for nbest,
nospeechtimeout
... two events are
added: onspeechchange and onspeecherrore;
Bjorn: two input
event: stopSpeechInput, as user stop speaking,
cancelSpeechInput, in case of an error;
... first for speech to talk
Bjorn: You get events and you can do all implementations. You don't have a dialog
Robert: There is an example at the end.
Olli: In which document there is the microphone ...
Bjorn: You have to click button to start speech.
Olli: If you have two iFrames with speech ..
Bjorn: What if you have multiple form in the same page.
MikeB: You may activate different speech.
Robert: you have a text box, why not define another button for start speech?
Satish: We had a discussion
earlier, open to explore different, but this is clear
... same paradigm
Robert: You are speaking to a page, so speech is a different paradigm. you might be not using a text/input
Bjorn: Open to other options
Rahul: Why type is search?
Bjorn: It is an HTML5 new
thing
... for instance type equal speech or text or email might have
different SLM
Dan: Might be to specify an ontology of semantic types
Robert: You have pattern like dates, etc, to standardize them.
Bjorn: It might be grammars or SLM as a builtin. It makes sense
Satish: Doing text input you can constranit or to do free form recognition.
MikeB: you might have a pattern
for builtin
... in HTML5 they specify types, sometimes very precise at word
level, other are generic
Olli: About continue, how that work? Privacy issues
Satish: User can stop recording,
Bjorn: It is not solved issues
MikeB: about sproofing ...
Bjorn: this proposal doesn't solve all use cases, but many.
Debbie: Can process to do barge-in?
Bjorn: No TTS at all, it is a press to talk.
DaveB: Might add an event when pressing the button.
Milan: and barge-in false?
DaveB: You can disable the press button
Debbie: Is it only press to talk?
Bjorn: It is only the first time,
then continue on will continue.
... If there is not only the button, but also a start event. It
is an extension.
... One problem is that it requires a UI element.
Debbie: Stop TTS by typing, instead of speaking.
MikeB: Touch you can stop, type as well and also with speech
DaveB: annoying to have audio
Dan: Recognition result should be
based upon a standard such as EMMA
... Are there concerns with EMMA?
DaveB: EMMA might brings a lot,
inject XML in the DOM is an unnecessary code
... Prefer JSON
Robert: EMMA is extensible
... in the XML sense
MikeB: We have a desire to use
EMMA and to have live easier to Web developers.
... Some might be JSON or HTTPrequest (?) or there might be
multiple way to discuss it.
Bjorn: Seems more complicated to analyse
DaveB: It is an interchange format
Dan: EMMA is a format for representing input.
Bjorn: It should be easy to
access from the web application.
... this should be easy other must be possible
... We won't agree on the full set, but on same.
DanB: Replace 27c and d
... access specific to recognizer
Olli: Argue to avoid to know a recognizer
DanB: reach consensus on 27 first
MikeB: I'll keep 27c should be EMMA and 27d an easy to process format
Satish: might not agree on a format today
Debbie: keep EMMA is fine
Bjorn: I'm fine to allow, but not to require it.
DanB: The recognition results must include information that will be available via EMMA
Bjorn: 27c It should be possible
for the web application to get the recognition results in a
standard format such as EMMA
... 27d It should be easy for the web appls to get access to
the most common pieces of recognition results such as
utterance, confidence, nbests
DanB: Even in VXML, you can get EMMA results, but we still like to have shadow variables for normal uses.
Bjorn: There should be a standard
way to get that ...
... Whether complex stuff should be in EMMA
DanB: We have agreement on
R27.
... R31. End Users not web application authors should be the
ones to select speech recognition resouces
... Discussion might be open to who ...
Bjorn: The authors should not be
forced to choose
... in email discussion for R31, it might sense that browser
specifies the default one,
... on top we might or might not to allow to specify another
one.
DanB: There are two different ways: (a) it might be possible for browser to specify, (b) the browser must specify a default
MikeB: R1, R15, R16, R22 are all correlated
Bjorn: The browser must have one, the author can give a hint or browser must implement
MikeB: Calling hint is implying ...
Bjorn: The browser gives the audio to the author, who does whatever he wants.
Robert: You won't be able to do good applications.
DanB: I prefer to start on the
agreement part:
... We expect the browser have a recognizer to use
Bjorn: like 15
Paolo: This imply you can change if you want
Bjorn: that is anothe
requirement
... It might be reworded as: R15 The browser must supply a
speech service that web applications can use
DanB: Agreement reached
MikeB: So far there is agreement, but not sure for other
DanB: R16 agreement on wording?
Bjorn: what does it mean "to exclude"?
DanB: That the audio is accessibe to author and results are treated the same.
MikeB: One is to run another or to select among a list.
Olli: ??
Robert: You might not trust to send a piece of voice, application can do misuse, a third is there are big data implied
Bjorn: example of CSS: default, specify by author, and by user
MikeB: My experience for large grammars, you might have different grammar for different accents and tuning recognizer
DanB: Some authors don't care of
the performance, but other care very much of the experience
they present.
... They need to customize the customer experience and
regardless the browser in use.
Dave: Most web developers don't care.
Bjorn: Example of Google search and bing.
DanB: Is it he user agent or the appl developer?
Satish: Don't want to have different experience on different applications
JimB: The user can state a different one and also make it mandatory, then the author can also modify it, ...
Bjorn: It is a variation of CSS
DanD: In current model you don't
need a plug-in, if becomes more obvious the power for more
accurate ...
... why local component talks to remote one
Bjorn: SRGS is small, but not on SLM large and proprietary
<burn> what is currently being said (but need to confirm that we understand and/or agree):
<burn> 1. Browser must provide default 2. Web apps should be able to request speech service different from default 3. User agent (browser) can refuse to use requested speech service 4. If browser refuses, it must inform the web app 5. If browser uses speech services other than the default one, it must inform the user which one(s) it is using..
<clarifications>
Bjorn: Amazon example
Debbie: asking clarification on refusal by the browser the user selection
DaveB: You can in Chrome specify
the search, similar the recognition client might test to
interoperable
... It doesn't require a protocol
DanB: Go back. Which are the
replacement of for?
... Current wording replaces: R15 (with new wording already
captured);
Bjorn: objects R1 is inconsistent
with (3)
... the browser can refuse to connect on network, etc
MikeB: 1-5 are agreed, but R1 is not covered competely.
DanB: Try to focus on
substitutions:
... R16 covered by 1-5
... R31 covered
... R22 covered, but take content for 1-5
... Some of the content of the removed R should be captured for
new 1-5
... R1 should be re-phrased
... R15 (new) is replacement of old R15
... Attempt to address R1, at least agree on changes to it.
Bjorn: If the web appls specify speech services, it should be possible to specify parameters.
MikeB: But you can specify parameters also in the default one speech services.
Bjorn: other difference is "network recognizer"
Robert: We discussed on local / remote
MikeB: There were use cases to be capture, there are things that might happen also in the network case
<discussion about network recognizer>
Dave: concerns on the need of the
requirement
... seems to be redundant
Bjorn: New reqs: Speech services
that can be specified by web appls must include network speech
services
... Remove R1 because difference is capture by the last two new
requirements
<End of meeting>
Thursday meeting at 8:30 in Level 2 - Saint Clair!