See also: IRC log
<burn> trackbot, start telcon
<trackbot> Date: 01 September 2011
<burn> aaaa is Glen_Shires
<burn> Scribe: Glen_Shires
<burn> ScribeNick: glen
<burn> Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html
http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html#topics
burn: other APIs cover this
... in our charter: record audio that's recognized; but fine if
we specify that we don't specifically capture audio without
recognition
milan: so could recognize & capture audio, and ignore the reco results -- but this may require a license
debbie: design decision 85: already decided we don't just capture audio
<Zakim> ddahl, you wanted to point out that we already have DD 85
milan: we may not need an
explicity API, but some things may preload implicitly - not
sure exactly how this would be implemented
... I believe preload is necessary sometimes, and is in scope,
and we need notification when complete
olli: agree
burn: in voicexml, author makes hint that grammar unlikely to change before being used (platform may use or ignore this hint)
olli: need to know when loading is complete, so recognition button does something [quickly]
burn: but that's in direct conflict with the "hint" concept. An author who has a changing grammar would prefer that the most up-to-date grammar always be used, even if adds time delay.
olli: use cases for both
burn: I agree. Do we need anything explicitly in the API for this? This is not an optimization, it's a user-affecting behavior that author may wish to specify
milan: I don't think we need it, but if others feels strongly, I don't object
olli: API considerations mean an event back to indicate preload is complete
burn: summarizing, may want to know all grammars are loaded before display a "recognize" button. So author may need to request preloading and get a notification back.
michael: I understand, but think of it more as "prepare grammars" rather than "preload"
burn: so if get an event back,
author can determine how to handle the event
... so API must support author requesting "prepare grammars"
and getting an event back.
... indicating completion
... we agree with this as a design decision
... does it apply to anything besides grammars?
... voicexml has a TTS - fetch audio (in some cases, it may not
be available)
... pre-recorded or streamed
glen: could have different voices or languages to preload
burn: yes, but seems different to me, they don't change as dynamically as grammars
olli: author needs to know everything (system) is loaded before initially beginning
burn: comparable to streaming video or audio - buttons for playing are ghosted out until stream/resource is ready
charles: recognizer may be local and may need to wait for models/etc to load
burn: practically I'm trying to
understand what differs here from voicexml
... local vs server is not clear-cut: sometimes files in mixed
locations
michael: we are all remote
milan: nuance all local
burn: I'm swayed less by
infrastructure details then user-affecting details
... tradition in graphical web world is that buttons only
visible when corresponding resources are available
michael: user agent could buffer if reco/grammars/etc not ready
burn: what about TTS
olli: what if server down
michael: sometimes web interface is not that, instead click "play" and wait to download, or find that it's not available
burn: agree, users are accustom
to audio not playing immediately
... it's a significant task to know that everything is
completely ready: grammars, recognizer, audio files, etc
olli: do we have an event for recognizer starting?
burn: olli, is there a way today in HTML to know if an audio file exists?
michael: it's only a hint, may be wrong
<burn> in answer to olli's question about recognizer starting, Charles said yes
michael: in HTML5 there is a
buffer attribute to query to see how much in buffer, but can't
tell it to buffer and user-agents can discard buffer, so it's
all hueristics
... not enforced
burn: trying to remember, do we have a way to specify playing an audio file, how close is our current spec to HTML?
michael: I think close, because we inherit from media.
burn: Is there any need (DD)
<mbodell> we inherit from HTMLMediaElement which has the attributes
michael: properties like preload and buffer useful for synthesis to inherit from
burn: any other resources to
preload, or any general statement on preloading?
... in VoiceXML, first call behavior: the first time/page is
called, it may not be ready, but assuming they are not changing
dramatically, everything is loaded the second time.
... We (Voxeo) and other vendors recommend to customers to
"run-once" (automated or not) to get all loaded on first
call.
... Web browser different, but if for example, at a conference,
you preload videos so they play quickly (e.g. start playing and
pause).
... I don't know of any equivalent for having a recognizer be
ready.
... I'm not proposing any particular solution here. Anyone want
to add anything else?
michael: grammars is the most expensive thing related to recognizers. Input is more forgiving than output because can buffer and then catch-up.
burn: so it's a performance issue, not a UI issue.
burn: DD 74
... replace mechanism not for user feedback, but rather server
to client
milan: a final result is final - nobody was motivated to spec all this out in protocol discussions
burn: how motivated is group to define a feedback mechanism?
michael: reco correcting itself
burn: to me, reco correcting itself is feedforward. I'm asking if we need a way for client to inform server that something was wrong.
milan: could also be done as vendor params.
michael: if we can standardize, makes sense. Google proposed and Microsoft interested.
milan: needs to be a hint to recognizer, not a requirement for recognizer to do anything
michael: agree
... won't require changing recognizer results
burn: final means final
milan: final unless we have this feedback - but I'm reluctant to open this can of worms
burn: what if recognizer has not reached a final state, but client provides feedback, then as long as recognizer has not made it final, it can change.
milan: not common case, users can't change that fast.
michael: not necessarily
... it's a hint. recognizer can do with it what it needs
to.
burn: client to recognizer feedback mechanism is a hint -- recognizer can do with hint whatever it needs to. Final is still final, so can't change past finalized results.
glen: agree, a hint for recognizer
milan: agree, a hint
burn: DD must be a way for client to send feedback about a recognition to the recognizer, even while reco is ongoing
<mbodell> http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html#continuous2
burn: also, I believe we agree that there is a point at which a result is final and can't be changed. I'm trying to find the DD for that.
michael: I don't think there was a DD on that. As long as continuous reco is ongoing, results can change.
milan: but sending only interim results, requires longer and longer results to be returned for long continuous recognition.
glen: could implement so that
interim results are "semi-final" and thus don't have to re-send
entire result each time, but still not "final" so that can
change if necessary.
... so the question here is whether we want to add this
complexity to the spec.
michael: agree, we did discuss, but not make a decision on this at face to face.
burn: we need to discuss this further on mailing list or in future call.
burn: I spoke with Coralie
Mercier and set to go. Not clear what we are using TPAC
discussion for. Charter officially extended to end of
November.
... However, expectation, group will wrap up work before TPAC
and publish right after TPAC.
... tech discussions in Sept, Oct for editorial and wrap-up.
Publish right after TPAC. Can publish before end of
November.
... I submitted a paragraph on our accomplishments: DD, web
api, html extensions and protocol, we plan to complete and
wrap-up in a report.
... she is expecting to publish this paragraph this week. she
re-assured our charter is intact and this is a formality.
michael: we discussed and decided to make not errors
burn: let's capture as DD if we don't have one...which we apparently don't. So we'll record this as DD.
michael: have in API ability to add weights, but haven't defined what they mean
burn: can anyone propose something?
milan: in voicexml, this is vendor specific
burn: I'm fine with not defining
<mbodell> A weight is nominally a multiplying factor in the likelihood domain of a speech recognition search. A weight of "1.0" is equivalent to providing no weight at all. A weight greater than "1.0" positively biases the grammar and a weight less than "1.0" negatively biases the grammar. If unspecified, the default weight for any grammar is "1.0". If no weight is specified for any grammar element then all grammars are equally likely.
<mbodell> Effective weights are usually obtained by study of real speech and textual data on a particular platform. Furthermore, a grammar weight is platform specific. Note that different ASR engines may treat the same weight value differently. Therefore, the weight value that works well on particular platform may generate different results on other platforms.
debbie: api section 7.1 says
...
... "relative to", but hard to interpret what that means
<mbodell> The posted text was VXML
<mbodell> the next text is from our current api spec, 7.1 that Debbie mentioned
<mbodell> This method adds a grammar to the set of active grammars. The URI for the grammar is specified by the src parameter, which represents the URI for the grammar. If the weight parameter is present it represents this grammar's weight relative to the other grammar. If the weight parameter is not present, the default value of 1.0 is used. If the modal parameter is set to true, then all other already active grammars are disabled. If the modal parameter is not pr
burn: let's distinguish between
general statements about weights, and weights relative to each
other. We've always agreed that larger means greater weight.
But we've never stated what values mean.
... not probabilities.
michael: yes, 2 is not necessarily twice as much as 1
<Charles> SRGS weight discussion: http://www.w3.org/TR/speech-grammar/#S2.4.1
burn: two grammars of weight X
both have the same weighting, whatever that means
... if one grammar A has weight X and grammar B has weight Y,
and X > Y, then grammar A has greater weight than grammar
B
<mbodell> I'm not sure if we want X > Y then X is greater then versus greater then or equal to
michael: should that be greater than, or greater than or equal to. Might be a step function. 1.8 and 1.9 might be treated as the same. Equal or Greater (but not less).
burn: "monotonically non-decreasing" is how we described it
michael: yes
burn: in the SSML sense
... (I don't know that SRGS says that)
michael: yes, SRGS only says positively and negatively biasing
burn: DD "monotonically
non-decreasing"
... we're out of time. Thanks, bye
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/capture audio without recording/capture audio without recognition/ Succeeded: s/oliv/olli/ Succeeded: s/oli/olli/ Succeeded: s/events/results/ Succeeded: s/???/Coralie Mercier/ Succeeded: s/not/not necessarily/ Succeeded: s/, +1.408.359.aaaa// Succeeded: s/, +1.425.580.aabb// Succeeded: s/what needs to be done to save these minutes?// Succeeded: s/thanks, goodbye// Succeeded: s/oli:/olli:/g Succeeded: s/T-pack/TPAC/g Succeeded: s/greater than or equal two/greater than or equal to/ Found Scribe: Glen_Shires Found ScribeNick: glen Default Present: Dan_Burnett, Olli_Pettay, Milan_Young, Debbie_Dahl, Glen_Shires, Dan_Druta, Charles_Hemphill, Michael_Bodell Present: Dan_Burnett Olli_Pettay Milan_Young Debbie_Dahl Glen_Shires Dan_Druta Charles_Hemphill Michael_Bodell Regrets: Robert_Brown Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html Found Date: 01 Sep 2011 Guessing minutes URL: http://www.w3.org/2011/09/01-htmlspeech-minutes.html People with action items:[End of scribe.perl diagnostic output]