15:50:35 RRSAgent has joined #htmlspeech
15:50:36 logging to http://www.w3.org/2011/09/01-htmlspeech-irc
15:50:46 Zakim has joined #htmlspeech
15:51:00 trackbot, start telcon
15:51:02 RRSAgent, make logs public
15:51:04 Zakim, this will be
15:51:04 I don't understand 'this will be', trackbot
15:51:05 Meeting: HTML Speech Incubator Group Teleconference
15:51:05 Date: 01 September 2011
15:51:07 zakim, this will be htmlspeech
15:51:07 ok, burn; I see INC_(HTMLSPEECH)11:30AM scheduled to start 21 minutes ago
15:51:18 Chair: Dan_Burnett
15:51:35 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html
15:51:58 burn has changed the topic to: Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html (burn)
15:54:39 INC_(HTMLSPEECH)11:30AM has now started
15:54:46 +Dan_Burnett
15:57:06 +??P10
15:57:28 smaug has joined #htmlspeech
15:57:33 zakim, ??P10 is Olli_Pettay
15:57:33 +Olli_Pettay; got it
15:59:42 glen has joined #htmlspeech
16:00:09 +Milan_Young
16:00:13 +Debbie_Dahl
16:00:34 ddahl has joined #htmlspeech
16:00:48 Zakim, nick smaug is Olli_Pettay
16:00:49 Milan has joined #HTMLSpeech
16:00:58 ok, smaug, I now associate you with Olli_Pettay
16:01:00 + +1.408.359.aaaa
16:01:13 aaaa is Glen_Shires
16:01:23 zakim, aaaa is Glen_Shires
16:01:23 +Glen_Shires; got it
16:01:32 + +1.425.580.aabb
16:01:51 zakim, aabb is Dan_Druta
16:01:51 +Dan_Druta; got it
16:02:38 DanD has joined #htmlspeech
16:04:04 Scribe: Glen_Shires
16:04:12 ScribeNick: glen
16:04:43 Charles has joined #htmlspeech
16:05:55 +Charles_Hemphill
16:08:14 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html
16:09:48 Topic: Topics remaining to be discussed
16:09:59 http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html#topics
16:10:13 Topic: Is audio recording without recognition a scenario to support?
16:11:18 burn: other APIs cover this
16:11:56 q+ to point out that we already have DD 85
16:12:16 burn: in our charter: record audio that's recognized; but fine if we specify that we don't specifically capture audio without recording
16:12:47 milan: so could recognize & capture audio, and ignore the reco results -- but this may require a license
16:13:21 debbie: design decision 85: already decided we don't just capture audio
16:13:21 s/capture audio without recording/capture audio without recognition/
16:13:56 Topic: Preloading of resources
16:14:00 ack me
16:14:00 ddahl, you wanted to point out that we already have DD 85
16:16:02 milan: we may not need an explicity API, but some things may preload implicitly - not sure exactly how this would be implemented
16:17:31 milan: I believe preload is necessary sometimes, and is in scope, and we need notification when complete
16:17:37 oliv: agree
16:18:11 burn: in voicexml, author makes hint that grammar unlikely to change before being used (platform may use or ignore this hint)
16:19:36 oli: need to know when loading is complete, so recognition button does something [quickly]
16:20:30 burn: but that's in direct conflict with the "hint" concept. An author who has a changing grammar would prefer that the most up-to-date grammar always be used, even if adds time delay.
16:20:37 oli: use cases for both
16:21:11 burn: I agree. Do we need anything explicitly in the API for this? This is not an optimization, it's a user-affecting behavior that author may wish to specify
16:21:29 s/oliv/olli/
16:21:40 milan: I don't think we need it, but if others feels strongly, I don't object
16:21:56 s/oli/olli/
16:22:38 +Michael_Bodell
16:23:05 olli: API considerations mean an event back to indicate preload is complete
16:25:22 burn: summarizing, may want to know all grammars are loaded before display a "recognize" button. So author may need to request preloading and get a notification back.
16:26:07 michael: I understand, but think of it more as "prepare grammars" rather than "preload"
16:26:32 burn: so if get an event back, author can determine how to handle the event
16:27:56 burn: so API must support author requesting "prepare grammars" and getting an event back.
16:28:10 ... indicating completion
16:29:19 burn: we agree with this as a design decision
16:30:07 burn: does it apply to anything besides grammars?
16:30:38 mbodell has joined #htmlspeech
16:31:36 burn: voicexml has a TTS - fetch audio (in some cases, it may not be available)
16:31:47 ... pre-recorded or streamed
16:32:36 glen: could have different voices or languages to preload
16:32:51 burn: yes, but seems different to me, they don't change as dynamically as grammars
16:33:58 olli: author needs to know everything (system) is loaded before initially beginning
16:34:40 burn: comparable to streaming video or audio - buttons for playing are ghosted out until stream/resource is ready
16:35:18 charles: recognizer may be local and may need to wait for models/etc to load
16:35:37 burn: practically I'm trying to understand what differs here from voicexml
16:36:12 burn: local vs server is not clear-cut: sometimes files in mixed locations
16:36:23 michael: we are all remote
16:37:18 milan: nuance all local
16:37:58 burn: I'm swayed less by infrastructure details then user-affecting details
16:38:47 burn: tradition in graphical web world is that buttons only visible when corresponding resources are available
16:39:12 michael: user agent could buffer if reco/grammars/etc not ready
16:39:23 burn: what about TTS
16:39:35 olli: what if server down
16:40:19 michael: sometimes web interface is not that, instead click "play" and wait to download, or find that it's not available
16:41:00 burn: agree, users are accustom to audio not playing immediately
16:41:42 burn: it's a significant task to know that everything is completely ready: grammars, recognizer, audio files, etc
16:42:10 olli: do we have an event for recognizer starting?
16:42:51 burn: olli, is there a way today in HTML to know if an audio file exists?
16:43:02 michael: it's only a hint, may be wrong
16:43:28 in answer to olli's question about recognizer starting, Charles said yes
16:43:53 michael: in HTML5 there is a buffer attribute to query to see how much in buffer, but can't tell it to buffer and user-agents can discard buffer, so it's all hueristics
16:43:59 ... not enforced
16:45:02 burn: trying to remember, do we have a way to specify playing an audio file, how close is our current spec to HTML?
16:45:15 michael: I think close, because we inherit from media.
16:46:17 burn: Is there any need (DD)
16:46:46 we inherit from HTMLMediaElement which has the attributes
16:47:16 michael: properties like preload and buffer useful for synthesis to inherit from
16:48:55 burn: any other resources to preload, or any general statement on preloading?
16:49:53 burn: in VoiceXML, first call behavior: the first time/page is called, it may not be ready, but assuming they are not changing dramatically, everything is loaded the second time.
16:50:44 ... We (Voxeo) and other vendors recommend to customers to "run-once" (automated or not) to get all loaded on first call.
16:51:23 ... Web browser different, but if for example, at a conference, you preload videos so they play quickly (e.g. start playing and pause).
16:51:37 ... I don't know of any equivalent for having a recognizer be ready.
16:52:12 ... I'm not proposing any particular solution here. Anyone want to add anything else?
16:52:57 michael: grammars is the most expensive thing related to recognizers. Input is more forgiving than output because can buffer and then catch-up.
16:53:08 burn: so it's a performance issue, not a UI issue.
16:54:29 Topic: Feedback mechanism for continuous recognition
16:55:40 burn: DD 74
16:57:23 burn: replace mechanism not for user feedback, but rather server to client
16:58:32 milan: a final result is final - nobody was motivated to spec all this out in protocol discussions
16:59:02 burn: how motivated is group to define a feedback mechanism?
16:59:26 michael: reco correcting itself
17:00:03 burn: to me, reco correcting itself is feedforward. I'm asking if we need a way for client to inform server that something was wrong.
17:00:21 milan: could also be done as vendor params.
17:00:46 michael: if we can standardize, makes sense. Google proposed and Microsoft interested.
17:01:27 milan: needs to be a hint to recognizer, not a requirement for recognizer to do anything
17:01:31 michael: agree
17:01:44 ... won't require changing recognizer results
17:01:50 burn: final means final
17:02:12 milan: final unless we have this feedback - but I'm reluctant to open this can of worms
17:02:46 burn: what if recognizer has not reached a final state, but client provides feedback, then as long as recognizer has not made it final, it can change.
17:03:03 milan: not common case, users can't change that fast.
17:03:10 michael: not necessarily
17:03:59 michael: it's a hint. recognizer can do with it what it needs to.
17:05:06 burn: client to recognizer feedback mechanism is a hint -- recognizer can do with hint whatever it needs to. Final is still final, so can't change past finalized events.
17:05:31 s/events/results/
17:07:08 glen: agree, a hint for recognizer
17:07:12 milan: agree, a hint
17:07:49 burn: DD must be a way for client to send feedback about a recognition to the recognizer, even while reco is ongoing
17:08:52 http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html#continuous2
17:09:16 burn: also, I believe we agree that there is a point at which a result is final and can't be changed. I'm trying to find the DD for that.
17:10:22 michael: I don't think there was a DD on that. As long as continuous reco is ongoing, results can change.
17:10:53 milan: but sending only interim results, requires longer and longer results to be returned for long continuous recognition.
17:14:14 glen: could implement so that interim results are "semi-final" and thus don't have to re-send entire result each time, but still not "final" so that can change if necessary.
17:14:35 ... so the question here is whether we want to add this complexity to the spec.
17:15:01 michael: agree, we did discuss, but not make a decision on this at face to face.
17:15:16 burn: we need to discuss this further on mailing list or in future call.
17:16:08 Topic: Extending our group's charter
17:16:23 Topic: Charter Extension Status
17:17:03 burn: I spoke with ??? and set to go. Not clear what we are using T-pack discussion for. Charter officially extended to end of November.
17:17:20 s/???/Coralie Mercier
17:17:26 ... However, expectation, group will wrap up work before T-pack and publish right after T-pack.
17:18:16 burn: tech discussions in Sept, Oct for editorial and wrap-up. Publish right after T-pack. Can publish before end of November.
17:19:35 burn: I submitted a paragraph on our accomplishments: DD, web api, html extensions and protocol, we plan to complete and wrap-up in a report.
17:20:07 burn: she is expecting to publish this paragraph this week. she re-assured our charter is intact and this is a formality.
17:20:48 Topic: Whether nomatch, noinput are errors or other conditions
17:21:06 michael: we discussed and decided to make not errors
17:21:42 burn: let's capture as DD if we don't have one...which we apparently don't. So we'll record this as DD.
17:22:10 Topic: How are top-level weights on grammars interpreted?
17:22:45 michael: have in API ability to add weights, but haven't defined what they mean
17:23:00 burn: can anyone propose something?
17:23:43 milan: in voicexml, this is vendor specific
17:24:24 burn: I'm fine with not defining
17:24:33 A weight is nominally a multiplying factor in the likelihood domain of a speech recognition search. A weight of "1.0" is equivalent to providing no weight at all. A weight greater than "1.0" positively biases the grammar and a weight less than "1.0" negatively biases the grammar. If unspecified, the default weight for any grammar is "1.0". If no weight is specified for any grammar element then all grammars are equally likely.
17:24:46 Effective weights are usually obtained by study of real speech and textual data on a particular platform. Furthermore, a grammar weight is platform specific. Note that different ASR engines may treat the same weight value differently. Therefore, the weight value that works well on particular platform may generate different results on other platforms.
17:25:13 debbie: api section 7.1 says ...
17:25:31 ... "relative to", but hard to interpret what that means
17:25:37 The posted text was VXML
17:25:53 the next text is from our current api spec, 7.1 that Debbie mentioned
17:25:54 This method adds a grammar to the set of active grammars. The URI for the grammar is specified by the src parameter, which represents the URI for the grammar. If the weight parameter is present it represents this grammar's weight relative to the other grammar. If the weight parameter is not present, the default value of 1.0 is used. If the modal parameter is set to true, then all other already active grammars are disabled. If the modal parameter is not pr
17:26:59 burn: let's distinguish between general statements about weights, and weights relative to each other. We've always agreed that larger means greater weight. But we've never stated what values mean.
17:27:06 ... not probabilities.
17:27:21 michael: yes, 2 is not twice as much as 1
17:27:24 SRGS weight discussion: http://www.w3.org/TR/speech-grammar/#S2.4.1
17:27:36 s/not/not necessarily/
17:28:27 burn: two grammars of weight X both have the same weighting, whatever that means
17:29:10 burn: if one grammar A has weight X and grammar B has weight Y, and X > Y, then grammar A has greater weight than grammar B
17:30:30 I'm not sure if we want X > Y then X is greater then versus greater then or equal to
17:31:17 michael: should that be greater than, or greater than or equal two. Might be a step function. 1.8 and 1.9 might be treated as the same. Equal or Greater (but not less).
17:31:35 burn: "monotonically non-decreasing" is how we described it
17:31:43 michael: yes
17:31:55 burn: in the SSML sense
17:32:04 ... (I don't know that SRGS says that)
17:32:10 -Dan_Druta
17:32:17 michael: yes, SRGS only says positively and negatively biasing
17:32:26 burn: DD "monotonically non-decreasing"
17:32:32 -Milan_Young
17:32:36 burn: we're out of time. Thanks, bye
17:32:36 -Olli_Pettay
17:32:40 -Michael_Bodell
17:32:41 -Debbie_Dahl
17:32:42 ddahl has left #htmlspeech
17:32:43 -Dan_Burnett
17:32:45 -Glen_Shires
17:32:53 what needs to be done to save these minutes?
17:33:14 -Charles_Hemphill
17:33:15 INC_(HTMLSPEECH)11:30AM has ended
17:33:17 Attendees were Dan_Burnett, Olli_Pettay, Milan_Young, Debbie_Dahl, +1.408.359.aaaa, Glen_Shires, +1.425.580.aabb, Dan_Druta, Charles_Hemphill, Michael_Bodell
17:33:47 zakim, bye
17:33:47 Zakim has left #htmlspeech
17:33:52 rrsagent, make log public
17:33:58 rrsagent, draft minutes
17:33:58 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn
17:34:19 thanks, goodbye
17:36:38 s/, +1.408.359.aaaa//
17:36:50 s/, +1.425.580.aabb//
17:36:57 rrsagent, draft minutes
17:36:57 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn
17:38:54 s/what needs to be done to save these minutes?//
17:39:00 s/thanks, goodbye//
17:39:08 s/oli:/olli:/g
17:39:14 rrsagent, draft minutes
17:39:14 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn
17:40:59 s/T-pack/TPAC/g
17:41:04 rrsagent, draft minutes
17:41:04 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn
17:42:36 s/greater than or equal two/greater than or equal to/
17:42:41 rrsagent, draft minutes
17:42:41 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn
17:43:44 regrets: Robert_Brown
17:43:50 rrsagent, draft minutes
17:43:50 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn
17:44:09 rrsagent, bye
17:44:09 I see no action items