15:50:35 RRSAgent has joined #htmlspeech 15:50:36 logging to http://www.w3.org/2011/09/01-htmlspeech-irc 15:50:46 Zakim has joined #htmlspeech 15:51:00 trackbot, start telcon 15:51:02 RRSAgent, make logs public 15:51:04 Zakim, this will be 15:51:04 I don't understand 'this will be', trackbot 15:51:05 Meeting: HTML Speech Incubator Group Teleconference 15:51:05 Date: 01 September 2011 15:51:07 zakim, this will be htmlspeech 15:51:07 ok, burn; I see INC_(HTMLSPEECH)11:30AM scheduled to start 21 minutes ago 15:51:18 Chair: Dan_Burnett 15:51:35 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html 15:51:58 burn has changed the topic to: Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html (burn) 15:54:39 INC_(HTMLSPEECH)11:30AM has now started 15:54:46 +Dan_Burnett 15:57:06 +??P10 15:57:28 smaug has joined #htmlspeech 15:57:33 zakim, ??P10 is Olli_Pettay 15:57:33 +Olli_Pettay; got it 15:59:42 glen has joined #htmlspeech 16:00:09 +Milan_Young 16:00:13 +Debbie_Dahl 16:00:34 ddahl has joined #htmlspeech 16:00:48 Zakim, nick smaug is Olli_Pettay 16:00:49 Milan has joined #HTMLSpeech 16:00:58 ok, smaug, I now associate you with Olli_Pettay 16:01:00 + +1.408.359.aaaa 16:01:13 aaaa is Glen_Shires 16:01:23 zakim, aaaa is Glen_Shires 16:01:23 +Glen_Shires; got it 16:01:32 + +1.425.580.aabb 16:01:51 zakim, aabb is Dan_Druta 16:01:51 +Dan_Druta; got it 16:02:38 DanD has joined #htmlspeech 16:04:04 Scribe: Glen_Shires 16:04:12 ScribeNick: glen 16:04:43 Charles has joined #htmlspeech 16:05:55 +Charles_Hemphill 16:08:14 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0038.html 16:09:48 Topic: Topics remaining to be discussed 16:09:59 http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110629.html#topics 16:10:13 Topic: Is audio recording without recognition a scenario to support? 16:11:18 burn: other APIs cover this 16:11:56 q+ to point out that we already have DD 85 16:12:16 burn: in our charter: record audio that's recognized; but fine if we specify that we don't specifically capture audio without recording 16:12:47 milan: so could recognize & capture audio, and ignore the reco results -- but this may require a license 16:13:21 debbie: design decision 85: already decided we don't just capture audio 16:13:21 s/capture audio without recording/capture audio without recognition/ 16:13:56 Topic: Preloading of resources 16:14:00 ack me 16:14:00 ddahl, you wanted to point out that we already have DD 85 16:16:02 milan: we may not need an explicity API, but some things may preload implicitly - not sure exactly how this would be implemented 16:17:31 milan: I believe preload is necessary sometimes, and is in scope, and we need notification when complete 16:17:37 oliv: agree 16:18:11 burn: in voicexml, author makes hint that grammar unlikely to change before being used (platform may use or ignore this hint) 16:19:36 oli: need to know when loading is complete, so recognition button does something [quickly] 16:20:30 burn: but that's in direct conflict with the "hint" concept. An author who has a changing grammar would prefer that the most up-to-date grammar always be used, even if adds time delay. 16:20:37 oli: use cases for both 16:21:11 burn: I agree. Do we need anything explicitly in the API for this? This is not an optimization, it's a user-affecting behavior that author may wish to specify 16:21:29 s/oliv/olli/ 16:21:40 milan: I don't think we need it, but if others feels strongly, I don't object 16:21:56 s/oli/olli/ 16:22:38 +Michael_Bodell 16:23:05 olli: API considerations mean an event back to indicate preload is complete 16:25:22 burn: summarizing, may want to know all grammars are loaded before display a "recognize" button. So author may need to request preloading and get a notification back. 16:26:07 michael: I understand, but think of it more as "prepare grammars" rather than "preload" 16:26:32 burn: so if get an event back, author can determine how to handle the event 16:27:56 burn: so API must support author requesting "prepare grammars" and getting an event back. 16:28:10 ... indicating completion 16:29:19 burn: we agree with this as a design decision 16:30:07 burn: does it apply to anything besides grammars? 16:30:38 mbodell has joined #htmlspeech 16:31:36 burn: voicexml has a TTS - fetch audio (in some cases, it may not be available) 16:31:47 ... pre-recorded or streamed 16:32:36 glen: could have different voices or languages to preload 16:32:51 burn: yes, but seems different to me, they don't change as dynamically as grammars 16:33:58 olli: author needs to know everything (system) is loaded before initially beginning 16:34:40 burn: comparable to streaming video or audio - buttons for playing are ghosted out until stream/resource is ready 16:35:18 charles: recognizer may be local and may need to wait for models/etc to load 16:35:37 burn: practically I'm trying to understand what differs here from voicexml 16:36:12 burn: local vs server is not clear-cut: sometimes files in mixed locations 16:36:23 michael: we are all remote 16:37:18 milan: nuance all local 16:37:58 burn: I'm swayed less by infrastructure details then user-affecting details 16:38:47 burn: tradition in graphical web world is that buttons only visible when corresponding resources are available 16:39:12 michael: user agent could buffer if reco/grammars/etc not ready 16:39:23 burn: what about TTS 16:39:35 olli: what if server down 16:40:19 michael: sometimes web interface is not that, instead click "play" and wait to download, or find that it's not available 16:41:00 burn: agree, users are accustom to audio not playing immediately 16:41:42 burn: it's a significant task to know that everything is completely ready: grammars, recognizer, audio files, etc 16:42:10 olli: do we have an event for recognizer starting? 16:42:51 burn: olli, is there a way today in HTML to know if an audio file exists? 16:43:02 michael: it's only a hint, may be wrong 16:43:28 in answer to olli's question about recognizer starting, Charles said yes 16:43:53 michael: in HTML5 there is a buffer attribute to query to see how much in buffer, but can't tell it to buffer and user-agents can discard buffer, so it's all hueristics 16:43:59 ... not enforced 16:45:02 burn: trying to remember, do we have a way to specify playing an audio file, how close is our current spec to HTML? 16:45:15 michael: I think close, because we inherit from media. 16:46:17 burn: Is there any need (DD) 16:46:46 we inherit from HTMLMediaElement which has the attributes 16:47:16 michael: properties like preload and buffer useful for synthesis to inherit from 16:48:55 burn: any other resources to preload, or any general statement on preloading? 16:49:53 burn: in VoiceXML, first call behavior: the first time/page is called, it may not be ready, but assuming they are not changing dramatically, everything is loaded the second time. 16:50:44 ... We (Voxeo) and other vendors recommend to customers to "run-once" (automated or not) to get all loaded on first call. 16:51:23 ... Web browser different, but if for example, at a conference, you preload videos so they play quickly (e.g. start playing and pause). 16:51:37 ... I don't know of any equivalent for having a recognizer be ready. 16:52:12 ... I'm not proposing any particular solution here. Anyone want to add anything else? 16:52:57 michael: grammars is the most expensive thing related to recognizers. Input is more forgiving than output because can buffer and then catch-up. 16:53:08 burn: so it's a performance issue, not a UI issue. 16:54:29 Topic: Feedback mechanism for continuous recognition 16:55:40 burn: DD 74 16:57:23 burn: replace mechanism not for user feedback, but rather server to client 16:58:32 milan: a final result is final - nobody was motivated to spec all this out in protocol discussions 16:59:02 burn: how motivated is group to define a feedback mechanism? 16:59:26 michael: reco correcting itself 17:00:03 burn: to me, reco correcting itself is feedforward. I'm asking if we need a way for client to inform server that something was wrong. 17:00:21 milan: could also be done as vendor params. 17:00:46 michael: if we can standardize, makes sense. Google proposed and Microsoft interested. 17:01:27 milan: needs to be a hint to recognizer, not a requirement for recognizer to do anything 17:01:31 michael: agree 17:01:44 ... won't require changing recognizer results 17:01:50 burn: final means final 17:02:12 milan: final unless we have this feedback - but I'm reluctant to open this can of worms 17:02:46 burn: what if recognizer has not reached a final state, but client provides feedback, then as long as recognizer has not made it final, it can change. 17:03:03 milan: not common case, users can't change that fast. 17:03:10 michael: not necessarily 17:03:59 michael: it's a hint. recognizer can do with it what it needs to. 17:05:06 burn: client to recognizer feedback mechanism is a hint -- recognizer can do with hint whatever it needs to. Final is still final, so can't change past finalized events. 17:05:31 s/events/results/ 17:07:08 glen: agree, a hint for recognizer 17:07:12 milan: agree, a hint 17:07:49 burn: DD must be a way for client to send feedback about a recognition to the recognizer, even while reco is ongoing 17:08:52 http://www.w3.org/2005/Incubator/htmlspeech/2011/05/f2fminutes201105.html#continuous2 17:09:16 burn: also, I believe we agree that there is a point at which a result is final and can't be changed. I'm trying to find the DD for that. 17:10:22 michael: I don't think there was a DD on that. As long as continuous reco is ongoing, results can change. 17:10:53 milan: but sending only interim results, requires longer and longer results to be returned for long continuous recognition. 17:14:14 glen: could implement so that interim results are "semi-final" and thus don't have to re-send entire result each time, but still not "final" so that can change if necessary. 17:14:35 ... so the question here is whether we want to add this complexity to the spec. 17:15:01 michael: agree, we did discuss, but not make a decision on this at face to face. 17:15:16 burn: we need to discuss this further on mailing list or in future call. 17:16:08 Topic: Extending our group's charter 17:16:23 Topic: Charter Extension Status 17:17:03 burn: I spoke with ??? and set to go. Not clear what we are using T-pack discussion for. Charter officially extended to end of November. 17:17:20 s/???/Coralie Mercier 17:17:26 ... However, expectation, group will wrap up work before T-pack and publish right after T-pack. 17:18:16 burn: tech discussions in Sept, Oct for editorial and wrap-up. Publish right after T-pack. Can publish before end of November. 17:19:35 burn: I submitted a paragraph on our accomplishments: DD, web api, html extensions and protocol, we plan to complete and wrap-up in a report. 17:20:07 burn: she is expecting to publish this paragraph this week. she re-assured our charter is intact and this is a formality. 17:20:48 Topic: Whether nomatch, noinput are errors or other conditions 17:21:06 michael: we discussed and decided to make not errors 17:21:42 burn: let's capture as DD if we don't have one...which we apparently don't. So we'll record this as DD. 17:22:10 Topic: How are top-level weights on grammars interpreted? 17:22:45 michael: have in API ability to add weights, but haven't defined what they mean 17:23:00 burn: can anyone propose something? 17:23:43 milan: in voicexml, this is vendor specific 17:24:24 burn: I'm fine with not defining 17:24:33 A weight is nominally a multiplying factor in the likelihood domain of a speech recognition search. A weight of "1.0" is equivalent to providing no weight at all. A weight greater than "1.0" positively biases the grammar and a weight less than "1.0" negatively biases the grammar. If unspecified, the default weight for any grammar is "1.0". If no weight is specified for any grammar element then all grammars are equally likely. 17:24:46 Effective weights are usually obtained by study of real speech and textual data on a particular platform. Furthermore, a grammar weight is platform specific. Note that different ASR engines may treat the same weight value differently. Therefore, the weight value that works well on particular platform may generate different results on other platforms. 17:25:13 debbie: api section 7.1 says ... 17:25:31 ... "relative to", but hard to interpret what that means 17:25:37 The posted text was VXML 17:25:53 the next text is from our current api spec, 7.1 that Debbie mentioned 17:25:54 This method adds a grammar to the set of active grammars. The URI for the grammar is specified by the src parameter, which represents the URI for the grammar. If the weight parameter is present it represents this grammar's weight relative to the other grammar. If the weight parameter is not present, the default value of 1.0 is used. If the modal parameter is set to true, then all other already active grammars are disabled. If the modal parameter is not pr 17:26:59 burn: let's distinguish between general statements about weights, and weights relative to each other. We've always agreed that larger means greater weight. But we've never stated what values mean. 17:27:06 ... not probabilities. 17:27:21 michael: yes, 2 is not twice as much as 1 17:27:24 SRGS weight discussion: http://www.w3.org/TR/speech-grammar/#S2.4.1 17:27:36 s/not/not necessarily/ 17:28:27 burn: two grammars of weight X both have the same weighting, whatever that means 17:29:10 burn: if one grammar A has weight X and grammar B has weight Y, and X > Y, then grammar A has greater weight than grammar B 17:30:30 I'm not sure if we want X > Y then X is greater then versus greater then or equal to 17:31:17 michael: should that be greater than, or greater than or equal two. Might be a step function. 1.8 and 1.9 might be treated as the same. Equal or Greater (but not less). 17:31:35 burn: "monotonically non-decreasing" is how we described it 17:31:43 michael: yes 17:31:55 burn: in the SSML sense 17:32:04 ... (I don't know that SRGS says that) 17:32:10 -Dan_Druta 17:32:17 michael: yes, SRGS only says positively and negatively biasing 17:32:26 burn: DD "monotonically non-decreasing" 17:32:32 -Milan_Young 17:32:36 burn: we're out of time. Thanks, bye 17:32:36 -Olli_Pettay 17:32:40 -Michael_Bodell 17:32:41 -Debbie_Dahl 17:32:42 ddahl has left #htmlspeech 17:32:43 -Dan_Burnett 17:32:45 -Glen_Shires 17:32:53 what needs to be done to save these minutes? 17:33:14 -Charles_Hemphill 17:33:15 INC_(HTMLSPEECH)11:30AM has ended 17:33:17 Attendees were Dan_Burnett, Olli_Pettay, Milan_Young, Debbie_Dahl, +1.408.359.aaaa, Glen_Shires, +1.425.580.aabb, Dan_Druta, Charles_Hemphill, Michael_Bodell 17:33:47 zakim, bye 17:33:47 Zakim has left #htmlspeech 17:33:52 rrsagent, make log public 17:33:58 rrsagent, draft minutes 17:33:58 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn 17:34:19 thanks, goodbye 17:36:38 s/, +1.408.359.aaaa// 17:36:50 s/, +1.425.580.aabb// 17:36:57 rrsagent, draft minutes 17:36:57 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn 17:38:54 s/what needs to be done to save these minutes?// 17:39:00 s/thanks, goodbye// 17:39:08 s/oli:/olli:/g 17:39:14 rrsagent, draft minutes 17:39:14 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn 17:40:59 s/T-pack/TPAC/g 17:41:04 rrsagent, draft minutes 17:41:04 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn 17:42:36 s/greater than or equal two/greater than or equal to/ 17:42:41 rrsagent, draft minutes 17:42:41 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn 17:43:44 regrets: Robert_Brown 17:43:50 rrsagent, draft minutes 17:43:50 I have made the request to generate http://www.w3.org/2011/09/01-htmlspeech-minutes.html burn 17:44:09 rrsagent, bye 17:44:09 I see no action items