15:54:48 RRSAgent has joined #htmlspeech 15:54:48 logging to http://www.w3.org/2011/09/22-htmlspeech-irc 15:54:54 Zakim, [IPcaller] is Olli_Pettay 15:54:54 +Olli_Pettay; got it 15:55:05 trackbot, start telcon 15:55:07 RRSAgent, make logs public 15:55:09 Zakim, this will be 15:55:09 I don't understand 'this will be', trackbot 15:55:11 Meeting: HTML Speech Incubator Group Teleconference 15:55:12 Zakim, nick smaug is Olli_Pettay 15:55:12 ok, smaug, I now associate you with Olli_Pettay 15:55:13 Date: 22 September 2011 15:56:15 Chair: Dan_Burnett,Michael_Bodell 15:56:24 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0036.html 15:58:11 ddahl has joined #htmlspeech 15:58:37 DanD has joined #htmlspeech 15:58:38 bringert has joined #htmlspeech 15:58:45 +Debbie_Dahl 15:59:15 + +1.760.705.aaaa 15:59:29 mbodell has joined #htmlspeech 15:59:35 +[Microsoft] 15:59:42 +Dan_Druta 16:00:03 zakim, aaaa is Bjorn_Bringert,Satish_Sampath 16:00:03 +Bjorn_Bringert,Satish_Sampath; got it 16:00:04 +Michael_Bodell 16:00:40 robert has joined #htmlspeech 16:01:00 brb - gotta hydrate 16:01:54 + +1.408.359.aabb 16:02:03 glen has joined #htmlspeech 16:02:06 zakim, aabb is Glen_Shires 16:02:06 +Glen_Shires; got it 16:02:24 bringert: can you hear anything? 16:02:44 + +1.818.237.aacc 16:02:55 zakim, aacc is Patrick_Ehlen 16:03:11 +Patrick_Ehlen; got it 16:03:32 ehlen has joined #htmlspeech 16:03:58 satish has joined #htmlspeech 16:04:04 Scribe: Satish_Sampath 16:04:27 ScribeNick: satish 16:04:33 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0036.html 16:05:38 Charles has joined #htmlspeech 16:05:53 burn: first topic is TPAC and we will likely have work to be done in the webapi and protocol in a face to face and some work on the document. It is highly likely we'llh ave significant discussions and we'll have 2 full days. 16:06:30 burn: number of people who register determines the place and number of power outlets, so please register 16:06:42 +Milan_Young 16:06:48 Meetings at TPAC Nov 3-4 Santa Clara, CA http://www.w3.org/2011/11/TPAC/Overview.html 16:06:54 Milan has joined #HTMLSpeech 16:07:08 Register by Oct 14 for lower fee 16:07:38 Best hotel rates / rooms by Oct 10 16:08:10 burn: the two days that matter for us are thursday/friday 16:08:22 +Charles_Hemphill 16:08:48 topic: Continuation of the Web API discussion 16:09:32 http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0033.html 16:09:41 topic: IDL for SpeechInputRequest sent earlier 16:10:32 satish: reviews his IDL proposal (see link above) 16:13:44 bringert: could start with the saveWaveformURI and inputWaveformURI questions 16:14:04 ... could be part of the MediaStream 'input' attribute, may not need a separate URI 16:14:20 robert: doesn't address where the waveform is at a remote server 16:14:41 bringert: what is the use case? 16:15:27 robert: re-reco is one use case where audio was saved 16:15:47 ... may have a 3 second utterance and don't want to upload it again 16:15:59 This comes from "FPR57. Web applications must be able to request recognition based on previously sent audio." 16:16:15 bringert: seems like a corner case and adds complexity to implementation 16:16:43 ... could be a random unique token instead of a uri 16:17:39 ... shouldn't say in the api that the uri should be downloadable and fetch the full audio as a file 16:17:47 ... perhaps replace it with a rerecognize method 16:18:17 ???: what if user wants to listen to what they said? 16:18:27 s/???/Debbie/ 16:18:30 bringert: could be a UA feature instead of an API requirement 16:19:06 ???: use cases: 1. listen to yourself, 2: re-reco with same service, 3: re-reco with a different service 16:19:25 s/???/Glen/ 16:19:55 +Michael_Johnston 16:20:19 robert: 3 is important because one request could be setup with one grammar, another could be with another grammar, app could use output of step 1 to figure out the correct set of grammars for the second step 16:20:51 ... could add a rereco method which takes in a set of parameters for the second reco 16:22:54 ... would be doing all this in one thread with event handlers and won't have time to do async stuff 16:23:28 ... e.g. a local search app with a coarse grammar identifying states,cities and based on the result decide which granular grammar to use the neighbourhood 16:23:30 MJ has joined #htmlspeech 16:25:05 ???: the issue is whether the second reco takes place in the same service. if it does then that service can perform a rereco. only if using a different service it is a problem 16:25:19 s/???/Milan/ 16:25:40 burn: another use case - compliance. may be a need for the client to say i want to save these recos to get to them later. the only module which can identify is the service 16:26:11 bringert: could be proprietary extensions 16:26:55 mbodell: all use cases are solvable if we keep the uri as is and mention it is only for identifying the audio and not download audio content 16:27:10 burn: not talking about rereco, only for client to identify sessions 16:27:24 bringert: is it realistic to have all implementors to keep this stored all the time? 16:27:38 robert: why need to get the recording back? 16:28:19 burn: client doesn't do all of the endpointing, only recognizer knows what it got. For compliance you may need an entire recording and sometimes need to know specifically what was heard. 16:28:44 bringert: e.g. calling stock broker and say sell, then i sue them for selling and they prove i actually said it? 16:28:45 burn: yes 16:29:28 robert: way to solve is that this is a specialized app and have the service provider record all audio anyway and provide a session id to client.. 16:29:45 ... really hard to solve all such use cases 16:30:17 ... we can just provide a way to tag the session 16:30:52 burn: could solve session id and rereco by returning an opaque session id in the reco result, which can be passed up as a parameter 16:31:02 s/burn/bringert 16:31:30 burn: happy with that if we also have an api to get the audio in the client for the session 16:32:45 robert: don't understand why end user needs to listen to what recognizer heard 16:33:09 ... speech service could provide an orthogonal api for fetching all data for a given session id 16:33:21 bringert: this is quite common and we do it for debugging, not for end users 16:33:53 burn: not sure that end user will need it, ui/mic tuning can be done offline 16:35:15 mbodell: helpful if audio can be obtained easily without doing something complicated. another use case - smart answering machine which transcribes and fall backs to the recorded audio if dictation wasn't successful 16:35:20 robert: what is the logic for such a webapp? 16:35:39 bringert: capture audio, send to server and cache locally, if response is fine send as email and otherwise send captured audio 16:35:51 mbodell: may want to listen to your audio before sending 16:36:17 ... so should be easy to play back sent audio 16:36:36 bringert: all of this can be done with media capture api 16:36:45 robert: this is like a mic api and we decided earlier to avoid that 16:37:37 bringert: so I propose we remove save/inputWaveformURI and instead add a sessionId in the response. Also add a way to pass this for rereco 16:37:55 mbodell: makes sense for saveWaveformURI, inputWAveformURI is a different use case 16:38:21 ... rereco is not the only use case. e.g recognize something recorded a long time ago 16:38:30 ... or audio stored elsewhere 16:38:45 robert: mediastream will allow that 16:38:51 bringert: agree 16:38:58 s/robert/smaug 16:39:24 burn: requires the client fetch and process the file contents itself, turn into a stream and pass to the server 16:39:25 s/robert: mediastream/Olli; mediastream/ 16:39:45 s/Olli;/Olli:/ 16:39:49 mbodell: has an issue with bandwidth usage 16:40:46 bringert: having specific apis to tell one service to talk to another service/uri adds complexity and security 16:40:55 mbodell: i don't buy both those reasons 16:41:31 robert: there are security problems as we have 3 entities now and all have to share security context. it is possible to do out of band 16:42:25 mbodell: if audio is in a private intranet could use mediastream api 16:42:57 mbodell: but there is much audio that is publicly available and could be fetched directly 16:42:58 bringert: is the use case like transcribing a youtube audio/video ? if writing that webapp instead of a service which fetches and transcribes once instead of in a webapp? 16:43:31 ... doesn't seem like a web application, not efficient 16:44:28 mbodell: similar to specifying a grammar, this may not be different than that 16:44:49 bringert: yes they are similar, just that use case is a lot weaker and there are other ways to accomplish the same thing 16:45:14 ... since more than one person would be interested in transcribing publicly available audio. 16:45:43 mbodell: don't agree with that, easy to do if you own the service 16:46:42 ... other protocols like MRCP already require such functionality. agree that there are other ways but that is the wrong optimisation. 16:47:23 bringert: probably not a big concern, use case feels pointless and its another feature but not hard to implement 16:47:34 ... but there is the codec issue 16:47:42 mbodell: could be figured out in protocol handshake 16:48:15 robert: in protocol group it came to uLaw and PCM as required codecs 16:48:31 mbodell: same discussion will happen in synthesis api so not unique to this context 16:49:00 bringert: could use the same uri mechanism for rereco 16:49:19 robert: what would be the header when fetching the uri, that'll specify the codec used? 16:49:51 bringert: assume standard http response headers would have the mime type or audio contains magic bytes to tell what codec is used 16:51:51 bringert: session id idea still stands and will be returned in the recognition result and request will take this id as an optional field. inputWaveformURI refers to a normal uri on the web 16:52:18 ... though rereco can fail if the id goes stale or service doesn't support storing audio 16:52:52 ... related boolean field present is 'saveForRereco' so webapp specifies in advance if it wants storing and rereco 16:53:09 Summary: remove saveWaformURI; keep inputWaveformURI with normal URI/http semantics; add a session id (format unknown - URI that isn't necessarily a URL?) to the result; Add ability to rereco from session id 16:53:48 robert: a counter proposal is to let service not send sessionId if it doesn't support saving audio 16:54:01 ... and rereco could be done by saving audio locally with mediastream 16:54:12 ... leave the flag as an optional optimisation. 16:55:43 bringert: good point, the result could always return a sessionId and a separate flag 'savedForRereco' will be set to true if server supported that feature 16:55:53 ... so sessionId is always present and can be used for logging etc 16:56:10 mbodell: should the separate variable/flag be a boolean or some other token? 16:56:31 bringert: could just use sessionId for referring to saved audio 16:56:58 mbodell: useful to differentate audio chunks in continuous reco, whereas sessionId could refer to the whole session 16:57:10 robert: rereco should allow specifying a time range 16:57:49 bringert: what if i get 2 results and i want to rereco the whole audio covering both results? 16:58:18 robert: could specify time range in the rereco method 16:59:36 ... between starting and finishing a recognition there is continuous recording of audio and you have an audio token. that might be different each time you cycle that request. 16:59:52 bringert: what audio does it refer to? from start to stop? 16:59:54 robert: yes 17:00:07 bringert: for rereco could pass in audioId, start and stop 17:00:40 ... rereco should be a separate method 17:02:08 terrible echo 17:02:08 robert: doesn't think so, instead of using mic input should use saved audio 17:02:51 ... same as starting reco in the normal case otherwise 17:03:01 bringert: what do stop and abort mean if you start rereco 17:03:30 robert: could call abort if result didn't come soon enough and you want to cancel 17:04:34 bringert: this will need 3 new attributes, rerecognizeFromId, rerecognizeFromStart, rerecognizeFromEnd or could be an object with 3 attributes 17:05:12 robert: could also reuse inputWaveformURI 17:05:42 s/robert/michael/ 17:06:14 ???: are we saying 3 attributes are better than 1 new method? 17:06:38 robert: better than having 2 ways to do reco, better way is to say where to get the audio from (local or saved) 17:06:59 s/???/Milan/ 17:07:25 ... similar to what we have specified in the protocol api work 17:08:16 satish: should we talk about the 2 new attributes added to the IDL? 17:08:27 http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/att-0012/speech-protocol-draft-05.htm#reco-headers 17:08:39 mbodell: sounds fine to me, need a way to specify continuous reco 17:09:05 http://example.com/retainedaudio/fe429ac870a?interval=0.3,2.86 17:09:35 this is an example of a wave uri with time intervals: http://example.com/temp44235.wav?interval=0.65,end 17:09:52 A different example might be: sessionid:foobar?interval=0.3,2.5 17:09:58 and here's another: http://example.com/retainedaudio/fe429ac870a?interval=0.3,2.86 17:10:32 bringert: I'll go back on my earlier concern, seems fine to use the inputWaveformURI for rereco from an earlier session and recognizing from publicly accessible audio 17:10:48 ... even for public URI should allow passing media fragments/time range 17:11:39 burn: the URI should just be something that the service can access 17:12:26 bringert: for continuous reco, have we talked about how results would be received? 17:12:44 topic: continuous reco attribute 17:13:08 mbodell: we have a simple proposal and satish sent one for complex scenario, should discuss both 17:13:21 robert: which isthe simple proposal? 17:13:35 bringert: probably the last one I sent to the mailing list 17:14:26 ... sent on Aug 25, subject 'web api discussion in today's call' 17:14:28 http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0033.html 17:15:35 satish's proposal for results API for continuous reco: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0034.html 17:15:53 topic: filtering offensive words attribute 17:16:20 bringert: 2 situations - filtering from language model so '***' gets recognized as 'duck' and the second one could send results back as 'f***' 17:16:33 ... for first could just choose a different grammar 17:17:08 robert: why can't we use grammar for both? 17:17:45 ... could even be a 'builtin:dictation?noOffensiveWords' 17:18:08 ???: this feels like a user selection 17:18:16 s/???/Glen/ 17:18:17 ... than a website selectable setting 17:18:34 mbodell: this is the mechanism to communicate this setting to the service 17:19:30 bringert: the problem is about misrecognizing something as offensive words - even random noise gets recognized as an offensive word 17:20:18 glen: agree that grammar could be the mechanism but should the web app specify it or should the UA? 17:21:58 burn: agree with glen, happens to me all the time with autocorrect and if it annoys me I turn it off 17:22:10 ... this is something the browser should provide as a setting and not the web app 17:23:05 mbodell: if i'm in an adult site it is not useful to send a flag to speech service saying don't send me back naughty words 17:23:58 bringert: as an example, we have a global flag on android to not return offensive words. there seem to be uses who don't mind offensive words and those who don't want 17:24:36 burn: users may be willing to input offensive words in some sites and not in some 17:24:46 satish: e.g. you never want to send offensive words in an office email web app 17:25:00 glen: we may need both, as a user setting and a web app setting 17:25:55 -Michael_Johnston 17:26:33 robert: grammar should be enough 17:26:51 glen: if using a custom grammar you are defining your own words 17:28:05 bringert: UA could do it like how it does spell check and only pass sanitized results to the web app if it wants 17:28:38 mbodell: so conclusion is to leave it out of the IDL 17:28:50 ... and allow a way to pass a hint via the grammar 17:29:24 ... something like 'builtin:dictation?noOffensiveWords' 17:30:13 -[Microsoft] 17:30:19 -Milan_Young 17:30:20 -Olli_Pettay 17:30:20 -Glen_Shires 17:30:21 -Debbie_Dahl 17:30:21 -Patrick_Ehlen 17:30:22 -Dan_Burnett 17:30:22 -Dan_Druta 17:30:24 -Bjorn_Bringert,Satish_Sampath 17:30:29 -Charles_Hemphill 17:30:30 -Michael_Bodell 17:30:32 INC_(HTMLSPEECH)11:30AM has ended 17:30:33 Attendees were Dan_Burnett, Olli_Pettay, Debbie_Dahl, +1.760.705.aaaa, [Microsoft], Dan_Druta, Bjorn_Bringert,Satish_Sampath, Michael_Bodell, +1.408.359.aabb, Glen_Shires, 17:30:35 ... +1.818.237.aacc, Patrick_Ehlen, Milan_Young, Charles_Hemphill, Michael_Johnston 17:30:48 zakim, bye 17:30:48 Zakim has left #htmlspeech 17:30:52 rrsagent, make logs public 17:31:09 rrsagent, draft minutes 17:31:09 I have made the request to generate http://www.w3.org/2011/09/22-htmlspeech-minutes.html burn 17:31:31 s/, +1.760.705.aaaa// 17:31:49 s/, [Microsoft]/, Robert_Brown/ 17:32:10 s/, +1.408.359.aabb// 17:32:31 s/ +1.818.237.aacc,// 17:32:36 rrsagent, draft minutes 17:32:36 I have made the request to generate http://www.w3.org/2011/09/22-htmlspeech-minutes.html burn 17:45:03 ddahl has left #htmlspeech 19:21:05 smaug has joined #htmlspeech