See also: IRC log
<burn> trackbot, start telcon
<trackbot> Date: 22 September 2011
<robert> brb - gotta hydrate
<smaug> bringert: can you hear anything?
<burn> Scribe: Satish_Sampath
<burn> ScribeNick: satish
<burn> Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0036.html
burn: first topic is TPAC and we
will likely have work to be done in the webapi and protocol in
a face to face and some work on the document. It is highly
likely we'llh ave significant discussions and we'll have 2 full
days.
... number of people who register determines the place and
number of power outlets, so please register
<glen> Meetings at TPAC Nov 3-4 Santa Clara, CA http://www.w3.org/2011/11/TPAC/Overview.html
<glen> Register by Oct 14 for lower fee
<glen> Best hotel rates / rooms by Oct 10
burn: the two days that matter for us are thursday/friday
<mbodell> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0033.html
<burn> satish: reviews his IDL proposal (see link above)
bringert: could start with the
saveWaveformURI and inputWaveformURI questions
... could be part of the MediaStream 'input' attribute, may not
need a separate URI
robert: doesn't address where the waveform is at a remote server
bringert: what is the use case?
robert: re-reco is one use case
where audio was saved
... may have a 3 second utterance and don't want to upload it
again
<mbodell> This comes from "FPR57. Web applications must be able to request recognition based on previously sent audio."
bringert: seems like a corner
case and adds complexity to implementation
... could be a random unique token instead of a uri
... shouldn't say in the api that the uri should be
downloadable and fetch the full audio as a file
... perhaps replace it with a rerecognize method
Debbie: what if user wants to listen to what they said?
bringert: could be a UA feature instead of an API requirement
Glen: use cases: 1. listen to yourself, 2: re-reco with same service, 3: re-reco with a different service
robert: 3 is important because
one request could be setup with one grammar, another could be
with another grammar, app could use output of step 1 to figure
out the correct set of grammars for the second step
... could add a rereco method which takes in a set of
parameters for the second reco
... would be doing all this in one thread with event handlers
and won't have time to do async stuff
... e.g. a local search app with a coarse grammar identifying
states,cities and based on the result decide which granular
grammar to use the neighbourhood
Milan: the issue is whether the second reco takes place in the same service. if it does then that service can perform a rereco. only if using a different service it is a problem
burn: another use case - compliance. may be a need for the client to say i want to save these recos to get to them later. the only module which can identify is the service
bringert: could be proprietary extensions
mbodell: all use cases are solvable if we keep the uri as is and mention it is only for identifying the audio and not download audio content
burn: not talking about rereco, only for client to identify sessions
bringert: is it realistic to have all implementors to keep this stored all the time?
robert: why need to get the recording back?
burn: client doesn't do all of the endpointing, only recognizer knows what it got. For compliance you may need an entire recording and sometimes need to know specifically what was heard.
bringert: e.g. calling stock broker and say sell, then i sue them for selling and they prove i actually said it?
burn: yes
robert: way to solve is that this
is a specialized app and have the service provider record all
audio anyway and provide a session id to client..
... really hard to solve all such use cases
... we can just provide a way to tag the session
bringert: could solve session id and rereco by returning an opaque session id in the reco result, which can be passed up as a parameter
burn: happy with that if we also have an api to get the audio in the client for the session
robert: don't understand why end
user needs to listen to what recognizer heard
... speech service could provide an orthogonal api for fetching
all data for a given session id
bringert: this is quite common and we do it for debugging, not for end users
burn: not sure that end user will need it, ui/mic tuning can be done offline
mbodell: helpful if audio can be obtained easily without doing something complicated. another use case - smart answering machine which transcribes and fall backs to the recorded audio if dictation wasn't successful
robert: what is the logic for such a webapp?
bringert: capture audio, send to server and cache locally, if response is fine send as email and otherwise send captured audio
mbodell: may want to listen to
your audio before sending
... so should be easy to play back sent audio
bringert: all of this can be done with media capture api
robert: this is like a mic api and we decided earlier to avoid that
bringert: so I propose we remove save/inputWaveformURI and instead add a sessionId in the response. Also add a way to pass this for rereco
mbodell: makes sense for
saveWaveformURI, inputWAveformURI is a different use case
... rereco is not the only use case. e.g recognize something
recorded a long time ago
... or audio stored elsewhere
smaug: mediastream will allow that
bringert: agree
burn: requires the client fetch and process the file contents itself, turn into a stream and pass to the server
<mbodell> s/robert: mediastream/Olli: mediastream/
mbodell: has an issue with bandwidth usage
bringert: having specific apis to tell one service to talk to another service/uri adds complexity and security
mbodell: i don't buy both those reasons
robert: there are security problems as we have 3 entities now and all have to share security context. it is possible to do out of band
mbodell: if audio is in a private intranet could use mediastream api
<burn> mbodell: but there is much audio that is publicly available and could be fetched directly
bringert: is the use case like
transcribing a youtube audio/video ? if writing that webapp
instead of a service which fetches and transcribes once instead
of in a webapp?
... doesn't seem like a web application, not efficient
mbodell: similar to specifying a grammar, this may not be different than that
bringert: yes they are similar,
just that use case is a lot weaker and there are other ways to
accomplish the same thing
... since more than one person would be interested in
transcribing publicly available audio.
mbodell: don't agree with that,
easy to do if you own the service
... other protocols like MRCP already require such
functionality. agree that there are other ways but that is the
wrong optimisation.
bringert: probably not a big
concern, use case feels pointless and its another feature but
not hard to implement
... but there is the codec issue
mbodell: could be figured out in protocol handshake
robert: in protocol group it came to uLaw and PCM as required codecs
mbodell: same discussion will happen in synthesis api so not unique to this context
bringert: could use the same uri mechanism for rereco
robert: what would be the header when fetching the uri, that'll specify the codec used?
bringert: assume standard http
response headers would have the mime type or audio contains
magic bytes to tell what codec is used
... session id idea still stands and will be returned in the
recognition result and request will take this id as an optional
field. inputWaveformURI refers to a normal uri on the web
... though rereco can fail if the id goes stale or service
doesn't support storing audio
... related boolean field present is 'saveForRereco' so webapp
specifies in advance if it wants storing and rereco
<mbodell> Summary: remove saveWaformURI; keep inputWaveformURI with normal URI/http semantics; add a session id (format unknown - URI that isn't necessarily a URL?) to the result; Add ability to rereco from session id
robert: a counter proposal is to
let service not send sessionId if it doesn't support saving
audio
... and rereco could be done by saving audio locally with
mediastream
... leave the flag as an optional optimisation.
bringert: good point, the result
could always return a sessionId and a separate flag
'savedForRereco' will be set to true if server supported that
feature
... so sessionId is always present and can be used for logging
etc
mbodell: should the separate variable/flag be a boolean or some other token?
bringert: could just use sessionId for referring to saved audio
mbodell: useful to differentate audio chunks in continuous reco, whereas sessionId could refer to the whole session
robert: rereco should allow specifying a time range
bringert: what if i get 2 results and i want to rereco the whole audio covering both results?
robert: could specify time range
in the rereco method
... between starting and finishing a recognition there is
continuous recording of audio and you have an audio token. that
might be different each time you cycle that request.
bringert: what audio does it refer to? from start to stop?
robert: yes
bringert: for rereco could pass
in audioId, start and stop
... rereco should be a separate method
<smaug> terrible echo
robert: doesn't think so, instead
of using mic input should use saved audio
... same as starting reco in the normal case otherwise
bringert: what do stop and abort mean if you start rereco
robert: could call abort if result didn't come soon enough and you want to cancel
bringert: this will need 3 new attributes, rerecognizeFromId, rerecognizeFromStart, rerecognizeFromEnd or could be an object with 3 attributes
michael: could also reuse inputWaveformURI
Milan: are we saying 3 attributes are better than 1 new method?
robert: better than having 2 ways
to do reco, better way is to say where to get the audio from
(local or saved)
... similar to what we have specified in the protocol api
work
satish: should we talk about the 2 new attributes added to the IDL?
mbodell: sounds fine to me, need a way to specify continuous reco
<robert> http://example.com/retainedaudio/fe429ac870a?interval=0.3,2.86
<robert> this is an example of a wave uri with time intervals: http://example.com/temp44235.wav?interval=0.65,end
<mbodell> A different example might be: sessionid:foobar?interval=0.3,2.5
<robert> and here's another: http://example.com/retainedaudio/fe429ac870a?interval=0.3,2.86
bringert: I'll go back on my
earlier concern, seems fine to use the inputWaveformURI for
rereco from an earlier session and recognizing from publicly
accessible audio
... even for public URI should allow passing media
fragments/time range
burn: the URI should just be something that the service can access
bringert: for continuous reco, have we talked about how results would be received?
mbodell: we have a simple proposal and satish sent one for complex scenario, should discuss both
robert: which isthe simple proposal?
bringert: probably the last one I
sent to the mailing list
... sent on Aug 25, subject 'web api discussion in today's
call'
<mbodell> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Aug/0033.html
<bringert> satish's proposal for results API for continuous reco: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0034.html
bringert: 2 situations -
filtering from language model so '***' gets recognized as
'duck' and the second one could send results back as
'f***'
... for first could just choose a different grammar
robert: why can't we use grammar
for both?
... could even be a 'builtin:dictation?noOffensiveWords'
Glen: this feels like a user
selection
... than a website selectable setting
mbodell: this is the mechanism to communicate this setting to the service
bringert: the problem is about misrecognizing something as offensive words - even random noise gets recognized as an offensive word
glen: agree that grammar could be the mechanism but should the web app specify it or should the UA?
burn: agree with glen, happens to
me all the time with autocorrect and if it annoys me I turn it
off
... this is something the browser should provide as a setting
and not the web app
mbodell: if i'm in an adult site it is not useful to send a flag to speech service saying don't send me back naughty words
bringert: as an example, we have a global flag on android to not return offensive words. there seem to be uses who don't mind offensive words and those who don't want
burn: users may be willing to input offensive words in some sites and not in some
satish: e.g. you never want to send offensive words in an office email web app
glen: we may need both, as a user setting and a web app setting
robert: grammar should be enough
glen: if using a custom grammar you are defining your own words
bringert: UA could do it like how it does spell check and only pass sanitized results to the web app if it wants
mbodell: so conclusion is to
leave it out of the IDL
... and allow a way to pass a hint via the grammar
... something like 'builtin:dictation?noOffensiveWords'
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/???/Debbie/ Succeeded: s/???/Glen/ Succeeded: s/???/Milan/ Succeeded: s/burn/bringert/ Succeeded: s/robert/smaug/ FAILED: s/robert: mediastream/Olli; mediastream/ Succeeded: s/Olli;/Olli:/ Succeeded: s/robert/michael/ Succeeded: s/???/Milan/ Succeeded: s/???/Glen/ Succeeded: s/, +1.760.705.aaaa// Succeeded: s/, [Microsoft]/, Robert_Brown/ Succeeded: s/, +1.408.359.aabb// Succeeded: s/ +1.818.237.aacc,// Found Scribe: Satish_Sampath Found ScribeNick: satish Default Present: Dan_Burnett, Olli_Pettay, Debbie_Dahl, Robert_Brown, Dan_Druta, Bjorn_Bringert, Satish_Sampath, Michael_Bodell, Glen_Shires, Patrick_Ehlen, Milan_Young, Charles_Hemphill, Michael_Johnston Present: Dan_Burnett Olli_Pettay Debbie_Dahl Robert_Brown Dan_Druta Bjorn_Bringert Satish_Sampath Michael_Bodell Glen_Shires Patrick_Ehlen Milan_Young Charles_Hemphill Michael_Johnston Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Sep/0036.html Found Date: 22 Sep 2011 Guessing minutes URL: http://www.w3.org/2011/09/22-htmlspeech-minutes.html People with action items:[End of scribe.perl diagnostic output]