15:49:45 RRSAgent has joined #htmlspeech
15:49:46 logging to http://www.w3.org/2011/05/05-htmlspeech-irc
15:49:52 Zakim has joined #htmlspeech
15:50:00 trackbot, start telcon
15:50:02 RRSAgent, make logs public
15:50:04 Zakim, this will be
15:50:04 I don't understand 'this will be', trackbot
15:50:05 Meeting: HTML Speech Incubator Group Teleconference
15:50:05 Date: 05 May 2011
15:50:09 zakim, this will be htmlspeech
15:50:09 ok, burn; I see INC_(HTMLSPEECH)12:00PM scheduled to start in 10 minutes
15:50:19 Chair: Dan_Burnett
15:50:39 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011May/0001.html
15:55:31 INC_(HTMLSPEECH)12:00PM has now started
15:55:38 +Dan_Burnett
15:55:41 zakim, I am Dan_Burnett
15:55:41 ok, burn, I now associate you with Dan_Burnett
15:56:37 +Michael_Bodell
15:57:08 mbodell has joined #htmlspeech
15:57:42 +Ronald
15:57:56 bringert has joined #htmlspeech
15:58:41 zakim, Ronald is Bjorn_Bringert
15:58:41 +Bjorn_Bringert; got it
15:58:44 +[Microsoft]
15:58:57 zakim, [Microsoft] is Robert_Brown
15:58:57 +Robert_Brown; got it
15:59:08 zakim, nick bringert is Bjorn_Bringert
15:59:08 ok, burn, I now associate bringert with Bjorn_Bringert
15:59:12 Robert has joined #htmlspeech
15:59:19 zakim, nick Robert is Robert_Brown
15:59:19 ok, burn, I now associate Robert with Robert_Brown
15:59:21 +??P34
15:59:32 smaug has joined #htmlspeech
15:59:45 zakim, ??P34 is Olli_Pettay
15:59:45 +Olli_Pettay; got it
15:59:58 zakim, nick mbodell is Michael_Bodell
15:59:58 ok, burn, I now associate mbodell with Michael_Bodell
16:00:05 -Olli_Pettay
16:01:09 Charles has joined #htmlspeech
16:01:55 + +1.425.830.aaaa
16:01:59 +??P50
16:02:14 zakim, aaaa is Charles_Hemphill
16:02:14 +Charles_Hemphill; got it
16:02:22 zakim, nick Charles is Charles_Hemphill
16:02:22 ok, burn, I now associate Charles with Charles_Hemphill
16:02:28 zakim, ??P50 is Olli_Pettay
16:02:28 +Olli_Pettay; got it
16:02:41 zakim, nick smaug is Olli_Pettay
16:02:45 ok, burn, I now associate smaug with Olli_Pettay
16:03:43 Scribe: Charles_Hemphill
16:03:44 smaug has joined #htmlspeech
16:03:53 smaug has joined #htmlspeech
16:04:10 Zakim, nick smaug is Olli_Pettay
16:04:20 ok, smaug, I now associate you with Olli_Pettay
16:05:20 ScribeNick: Charles
16:05:33 Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011May/0001.html
16:05:52 Topic: F2F Logistics: Any updates on attendance, hotel bookings, and questions or details from Bjorn.
16:06:04 Michael has joined #htmlspeech
16:06:07 Bjorn: no updates on F2F
16:06:24 Burn: will send out schedule in the next few days.
16:06:36 Topic:Review new text in updated "Final Report" document [1] to ensure it matches what people think we agreed upon in our last teleconference.
16:06:39 document is http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110503.html
16:06:55 + +1.415.248.aabb
16:07:14 Burn: comments on the document - added general design decission - 17 new discussion bullets.
16:07:34 zakim, aabb is Patrick_Ehlen
16:07:35 +Patrick_Ehlen; got it
16:08:02 + +1.425.580.aacc
16:08:07 Topic: Determine if we already have other agreed-upon design decisions.
16:08:18 zakim, aacc is Dan_Druta
16:08:20 +Dan_Druta; got it
16:08:48 +Michael_Johnston
16:09:04 DanD has joined #htmlspeech
16:09:19 zakim, nick Michael is Michael_Johnston
16:09:19 ok, burn, I now associate michael with Michael_Johnston
16:09:34 Bjorn: discussion topic about mic capture access. Propose design agreement - should be possible to start speech reco without selecting mic - just pick default.
16:09:48 Burn: default vs. what you can do - two things.
16:10:11 Bjorn: There should be a default mic. Perhaps the only option.
16:10:12 satish has joined #htmlspeech
16:10:27 Born: saying explicit determination of mic should not be required.
16:10:46 Brorn: Should not need to enumerate mics before starting.
16:11:12 Robert: Think we let you mic other mics.
16:11:13 s/Brorn/Bjorn/
16:11:29 ehlen has joined #htmlspeech
16:11:41 zakim, nick ehlen is Patrick_Ehlen
16:11:41 ok, burn, I now associate ehlen with Patrick_Ehlen
16:11:48 Robert: that's a reasonable interpretation.
16:12:07 Robert: By default, mic provided by user agent default device.
16:12:24 Bjorn: Need to discuss second sentence later - picking a mic.
16:12:54 Bjorn: should be able to start reco without selecting mic - confirming agreement.
16:13:10 Robert: Assuming that the default will be used for mic.
16:13:18 Burn: notion of default mic.
16:14:01 Robert: Issue of user interface. Shows speaker activity. Is there a default user interface? Can the application override.
16:14:14 Bjorn: Have that requirement for default user interface.
16:14:41 Robert: RE: default user interface - shows it's listening and lets user cancel.
16:15:18 Olli: What is the default user interface. Something in the browser.
16:15:35 Brjorn: Should only user browser user interface. No Web app user interface.
16:15:51 s/Brjorn/Bjorn/
16:15:52 Olli: More security or privacy concerns otherwise.
16:16:34 DanD: worried about limitations of only in the browser.
16:16:43 Robert: Don't think that's true.
16:17:06 Robert: Default user interface. Can it be overridden. Where does it live. 3 discussions.
16:17:29 Robert: Google right in the Web page where the user clicks. Up to user agent to decide how to render.
16:17:43 Bjorn: Have a default interface now.
16:17:55 Raj has joined #htmlspeech
16:18:28 MichaelJ: Fine for default. Want APIs to allow someone to build their own. Different user experience. Allow this. Useful to have default. But now always appropriate.
16:18:40 +??P0
16:18:58 Bjorn: Have agreement on default. Have disagreeemnt on your own due to security reasons, etc.
16:19:11 DanD: Very limiting otherwise.
16:19:31 s/DanD/MichaelJ/
16:19:47 Bjorn: Should start speech by custom ways including JavaScript. Can hide that you're capturing audio if custom UI.
16:19:51 zakim, ??P0 is Raj_Tumuluri
16:19:51 +Raj_Tumuluri; got it
16:20:01 zakim, nick Raj is Raj_Tumuluri
16:20:01 ok, burn, I now associate Raj with Raj_Tumuluri
16:20:22 Robert: Compromise - default UI parameterized? Provide feedback to the user. Style sheet. Look at customizations.
16:20:40 DanD: Up to user agent to allow customization. Part of permissions API.
16:21:00 Burn: Should be a default user interface.
16:21:11 s/DanD/MichaelB/
16:21:16 Burn: Should there be customization and what level.
16:21:57 DanD: Not all use cases in browsers. Different security concerns if rendering engine used. Should not be forced by HTML spec to have a particular UI.
16:22:17 DanD: Don't want to prevent annimated character app that is listening to you.
16:22:44 Bjorn: Talk about browser case. Need to be clear tha the browser is capturing the audio.
16:22:57 Dand: COuld be a matter of security settings.
16:23:20 Bjorn: Don't say that we disallow customization, but don't require this.
16:23:38 DanD: End up with fragmentation. WOn't work cross browser.
16:23:48 Bjorn: Allow for non-browser apps.
16:24:41 Bjorn: Note for future discussion.
16:25:01 Bjorn: Allow customization of the user interface that show audio capture is happening.
16:25:17 Burn: Have a discussion topic of the level of customization allowed.
16:26:20 Bjorn: SHould have customization for the UI for starting recognition. Have discussion topic: customize UI for showing that audio is being captured.
16:26:45 MichaelJ: Waveform, traffic lights?
16:27:00 Bjorn: Can app customize what the app looks like?
16:27:30 MichaelJ: Can customize one that show up in the UI.
16:27:37 Regrets: Debbie_Dahl, Marc_Schroeder
16:28:24 MichaelJ: Multimodal tap and talk API. Want creativity. Activate recogntition button. DOn't want to rule out certain kinds of APIs. Dont' want built-in browser feedback to interfere.
16:28:42 Burn: come back to this discussion later.
16:29:22 Topic: Begin discussing issues listed in the Appendix.
16:30:06 Burn: Have time to discuss a serious topic. Can work out serious issues at FTF.
16:30:34 Burn: Determine which topics have more meat. Start with audio.
16:31:01 Burn: 3 audio related topics. How to get audio capture access. Manditory audio codecs. Audio streaming support and how.
16:31:49 Bjorn: 1st unrelated to 2nd two. 1st is API. 2nd two how audio is sent form browser to implementation.
16:32:09 Burn: How to get audio mic capture access.
16:32:26 Bjorn: MS proposal has mic selection. What are use cases?
16:33:06 "audio mic capture" is "audio/mic/capture"
16:33:10 Robert: Browser going to have mic API anyway. Avoid 2 mic APIs. 1 in speech and anothe unrelated (explicit). Want speech API to integrate with browser API.
16:33:39 Robert: Many devices will have mult. mics. Improtant to select the one you want. Maybe app or user through prefences.
16:34:05 Robert: May want to configure mic settings. Use for things other than speech. E.g. video app that does speech reco.
16:34:43 Robert: MS API allows this. Can get audio strem to reco. Look at multimodal scenarios. Need for integrated API there. Speech API should integrate.
16:34:51 Bjorn: Can buy most of that.
16:35:08 Bjorn: If there is one there, should be able to use for speech. But no such standard API yet.
16:35:37 Robert: Pushing capture API heavily. With michael. IE team thinks this is a sound approach.
16:35:47 Burn: Agree ability to select diff. audio sources.
16:36:32 Robert: Not quite it. If browser has mic API - we should be able to use it.
16:36:49 Bjorn: Agree. But if not one, don't want to come up with one ourself.
16:36:54 Olli: agree.
16:37:33 Bjorn: If HTML standard has one, we should be able to use it.
16:37:43 Robert: Fine with HTML rather than browser.
16:38:23 Burn: Meta decision. Use HTML if exists, but not create one.
16:38:34 Robert: Have requirements for such an API?
16:39:06 Robert: Latest draft doesn't have notion of stream of endpointing. And we care deaply about these for mic API.
16:39:25 ???: Why does mic API need endpointing?
16:39:33 s/???/Bjorn/
16:39:38 Robert: Can be a long way between mic and endpointer.
16:39:55 should "stream of endpointing" be "stream or endpointing"?
16:40:09 Bjorn: Requirement that endpointing be available for things other than speech.
16:40:40 Michael: Hopefully, have agreement - will work with people designing the API and express requirements.
16:40:44 Bjorn: Seems fair.
16:41:45 Olli: Capture API in HTML draft or draft working group.
16:42:22 Robert: Mean the one in the DAP working group.
16:42:30 Bjorn: Think we should work with HTML.
16:43:10 Burn: 2nd one tricky. Wrote we will capture an express requirement on a capture API to relavent groups.
16:43:22 Bjorn: Seems reasonable. Avoid "capture".
16:43:42 Burn: requirements on audio capture APIs.
16:43:55 Burn: requirements on all audio capture APIs.
16:44:00 Bjorn: seems fine.
16:44:22 Olli, is there a capture API in the w3c HTML draft? I don't see it at http://dev.w3.org/html5/spec/Overview.html
16:44:49 mbodell: I don't read that version of HTML spec ;)
16:45:04 Bjorn: If no HTML audio capture API. Propose that we proceed even without a mic API.
16:45:09 mbodell: http://www.whatwg.org/specs/web-apps/current-work/multipage/dnd.html#video-conferencing-and-peer-to-peer-communication is an early draft
16:45:23 (now Robert is speaking)
16:45:48 Robert: Concern - browsers will need to implement privacy and security policies. Weird to have for speech alone, but not audio capture in general. May be messy.
16:46:04 Bjorn: Forge ahead, and consider audio capture in general.
16:46:19 Burn: Agreement that's important.
16:46:32 Bjorn: Having control over audio capture does not have to be in the first proposal.
16:46:45 Burn: Is that the concencus?
16:47:03 Bjorn: OK to have speech API if there is not an audio capture API.
16:47:32 Robert: Not create one, and shouldn't be blocked from moving forward.
16:47:54 Burn: Not create one and not block while waiting for one.
16:48:15 Michael: May design suboptimal if no audio capture API and may not fit well once it's there.
16:48:37 Michael: Premature to jump to say we can make total progress without that.
16:49:26 DanD: Goal for group to submit the requirements to the other working groups. Accelarating the cature API for audio may be one of the recommendations. AT&T member of DAP. Recognize needs.
16:49:45 Bjorn: Agree we should not block this progress while waiting.
16:49:56 DanD: May create fragmentation.
16:50:09 Dand: Unless abstracted completely to "get mic".
16:50:25 Bjorn: Agreed that we should start reco without specifying mic.
16:50:39 DanD: Concerned that we should avoid fragmentation.
16:50:50 Burn: Good to get agreement.
16:51:38 Dand: API for capture, if we are able to capture the audio without web developer going through coding, then we are fine.
16:52:06 Dand: If anything specific in the web application to retrieve the audio handle, then we're looking for if-then-else statements.
16:52:14 Bjorn: We would like to do the former.
16:53:29 Burn: What is meant by "start of speech", "end of speech", and endpointing in general? How do transmission delays affect the definitions and what we want in terms of APIs?
16:54:31 Robert: Divide into smaller topics. Distributed env., with speech services remote. 2 notions of endpoiting: by reco or cheap on client (responsiveness and reduced network IO). Look at these 2 as seperate.
16:54:50 Bjorn: Throw out proposal. Require client-side simple endpointer?
16:55:00 Robert: Has my vote.
16:55:09 Burn: No endpointer on my computer.
16:55:22 Bjorn: Browser could do simple energy-based end pointing.
16:55:52 Robert: Lots of optinos. GSM encoder has endpointer. Can have local reco and use for endpointer.
16:56:16 Burn: APi needs to assume client as well as server-side endpointer. client could be null op?
16:56:37 Bjorn: Stronger: has to be something in the client that does tell start and end of speech. even if not good.
16:57:08 Michael: Can see recommending. Don't know how web author can know. requirement is low latency. doesn't matter after that.
16:57:37 Bjorn: Agree with that. But if app points to specific recognizer, can interact.
16:58:28 Burn: Why concerned. Reco can get finicky about input based on training. Endpointing is mostly done in advance. Be careful about requiring local endpointing. If bad, can affect reco.
16:58:40 Bjorn: Avoid bad endpointers.
16:59:05 Bjorn: Low latency speech dectection should always be available.
16:59:51 MichaelJ: But not forced to use it. FedEx example: some query - using endpointing from reco - want them to be able to use the standard. Client endpointing could cause errors.
17:00:11 Bjorn: Have some parameters. Make it easier for the app. Think you're speaking.
17:00:23 Burn: Ongoing recognition case - won't use loca endpointer.
17:00:37 Burn: plenty of open mic apps - listen for keywords.
17:00:51 Bjorn: Should be one, but should be possible for app to turn off.
17:01:04 Robertt: probably want app to turn it on if it needs it.
17:01:18 Michael: Set a parameter and get it that way.
17:01:51 Bjorn: Hello world app.
17:02:08 Charles: Level for feedback - good to be local.
17:02:24 Burn: Low latency endpoint detector shoudl be available.
17:02:34 Bjorn: Don't have agreementn if on or off by default.
17:02:53 MichaelJ: Talking about detection of end of speech or start too?
17:03:07 Burn: may be big difference.
17:03:26 Burn: Want low latency to turn on speech to reco - but don't want it to stop.
17:03:34 Bjorn: we do the opposite.
17:04:00 Bjorn: Start streaming right away, server endpoints, but need to stop streaming at some point.
17:04:36 Robert: very scenario dependent. Need start stop speech event. Start when click of button, end matters a lot. Need to have optinos available.
17:05:10 Burn: Forwarding audio to expensive recognizers. Want high accuracy on end pointing. Don't want to send audio unless we have to due to expense.
17:05:33 Bjorn: Cutting off audio vs. endpointer. Can not listen for the event. Control if endpointing cuts off audio.
17:05:44 MichaelJ: Need to control when start sending audio to recognizer.
17:05:55 Burn: Start speech adn reco can be different.
17:06:28 MichaelJ: If reco on for a long time, may want to do something do delay until there is certainty of speech.
17:06:58 Bjorn: Agree tha there is low latency endpointer is available. Should be possible for app to decide if audio is started of stopped on endpointer.
17:07:24 Burn: Audio start /stop separate from speech start/stop. Seperatly controllable.
17:07:58 Burn: Detector detects both start/end of speech and fires an event in each case.
17:08:09 Bjorn: Seperate issue of cutting off audio.
17:08:21 Burn: Audio to the reco process as opposed to TTS.
17:08:52 Burn: Audio start and stop to reco server (resource)...
17:09:05 Bjorn: Control over which audio is used for speech recognition.
17:09:19 Bjorn: which part of the captured audio.
17:10:00 DanD: Make sure we carefully agree that we are not forcing the application into using the predefined environment engine of the browser and still allow developer which engine to use.
17:10:16 DadD: have a flag. If use optimzied endpointing in application of not.
17:10:33 Bjorn: Seperate from how you choose the engine.
17:11:19 MichaelJ: Related - if turned on, give some sort of event for local prediction of begin/end of speech, is that the resolution we want? If level dectector, can also get level?
17:11:42 Bjorn: Ahould be a more precise way to get actual events from recognizer. Level part of mic API?
17:12:38 MichaelJ: Could be raw energy detector, limited reco listing for "silence", etc. for the local part. The browser, client side, can have best that it can. Not saying anything about how it's done.
17:13:16 Burn: May be a difference when there are multiple endpointers. (1) low latency - prefilter to decide if goes to reco, (2) high quality in engine.
17:13:41 Burn: Would want recognizers endpoint detector. But preprocess one is the low latency one.
17:13:56 Bjorn: 2 event : 1 probably vs. actual start/end of speech.
17:15:07 MichaelJ: Talking now vs. not. More going on underneath. Get complicated to expose underneath if varies by implementation. Energy level might drive aspects of the API.
17:15:49