Hypertext Coordination Group Teleconference -- 14 Jan 2011

<trackbot> Date: 14 January 2011

<glazou> "this passcode is not valid"

<glazou> ah finally

<ChrisL> glazou, i used 4824 and it worked

<glazou> ah that was the french bridge

<glazou> the US one is in better shape

<glazou> good start for the new year darobin

<ddahl1> scribe: ddahl

<ddahl1> chair: ChrisL

Audio on the Web

<ChrisL> http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0005.html

<ddahl1> michaelB: in MMI, VoiceBrowser, and HTML SpeechXG. wants to capture audio in a way that user can interact with it. some proposals have capture and then upload, but that doesn't satisfy our use case.

<ddahl1> chris: other requirements for speech?

<ddahl1> michaelB: yes, endpointing, echo cancellation, playback for speech synthesis, and tying playback to barge-in

<glazou> Bert: the french bridge...

<ddahl1> the French bridge is hosed, call in on the US one if you can

<ChrisL> http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0003.html

<ddahl1> janina: our requirements came from making HTML 5 video and audio accessible

<ddahl1> ...video description uses secondary audio channel, used in broadcasting, different on the web, also looking for a way to play two binary resources, not necessarily the same length

<ddahl1> ...another one is the need to control volume and panning separately, or direct them to a secondary audio device

<ddahl1> chrisL: if something's being broadcast to a group, then different people might have different needs

<ddahl1> doug: how would that work?

<ddahl1> ...are you accessing different devices, how would you discover different devices?

<ddahl1> janina: I don't know if the browser knows, but the OS knows. I know it's discoverable on Linux, mapping the OS resources to the browser

<ddahl1> chrisL: a kind of labeling so that different things go to different devices

<ddahl1> ...also synchronization of multiple audio streams, SMIL does this, but not HTML5 audio

<ddahl1> janina: HTML5 seems to assume that files have the same timespan, but that might not be true for video description or for different languages

<ddahl1> chrisL: especially problematic for longer files

<ddahl1> janina: SMIL seems to work well, is used in Daisy Consortium

<ddahl1> ...we could take as much of SMIL for the use cases we need and leave the rest behind

<ddahl1> chrisL: similar to what we did with SVG

<ddahl1> doug: Audio XG -- audio api is an api for reading and writing to the live audio stream, one implementation that will be in Firefox, we just give access to the raw bits, a more sophisticated implementation in WebKit, also has a higher-level ability to manipulate audio in the browser

<ddahl1> ...we will make a WG, have been mostly talking about WebKit approach, should things be done in the browser or with script libraries

<ddahl1> chrisL: is script fast enough for helper methods?

<ddahl1> doug: I don't know, would be better to use helper methods in mobile devices because of processing constraints

<ChrisL> http://lists.w3.org/Archives/Member/w3c-html-cg/2011JanMar/0004.html

<inserted> scribenick: ChrisL

ddahl: Our primary use case involving audio is input and output of speech, mainly for interaction
... but also recording, like fro voicemail. so need to capture speech and to stream it
... not just batch capture
... support arbitrary processing - speech

recognition, speech understanding, speech-to-speech translation, emotion

detection, speaker verification, language/gender/age identification, medical

diagnosis

ddahl: will not support arbitrary translation
... need to contreol format and sampling rate
... capture speech on mobile or desktop or over telephone (last is a VB requirement)
... Able to combine semantics of speech with other inputs, like circling an area

and saying "Italian restaurants near here"

ddahl: control volume of output, pause and resume
... local or distributed cloud-based processing
... audio file output, tts, positioningof inputsand outputs
... multiple microphones? like a big meeting room and record the whole meeting

ChrisL: multichannel or mixing?

ddahl: both
... no use cases around capturing non-speech audio, for mm, but importtant for others

ChrisL: ability to determine if an audio input is speech or non-speech

<kaz> scribenick: ddahl

<inserted> scribenick: ddahl1

michaelB: also have concerns around security and privacy
... a microphone is like a keyboard, what are user expectations and behavior
... need to mix with functional requirements

chrisL: you can imagine some way of notifying the user that speech is being recorded.

janina: in the news today was a story about spyware on smartphones

michael: also need to be able to notify user in non-visual environments

doug: maybe a vibratory signal could signal when microphone is on
... nothing about privacy in the charter, but the spec will mention privacy
... charter basically has microphone access. lots of discussion about access to microphone. DAP WG is chartered to do it but hasn't done it. Audio WG will work on it if necessary

<darobin> DAP is doing something about this

<darobin> RTC will help as well

<darobin> more than happy to work with Audio

chrisL: comments from robin on DAP?

<darobin> and in fact we've done it

<darobin> just not at the level required yet

<darobin> but certainly can push further

<darobin> very basic access: http://dev.w3.org/2009/dap/camera/

<darobin> more advanced: http://dev.w3.org/2009/dap/camera/Overview-API.html

<darobin> and we want to do more advanced still, but will need some security model for it

<darobin> RTC == real time Web

<darobin> it's not called camera, URIs are opaque dammit :)

chrisL: "camera" spec sounds like it should be visual

<darobin> http://www.w3.org/TR/media-capture-api/

<darobin> http://www.w3.org/TR/html-media-capture/

<darobin> (same links, for people who read URIs)

<darobin> that is correct

michaelB: doesn't cover the streaming case for audio

<darobin> we're working on that, but it's harder security wise

<darobin> we're also synching with HTML WG

chrisL: separate specs, capture vs. streaming?

<darobin> yes, they build atop one another whenever possible

michaelB: maybe could be separate, but could be the same spec. working on proposals in HTML-SpeechXG, reviewing proposals

<darobin> feeeeeeeeeeeedback

<darobin> we wantsssss feeeeeeeeeeeeeeeeeeeedback

<darobin> I may not be able to speak today, but I can read :)

chrisL: HTML-speech XG should send email to DAP

<darobin> DAP: public-device-apis@w3.org

<ArtB> Web Audio API from Chris Rogers: http://chromium.googlecode.com/svn/trunk/samples/audio/specification/specification.html

- DRAFT -

Hypertext Coordination Group Teleconference
14 Jan 2011

Attendees

Contents

Audio on the Web

Summary of Action Items

Scribe.perl diagnostic output

- DRAFT -

Hypertext Coordination Group Teleconference 14 Jan 2011

Attendees

Contents

Audio on the Web

Summary of Action Items

Scribe.perl diagnostic output

Hypertext Coordination Group Teleconference
14 Jan 2011