HTML Speech Incubator Group Teleconference -- 12 May 2011

<burn> trackbot, start telcon

<trackbot> Date: 12 May 2011

<bringert_> I'm having connectivity issues

<bringert_> and it looks like I'm in here twice

<ddahl> bjorn, we can hear you

<bringert> ok, I can't hear anyone else

<bringert> try a different connection

<bringert> trying

<burn> Scribe: Dan_Druta

<burn> ScribeNick: DanD

<burn> Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011May/0005.html

Updated final report draft

burn: Made a few changes
... Topic: Design Decisions with agreements

Design Decisions with agreements

Burn: Any new items we agree on?
... No design decisions?
... Any other topics to be discussed later?

Issues discussed in the appendix

Audio Codecs

robert: We don't think we should agree on codec. We should look at a few items: One by bandwidth, ip issues,
... there are trade offs
... fidelity is another issue

burn: We want the ideal codec but there no such thing
... Opus is a combination of codecs and an attempt to provide an industry standard

Milan: RTCWeb is looking at Opus

burn: The issue is which audio codecs is mandatory to support

mbodell: The question is if you can recognize an audio file

Milan: is the synthesizer also part of this?

Bringrt: Three items: 1. Codecs use for remote speech engine

<bringert> 1. codecs used between browser and web app specified recognizer

Milan: 2. Codec use for file speech

<bringert> 2. codecs used between web app and browser for recognition of existing audio

<smaug> there is terrible echo now

<bringert_> 3. codes used between browser and web app specified synthesizer

mbodell: we should allow other codec to be used

Milan: Sounds like requirements

robert: Microsoft uses SIREN owned by Polycom.

burn: Voxeo support all

<bringert> Google uses Speex, FLAC and AMR

Milan: Opus has the notion to cutoff audio and saves bandwidth
... speech has a critical requirement to capture the first part

burn: There are several codecs in Opus
... There was an attempt to merge

Michael: is the issue of support in mobile devices (hardware)
... for the mobile browsing we can rely on hardware and fall back

bringert: The one codec that has must support is Speex
... Caution - there's no container format

burn: another issue is transport (framing)

Milan: isn't an IETF standard

burn: It will require some sort of support for RTP
... How much SIP support will be needed?
... There's disagreement and not everybody want a full SIP stack

bringert: how about OGG?

<bringert> Speex codec in OGG container

<burn> s/disarrangement/disagreement/

burn: It is appropriate not to commit yet and review next week

Milan: It would be useful to know streaming

mbodell: Add a forth item to the list of elements: support for streaming

Milan: can we agree that the architecture should support streaming?

bringert: I'm fine if we support streaming before the engine starts processing

Milan: Recognizer should be able to return results before the end of speech

burn: Recognizer should be able to return final result before the end of speech

bringert: This rules out HTTP

mbodell: You can't get duplex but you can get intermediary responses

Milan: The client can chunk up responses
... Is it a violation if we use web sockets?

<bringert> I'm muted

<smaug> burn: we don't seem to have scribe anymore

burn: We need to be careful not to go in a different direction from RTCWeb

mbodell: different protocol for different use cases
... http works well for certain cases

robert: we don't want to over complicate
... RTC has a different set of requirements

burn: you are right

bringert: We have two choices: we go with http and add RTCweb

robert: or web sockets

bringert: is anyone opposing support for HTTP?
... for streaming
... We support it in Chrome 11
... We want to have http used for other interactions between the user agent and server

mbodell: It's not just audio if we understand correctly
... different apps would use different approaches

burn: we can't predict how it will be used

Milan: there's a continuous response

robert: I'd like to see a proposal before we agree

<bringert> http://tools.ietf.org/html/draft-zhu-http-fullduplex-02

Milan: I agree with a solution that uses HTTP as a basic but not full solution

robert: I would not call Web Sockets HTTP and I'd like to see a proposal

bringert: We should be able to use HTTP

burn: We are saying we are mandating HTTP not eliminating the potential support for other

bringert: the server does not know what's supported on the browser

robert: we need some discovery capability

burn: We believe Web Sockets will not be mandated for support

Milan: I'm not asking for that but a solution for bidirectional support
... if HTTP can do bidirectional we're fine

bringert: there's no reason not support HTTP.

<burn> bringert: would love bidirectional support if we had a good solid candidate for it

Milan: Instead of saying HTTP is required let's list the elements

bringert: We should require HTTP

burn: Agreement - we require http support for all communications and allow for others

mbodell: I'd like to have a solution for bidirectional support but we should not block the spec

burn: other topics around codecs?

mbodell: some audio codecs that support audio and video
... recognize audio from a video+audio stream

bringert: I would suggest we don't send video to reduce bandwidth
... if we don't have strong use cases we should not add it to the spec
... Should we disallow sending video?

burn: no agreements and the best way is not to make any other statements
... add this to the list of topics
... nobody is talking about gesture recognition just audio
... we will get back to this
... Other items related to codecs?

Milan: are there any other candidates:

burn: OPUS. Big but with support for different use cases

<mbodell> http://en.wikipedia.org/wiki/Comparison_of_audio_codecs

F2F Logistics

bringert: no updates
... I will come back with directions from the hotel to the offices
... We sent the directions from the airport
... everybody should have gotten the email

burn: it would still be good if we have some directions from hotel to the Google offices
... one more call before the f2f

bringert: There's a statement about the agreement on the user interface that is not well captured

burn: Yes, I somehow dropped the most important decision -- that it must NOT be possible to customize the part of the user interface that indicates the microphone is open. I will add that in.

- DRAFT -

HTML Speech Incubator Group Teleconference

12 May 2011

Attendees

Contents

Updated final report draft

Design Decisions with agreements

Issues discussed in the appendix

Audio Codecs

F2F Logistics

Summary of Action Items

Scribe.perl diagnostic output