HTML Speech Incubator Group Teleconference -- 11 Nov 2010

trackbot start telcon

trackbot, start telcon

<trackbot> Date: 11 November 2010

<marc> yes

<smaug_> ah

minute takers and expectations

<scribe> scribe: ddahl

dan: important to start and end on time
... in the future we will track when people take minutes, also will prefer to to select people who join late and haven't taken minutes
... will start asking newer people in the coming weeks
... suggestion that we check on f2f minutes and requirements draft
... send minor corrections to minutes by email, or bring up major corrections now.

<scribe> ...new requirements draft sent out based on f2f discussion by Michael Bodell

<smaug_> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Nov/0102.html

michael: started new section and struck out requirements that have been deleted. the new draft just covers f2f

dan: concerns about requirements?

<burn> agenda is at http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Nov/0104.html

marc: should this requirements document be at a stable uri?

michael: would like to do this, but still need CVS access

dan: we're going through this process as quickly as we can
... this is supposed to be more of a live draft
... comments on requirements?

paolo: would like to know more about section 4, is that the place where the new changes will be placed?

michael: yes
... as we go through this, all of section 3 will be moved, but section 4 will replace it
... confusing to mix old and new requirements

dan: paolo, were you asking if there was a way to link old and new requirements?

paolo: yes, and in the old one there was explanation

michael: was mainly trying to capture what we agreed on rather than include a lot of text

<scribe> ...new ones link back to old ones

dan: this approach reduces confusion about which are the new requirements, when we finish the old requirements I will ask if there's anything that didn't get captured, then it will be safe to remove old requirements.

requirements

dan: start with R21

michael: status is that there was mostly agreement that it was out of scope, was not sure what the scenario was that Eric Johansson had in mind.

dan: recommend drawing a line through this requirement and letting Eric raise concerns

bjorn: i think Eric agreed that it was out of scope

dan: he may want to continue discussion

R6

michael: we already covered this in discussion of R27

<burn> for the minutes, also note that we explicitly had consensus on the call to remove R21

dan: does anyone object to removing R6?

(no objections)

R17

michael: little discussion, but some bleeded over from R18
... should have requirement that API for recognition should not introduce unneeded latency

dan: any more comments on R17?

<bringert> my connection dropped, dialing again

dan: any objections to Michael's wording to replace R17

michael: two requirements, one that Bjorn proposed and one that Michael proposed
... first one is that applications can start processing captured audio right away and second says that applications should not unneeded latency

dan: any objections to replacing original R17 with these two requirements?

(no objections)

<smaug_> "Implementations should be allowed to start processing captured audio before the capture completes." and "The API to do recognition should not introduce unneeded latency."

R18

<bringert> (w3c doesn't answer the phone), no objection from me

<burn> bjorn, just try again. zakim does this sometimes

dan: we acknowledge no objections to R17 (including from bjorn)
... we will do what we can on R18, will get started while waiting for bjorn
... any objections?

<mbodell> Implementations should be allowed to start playing back synthesized speech before the complete results of the speech synthesis request are available.

dan: any objections to adding this requirement?

michael: other proposals that may have come out of this discussion.
... other issues about what user agent must allow, codecs, not sure how to tackle

dan: there are others where we're closed to consensus, so let's start with those
... some of the ones proposed by Milan were addressing other wording on email list

milan: first two, about writing and reading to the audio buffer is handled

bjorn: we might have consensus on the requirement about unneeded latency
... for TTS

marc: we agree with this

michael: that takes care of the first 3, and the others aren't needed
... then we had user agents must not filter results from the application

<burn> no, not "the others aren't needed" -- instead, the next 3 are not needed

<burn> ... through filtering results

bjorn: the next two are that the ua must not interfere with result or timing

<burn> oops, i meant through passing parameters

<burn> yes, we are now discussing results and timing

olli: you might have some results that are suspicious
... that the ua might want to filter

marc: what speaks in favor of these requirements?

milan: concerned that ua think that it knows best about the speech interaction and deleting something that was part of the protocol between the speech resource and the web application

michael: api needs to be extensible so that additional functionality is supported?

milan: actually wants to make sure that if an EMMA result is sent to the web app the ua must not change it because it doesn't know what it is? also wants events fired by the speech resource to make it into the user space.

olli: what kind of event are you talking about?

<burn> actually, that was olli

<burn> (not bjorn)

<mbodell> Yeah, I think ealier too about supspicious results was also olli

milan: if we have a "start of speech" event, as long as our API is flexible enough we can add new things
... API should be flexible so that new events don't break apps

bjorn: how about if we say that it should be possible to add new information to speech recognition results
... then for events we would have to require that speech server specific events would be able to be returned to the web app

milan: web applications need to be able to continue to run even if they aren't expecting them

bjorn: can't expect that app will never crash
... so speech server should be able to return implementation-specific events
... two new requirements

<bringert> new requirement 1: speech recognition implementations should be allowed to add implementation specific information to speech recognition results

<bringert> new requirement 2: speech recognition implementations should be allowed to fire implementation specific events

milan: previously we had discussed that there was a particular ordering of events, if we just have events being fired in between those events, we might destabilize applications

bjorn: the apps that don't care about events won't register for them. and if we say "a before b" that doesn't mean that there's not anything in between.

milan: if speech server doesn't generate "start of speech" does the ua insert it?

bjorn: we haven't defined events

michael: it could be the ua or the speech resource that generates some of these events

dan: any objections to adding these two requirements as stated on IRC

milan: are we going to allow TTS to fire events? these just talk about recognition.

marc: it would make sense to have that?

michael: rather than changing R2 to speech resources would it make sense to have new requirement for TTS events?

bjorn: other use cases for TTS events other than "mark"?

marc: yes, "mark" is too coarse-grained for lip synchronization

<mbodell> new requirement 3: speech synthesis implementations should be allowed to fire implementation specific events

dan: sounds like we have agreement on these three

michael: back to R18

dan: should be split into two, first is exactly as worded

michael: i.e. requires support for remote services, e.g. by HTTP 1.1

dan: yes, second is that speech services and ua may negotiate on other protocols
... first one is mandatory support for HTTP 1.1

michael: does HTTP 1.1 include https?

marc: this seems like a low-level technical requirement, should not discuss now

dan: two issues -- we use a generic term "communication" but we might need to distinguish control and media
... at IETF there is a lot of discussion for using HTTP for streaming, when should you use that or not
... we might not want to define these protocols, just pick from what's available, but may not want to pick now

marc: would like to settle on some existing protocol, but now is too early

michael: likes rewording with "such as"

robert: likes "lowest common denominator"

<smaug_> that wasn't me

<mbodell> that was robert

marc: requirement should be: the communication between the us and the speech server muat allow for some lowest common denominator like HTTP 1.1

dan: we want to require a lowest common denominator protocol

bjorn: we want to avoid mismatch between browser and speech server communication

<burn> i said "require a mandatory-to-support communication protocol, such protocol TBD"

michael: the communication between the us and the speech server muat require a mandatory-to-support lowest common denominator such as HTTP 1.1, TBD

dan: the reason for this is that four months from now it might be construed as requiring HTTP 1.1

michael: do we have agreement on this?
... ok what about the second sentence?

dan: we could write a requirement but may not end up considering it as something that we need to do now
... we don't want to prevent negotiation in the future

robert: negotiation sounds like a runtime handshake but what we really want is the freedom to use something else if something better shows up
... in telephony negotiation is important, but in web apps you just ask for what you want and if it doesn't work, try something else

dan: this concept is important to capture

michael: requirement text?

<burn> what i had proposed initially was: "UAs and speech services may negotiate on use of other protocols for communication."

michael: another wording: "UAs and speech services may agree to use alternate protocols for communication."

dan: agreement?

(no objections)

<mbodell> agree with another wording

marc: another one for R18, should we have one for implementation data for TTS, a mirror image to the new one we added for ASR

robert: we added that at the f2f

michael: two more
... require ua's to expose an API for local speech services

bjorn: i don't see this as necessary

michael: i agree

dan: do you think that whatever API we define should be sufficient?

bjorn: the ua is free to talk to any speech service whenever it wants

<burn> my question is whether even in the local case that the single api we are defining should be required

<burn> ... and that UAs can optimize the behavior

<burn> or are you saying that the UA can do anything it wants any way it wants when the resource is local?

bjorn: as long as the specification is specified, if the web app requests a local speech service but it's not available, what happens

milan: we have a stereotype that a local speech service is embedded, but it could be a plug in. so if the app requests a local service, that should be honored.

bjorn: what if it's not there?

milan: if it's a plugin you might ask the user if they want to install a plugin

bjorn: to make that work we would have to specify a plugin language

milan: agree, but if a local service is requested it should be used

michael: isn't this an example of the web app requesting an alternate speech service?

milan: don't expect to require downloading of code

bjorn: how could this work without downloading code?

milan: we don't need to specify, just say that there has to be some mechanism

bjorn: what if we say that web app could point to a local service, but we shouldn't say that the ua has to try to install a local service

olli: what if some vendor only makes plugins for, i.e. IE

<bringert> that was olli

michael: we should have a requirement that speech services can specify a local service

<mbodell> new req: Speech services that can be specified by web apps must include local speech services.

dan: if we agree on this requirement, if someone wants to propose additional requirements we can discuss

michael: if the ua refuses resource it must inform web app (we already have this one)

<burn> so we have agreed to add this new requirement

michael: don't distinguish between network and local

milan: what does "default speech resource"?

<burn> we are now discussing the related FPR9 and 10 to understand whether we need to add anything else

michael: in the future the author should be able to just ask for speech service without specifying one

bjorn: a browser must provide speech service, it doesn't have to build its own

<smaug_> I may not agree with the requirement

dan: it sounds like we do have agreement with this requirement

<smaug_> I need to think about it a bit

<Milan> Who is smaug?

michael: can we remove R18 and discuss codecs later

<smaug_> smaug is Olli

dan: we could keep R18 just for that last point
... milan's concerns about R18 haven't been addressed
... next call next week

present?

- DRAFT -

HTML Speech Incubator Group Teleconference

11 Nov 2010

Attendees

Contents

minute takers and expectations

requirements

R6

R17

R18

Summary of Action Items

Scribe.perl diagnostic output