See also: IRC log
<burn> trackbot, start telcon
<trackbot> Date: 28 April 2011
<burn> Scribe: Robert Brown
<burn> ScribeNick: Robert
<burn> Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html
Bjorn: nothing new logistically
Burn: will send revised schedule
<burn> final report draft: http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech-20110426.html
burn: no new comments
Bjorn: previously only looked at intersection of proposals, is there anything that's in two proposals but not the third. e.g. continuous recognition
Milan: any requirement that we support this?
burn: will add continuous recognition to the list of topics to discuss
Bjorn: only removed it from Google proposal because difficult to do , and may want to do it in a later version
Michael: recapped two scenarios stated by Bjorn: 1) continuous speech; 2) open mic
Bjorn: proposed that we all agree this is a requirement
Milan: we were vague about what the interim events requirement meant, whether it included results
<bringert> burn: satish is trying to join, but zakim says the conference code isn't valid
Burn: [after discussion] proposes Michael adds this as a new requirement (or requirements) to the report
Michael: sure, but will also check to see whether we just need to clarify an existing requirement
Bjorn: this is also a design topic
<satish> burn: will do
Bjorn: Robert is there anything else in the Microsoft proposal that should be considered as a design decision?
Robert: nothing apparent, will review again in coming week
Bjorn: should we start work on a joint proposal then?
Burn: proposes that we now go to the list of issues to discuss and discuss them
Bjorn: more items for discussion
from Microsoft proposal
... MS proposal supports multiple grammars, but Google &
Mozilla only supports one
Olli: Mozilla proposal allows multiple parallel recognitions, each with its own grammar
MichaelJohnston: can't reference an SLM from SRGS, so multiple grammars are required
Bjorn: proposes topic: Should we
support multiple simultaneous grammars?
... proposes topic: which timeout parameters should we
have?
<smaug_> yeah, Mozilla proposal should have some timouts
<smaug_> timeouts
Bjorn: emulating speech input is a requirement, but it's only present in the Microsoft proposal
Michael: proposes topic: some way for the application to provide feedback information to the recognizer
Bjorn: does anybody disagree that this is a requirement we agree on?
Burn: proposes requirement: "it must be possible for the application author to provide feedback on the recognition result"
Debbie: need to discuss the result format
Michael: seems like general agreement on EMMA, with notion of other formats available
Olli: EMMA as a DOM document? Or as a JSON object?
MichaelJohnston: multimodal
working group has been discussing JSON representations of
EMMA
... there are some issues, such as losing element/attribute
distinction
... straight translation to JSON is a little ugly
Michael: existing proposals include simple representations as alternatives to EMMA
MichaelJohnston: For more nuanced things, let's not reinvent solutions to the problems EMMA already solves
Milan: would rather not have EMMA mean XML, since that implies the app needs a parser
Debbie: sounds like we agree on EMMA, but need to discuss how its represented, simplified formats, etc
Milan: a good idea to agree that an EMMA result available through a DOM object is a baseline agreement
Bjorn: it's okay to provide the EMMA DOM, but we should also have the simple access mechanism that all three proposals have
Burn: would rather have XML or JSON, but not the DOM
Michael: if you have XML, you can feed it into the DOM
Burn: it's a minor objection, if everybody else agrees on the DOM, I'm okay with that
Bjorn: maybe just provide both
MichaelJohnston: EMMA will also help with more sophisticated multimodal apps, for example using ink. The DOM will be more convenient to work with.
Burn: proposed agreement: "both
DOM and XML text representations of EMMA must be
provided"
... haven't necessarily agreed that that is all
Bjorn: we already appear to agree, based on proposals: "recognition results must also be available in the javascript objects where the result is a list of recognition result items containing utterance, confidence and interpretation."
Michael: may need to be tweaked to accommodate continuous recognition
Burn: add "at least" to Bjorn's
proposed requirement
... added a statement "note that this will need to be adjusted
based on any decision regarding support for continuous
recognition"
Milan: would like to add a discussion topic around generic parameters to the recognition engine
Burn: related to existing topic on the list, but will add
Milan: also need to agree on standard parameters, such as speed-vs-accuracy
Burn: will generalize the timeouts discussion to include other parameters
MichaelJohnston: which parameters should be expressed in the javascript API, and what can go in the URI? What sorts of conflicts could occur?
Bjorn: URI parameters are engine specific
MichaelJohnston: for example, if we agreed that the way standard parameters are communicated is via the URI, they could come from the URI, or from the Javascript
Michael: need to discuss the API/protocol to the speech engine, and how standard parameters are conveyed
Bjorn: we need to discuss the protocol, it's not in the list
Burn: will add it to the list
Milan: are the grammars referred to by HTTP URI?
Burn: existing requirement says "uri" which was intended to represent URLs and URNs
Milan: would like to mandate that HTTP was for sure supported. there are lots of others that may work.
Robert: should we have a standard set of built-in grammars/topics?
Bjorn: in the Google proposal we had "builtin:" URIs
Burn: "a standard set of common
tasks/grammars should be supported. details TBD"
... need a discussion topic about what these are
Robert: what about inline grammars?
Bjorn: data URIs would work for that, and perhaps we should agree about that
Charles: would like to see inline grammars remain on the table
Burn: will add a discussion about
inline grammars
... we all agree on the functionality that inline grammars
would give
MichaelJohnston: one target user is "mom & pop developers" who would provide simple grammars
Burn: discussion topic: "what is the mechanism for authors to directly include grammars within their HTML document? Is this inline XML, data URI or something else?"
Robert: use case: given that HTML5 supports local storage, the data from which a grammar is constructed may only be located on the local device
Bjorn: proposes that we mandate data URIs, just for consistency with the rest of HTML
Burn: no objections, so will record as an agreement
Michael: need to discuss the ability to do re-recognition
Burn: related to the topic of recognition from a file
Bjorn: both are fine discussion topics
Burn: [discussion about whether there's anything to discuss around endpointing], already implied in existing discussion topic
Bjorn: context block?
Burn: discussion topic: "do we need a recognition context block capability?" and if we end up deciding yes, we'll discuss the mechanism
Milan: how do we specify a default recognizer?
Bjorn: don't specify it at
all
... since it's the default
Michael: need some canonical
string to specify user agent default, so we could switch back
to it (could be empty string)
... Whereas how we specify a local one may be similar to the
way to specify the remote engine
Bjorn: for local engines do we need to specify the engine or the criteria?
Burn: SSML does it this way
Bjorn: is there a use case for specifying criteria?
Burn: in Tropo API, language
specification can specify a specific engine
... this is a scoping issue. e.g. in SSML a voice is used in
the scope of the enclosing element
... in HTML could say that the scope is the input field, or the
entire form
Bjorn: in all the proposals,
scoping is to a javascript object
... are there any other criteria for local recognizers than
speed-vs-accuracy?
Charles: different microphones will have different profiles
Raj: how do we discover characteristics of installed engines
Michael: selection = discovery?
Burn: in SSML, some people wanted discovery
Bjorn: use cases?
Michael: selection of existing acoustic and language models
Robert: there's a blurry line between what a recognizer is, and what a parameter is
Michael: topic: "how to specify
default recognition"
... topic: "how to specify local recognizers"
... topic: "do we need to specify engines by capability?"
Raj: or "how do we specify the parameters to the local recognizer?"
Burn: want to back up to "what is
a recognizer, and what parameters does it need?"
... call something a recognizer, and call other things related
to that a recognizer
Bjorn: the API probably doesn't need to specify a recognizer. speech and parameters go somewhere and results come back
Burn: what is the boundary between selecting a recognizer and selecting the parameters of a recognizer
Milan: we need to discuss audio streaming
Burn: topic: "do we support audio streaming and how?"
<Milan> Milan: Let's discuss audio streaming
This is scribe.perl Revision: 1.135 of Date: 2009/03/02 03:52:20 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Found Scribe: Robert Brown Found ScribeNick: Robert Default Present: Dan_Burnett, Olli_Pettay, Robert_Brown, Charles_Hemphill, Milan_Young, Debbie_Dahl, +1.818.237.aaaa, Bjorn_Bringert, Michael_Johnston, Raj_Tumuluri, Patrick_Ehlen, Michael_Bodell Present: Dan_Burnett Olli_Pettay Robert_Brown Charles_Hemphill Milan_Young Debbie_Dahl +1.818.237.aaaa Bjorn_Bringert Michael_Johnston Raj_Tumuluri Patrick_Ehlen Michael_Bodell Regrets: Dan Druta Agenda: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Apr/0059.html Found Date: 28 Apr 2011 Guessing minutes URL: http://www.w3.org/2011/04/28-htmlspeech-minutes.html People with action items:[End of scribe.perl diagnostic output]