HTML Speech Incubator Group Teleconference -- 20 Oct 2011

<burn> trackbot, start telcon

<trackbot> Date: 20 October 2011

<burn> Scribe: Patrick_Ehlen

<burn> ScribeNick: ehlen

reco element

<burn> Glen's proposal that we're discussing: http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0000.html

glen: reco element always visible; opacity not possible to avoid clickjacking
... should we allow dynamically hiding/showing reco element

michael: user agents can decide what permissions models they use, and grant permissions according to UA policy

charles: also important to consier handsfree cases; can't rely on touch for permissions

satish: reco should automatically activate for ppl who can't touch element ??
... there are other ways to "click" reco

michael: UA could use some of these techniques to enable permissions

satish: how exactly would this be implemented?

michael: implement a UI idiom from the browser the user can't control that would notify the user

binding tag for input field

scribe: "speech IME": User agent that can speech-enable any input field

charles: field-specific reco is better for accuracy

michael: allowing developer to bind grammar to a specific field; increases complexity
... if developer is sophisticated to do this from an API, make a declarative element makes it more complex

glen: disagree; gives a lot more flexibility and control to both developer and user

charles: a lot of web developers only work w/ HTML
... not everyone can do thins in javascript, so a declarative ability is advantageous

glen: keep simple things simple. if we can do something simple w/ reco tag but not UA, then there's a good reason for a reco tag

<smaug> if someone says he "knows HTML but not JS", he probably doesn't know HTML either

satish: how to assoc. an element w/ an input type

glen: isn't it easier to have an automatic binding people can use?

satish: not clear how it would work

michael: need to work through list of things that are reco-able elements

charles: example on website of multiple input fields each bound to a separate grammar

michael will create specific examples of how binding works for different elements

Can extract grammar information from input fields; have a method that allows you to extract grammar from an input field?

<glen> SpeechInputRequest.addGrammarFrom(DomInputElement)

<glen> Retrieves grammar from <input> tag and adds to request.

michael: would UA be responsible for communicating constraints or would it be responsible for generating and sending the grammar itself?

glen: should be reco service that converts into grammar
... this would be a way to extract input field specification and sent to speech engine in scriptable manner

burn: Would it be possible then to change these constraints dynamically?
... how would it work?
... what happens if you do it 2x in a row? would grammar sent before get replaced by newer one?

michael: should have a way to control the grammar; but how to dynamically remove and change them?

burn: rename method above to "includeGrammarFrom()" ?
... would allow you not to "add" but rather to take a snapshot

glen: there are other methods that cover these kinds of actions

<glen> SpeechInputRequest.addGrammarFrom(DomInputElement, weight, modal)

glen: makes sense to add weight and modal flags as well
... would expect api developer to be able to enable & disable grammar

<glen> SpeechInputRequest.outputToElement(DomElement)

<glen> Valid DomElements are <input> and <textarea>

<glen> UA will automatically fill DomElement with results. This allows the UA to display continuous streaming of results, and properly handle text insertion point.

<glen> Only one DomElement may be active at a time.

<smaug> request.onmatch = function(e) { domElement.value = e.result; }

One DOM element active at a time, since you can't stream to 2 different elements

scribe: sort of like binding to an element

Olli: handling of output depends on element type; how would that work?

glen: UA would implement the tricky things, like where to output text, etc.

<mbodell> For request.onmatch you don't want to just do domElement.value = e.result as it over writes the content in the continuous case

olli: all that needs to be defined in spec

glen: for insertion point, handle in a way similar to typing text

olli: would need to define so many different cases.

charles: another thing: UA ought to be able to use focus to enable and disable grammars assoc. with input

glen: should at least work at trying to specify it, perhaps at f2f

burn: after tech discussions, there will still be a lot of work on doc, so perhaps doing this at f2f is not realistic
... even if we can't fully specify it, that isn't a fail; it shows some thought in that direction

satish: perhaps choose somehting simpler to start with

glen: perhaps restrict to just text, date, etc., rather than covering other types?

michael: can start there and see where we end up

Method to attribute conversion

<mbodell> see mail http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0037.html

michael: converting input result params into attributes, but grammar and custom params are more complicated
... want some array of simple structures?

robert: arrays tend to be default way of doing things in JS

michael: array of structure of something like the speech grammar?

Charles: JS can also utilize objects for structures

Robert: array isn't strongly typed

satish: don't need helpers for grammars and speech parameters?
... why have all that when you can do it with one attribute

michael: so leaning toward having these structures but not the methods discussed above?

casing

michael: people expect all caps for objects and interfaces

michael will make those changes for next week

grammar URIs with filters on them

Robert: it's cool if it works
... providing parameters to finite grammars is fine
... skeptical of free dictation filter, or whether could be implemented efficiently and therefore won't be used

<mbodell> builtin:input?type=text&pattern=%5B0-9%5D%5BA-Z%5D%7B3%7D

Robert: specifying pattern on input field
... how to merge n-gram w/ pattern?

<mbodell> (which is really builtin:input?type=text&pattern=[0-9][A-Z]{3} )

Robert: easy to specify, but prob. hard to ipmlement
... will foul up probabilities in ngram model

michael: such a pattern could be translated into CFG on server
... but this isn't necessarily merged w/ freee text model

Robert: but pattern doesn't necessarily represent how people will speak it
... "three four a" vs. "thirty-four a", etc.
... or "three boo"

Milan: so it's up to speech service to be good at handling that
... this is a new way of specifying grammars, that many existing speech services don't do now

Robert: regex doesn't include any kind of normalization

Michael: real question: is it legal for speech engine to ignore such hints (or patterns) and return something that has nothing to do w/ it? (would hope so)

Robert: Looks great on paper, but won't be implemented

Milan: Nothing stopping speech providers to offer this but reluctant to standardize at this point

glen: HTML already has a lot of this stuff

Milan: builtins should not be hints; they should recognize what's specified or not

Robert: cool idea; but there is work missing here that should cause reluctance on including in spec

<burn> s/glen: this is a new/Milan: this is a new/

Robert: how would you autmote building a CFG off this pattern?

Milan: what about adopting two types: hints and grammars

michael: is it legal for speech engine to return something that doesn't fit the parameter
... for a regex, if the engine returns a result that does not fit the pattern, what should happen?
... provide some user-facing interface for correction?
... nothing wrong w/ a hint that is ignored

Milan: having things that need to be followed exactly, and then just hints

michael: was thinking everything is a hint

Milan: What if you just want a date and don't want to specify a grammar for it?

glen: if speech engine isn't up to the task, that's an issue w/ service
... most engines should be smart enough to know what a date is. but do you say don't use speech if your engine can't do that?

Milan: no, give error back

glen: date is a special case

Milan: but there are lots of those (bool, etc)

<mbodell> http://www.w3.org/TR/html5/the-input-element.html

glen: should we bind to every type of input element there is? automatic binding is questionable

Robert: if you need to click a mic to do it, what's the point of speech?

Charles: or handsfree cases

Milan: Developer has very complex UI. rather than re-write from scratch, it references a library

glen: take checkboxes. grammar would not be a binary, but the term bound to the box (e.g., "non-stop")

<mbodell> For date, look at http://www.w3.org/TR/html5/states-of-the-type-attribute.html#date-state

<mbodell> it lists that: If the element is mutable, the user agent should allow the user to change the date represented by its value, as obtained by parsing a date from it. User agents must not allow the user to set the value to a non-empty string that is not a valid date string. If the user agent provides a user interface for selecting a date, then the value must be set to a valid date string representing the user's selection. User agents should allow the user to

johnston: need to keep assistive use cases in mind

Charles: can use UI to highlight these things

<mbodell> so for input=date we should have the same ruling where it can't set it to values that are not valid date strings IMO

Charles: should be careful ruling things out in general
... what is purpose of tag name here?

glen: if we decide to allow only a single type of input, then you don't need tag name. but here distinguishing what elemnt you're associating with

Charles: wouldn't it be redundant to put tag name here?

glen: no, complementary to binding. can use builting grammars w/ out any binding at all
... but if the reco tag is bound to an element, then you'd create those default grammars automatically & assoc using the tag

Charles: how to know which builtin goes w/ which element?

glen: for multiple input fields and only one reco element, then you need to specify some grammars yourself

Charles: thought builtin would specify language-specific things, and binding would occur separately

Robert: couple questions on protocol draft

Should start w/ those next wee

week

<ddahl> invite zakim #htmlspeech

<ddahl> s/+1.212.237.aaaa,//

<ddahl> invite zakim #htmlspeech

- DRAFT -

HTML Speech Incubator Group Teleconference

20 Oct 2011

Attendees

Contents

reco element

Can extract grammar information from input fields; have a method that allows you to extract grammar from an input field?

Method to attribute conversion

casing

grammar URIs with filters on them

Summary of Action Items

Scribe.perl diagnostic output