VoiceXML 2.0: Official Response #9 to Candidate Recommendation Issues from McGlashan, Scott on 2003-11-19 (www-voice@w3.org from October to December 2003)

From: McGlashan, Scott <scott.mcglashan@hp.com>
Date: Wed, 19 Nov 2003 20:33:27 +0100
To: <kayseri@phonoclick.com>
Cc: <www-voice@w3.org>
Message-ID: <77DB1374F763FB489900AA600DA37AE103563590@frqexc01.emea.cpqcorp.net>
The Voice Browser Working Group (VBWG) is now completing its resolution
of 
issues raised during the review of the Candidate Recommendation version
of VoiceXML 2.0 [1]. Our apologies that it has taken so long to respond.

Following the process described in [2] for advancement to Proposed
Recommendation, this is the VBWG's formal response to the issues you
raised.

Please indicate before 26 November 2003 whether you are satisfied with
the VBWG's resolutions, whether you think there has been a
misunderstanding, or whether you wish to register an objection.

If you do not think you can respond before 26 November, please let me
know. The Director will appreciate a response whether you agree with the
resolutions or not. However, if we do not hear from you at all by 26
November 2003, we will assume that you accept our resolutions.

Below you will find a summary of the VBWG's responses to each of your
issues. Please use the issue identifiers when responding.

Thank you,

Scott McGlashan
Co-chair, Voice Browser Working Group

[1] http://www.w3.org/TR/2003/CR-voicexml20-20030220/ 
[2] http://www.w3.org/2003/06/Process-20030618/ 


-----------------------------------------------
Issues you raised and VBWG responses
-----------------------------------------------

Issues: CR15-1
http://lists.w3.org/Archives/Public/www-voice/2003AprJun/0030.html

Issue CR15-1
We are trying to implement the <record> tag in our Voice Browser in a
comformant way; however, we cannot understand what, clearly, are the
requirements from a browser for this tag. My points can be summed up as
follows:

I) The main confusion arises form the behaviour of bargein="true"
prompts in <record>. According to Fig 7 in section 2.3.6 (lower left
corner) bargein controls apply to audio queued within <record>. On the
other hand, a few lines below, it is stated:

"A /recording begins/ at the earliest after the playback of any prompts
(including the 'beep' tone if defined). As an optimization, a platform
may begin recording when the user starts speaking."

Now, if recording does not begin DURING the prompt playback, then how
can those prompts be barged-in? Or, should we understand that if the
user barges-in with voice during prompt playback THEN recording should
be started? A clarification of how <record> and audio queued within
<record> with barge-in interacts, in our opinion, is badly needed.

II) The second comment that baffles us in the spec is:

"If no audio is collected during execution of <record>, then the record
variable remains unfilled (note
<http://www.w3.org/TR/voicexml20/#unfilled_record>). This can occur, for
example, when DTMF or speech input is received during prompt playback or
the timeout interval (if the developer wants input during prompt
playback to initiate recording, then prompts should be placed in an
immediately preceding <field> with a zero timeout)." (Section 2.3.6)

This comment is weird in two ways:

  1) How can record variable be unfilled "when DTMF or speech input is
received during ... the timeout interval"? This seems to be the primary
method of filling a record variable.

  2) We cannot grasp, in any way, how it would be possible to achieve
what the spec author has stated within the parantheses. If there is
preceeding <field> with zero timeout then:
    i) if the user starts speaking while the prompts in the <field> are
playing then the input goes to the processing of the field and will be
matched to whatever grammar is specified for it, or will throw a
"nomatch",
    ii) else if the user waits for the prompts to finish, then a
"noinput" event will be thrown.
  In neither case, will the input be going into the <record> tag that
succeeds the <field> tag. If the spec is trying to say something else
then it should be clearly explained.


CR15-1 Resolution: rejected with modifications 

I). Prompts can be barged in on if active DTMF grammars are defined
(active speech grammars too but the ability to combine recognition and
recording may be removed from the specification due to a lack of
implementation support). II.1) DTMF input with recording triggered by
voice activity detection (i.e. as platform optimization, instead of
recording starting immediately after prompt playback, recording only
begins when voice activity is detected). II.2) We agree this is
confusing (it was intended to cover another use case). So we will remove
the text in parenthesis "(if the developer wants input during prompt
playback to initiate recording, then prompts should be placed in an
immediately preceding <field> with a zero timeout) "
Received on Wednesday, 19 November 2003 14:33:34 UTC