Bug 21049 - When cancel() is called on SpeechSynthesis, does the end() or error() event get fired
When cancel() is called on SpeechSynthesis, does the end() or error() event g...
Status: RESOLVED FIXED
Product: Speech API
Classification: Unclassified
Component: Speech API
unspecified
PC All
: P2 normal
: ---
Assigned To: Glen Shires
:
: 20985 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-19 17:05 UTC by chris fleizach
Modified: 2013-10-18 00:02 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description chris fleizach 2013-02-19 17:05:46 UTC
I felt that the spec wasn't entirely clear which event is called on a cancel(), because it's not technically an error that speech stopped.
Comment 1 Glen Shires 2013-02-25 19:31:30 UTC
Copying from mailing list. Let's continue this conversation here...

From: Eitan Isaacson <eisaacson@mozilla.com> 
Date: Thu, 07 Feb 2013 10:22:12 +0000
To: "public-speech-api@w3.org" <public-speech-api@w3.org> 
Hello again,

The event behavior of speak(), pause(), and resume() is pretty clear.
But what happens when cancel() is called? onend? onerror? Perhaps there
is room for another event?

Cheers,
  Eitan.
Comment 2 Glen Shires 2013-02-25 19:44:44 UTC
*** Bug 20985 has been marked as a duplicate of this bug. ***
Comment 3 Glen Shires 2013-02-25 19:46:52 UTC
In duplicate Bug 20985 Eitan Isaacson wrote:

Maybe the utterances in the queue that have never started should have an event fire on them too.
Comment 4 Dominic Mazzoni 2013-04-24 22:40:34 UTC
I definitely think we should always fire an event on an utterance that was queued, whether it was completed, had an error, or was canceled.

I'd vote for "error", with an error type of "canceled".

If the utterance was in process when it was canceled, I'd change the error type to "interrupted", so the event listener could distinguish between speech that was never spoken and speech that was partway spoken before being canceled.

Alternatively, we could explicitly have "canceled" and "interrupted" events, rather than reusing "error". No strong preference there.

I think firing an "end" event for speech that doesn't actually complete would be confusing, it'd just create more work for someone implementing a listener.
Comment 5 Glen Shires 2013-09-11 08:33:08 UTC
I propose the following errata, which uses the charIndex attribute to distinguish between "canceled" and "interrupted" utterances.

If there's no disagreement, I'll add this to the errata page on Sept 24.

Section 5.2.2 cancel method: definition should be changed to:
"This method removes all utterances from the queue and fires an error event on each. If an utterance is being spoken, speaking ceases immediately and the SpeechSynthesisEvent charIndex attribute must return the current speaking position if the speech synthesis engine supports it, otherwise it must return undefined. If an utterance has not begun being spoken, the SpeechSynthesisEvent charIndex attribute must return 0. This method does not change the paused state of the global SpeechSynthesis instance."
Comment 6 chris fleizach 2013-09-11 17:45:04 UTC
(In reply to Glen Shires from comment #5)
> I propose the following errata, which uses the charIndex attribute to
> distinguish between "canceled" and "interrupted" utterances.
> 
> If there's no disagreement, I'll add this to the errata page on Sept 24.
> 
> Section 5.2.2 cancel method: definition should be changed to:
> "This method removes all utterances from the queue and fires an error event
> on each. If an utterance is being spoken, speaking ceases immediately and
> the SpeechSynthesisEvent charIndex attribute must return the current
> speaking position if the speech synthesis engine supports it, otherwise it
> must return undefined. If an utterance has not begun being spoken, the
> SpeechSynthesisEvent charIndex attribute must return 0. This method does not
> change the paused state of the global SpeechSynthesis instance."

So is there a way to distinguish on the current speaking event if it was cancelled or an error occurred?
Comment 7 Eitan Isaacson 2013-09-11 17:53:13 UTC
Why not robustify the error events and give them a type field? The cancelled and interrupted could be explicit. Something like SpeechRecognitionError.
Comment 8 Glen Shires 2013-09-11 18:40:05 UTC
Chris: my intent is that charIndex==0 indicates "canceled" and charIndex>0 or charIndex==undefined indicates "interrupted".

Eitan: we could add a "type" attribute to SpeechSynthesisEvent, or we could re-use the "name" attribute in SpeechSynthesisEvent as an "enum".  (Name is already used as an "enum" for boundary events: https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-utteranceonboundary )

In either case, that implies we should define the list of types/enums for all other error events (not just "canceled" or "interrupted").

- Opinions?
- Anyone want to propose such a list? Perhaps with inspiration from SpeechRecognitionError ErrorCode https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-error as well as typical synthesis engine errors
Comment 9 Eitan Isaacson 2013-09-13 17:17:09 UTC
(In reply to Glen Shires from comment #8)
> Chris: my intent is that charIndex==0 indicates "canceled" and charIndex>0
> or charIndex==undefined indicates "interrupted".
> 
> Eitan: we could add a "type" attribute to SpeechSynthesisEvent, or we could
> re-use the "name" attribute in SpeechSynthesisEvent as an "enum".  (Name is
> already used as an "enum" for boundary events:
> https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-
> utteranceonboundary )
> 
> In either case, that implies we should define the list of types/enums for
> all other error events (not just "canceled" or "interrupted").
> 
> - Opinions?
> - Anyone want to propose such a list? Perhaps with inspiration from
> SpeechRecognitionError ErrorCode
> https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-error as
> well as typical synthesis engine errors

It isn't really an enum for boundary events, just loosely defined as 'word' or 'sentence'. Whether to formalize it as an enum or not, I'll leave to others. Also, many errors could be thrown as exceptions. The only ones that can't are the kind that would be encountered during the actual synth.

Here is a partial list of possible synth errors I could think of, some might be simple exceptions instead:

network-error: For cloud services, there is a network error that does not allow the synth to start or complete.

language-not-supported: No appropriate voice was found for given language. Implementations should try hard to find a match, even outside of the given locale, but if there is no alternative (for example, no latin language could synthesize Arabic), fail with this error. This could be done as an exception as well.

voice-not-available: The given voice is unavailable. Since Errata 7 (bug #20529), we provide an actual voice object so there is less risk for erroneous URIs. Nonetheless, a voice could become unavailable.

busy: The synthesis engine or audio device is busy. For example, a platform service. Kind of related to bug #21110. We may want to just wait as opposed to emitting an error. Not sure.

interrupted: The utterance was interrupted mid-speech.

canceled: The utterance was removed from the queue before it started.
Comment 10 Dominic Mazzoni 2013-09-16 08:55:15 UTC
I don't think we want to use exceptions anywhere unless possibly if there's a blatant syntax error, like passing the wrong type of object to an API. TTS is quite likely to be implemented asynchronously, so I think everything else should be sent via the event.

Rather than add a type to SpeechSynthesisEvent that wouldn't make sense for all events, I propose SpeechSynthesisError extends SpeechSynthesisEvent, and it includes an additional field, error, which is chosen from an enum.

Eitan's list sounds good:

* network error
* language not supported
* voice not available
* busy
* interrupted
* canceled

Other possible errors:
* Audio device busy (separate from "speech synthesizer busy")
* Utterance too long (perhaps the engine disallows more than 32k, for example)

These could be errors, but they could also just be exceptions because they could be checked immediately:
* Invalid rate, pitch, or volume
* Utterance already belongs to another synthesizer
Comment 11 Glen Shires 2013-09-24 17:58:17 UTC
Here's a proposal for adding an enum for errors.

I'm interested in getting your opinions on:
a) Do we need a "message" attribute? (I think not, but for discussion, I included it here for consistency with SpeechRecognitionError.)
b) Do we need a "busy" error? This is related to bug #21110 , If so, should we split out "audio-busy" and "synthesizer-busy"?
c) Should the word "operation" be replaced with "speak method", or is it more general?
d) I renamed "language-not-supported" to "language-not-available" because the current state may change if additional voices become available.
However, perhaps we should rename these "language-not-supported" and "voice-not-supported" to be more consistent with SpeechRecognitionError.
e) Is "invalid-parameter" the right word?  (I don't think this can be an exception since it may not be known for network synthesizers.)
f) Note that "Utterance already belongs to another synthesizer" can be checked immediately, and thus can and should be an exception, see errata E10.


    interface SpeechSynthesisErrorEvent extends SpeechSynthesisEvent {
        enum ErrorCode {
          "canceled"
          "interrupted",
          "busy",
          "network",
          "language-not-available",
          "voice-not-available",
          "text-too-long",
          "invalid-parameter",
        };

        readonly attribute ErrorCode error;
        readonly attribute DOMString message;
    };


5.2.5.1 SpeechSynthesisErrorEvent Attributes

error attribute
The errorCode is an enumeration indicating what has gone wrong. The values are:
  "canceled"
      A cancel method call caused the SpeechSynthesisUtterance to be removed from the queue before it had begun being spoken.
  "interrupted"
      A cancel method call caused the SpeechSynthesisUtterance to be interrupted after it has begun being spoken and before it completed.
  "busy"
      The operation cannot be completed at this time because the synthesis engine or audio device is busy.
  "network"
      Some network communication that was required to complete the operation failed.
  "language-not-available"
      No appropriate voice is available for the language designated in SpeechSynthesisUtterance lang.
  "voice-not-available"
      The voice designated in SpeechSynthesisUtterance voiceURI is not available.
  "text-too-long"
      The contents of the SpeechSynthesisUtterance text attribute is too long to synthesize.
  "invalid-parameter"
      The contents of the SpeechSynthesisUtterance rate, pitch or volume attribute is not supported by synthesizer.

message attribute
The message content is implementation specific. This attribute is primarily intended for debugging and developers should not use it directly in their application user interface.
Comment 12 Dominic Mazzoni 2013-09-30 19:51:19 UTC
My opinion: "end" always means it was actually spoken, and "error" means it wasn't spoken. My thinking is that most apps care more about what happened (was it spoken or not) and not why.
Comment 13 Dominic Mazzoni 2013-09-30 19:57:27 UTC
No strong opinion on any of these, but here are my thoughts.

a) Do we need a "message" attribute? (I think not, but for discussion, I included it here for consistency with SpeechRecognitionError.)

For simplicity my vote is no, but not a strong opinion.

b) Do we need a "busy" error? This is related to bug #21110 , If so, should we split out "audio-busy" and "synthesizer-busy"?

Sure, let's have both.

c) Should the word "operation" be replaced with "speak method", or is it more general?

I like "operation".

d) I renamed "language-not-supported" to "language-not-available" because the current state may change if additional voices become available.
However, perhaps we should rename these "language-not-supported" and "voice-not-supported" to be more consistent with SpeechRecognitionError.

language-not-available is fine.

There's a difference. It's possible to specify a SpeechSynthesisUtterance with a language but no voice, in which case the error might be language-not-available. If you specify a voice, then either the voice is available or it's not - so voice-not-available makes sense there.

e) Is "invalid-parameter" the right word?  (I don't think this can be an exception since it may not be known for network synthesizers.)

I think "invalid argument" is more correct. A parameter is part of the function definition, an argument is what you pass to a function. This isn't true in all languages, but that's how those terms are used in JavaScript.

f) Note that "Utterance already belongs to another synthesizer" can be checked immediately, and thus can and should be an exception, see errata E10.

Agreed.
Comment 14 Glen Shires 2013-10-01 00:28:40 UTC
Based on the above discussion, I propose the following errata.
If there's no disagreement, I'll add this to the errata page on Oct 15.


Section 5.2 IDL: Add the following:
    interface SpeechSynthesisErrorEvent extends SpeechSynthesisEvent {
        enum ErrorCode {
          "canceled"
          "interrupted",
          "audio",
          "synthesis",
          "network",
          "language-not-available",
          "voice-not-available",
          "text-too-long",
          "invalid-argument",
        };

        readonly attribute ErrorCode error;
    };


Section 5.2.4 SpeechSynthesisUtterance Events: change first sentence to:
"Each of these events MUST use the SpeechSynthesisEvent interface, except the error event which MUST use the SpeechSynthesisErrorEvent interface."


New "Section 5.2.5.1 SpeechSynthesisErrorEvent Attributes" is added and contains:
error attribute
The errorCode is an enumeration indicating what has gone wrong. The values are:
  "canceled"
      A cancel method call caused the SpeechSynthesisUtterance to be removed from the queue before it had begun being spoken.
  "interrupted"
      A cancel method call caused the SpeechSynthesisUtterance to be interrupted after it has begun being spoken and before it completed.
  "audio"
      The operation cannot be completed at this time because the audio device is busy or unavailable.
  "synthesis"
      The operation cannot be completed at this time because the synthesis engine is busy or unavailable.
  "network"
      The operation cannot be completed at this time because some required network communication failed.
  "language-not-available"
      No appropriate voice is available for the language designated in SpeechSynthesisUtterance lang.
  "voice-not-available"
      The voice designated in SpeechSynthesisUtterance voiceURI is not available.
  "text-too-long"
      The contents of the SpeechSynthesisUtterance text attribute is too long to synthesize.
  "invalid-argument"
      The contents of the SpeechSynthesisUtterance rate, pitch or volume attribute is not supported by synthesizer.
Comment 15 chris fleizach 2013-10-01 00:59:19 UTC
(In reply to Glen Shires from comment #14)
> Based on the above discussion, I propose the following errata.
> If there's no disagreement, I'll add this to the errata page on Oct 15.
> 
> 
> Section 5.2 IDL: Add the following:
>     interface SpeechSynthesisErrorEvent extends SpeechSynthesisEvent {
>         enum ErrorCode {
>           "canceled"
>           "interrupted",
>           "audio",
>           "synthesis",
 synthesis and audio are not very descriptive out of the box. can we be more specific?

>           "network",
>           "language-not-available",
>           "voice-not-available",
>           "text-too-long",
>           "invalid-argument",
>         };
> 
>         readonly attribute ErrorCode error;
>     };
> 

Recent specs I've seen use camel case for string literals. Not sure if you want to take that approach or not

Thanks

> 
> Section 5.2.4 SpeechSynthesisUtterance Events: change first sentence to:
> "Each of these events MUST use the SpeechSynthesisEvent interface, except
> the error event which MUST use the SpeechSynthesisErrorEvent interface."
> 
> 
> New "Section 5.2.5.1 SpeechSynthesisErrorEvent Attributes" is added and
> contains:
> error attribute
> The errorCode is an enumeration indicating what has gone wrong. The values
> are:
>   "canceled"
>       A cancel method call caused the SpeechSynthesisUtterance to be removed
> from the queue before it had begun being spoken.
>   "interrupted"
>       A cancel method call caused the SpeechSynthesisUtterance to be
> interrupted after it has begun being spoken and before it completed.
>   "audio"
>       The operation cannot be completed at this time because the audio
> device is busy or unavailable.
>   "synthesis"
>       The operation cannot be completed at this time because the synthesis
> engine is busy or unavailable.
>   "network"
>       The operation cannot be completed at this time because some required
> network communication failed.
>   "language-not-available"
>       No appropriate voice is available for the language designated in
> SpeechSynthesisUtterance lang.
>   "voice-not-available"
>       The voice designated in SpeechSynthesisUtterance voiceURI is not
> available.
>   "text-too-long"
>       The contents of the SpeechSynthesisUtterance text attribute is too
> long to synthesize.
>   "invalid-argument"
>       The contents of the SpeechSynthesisUtterance rate, pitch or volume
> attribute is not supported by synthesizer.
Comment 16 Glen Shires 2013-10-01 02:29:42 UTC
I prefer hyphens (rather than camelCase) for consistency with SpeechRecognitionError.
https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-section


Since this attribute is "error", I interpret it as "audio" error and "synthesis" error, and their descriptions are below. Admittedly we could be more specific, particularly if the JavaScript code could take a more appropriate action based on the various ErrorCodes.  What would you suggest?
Comment 17 chris fleizach 2013-10-01 18:26:59 UTC
(In reply to Glen Shires from comment #16)
> I prefer hyphens (rather than camelCase) for consistency with
> SpeechRecognitionError.
> https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-
> section
> 
> 
> Since this attribute is "error", I interpret it as "audio" error and
> "synthesis" error, and their descriptions are below. Admittedly we could be
> more specific, particularly if the JavaScript code could take a more
> appropriate action based on the various ErrorCodes.  What would you suggest?

some options:
audio-hardware-busy
synthesizer-busy
synthesizer-unavailable
synthesis-generation

?
Comment 18 Glen Shires 2013-10-03 01:02:17 UTC
Chris,
If your concern is to have more descriptive errors to help the developer debug, then I suggest we add a "readonly attribute DOMString message".

If your concern is that we need finer-grained messages so that the page can better instruct the user how to resolve the issue, then I propose replacing the "audio"  and "synthesis" with the four following ErrorCodes, because the action to be taken for each is distinct.

  "audio-busy"
      The operation cannot be completed at this time because the user-agent cannot access the audio output device. (For example, the user may need to correct this by closing another application.)

  "audio-hardware"
      The operation cannot be completed at this time because the user-agent cannot identify an available audio output device. (For example, the user may need to connect a speaker or configure system settings.)

  "synthesis-unavailable"
      The operation cannot be completed at this time because no synthesis engine is available. (For example, the user may need to install or configure a synthesis engine.)

  "synthesis-failed"
      The operation failed because synthesis engine had an error.

I also propose renaming:
  "language-not-available" to "language-unavailable"
  "voice-not-available" to "voice-unavailable"
Comment 19 chris fleizach 2013-10-03 01:07:58 UTC
(In reply to Glen Shires from comment #18)
> Chris,
> If your concern is to have more descriptive errors to help the developer
> debug, then I suggest we add a "readonly attribute DOMString message".
> 
> If your concern is that we need finer-grained messages so that the page can
> better instruct the user how to resolve the issue, then I propose replacing
> the "audio"  and "synthesis" with the four following ErrorCodes, because the
> action to be taken for each is distinct.
> 
>   "audio-busy"
>       The operation cannot be completed at this time because the user-agent
> cannot access the audio output device. (For example, the user may need to
> correct this by closing another application.)
> 
>   "audio-hardware"
>       The operation cannot be completed at this time because the user-agent
> cannot identify an available audio output device. (For example, the user may
> need to connect a speaker or configure system settings.)
> 
>   "synthesis-unavailable"
>       The operation cannot be completed at this time because no synthesis
> engine is available. (For example, the user may need to install or configure
> a synthesis engine.)
> 
>   "synthesis-failed"
>       The operation failed because synthesis engine had an error.
> 
> I also propose renaming:
>   "language-not-available" to "language-unavailable"
>   "voice-not-available" to "voice-unavailable"

The second proposal looks good to me (more descriptive error code names)
Thanks
Comment 20 Glen Shires 2013-10-18 00:02:45 UTC
I've updated the errata (E12) with the above change:
https://dvcs.w3.org/hg/speech-api/rev/0b8ae424b59d

As always, the current errata is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi-errata.html