This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22003 - getVoices should be asynchronous
Summary: getVoices should be asynchronous
Status: RESOLVED FIXED
Alias: None
Product: Speech API
Classification: Unclassified
Component: Speech API (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Glen Shires
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-10 22:52 UTC by Dominic Mazzoni
Modified: 2013-10-18 00:01 UTC (History)
2 users (show)

See Also:


Attachments

Description Dominic Mazzoni 2013-05-10 22:52:10 UTC
The getVoices method should be asynchronous.

As it is right now, user agents have three choices, all bad:

1. They can pre-load speech synthesis, which slows down application startup even though most web pages don't speak.
2. They can block until the list of voices is available, which can make the browser appear sluggish and potentially freeze script execution on that page for tens of milliseconds.
3. They can return an empty list and update it as soon as the complete list of voices is available.

I propose that getVoices take a callback as an argument, so you'd use it like this:

window.speechSynthesis.getVoices(function(voices) {
  for (var i = 0; i < voices.length; i++)
    ...
});

If the callback is unspecified, we could allow it to be used synchronously - and we should recommend browsers adopt either #2 or #3.

Thoughts?
Comment 1 Eitan Isaacson 2013-05-21 16:04:16 UTC
(In reply to comment #0)
> The getVoices method should be asynchronous.
> 
> As it is right now, user agents have three choices, all bad:
> 
> 1. They can pre-load speech synthesis, which slows down application startup
> even though most web pages don't speak.
> 2. They can block until the list of voices is available, which can make the
> browser appear sluggish and potentially freeze script execution on that page
> for tens of milliseconds.
> 3. They can return an empty list and update it as soon as the complete list
> of voices is available.
> 
> I propose that getVoices take a callback as an argument, so you'd use it
> like this:
> 
> window.speechSynthesis.getVoices(function(voices) {
>   for (var i = 0; i < voices.length; i++)
>     ...
> });
> 
> If the callback is unspecified, we could allow it to be used synchronously -
> and we should recommend browsers adopt either #2 or #3.
> 
> Thoughts?

I agree :)
Comment 2 Glen Shires 2013-05-28 13:27:14 UTC
I agree and propose that the callback MUST be supplied.

All of the options 1,2,3 are bad and would require extra handling, so they are only complicating the requirements for both the user agent and the JavaScript developer. The callback provides a practical solution that's easy to implement in the user agent and JavaScript, and supports local and remote speech synthesizers.
Comment 3 Dominic Mazzoni 2013-08-30 16:31:59 UTC
It sounds like we have consensus, could we resolve this and update the spec?
Comment 4 Eitan Isaacson 2013-09-02 12:12:09 UTC
Does this change look right?

+callback GetVoicesCallback = void (sequence<SpeechSynthesisVoice> voices);

interface SpeechSynthesis {
  ...
-  sequence<SpeechSynthesisVoice> getVoices();
+  void getVoices(GetVoicesCallback voicesCallback);
}
Comment 5 Olli Pettay 2013-09-02 14:01:04 UTC
Should we use Promises here?
Comment 6 Olli Pettay 2013-09-02 19:57:07 UTC
Though, I think callback is simpler, and Promises shouldn't be used just
because they are hip.

But callback and Promise are both ok to me.
Comment 7 Dominic Mazzoni 2013-09-03 23:22:20 UTC
The feedback I'm hearing from others working on Blink is that we should definitely use a Promise, they're discouraging any new APIs that use callback functions. The only potential downside is if this would delay implementation a lot since Promises aren't really launched anywhere yet.

As one alternative, what about having a "voices changed" event listener? Apps might want to use that to present UI indicating what voices are available. This would help both with the initial load, and in situations where voices are dynamically loaded and unloaded.

What do you think of:

interface SpeechSynthesis {
  attribute EventHandler onvoiceschanged;
}

...then leave getVoices() alone?

Do we want "onvoiceschanged", or Promises, or both?
Comment 8 Olli Pettay 2013-09-04 11:01:41 UTC
Definitely not both.

It would be really nice if SpeechSynthesis was an EventTarget and we could just
dispatch voiceschanged to it, but for some odd reasons it is 
SpeechSynthesisUtterance which is EventTarget.
(All the events could just be dispatched on SpeechSynthesis).

But perhaps for consistency we should indeed make SpeechSynthesis
an EventTarget and dispatch voiceschanged on it. But how would it work initially?
When would the event be dispatched first time?
Comment 9 Dominic Mazzoni 2013-09-04 13:56:57 UTC
Why not both?

What I was thinking was that any use of window.speechSynthesis would trigger initialization, and onvoiceschanged would fire when the initial voices are loaded.

However, that complicates app development. Some apps would do it wrong and fail to run on some browsers because they have an initialization race condition.

The most foolproof solution would be for getVoices to return a Promise (or use a callback) AND to have an onvoiceschanged notification for when they change again.
Comment 10 Olli Pettay 2013-09-04 14:19:34 UTC
(In reply to comment #9)
> Why not both?
Because it would be an odd API. Requiring one to use two different paradigms for callbacks.
Comment 11 Dominic Mazzoni 2013-09-04 16:00:31 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Why not both?
> Because it would be an odd API. Requiring one to use two different paradigms
> for callbacks.

Well, "onvoiceschanged" really is needed in case the voices change.

What if getVoices returns null or an error/exception if the system isn't initialized yet? That way authors can clearly distinguish between "voices haven't been loaded yet" and "there aren't any voices" - and we don't need a Promise.
Comment 12 Dominic Mazzoni 2013-09-16 08:58:16 UTC
I think I like that last idea best. I propose the following changes:

getVoices returns null if the voices have not yet been loaded. If it returns the empty list, it means that initialization has completed and no voices are available.

SpeechSynthesis has a new attribute:

attribute EventHandler onvoiceschanged;

This event is called when the list of voices returned by getVoices() has been updated. It doesn't include any additional fields in its Event.
Comment 13 Eitan Isaacson 2013-09-16 17:42:19 UTC
(In reply to Dominic Mazzoni from comment #12)
> I think I like that last idea best. I propose the following changes:
> 
> getVoices returns null if the voices have not yet been loaded. If it returns
> the empty list, it means that initialization has completed and no voices are
> available.
> 
> SpeechSynthesis has a new attribute:
> 
> attribute EventHandler onvoiceschanged;
> 
> This event is called when the list of voices returned by getVoices() has
> been updated. It doesn't include any additional fields in its Event.

What i liked about your original callback/Promises proposal is that from an implementation point of view, nothing needs to happen on initialization. And voices could be retrieved lazily. My assumption is that getting a list of voices will almost always hit the disk or network.

Also, when would you see onvoiceschanged happening besides startup? I imagine that if you have a remote service, that has a change in its voices inventory, you wouldn't know about it outside of polling, which is similar to what you would do with the current API. Also, at least in Gecko's current implementation, a service could unregister voices. But I see these as edge cases, where the general expectation would be that the set of voices doesn't change. In the extreme circumstance that it does, we have the proposed "voice not available" error in bug #21049.

I understand the schedule constraints, and not wanting to block on Promises, but it seems less than ideal to me. On the other hand, I don't feel extremely strong about this, and if I am the only frowny face, I'll happily go with the proposal above.
Comment 14 Dominic Mazzoni 2013-09-16 19:31:27 UTC
> Also, when would you see onvoiceschanged happening besides startup?

Chrome has the ability to install extensions that provide TTS voices. We have several available now. It's easy to install one dynamically or disable an existing one.

It's also possible to download and install a new system voice on Windows, Mac OS X, or Android. It's particularly easy on Android, for example - but I believe that none of them require a reboot. I agree this would be rare, but it'd be nice if it could work.

Finally, network speech could be automatically disabled when the device goes offline, allowing an app to switch to its preferred local voice.

So given all of these things, it really doesn't seem like a corner case to me, it seems like something we should account for in the design.

Chris, what do you think?
Comment 15 Dominic Mazzoni 2013-09-23 16:14:18 UTC
The web speech synthesis api just launched as part of iOS 7.

Given that it implements getVoices() as originally specified, I don't think it makes sense to consider any changes to the spec that would break compatibility.

One option would be to have getVoices return null if the voices haven't loaded yet, as I originally proposed. That wouldn't break compatibility at all, and I think that's still one option, but it's a bit uncommon for a Web IDL to have a return value that can be null.

Here's an even simpler idea: let's just add another boolean attribute to SpeechSynthesis indicating whether the system is initialized yet. If it's not initialized, getVoices may not be the final list.

I formally propose the following changes:

    interface SpeechSynthesis {
      readonly attribute boolean pending;
      readonly attribute boolean speaking;
      readonly attribute boolean paused;
+     readonly attribute boolean initialized;

      void speak(SpeechSynthesisUtterance utterance);
      void cancel();
      void pause();
      void resume();
      SpeechSynthesisVoiceList getVoices();

+     attribute EventHandler onvoiceschanged;
    };

...

+ initialized: This attribute is true if the speech synthesis system is
+ initialized. It returns false if it is still loading the initial set
+ of voices. For backwards compatibility, if speechSynthesis.initialized
+ is undefined, clients should assume that initialization is complete.
+ Clients use onvoiceschanged to be notified when initialization has
+ completed.

+ onvoiceschanged: This event is fired when the set of voices has changed.
+ If voices are loaded asynchronously, this event will be fired when the voices
+ first load. It may be fired again if additional voices are loaded or unloaded.

  getVoices method
- This method returns the available voices.
- It is user agent dependent which voices are available.
+ This method returns the available voices. It is user agent dependent which
+ voices are available. If speechSynthesis.initialized is false, the list of
+ voices may be empty or incomplete. Clients can use speechSynthesis.onvoiceschanged
+ to be notified when voices are loaded or when the list changes.
Comment 16 Olli Pettay 2013-09-23 16:39:19 UTC
(In reply to Dominic Mazzoni from comment #15)
> The web speech synthesis api just launched as part of iOS 7.
> 
> Given that it implements getVoices() as originally specified, I don't think
> it makes sense to consider any changes to the spec that would break
> compatibility.

We don't have any stable spec, so we should allow changes to the API.


> I formally propose the following changes:
> 
>     interface SpeechSynthesis {
>       readonly attribute boolean pending;
>       readonly attribute boolean speaking;
>       readonly attribute boolean paused;
> +     readonly attribute boolean initialized;
> 
>       void speak(SpeechSynthesisUtterance utterance);
>       void cancel();
>       void pause();
>       void resume();
>       SpeechSynthesisVoiceList getVoices();
> 
> +     attribute EventHandler onvoiceschanged;
>     };
This doesn't quite work since SpeechSynthesis isn't an EventTarget.
So SpeechSynthesis should inherit EventTarget.



> For backwards compatibility, if speechSynthesis.initialized
> + is undefined, clients should assume that initialization is complete.
That would be really odd.
if (speechSynthesis.initialized) might mean ss is initialized or it is not depending on
whether initialized is implemented.

Could we live without .initialized and rely on getVoices to return non-empty list in case SS has been initialized?
Returning null from getVoices() would break backwards compatibility, but empty list wouldn't.

> + onvoiceschanged: This event is fired when the set of voices has changed.
Nit, onvoiceschanged is the name of an EventHandler, not event.
Event name would be voiceschanged
Comment 17 Eitan Isaacson 2013-09-23 18:12:29 UTC
(In reply to Dominic Mazzoni from comment #15)
> + initialized: This attribute is true if the speech synthesis system is
> + initialized. It returns false if it is still loading the initial set
> + of voices. For backwards compatibility, if speechSynthesis.initialized
> + is undefined, clients should assume that initialization is complete.
> + Clients use onvoiceschanged to be notified when initialization has
> + completed.

From an implementation point of view, we may not want to initialize anything until getVoices, or other API points are touched. The original proposal allowed us to retrieve voices lazily, which this does not really do.

For example, if there is a network speech service that we need to query for voices, we should only do that if the user or app have interest in it. We shouldn't query the service for voices every time the browser starts.

So far, I think the best proposal is what you talked about in comment #12.
Comment 18 Dominic Mazzoni 2013-09-23 18:28:58 UTC
(In reply to Olli Pettay from comment #16)
> We don't have any stable spec, so we should allow changes to the API.

The bar should be higher for making changes that would break compatibility with an implementation that's shipping in a major browser. Let's not do this if we don't have to.

> This doesn't quite work since SpeechSynthesis isn't an EventTarget.
> So SpeechSynthesis should inherit EventTarget.

You're right, good point. I think that's reasonable in general because there may be other "global" speech events we may want to add to the spec in the future.

> > For backwards compatibility, if speechSynthesis.initialized
> > + is undefined, clients should assume that initialization is complete.
> That would be really odd.
> if (speechSynthesis.initialized) might mean ss is initialized or it is not
> depending on
> whether initialized is implemented.

That's right, but we could just recommend web authors write (speechSynthesis.initialized !== false).

> Could we live without .initialized and rely on getVoices to return non-empty
> list in case SS has been initialized?

No, because it's quite plausible that there are no voices. You could be running a build of the browser with no built-in voices. You could be on mobile and no voices could be installed.

> Returning null from getVoices() would break backwards compatibility, but
> empty list wouldn't.

Returning null would only break backwards compatibility with webpages that follow the old spec. However, it doesn't make the iOS implementation incompatible with the new proposed spec, which seems valuable. Since iOS is synchronous, always returning a list would work fine.

> > + onvoiceschanged: This event is fired when the set of voices has changed.
> Nit, onvoiceschanged is the name of an EventHandler, not event.
> Event name would be voiceschanged

Agreed, thanks.
Comment 19 Dominic Mazzoni 2013-09-23 18:33:19 UTC
(In reply to Eitan Isaacson from comment #17)
> From an implementation point of view, we may not want to initialize anything
> until getVoices, or other API points are touched. The original proposal
> allowed us to retrieve voices lazily, which this does not really do.
> 
> For example, if there is a network speech service that we need to query for
> voices, we should only do that if the user or app have interest in it. We
> shouldn't query the service for voices every time the browser starts.

Why couldn't we retrieve voices lazily? The first time speechSynthesis is accessed, the browser could start initializing. If it's already initialized when the client accesses it, great - otherwise it waits for the event.

> So far, I think the best proposal is what you talked about in comment #12.

That's still fine with me. Technically returning null is breaking backward compatibility, but it seems like it's only doing so in a small way, and it's easy for a web developer to write code that works on both iOS and a browser that implements the spec change.

The alternatives (like using a promise or callback) would require web developers writing different code for different browsers.
Comment 20 Dominic Mazzoni 2013-09-26 20:14:51 UTC
Any thoughts on this? I'm hoping for more support for either my proposal in Comment 12, or my alternative in Comment 15.

If anyone objects to *both* ideas, please speak up now. I'm assuming that most people in this community are okay with either one if they haven't said otherwise.
Comment 21 Glen Shires 2013-09-30 21:50:51 UTC
Note that there is already errata E08:
Section 5.2 IDL: "interface SpeechSynthesis" should be "interface SpeechSynthesis : EventTarget".


I propose the following errata.
If there's no disagreement, I'll add this to the errata page on Oct 15.


Section 5.2 IDL: A "?" is added to the following line to indicate that null may be returned: "SpeechSynthesisVoiceList? getVoices();"

Section 5.2 IDL: "attribute EventHandler onvoiceschanged;" is added to "interface SpeechSynthesis : EventTarget".

Section 5.2.2 getVoices method: append at the end of the definition: "Returns null if the list of available voices has not yet been initialized (for example: server-side synthesis where the list is determined asynchronously). If initialization of the list has completed and there are no voices available, it MUST return a SpeechSynthesisVoiceList of length zero."

New "Section 5.2.2.1 SpeechSynthesis Events" is created and contains:
voiceschanged: Fired when the set of available voices has changed, indicating that a subsequent call to the getVoices method MUST return a SpeechSynthesisVoiceList (not null) and the list MAY have changed. If the available voices are determined asynchronously (for example: server-side synthesis), this event will be fired when the voices list becomes available. It may be fired again if additional voices subsequently become available or unavailable.
Comment 22 Olli Pettay 2013-10-01 11:23:07 UTC
Returning empty list from getVoices() would break the API less than returning null. But otherwise sounds good.
Comment 23 Glen Shires 2013-10-01 17:37:56 UTC
Olli,
Instead of null, we could alternatively add an attribute. Below I'm proposing the Comment 15 idea, except I've reversed the polarity of the attribute such that a developer could write

   if (speechSynthesis.voicesLoading)

and get the expected results if it's undefined, such as the current iOS synchronous implementation.


Thus, I propose the following errata.
If there's no disagreement, I'll add this to the errata page on Oct 15.

Section 5.2 IDL: "readonly attribute boolean voicesLoading;" is added to "interface SpeechSynthesis : EventTarget".

Section 5.2 IDL: "attribute EventHandler onvoiceschanged;" is added to "interface SpeechSynthesis : EventTarget".

Section 5.2.1: SpeechSynthesis Attributes: "voicesLoading attribute" is added with the following definition:
"Returns true if the list of available voices is not yet obtainable (for example: server-side synthesis where the list is determined asynchronously). Returns false, and remains false, when the list of available voices is obtainable and will be returned upon a subsequent call to the getVoices method. Upon the transition from true to false, onvoiceschanged MUST fire, even if there are no available voices.  (If the voicesLoading attribute never returns true, for example: client-side synthesis where the list is immediately available, then there is no such transition and no requirement for onvoiceschanged to fire in this case.)

Section 5.2.2 getVoices method: append at the end of the definition: "If there are no voices available, or if voicesLoading is true, this method MUST return a SpeechSynthesisVoiceList of length zero."

New "Section 5.2.2.1 SpeechSynthesis Events" is created and contains:
voiceschanged: Fired when the set of available voices has changed, indicating that the contents of the SpeechSynthesisVoiceList returned by the getVoices method MAY have changed. This event MUST be fired again if additional voices subsequently become available or unavailable.
Comment 24 Olli Pettay 2013-10-01 17:42:34 UTC
What is wrong with empty voices list?
Comment 25 Glen Shires 2013-10-01 18:07:21 UTC
Olli,
As you point out, there is a third option if getVoices is called before the list of voices has been asynchronously retrieved. The API can indicate this state (as opposed to the state where the voices are known and none are available) by: 

Option 1: (Comment 21): Indicate this state by getVoices returning null.

Option 2: (Comment 23): Indicate this state by voicesLoading returning true. (In this case, getVoices returns the empty list).

Option 3: Not provide a way of indicating this state.

In all three cases, onvoiceschange will fire when the list of voices has been retrieved.


So the question becomes, how does this affect use cases:

With option 1 and 2: the web page could indicate a "loading" state until the voices are known to be available...or not available.

With option 3: the web page would indicate that voices are not available, until they are known to be available. (Or it could emulate the "loading" state with its own timeout, but that's a hack).


Option 3 is certainly a simpler API.  Is it sufficient?
Comment 26 Dominic Mazzoni 2013-10-01 18:11:23 UTC
(In reply to Olli Pettay from comment #24)
> What is wrong with empty voices list?

What's wrong is that an app developer can't distinguish between "speech synthesis is uninitialized" and "it's initialized, no voices are available".

Pragmatically, a web developer who tests on their desktop might implement a UI that seems to work; they call getVoices once on startup and populate a list of voices. Then some of their users might get an experience where the voice list is sometimes empty, and it wouldn't be clear why.

If getVoices returns null, their code might break - but it'd be trivial to debug and fix their code to listen to onvoiceschanged.
Comment 27 Olli Pettay 2013-10-01 18:39:47 UTC
Well, given that the number of voices may change at any point,
the speech synthesis is never at stable state (see comment 14).
So I don't understand why the different between initialized and other state.

Returning empty list would be backwards compatible, and it wouldn't be
hard to add a listener for voiceschanged.
getVoices would always return the currently available voices.

(Just trying to keep the API simple, by not adding special cases like
returning null in some cases and empty list in some other cases)
Comment 28 Dominic Mazzoni 2013-10-01 18:59:30 UTC
(In reply to Olli Pettay from comment #27)
> Returning empty list would be backwards compatible, and it wouldn't be
> hard to add a listener for voiceschanged.

I think it would be backwards compatible. Only iOS has actually shipped the API, and they only return a valid list synchronously, never null. There's nothing not-backwards-compatible about changing the API to allow null to be returned.

We don't need to be backwards-compatible with existing apps when the spec isn't even final yet!

> getVoices would always return the currently available voices.
> 
> (Just trying to keep the API simple, by not adding special cases like
> returning null in some cases and empty list in some other cases)

I still prefer returning null. But I can live without it.

If not, I would prefer adding an initialized or uninitialized flag. That okay?
Comment 29 Dominic Mazzoni 2013-10-02 20:52:02 UTC
To reach consensus, I'll agree to just add "onvoiceschanged" for now and not make any changes to getVoices.

If we decide this is insufficient later, we can reconsider adding an "initialized" flag or something similar to speechSynthesis.
Comment 30 Glen Shires 2013-10-02 21:12:42 UTC
Based on Olli's and Dominic's most recent comments, I propose the following errata. If there's no disagreement, I'll add this to the errata page on Oct 15.

Section 5.2 IDL: "attribute EventHandler onvoiceschanged;" is added to "interface SpeechSynthesis : EventTarget".

Section 5.2.2 getVoices method: append at the end of the definition: "If there are no voices available, or if the the list of available voices is not yet known (for example: server-side synthesis where the list is determined asynchronously), then this method MUST return a SpeechSynthesisVoiceList of length zero."

New "Section 5.2.2.1 SpeechSynthesis Events" is created and contains:
"voiceschanged: Fired when the contents of the SpeechSynthesisVoiceList, that the getVoices method will return, have changed.  Examples include: server-side synthesis where the list is determined asynchronously, or when client-side voices are installed/uninstalled."
Comment 31 Glen Shires 2013-10-18 00:01:24 UTC
I've updated the errata (E11) with the above change:
https://dvcs.w3.org/hg/speech-api/rev/0b8ae424b59d

As always, the current errata is at:
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi-errata.html