Re: Interacting with WebRTC, the Web Audio API and other external sources

With all major browser vendors being members of the WebRTC working group,
it may actually be worth considering to slim down the APIs and re-use the
interface they'll provide.

As an addendum to the quoted proposal:

* Drop the "start", "stop" and "abort" methods from the SpeechRecognition
object in favor of an input MediaStream acquired through getUserMedia()[1].

Alternatively, the three methods could be re-purposed allowing
partial/timed recognition in case of continuous media streams, rather than
the whole stream.

Best,
Peter

[1]
http://dev.w3.org/2011/webrtc/editor/getusermedia.html#navigatorusermedia


On Wed, Jun 13, 2012 at 3:49 PM, Peter Beverloo <beverloo@google.com> wrote:

> Currently, the SpeechRecognition[1] interface defines three methods to
> start, stop or abort speech recognition, the source of which will be an
> audio input device as controlled by the user agent. Similarly, the
> TextToSpeech (TTS) interface defines play, pause and stop, which will
> output the generated speech to an output device, again, as controlled by
> the user agent.
>
> There are various other media and interaction APIs in development right
> now, and I believe it would be good for the Speech API to more tightly
> integrate with them. In this e-mail, I'd like to focus on some additional
> features for integration with WebRTC and the Web Audio API.
>
> ** WebRTC <http://dev.w3.org/2011/webrtc/editor/webrtc.html>
>
> WebRTC provides the ability to interact with the user's microphone and
> camera through the getUserMedia() method. As such, an important use-case is
> (video and --) audio chatting between two or more people. Audio is
> available through a MediaStream object, which can be re-used to power, for
> example, an <audio> element, transmitted to other people through a
> peer-to-peer connection, but can also integrate with the Web Audio API
> through an Audio Context's createMediaStreamSource() method.
>
> ** Web Audio API <
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html>
>
> The Web Audio API provides the ability to process, analyze, synthesize and
> modify audio through JavaScript. It can get its input from media files
> through XMLHttpRequest, from media elements such as <audio> and <video> and
> from any kind of other system, which includes WebRTC, that is able to
> provide an audio-based MediaStream.
>
> Since speech recognition and synthesis does not have to be limited to live
> input from and output to the user, I'd like to present two new use-cases.
>
> 1) Transcripts for (live) communication.
>
> While the specification does not mandate a maximum duration of a speech
> input stream, this suggestion is most appropriate for implementations
> utilizing a local recognizer. Allowing MediaStreams to be used as an input
> for a SpeechRecognition object, for example through a new "inputStream"
> property as an alternative to the start, stop and abort methods, would
> enable authors to supply external input to be recognized. This may include,
> but is not limited to, prerecorded audio files and WebRTC live streams,
> both from local and remote parties.
>
> 2) Storing and processing text-to-speech fragments.
>
> Rather than mandating immediate output of the synthesized audio stream, it
> should be considered to introduce an "outputStream" property on a
> TextToSpeech object which provides a MediaStream object. This allows the
> synthesized stream to be played through the <audio> element, processed
> through the Web Audio API or even to be stored locally for caching, in case
> the user is using a device which is not always connected to the internet
> (and when no local recognizer is available). Furthermore, this would allow
> websites to store the synthesized audio to a wave file and save this on the
> server, allowing it to be re-used for user agents or other clients which do
> not provide an implementation.
>
> The Web platform gains its power by the ability to combine technologies,
> and I think it would be great to see the Speech API playing a role in that.
>
> Best,
> Peter
>
> [1]
> http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-section
>

Received on Thursday, 19 July 2012 14:38:49 UTC