Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
A number of existing or proposed features for the Web platform deal with continuous real-time media:
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This specification documents a proposal for an audio processing and synthesis API for use in client-side user agents (e.g. a browser). This document was produced by members of the Audio Working Group. It was initially published alongside an alternative proposal, the Web Audio API. On May 9th, 2012, the Audio Working Group resolved to re-publish this proposal as a Working Group Note, with a view to use this document as input for the Web Audio API draft specification.
If you wish to make comments regarding this document, please send them
to public-audio@w3.org
including the prefix'[MSP NOTE comment]'
in the subject
line. The archives of the
public-audio mailing list are publicly available.
Web content and browser developers especially are encouraged to review this draft, and to experiment with the API and provide feedback.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The ideas here build on Ian Hickson's proposal for HTML Streams, adding features partly inspired by the Mozilla audio API and the Chrome audio API. Unlike previous audio API proposals, the API presented here integrates with proposed API for media capture from local devices, integrates with proposed API for peer-to-peer media streaming, handles audio and video in a unified framework, incorporates Worker-based Javascript audio processing, and specifies synchronization across multiple media sources and effects. The API presented here does not include a library of "native" effects; those should be added as new "named effects" (described below), perhaps as a "level 2" spec.
The work here is nascent. Until a prototype implementation exists, this proposal is likely to be incomplete and possibly not even implementable.
These are concrete usage scenarios that have helped guide the design of the API. They are higher-level than use-cases.
The description of MediaStreams here extends and must remain compatible with
HTML MediaStreams.Each MediaStream DOM object has an underlying media stream. The underlying media streams form a graph;
some streams (represented by ProcessedMediaStream DOM objects) can take other streams as inputs and compute an output stream.To avoid interruptions due to script execution, script execution can overlap with media stream processing;
media streams continue to play and change state during script execution. However, to simplify the DOM programming model, we limit the interaction of MediaStream DOM objects with their underlying media streams. Specifically: stream.inputs[0].volume = 0;
if (needToPlay()) {
stream.inputs[0].volume = 1.0;
}
Specify exactly which attributes (and methods) are subject to this regime, possibly extending to attributes and methods already defined in HTML for media elements etc.
In this spec, references to MediaStream
s and MediaInput
s
refer to the DOM-visible state, and references to media streams
and input ports refer to the underlying real-time media stream
graph.
A stream is an abstraction of a time-varying video and/or audio signal. At a given point in time, a media stream can be blocked, that is, not playing for some reason. All non-blocked streams play at the same constant rate --- real time. Streams cannot be seeked or played at a rate other than real time. For convenience we define a stream's "current time" as the duration it has played since it was created, but (unless blocking is involved) streams do not have independent timelines; they are synchronized.
At the implementation level, and when processing media data with Workers, we assume that each stream has a buffered window of media data available, containing the sample it is currently playing (or that it will play the next time it unblocks). This buffer defines the stream's contents into the future.
A stream can be in the finished state. A finished stream is always blocked and can
never leave the finished state --- it will never produce any more content.A stream that has no consumers must block. Stream consumers defined in
this specification are media elements and ProcessedMediaStream
s
(see below). This avoids situations where streams that aren't connected
ProcessMediaEvent
is being sent). A muted audio element can be
used as a dummy sink if necessary.
We do not allow streams to have independent timelines (e.g. no adjustable playback
rate or seeking within an arbitrary stream), because that can lead to a single stream being consumed at multiple different "current times" simultaneously, which requires either unbounded buffering or multiple internal decoders and buffers for a single stream. It seems simpler and more predictable for performance to require authors to create multiple streams (if necessary) and change the playback rate in the original stream sources to handle such situations.For example, consider this hard case:
Authors can avoid this by explicitly splitting streams that may need to progress at
different rates --- in the above case, by using two separate media elements each loading http://fast. The HTML spec encourages implementations to share cached media data between media elements loading the same URI.A media stream contains video and audio tracks. Tracks can start and end at any time. Each track
contains a stream of audio or video data.This spec mostly treats the formats used for stream audio and video data
as an implementation detail. In particular, whether stream buffers are compressed or uncompressed, what compression formats might be used, or what uncompressed formats might be used (e.g. audio sample rates, channels, and sample representation) are not specified, and are not directly observable. An implementation might even support changing formats over time within a single stream. Media data is implicitly resampled as necessary, e.g. when mixing streams with different formats. Non-normative suggestions for resampling algorithms will be provided in section 7.Built-in audio processing filters guarantee that if all the audio inputs constantly have the same uncompressed format
(same audio sample rate and channel configuration), the audio output will have the same format and there will be no unnecessary resampling.When samples are exposed to a Worker for processing, the user-agent chooses a fixed uncompressed audio format (sample rate and channel configuration) for its inputs and outputs; see section 4.4.
However, suggested resampling algorithms will be provided in an appendix.
partial interface MediaStream {
readonly attribute double currentTime;
ProcessedMediaStream createProcessor(optional in DOMString namedEffect);
ProcessedMediaStream createWorkerProcessor(in Worker worker);
};
The currentTime
attribute returns the amount of time that
this MediaStream
has played since it was created.
The createProcessor(namedEffect)
method returns a new ProcessedMediaStream
with this MediaStream
as its sole input.
ProcessedMediaStream
is configured with a built-in
processing engine named by namedEffect
, or the default
processing engine if namedEffect
is omitted. If namedEffect
is not supported by this user-agent, createProcessor
returns
null. User-agents adding nonstandard named effects should use vendor
prefixing, e.g. "MozUnderwaterBubbles". The stream's autofinish
attribute is set to true.
The createWorkerProcessor(worker)
method returns a new ProcessedMediaStream
with this MediaStream
as its sole input.
worker
as its processing engine.
The stream's autofinish
flag is set to true.
Add event handlers or callback functions for all finished and blocking state changes?
We extend HTML media elements to produce and consume streams. When an HTML media element
produces a stream, it acts as a resource loader and control mechanism; the stream consists of whatever the media element is currently playing. When a media element consumes a stream, it acts a playback mechanism for the stream.partial interface HTMLMediaElement {
readonly attribute MediaStream stream;
MediaStream captureStream();
MediaStream captureStreamUntilEnded();
readonly attribute boolean audioCaptured;
attribute any src;
};
The stream
attribute returns a stream which always plays
whatever the element is playing. The stream is blocked while the media
element is not playing. It is never finished, even when playback ends
stream
attribute for a given element always
returns the same stream. When the stream changes to blocked, we fire the waiting
event for the media element, and when it changes to unblocked we fire the playing
event for the media element.
Currently the HTML media element spec says that playing
would fire on an element that is able to play except that a downstream MediaController
is blocked. This is incompatible with the above. I think that part of the
HTML media spec should be changed so that only elements that are actually
going to play fire playing
.
The captureStream()
method returns a new MediaStream
that plays the same audio and video as stream
.
captureStream()
sets the audioCaptured
attribute
to true.
The captureStreamUntilEnded()
method returns a new MediaStream
that plays the same audio and video as stream
, until the
element next reaches
captureStreamUntilEnded()
sets the audioCaptured
attribute to true.
While the media element is playing a resource whose origin is not the
same as the media element itself, MediaStream
s for the
element do not play the content; they block.
This prevents leaking of media contents to scripts in the page. In the future we could relax this and allow the streams to play as long as there's no scripted processing of the data downstream, but that's trickier.
While the audioCaptured
attribute is true, the element does
not produce direct audio output.
stream
and the resource stream.
This attribute is NOT reflected into the DOM. It is initially false. It's
readonly to script and nothing ever sets it to true.
The src
attribute is extended to allow it to be set to a MediaStream
.
The element
playbackRate
are not supported.
The URL.createObjectURL(stream)
method defined for HTML
MediaStreams can create a URL to be
src
to
the given stream.
We add an Audio
constructor taking a MediaStream
as a parameter.
src
to the stream.
[NamedConstructor=Audio(MediaStream src)]
partial interface HTMLAudioElement {
}
To enable precise control over the timing of attribute changes, many attributes can be set using a
"timed setter" method taking astartTime
parameter. The
user-agent will attempt to make the change take effect when the subject
stream's "current time" is exactly the given startTime
---
certainly no earlier, but possibly later if the change request is processed
after the stream's current time has reached startTime
. startTime
is optional; if ommitted, the change takes effect as soon as possible.
Using a timed setter method never changes the observed attribute value immediately. Setter method changes always take effect after the next stable state, as described in section 2.1. Setting the attribute value changes the observed attribute value immediately, but the change to the underlying media stream will still not take effect until after
the next stable state.Multiple pending changes to an attribute are allowed. Calling the setter method with
startTime
T sets the value of the attribute for all times T'
>= T to the desired value (wiping out the effects of previous calls to
the setter method with a time greater than or equal to startTime
).
Therefore by calling the setter method multiple times with increasing startTime
,
a series of change requests can be built up. Setting the attribute directly
sets the value of the attribute for all future times, wiping out any pending
setter method requests.
A ProcessedMediaStream
combines zero or more input streams
and applies some processing to
[Constructor(),
Constructor(in Worker worker, in optional long audioSampleRate, in optional short audioChannels)]
interface ProcessedMediaStream : MediaStream {
readonly attribute MediaInput[] inputs;
MediaInput addInput(in MediaStream inputStream, in optional double outputStartTime, in optional double inputStartTime);
attribute any params;
void setParams(in any params, in optional double startTime);
readonly attribute boolean autofinish;
};
The constructors create a new ProcessedMediaStream
with no
inputs.
ProcessedMediaStream
with
a Worker processing engine, setting the audio sample rate to audioSampleRate
and setting the number of audio channels to audioChannels
(defaulting to 2). These parameters control the audio sample format used by
the Worker (see below). Both constructors initialize autofinish
to false.
Specify valid values for audioChannels
and audioSampleRate
.
The inputs
attribute returns an array of MediaInput
s,
one for
ProcessedMediaStream
.
(A stream can be used as multiple inputs to the same ProcessedMediaStream
.)
It is initially empty if constructed via the ProcessedMediaStream()
constructor, or contains a single element if constructed via MediaStream.createProcessor
.
A MediaInput
represents an input port. An input port is active
while it is enabled (see below) and the input stream is not finished.
The addInput(inputStream, outputStartTime, inputStartTime)
method adds a new MediaInput
to the end of the
inputs
array, whose input stream is inputStream
.
The outputStartTime
and inputStartTime
parameters control when an input port is enabled and helps synchronize
inputs with outputs. The input port is enabled when the input stream's
current time is inputStartTime
and the output stream's current
time is outputStartTime
.
More precisely, when the addInput
call takes effect (see
section 2.1),
outputStartTime
was omitted, set it to the output
stream's current time.inputStartTime
was omitted, set it to the input
stream's current time.inputStartTime
outputStartTime
inputStartTime
and outputStartTime
. (This would be a good place for
user-agents to emit a developer-accessible warning.)inputStartTime
,
or the output stream's current time is less than outputStartTime
:
inputStartTime
,
block the input stream.outputStartTime
,
block the output stream. var p = inputStream.createProcessor();
p.addInput(inputStream, 5);
In this example, inputStream
is used as an input to p
twice. inputStream
must block until p
has
played 5s of output, but also p
cannot play anything until inputStream
unblocks. It seems hard to design an API that's hard to deadlock; even
creating a cycle will cause deadlock.The params
attribute and the setParams(params,
startTime)
timed setter method set the paramters for this stream.
On setting, a structured clone of this object is made. The clone
is sent to
When an input stream finishes, at the next stable state any MediaInput
s
for that
When the autofinish
attribute is true, then when all stream
inputs are finished (including if there are no inputs), the stream will
automatically enter the finished state and never produce any more output
(even if new inputs are attached).
A MediaInput
object controls how an input stream
contributes to the combined stream.
interface MediaInput {
readonly attribute MediaStream stream;
MediaInput addFollowing(MediaStream inputStream);
attribute double volume;
void setVolume(in double volume, in optional double startTime, in optional double fadeTime);
attribute any params;
void setParams(in any params, in optional double startTime);
attribute boolean blockInput;
attribute boolean blockOutput;
void remove(in optional double time);
};
The stream
attribute returns the MediaStream
connected to this input.
The addFollowing(inputStream)
method adds a new MediaInput
to
inputs
array for the ProcessedMediaStream
this MediaInput
belongs to. The new MediaInput
's
input stream is inputStream
. The new input port is enabled
when this input port's audio tracks all end, or the this input port is
removed. Implementations should extrapolate and blend the ends of audio
tracks if necessary to ensure a seamless transition between streams.
Sequencing media resources using media element durations is
possible, e.g. processor.addInput(video1.captureStreamUntilEnded());
processor.addInput(video2.captureStreamUntilEnded(),video1.duration);
.
However, media element durations may not be perfectly reliable due to
limitations of resource formats or commonly-used encoders. For example,
WebM resources often have a slight mismatch between the number of audio
samples and the duration metadata. For other resources, such as live
streaming sources, the duration may not be knowable in advance. Thus,
using the duration is often unreliable and additional API is warranted.
Add additional API to select particular tracks.
The volume
volume attribute and the setVolume
timed setter method
setVolume
method takes an
additional fadeTime
parameter; when greater than zero, the
volume is changed gradually from the value just before startTime
to the new value over the given fade time. The transition function is chosen
so that if one stream changes from V1 to V2 and another stream changes from
V2 to V1 over the same interval, the sum of the volumes at each point in
time is V1 + V2. This attribute is initially 1.0.
Specify the exact transition function. Tim says "w=cos((pi/2)*t)^2 for t=0...1".
The params
attribute and the setParams(params,
startTime)
timed setter method set the paramters for this input.
On setting, a structured clone of this object is made. The clone
is sent to
For the timed setter methods of MediaInput
, the subject
stream is the output stream, so changes take effect when the output
stream's current time is equal to startTime
.
The blockInput
and blockOutput
attributes
control
blockOutput
is true and the
input port is active, if the input stream is blocked then the output stream
must be blocked. While an active input is blocked and the output is not
blocked, the input is treated as having no tracks. When blockInput
is true and the input port is active, if the output is blocked, then the
input stream must be blocked. When false, while the output is blocked and an
active input is not, the input will simply be discarded. These attributes
are initially true.
Need to look again at these. It's not clear we have use cases for both attributes, and I haven't implemented them yet and they could be hard to implement.
The remove(time)
method removes this MediaInput
from the inputs array of its owning
ProcessedMediaStream
at the given time relative to the output
stream (or later, if it cannot be removed in time). If time
is
omitted, the input is removed as soon as possible and the MediaInput
is removed from the destionation stream's input
array
immediately. After removal, the MediaInput
object is no longer
used; its attributes retain their current values and do not change unless
explicitly set. All method calls are ignored. Additional calls to remove
with an earlier time can advance the removal time, but once removal is
scheduled it cannot be stopped or delayed.
A ProcessedMediaStream
with a worker computes its output by
dispatching a sequence of onprocessmedia
callbacks to the
worker, passing each a ProcessMediaEvent
parameter. A ProcessMediaEvent
provides audio sample buffers for each input stream. Each sample buffer
for a given ProcessMediaEvent
has the same duration, so the
inputs presented to the worker are always in sync. (Inputs may be added or
removed between ProcessMediaEvent
s, however.) The sequence
of buffers provided for an input stream is the audio data to be played by
that input stream. The user-agent will precompute data for the input
streams as necessary.
For example, if a Worker computes the output sample for time T as a function of the [T - 1s, T + 1s] interval of an input stream, then initially the Worker would simply refuse to output anything until it has received at least 1s of input stream data, forcing the user-agent to precompute the input stream at least 1s ahead of the current time. (Note that large Worker latencies will increase the latency of changes to the media graph.)
Note that Worker
s do not have access to most
DOM API objects. In particular, Worker
s have no direct
access to MediaStream
s.
Note that a ProcessedMediaStream
's worker
cannot be a SharedWorker
. This ensures that the worker can
run in the same process as the page in multiprocess browsers, so media
streams can be confined to a single process.
Currently ProcessMediaEvent
does not offer
access to video data. This should be added later.
partial interface DedicatedWorkerGlobalScope {
attribute Function onprocessmedia;
};
The onprocessmedia
attribute is the function to be called
whenever stream data needs to be processed.
ProcessMediaEvent
is passed as the single parameter to each
call to the onprocessmedia
callback. For a given ProcessedMediaStream
,
the same ProcessMediaEvent
is passed in every call to the onprocessmedia
callback. This allows the callback function to maintain per-stream state.
interface ProcessMediaEvent : Event {
readonly attribute double inputTime;
readonly attribute any params;
readonly attribute double paramsStartTime;
readonly attribute MediaInputBuffer inputs[];
readonly attribute long audioSampleRate;
readonly attribute short audioChannels;
reaodnly attribute long audioLength;
void writeAudio(in Float32Array data);
void finish();
};
The inputTime
attribute returns the duration of the input
that has been consumed by the
ProcessedMediaStream
for this worker.
The params
attribute provides a structured clone of the
parameters object set by
ProcessedMediaStream.setParams
. The same object is returned in
each event, except when the object has been changed by setParams
between events.
The paramsStartTime
attribute returns the first time
(measured in duration of input consumed for this stream) that this params
object was set.
Note that the parameters objects are constant over the duration of the inputs presented in the event. Frequent changes to parameters will reduce the length of the input buffers that can be presented to the worker.
inputs
provides access to MediaStreamBuffers
for each active input stream
ProcessedMediaStream.inputs
array).
audioSampleRate
and audioChannels
represent
the format of the input and
audioSampleRate
is the number of
samples per second. audioChannels
is the number of channels;
the channel mapping is as defined in the Vorbis specification. These values
are constant for a given ProcessedMediaStream
. When the ProcessedMediaStream
was constructed using the Worker constructor, these values are the values
passed as parameters there. When the ProcessedMediaStream
was
constructed via MediaStream.createProcessor
, the values are
chosen to match the first active input stream (or 44.1KHz, 2 channels if
there is no active input stream).
audioLength
is the duration of the input(s) multiplied by
the sample rate. If there are no inputs,
writeAudio(data)
writes audio data to the stream output.
kind
is "main" and
the other metadata attriutes are the empty string. The data for the output
audio track is the concatenation of the inputs to each writeAudio
call before the event handler returns. The data buffer is laid out with the
channels non-interleaved, as for the input buffers (see below). The length
of data
must be a multiple of audioChannels
; if
not, then only the sample values up to the largest multiple of audioChannels
less than the data length are used.
It is permitted to write less audio than the duration of the inputs (including none). This indicates latency in the filter. Normally the user-agent will dispatch another event to provide
more input until the worker starts producing output. It is also permitted to write more audio than the duration of the inputs, for example if there are no inputs. Filters with latency should respond to an event with no inputs by writing out some of their buffered data; the user-agent is draining them.A synthesizer with no inputs can output as much data as it wants; the UA will buffer data and fire events as necessary. Filters that misbehave, e.g. by always writing zero-length buffers, will cause the stream to block due to an underrun.
If writeAudio
is not called during the event handler, then
the input audio buffers are added together and written to the output.
If writeAudio
is called outside the event handler, the call
is ignored.
Calling finish()
puts the stream into the finished state
(once any previously buffered output has been consumed). The event
callback will never be called again. finish()
can be called
at any time, inside or outside the event handler.
The output video track is computed as if there was no worker (see above).
This will change when we add video processing.
interface MediaInputBuffer {
readonly attribute any params;
readonly attribute double paramsStartTime;
readonly attribute Float32Array audioSamples;
};
The params
attribute provides a structured clone of the
parameters object set by
MediaInput.setParams
. The same object is returned in each
event, except when the object has been changed by setParams
between events.
The paramsStartTime
attribute returns the first time
(measured in duration of input consumed for this stream) that this params
object was set.
audioSamples
gives access to the audio samples for each
input stream. The array length will be event.audioLength
multiplied by event.audioChannels
. The samples are floats
ranging from -1 to 1, laid out non-interleaved, i.e. consecutive segments
of audioLength
samples each. The durations of the input
buffers for the input streams will be equal. The audioSamples
object will be a fresh object in each event. For inputs with no audio
track, audioSamples
will be all zeroes.
A ProcessedMediaStream
with the default processing engine
produces output as follows:
id
,
kind
, label
, and language
) is
equal to that of the audio track for the last active input that has an
audio track. The output audio track is produced by adding the samples of
the audio tracks of the active inputs together.id
,
kind
, label
, and language
) is
equal to that of the video track for the last active input that has a
video track. The output video track is produced by compositing together
all the video frames from the video tracks of the active inputs, with
the video frames from higher-numbered inputs on top of the video frames
from lower-numbered inputs; each video frame is letterboxed to the size
of the video frame for the last active input that has a video track.
This means if the last input's video track is opaque, the video output is simply the video track of the last input.
A ProcessedMediaStream
with the "LastInput" processing
engine simply produces the last active input stream as output. If there
are no active input streams, it produces the same output as the default
processing engine.
While a ProcessedMediaStream
has itself as a direct or
indirect input stream (considering only active inputs), it is blocked.
At each moment, every stream should not be blocked except as explicitly required by this specification.
To enable video synthesis and some easy kinds of video effects we can record the contents of a canvas:
partial interface HTMLCanvasElement {
readonly attribute MediaStream stream;
};
The stream
attribute is a stream containing a video track
with the "live" contents of the canvas as video frames whose size is the
size of the canvas, and no audio track. It always returns the same stream
for a given element.
Here will be some non-normative implementation suggestions.
Add Worker scripts for these examples.
<video src="foo.webm" id="v" controls></video>
<audio id="out" autoplay></audio>
<script>
document.getElementById("out").src =
document.getElementById("v").captureStream().createWorkerProcessor(new Worker("effect.js"));
</script>
<video src="foo.webm" id="v"></video>
<audio src="back.webm" id="back"></audio>
<audio id="out" autoplay></audio>
<script>
var mixer = document.getElementById("v").captureStream().createWorkerProcessor(new Worker("audio-ducking.js"));
mixer.addInput(document.getElementById("back").captureStream());
document.getElementById("out").src = mixer;
function startPlaying() {
document.getElementById("v").play();
document.getElementById("back").play();
}
// MediaController is a more convenient API because it ties together control of the elements,
// but using streams is more flexible (e.g. they can be seeked to different offsets).
</script>
<script>
navigator.getUserMedia('audio', gotAudio);
function gotAudio(stream) {
peerConnection.addStream(stream.createWorkerProcessor(new Worker("effect.js")));
}
</script>
<canvas id="c"></canvas>
<script>
navigator.getUserMedia('audio', gotAudio);
var streamRecorder;
function gotAudio(stream) {
var worker = new Worker("visualizer.js");
var processed = stream.createWorkerProcessor(worker);
worker.onmessage = function(event) {
drawSpectrumToCanvas(event.data, document.getElementById("c"));
}
streamRecorder = processed.record();
peerConnection.addStream(processed);
}
</script>
<canvas id="c"></canvas>
<audio src="back.webm" id="back"></audio>
<script>
navigator.getUserMedia('audio', gotAudio);
var streamRecorder;
function gotAudio(stream) {
var worker = new Worker("visualizer.js");
var processed = stream.createWorkerProcessor(worker);
worker.onmessage = function(event) {
drawSpectrumToCanvas(event.data, document.getElementById("c"));
}
var mixer = processed.createProcessor();
mixer.addInput(document.getElementById("back").captureStream());
streamRecorder = mixer.record();
peerConnection.addStream(mixer);
}
</script>
<script>
var worker = new Worker("spatializer.js");
var spatialized = stream.createWorkerProcessor(worker);
peerConnection.onaddstream = function (event) {
spatialized.addInput(event.stream).params = {x:..., y:..., z:...};
};
(new Audio(spatialized)).play();
</script>
This method requires that you know each stream's duration, which is a bit suboptimal. To get around that we'd need new API, perhaps a new kind of ProcessedMediaStream that plays streams in serial.
<audio src="in1.webm" id="in1" preload></audio>
<audio src="in2.webm" id="in2"></audio>
<script>
var in1 = document.getElementById("in1");
in1.onloadedmetadata = function() {
var mixer = in1.captureStreamUntilEnded().createProcessor();
var in2 = document.getElementById("in2");
mixer.addInput(in2.captureStreamUntilEnded(), in1.duration);
(new Audio(mixer)).play();
in1.play();
}
</script>
There are two ways to implement seamless switching: seek the second resource to before the current time and then run the decoder faster than real-time to catch up to the first resource's play point, or seek the second resource to after the current time and enable it when the first resource catches up to the seek point. The first is more robust if the seek takes unexpectedly long, but the second is less demanding on the decoder. Only the second method is currently implementable with this API (since by design there is no way to drive MediaStreams faster than real-time). If we want to support the first method as well, the right way would be to add API to media elements to let them seek to synchronize with a given MediaStream.
<audio src="in1.webm" id="in1" preload></audio>
<audio src="in2.webm" id="in2"></audio>
<audio id="out" autoplay></audio>
<script>
var stream1 = document.getElementById("in1").captureStream();
var mixer = stream1.createProcessor("LastInput");
document.getElementById("out").src = mixer;
function switchStreams() {
var in2 = document.getElementById("in2");
in2.currentTime = in1.currentTime + 10; // arbitrary, but we should be able to complete the seek within this time
mixer.addInput(in2.captureStream(), mixer.currentTime + 10);
in2.play();
// in2 will be blocked until the input port is enabled
in2.onplaying = function() { mixer.inputs[0].remove(); };
}
</script>
<audio id="out" autoplay></audio>
<script>
document.getElementById("out").src =
new ProcessedMediaStream(new Worker("synthesizer.js"));
</script>
<script>
var effectsMixer = ...;
function playSound(src) {
var audio = new Audio(src);
audio.oncanplaythrough = function() {
var stream = audio.captureStreamUntilEnded();
var port = effectsMixer.addInput(stream);
port.blockOutput = false;
audio.play();
}
}
</script>
<script>
var effectsMixer = ...;
var audio = new Audio(...);
function triggerSound() {
var sound = audio.clone();
var stream = sound.captureStreamUntilEnded();
sound.play();
effectsMixer.addInput(stream, effectsMixer.currentTime + 5);
}
</script>
<script>
navigator.getUserMedia('video', gotVideo);
function gotVideo(stream) {
stream.createWorkerProcessor(new Worker("face-recognizer.js"));
}
</script>
<script>
navigator.getUserMedia('video', gotVideo);
var streamRecorder;
function gotVideo(stream) {
streamRecorder = stream.record();
}
function stopRecording() {
streamRecorder.getRecordedData(gotData);
}
function gotData(blob) {
var x = new XMLHttpRequest();
x.open('POST', 'uploadMessage');
x.send(blob);
}
</script>
<canvas width="640" height="480" id="c"></canvas>
<script>
var canvas = document.getElementById("c");
var streamRecorder = canvas.stream.record();
function stopRecording() {
streamRecorder.getRecordedData(gotData);
}
function gotData(blob) {
var x = new XMLHttpRequest();
x.open('POST', 'uploadMessage');
x.send(blob);
}
var frame = 0;
function updateCanvas() {
var ctx = canvas.getContext("2d");
ctx.clearRect(0, 0, 640, 480);
ctx.fillText("Frame " + frame, 0, 200);
++frame;
}
setInterval(updateCanvas, 30);
</script>