This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
E-mail Thread: http://lists.w3.org/Archives/Public/public-audio/2013JulSep/thread.html#msg105 Kumar Wrote: Q: Can script processor nodes be expected to run synchronously during offline rendering? By that I mean will native components wait for a script node to complete its task before proceeding to the next block of sample frames? Though it would be ridiculous if they are not, the spec isn't clear on this. When using the realtime context, the audio thread is not expected to wait for compute intensive script nodes and consequently the script nodes operate with a delay. This delay is unnecessary when rendering using the offline context and it would be ok for the rendering thread to wait for the JS code before continuing. Ensuring this has another benefit - that of providing a mechanism by which we can use a script node as a periodic callback timer to instantiate native nodes just in time when rendering long compositions. Chris Rogers replied: Yes, this is how it should work. Currently, this is a limitation in the WebKit and Blink implementations. TODO: edit the spec accordingly.
Kumar, I've been starting some initial work on this in Blink. I believe that as long as the ScriptProcessorNode buffer size is equal to 128 (the block processing size of the AudioContext) then we can get zero in/out latency on this node if used with an OfflineAudioContext. Otherwise, if the buffer size is greater, then we have to end up buffering and incur some latency, although the processing would still be synchronized.
We can also specify that when used with OfflineAudioContext, the implementation must eliminate the latency and provide a zero latency non-glitching buffer in the rendered buffer on the context. I think doing that makes sense since the latency added by ScriptProcessorNode is almost never desirable. And also any application that needs an initial latency can code that into the event handler on the node, or route the output through a DelayNode.
I agree with Ehsan here. The only reason it is convenient to have a buffer size specifiable for a script processor node is to avoid audio glitching, since (as Ehsan pointed out) we can always code up any delays we req
I agree with Ehsan here. The only reason it is convenient to have a buffer size specifiable for a script processor node is to avoid audio glitching, since (as Ehsan pointed out) we can always code up any delays we require for an implementation in the onaudioprocess call. This need does not exist during offlien processing.
I accidentally hit "save changes" before I got a chance to read what I'd typed. Apologies for the duplication. Here it is, edited - -- I agree with Ehsan here. The only reason it is convenient to have a buffer size specifiable for a script processor node is to tweak it to avoid audio glitching, since (as Ehsan pointed out) we can always code up any delays we require for an application in the onaudioprocess callback. This is not useful for offline processing. --
(In reply to comment #5) > I accidentally hit "save changes" before I got a chance to read what I'd > typed. Apologies for the duplication. Here it is, edited - > > -- > I agree with Ehsan here. The only reason it is convenient to have a buffer > size specifiable for a script processor node is to tweak it to avoid audio > glitching, since (as Ehsan pointed out) we can always code up any delays we > require for an application in the onaudioprocess callback. This is not > useful for offline processing. > -- I'm just trying to get clarification and more detail on what you mean here, since I'm working on a prototype right now. Maybe this is already clear to you, but just wanted to make sure... In the general case, a ScriptProcessorNode has *both* input and output data and acts as a signal processor. In order to achieve zero latency, the buffer size has to be 128, the same as the rest of the native nodes. If this were not the case, then the .onaudioprocess handler could not be called until enough input samples (>128) were buffered (for the .inputBuffer attribute of the AudioProcessingEvent), thus introducing a latency.
(In reply to comment #6) > (In reply to comment #5) > > I accidentally hit "save changes" before I got a chance to read what I'd > > typed. Apologies for the duplication. Here it is, edited - > > > > -- > > I agree with Ehsan here. The only reason it is convenient to have a buffer > > size specifiable for a script processor node is to tweak it to avoid audio > > glitching, since (as Ehsan pointed out) we can always code up any delays we > > require for an application in the onaudioprocess callback. This is not > > useful for offline processing. > > -- > > I'm just trying to get clarification and more detail on what you mean here, > since I'm working on a prototype right now. Maybe this is already clear to > you, but just wanted to make sure... > > In the general case, a ScriptProcessorNode has *both* input and output data > and acts as a signal processor. In order to achieve zero latency, the > buffer size has to be 128, the same as the rest of the native nodes. If > this were not the case, then the .onaudioprocess handler could not be called > until enough input samples (>128) were buffered (for the .inputBuffer > attribute of the AudioProcessingEvent), thus introducing a latency. Understood. To restate in more concrete terms what I said, it would be acceptable, I think, if script processor nodes created on offline audio contexts supported *only* buffer durations of 128 and threw an exception for other values. It would also be acceptable if implementations simply ignore the requested buffer size in this case and always used 128. The notion of "latency" in this context may need some clarification and it is perhaps better to just translate this term as "delay in a signal path". If the script node's output is piped through some processing that gets fed back into the script node, then it won't (obviously) see the effect of its output until it actually generates it. There would therefore be a (minimum) delay of BUFFER_LENGTH samples before it gets to see its output signal on its input pin. In the absence of feedback, there need be no "latency" at all. Consider this simple graph: source ----> script-node --------> destination | ^ | | \----> some filter node ------------/ Suppose the script node has a buffer size of 1024 set (= 128*8). The script node will produce output only once every 8 blocks. If for any block the destination node does not receive audio on one or more of its inputs, it *could* buffer the others until it has some audio from all of its inputs before proceeding to mix them (partial mix downs would be an optimization). If this were the destination's behaviour, then there would be no *audible* signal delay in the script node's path and this graph would effectively behave as though its block size was set to 1024 and not 128 (except for k-rate animated parameters which will update more frequently for the native nodes). If every node used this "wait for all inputs before running" logic, then script nodes with buffer sizes greater than 128 need not impose a delay in their signal paths. However, such a script node now imposes a memory cost proportional to its buffer length (minus 128) on all multi-input nodes immediately dependent on its output (ex: a channel muxer will have to buffer all other inputs until it gets some from a script node connected to one of its inputs). A simple way to avoid such additional "invisible" memory costs and the additional buffering logic is to only support the native block size of 128 for script nodes.
We should also note that ScriptProcessorNode should preferrably not bombard the main thread with events. Smaller buffer sizes hurt with that, especially with OfflineAudioContext which promises to do the processing as fast as possible.
Another point I wondered about is whether the "back and forth" between the audio processing thread and the main thread to process a script node will force sub-realtime rendering for offline audio contexts in current browser architectures. Currently, it is not unreasonable to expect a delay of 4ms between event firing and callback invocation. That limits the number of calls that can be made to a script node to 250 calls per second. If a block size of 128 is used, that might limit the rate of generating audio samples to 32KHz. The longer the event->callback delay, the worse this gets.
(In reply to comment #9) > Another point I wondered about is whether the "back and forth" between the > audio processing thread and the main thread to process a script node will > force sub-realtime rendering for offline audio contexts in current browser > architectures. > > Currently, it is not unreasonable to expect a delay of 4ms between event > firing and callback invocation. That limits the number of calls that can be > made to a script node to 250 calls per second. If a block size of 128 is > used, that might limit the rate of generating audio samples to 32KHz. The > longer the event->callback delay, the worse this gets. Yes, it's true that running the ScriptProcessorNode at size 128 is a performance penalty, so it's probably best to not force the ScriptProcessorNode to process using a buffer size of 128. I was just saying that *if* we would like the ScriptProcessorNode to have zero in/out latency, then we'll need 128 buffer size. I"m fairly sure the dire prediction of delay of 4ms is not something we would normally see, based on my experience of the WebKit/Blink code. I haven't yet had a chance to determine the exact performance hit, but we have nice tracing features in Chrome to get quite a detailed picture. I'll try some experiments here...
The 4ms delay is probably with relation to setTimeout(), right? That is enforeced artificially by browsers in order to prevent web pages from spamming the main thread in exactly this way, it is not the actual delay of event delivery. In reality, the event delivery delays really depend on a lot of factors invisible to the web application, such as how busy the event loop is. But it should be *much* lower than 4ms under most circumstances.
(In reply to comment #11) > The 4ms delay is probably with relation to setTimeout(), right? That is > enforeced artificially by browsers in order to prevent web pages from > spamming the main thread in exactly this way, it is not the actual delay of > event delivery. In reality, the event delivery delays really depend on a > lot of factors invisible to the web application, such as how busy the event > loop is. But it should be *much* lower than 4ms under most circumstances. Not entirely. With the realtime script processor nodes, I've needed a buffer size of at least 1024 to avoid glitches due to UI events. Since that amounts to 23ms, I thought I was being rather optimistic in picking 4ms :)
(In reply to comment #12) > (In reply to comment #11) > > The 4ms delay is probably with relation to setTimeout(), right? That is > > enforeced artificially by browsers in order to prevent web pages from > > spamming the main thread in exactly this way, it is not the actual delay of > > event delivery. In reality, the event delivery delays really depend on a > > lot of factors invisible to the web application, such as how busy the event > > loop is. But it should be *much* lower than 4ms under most circumstances. > > Not entirely. With the realtime script processor nodes, I've needed a buffer > size of at least 1024 to avoid glitches due to UI events. Since that amounts > to 23ms, I thought I was being rather optimistic in picking 4ms :) There's a difference between a good "safe" buffer size for real-time operation such as 1024 or 2048 to avoid glitches, since that's based on worst case performance of any single .onaudioprocess invocation compared with a non-realtime (perhaps and hopefully often faster than real-time), we don't have to worry about "safe" sizes, and can do the synchronization as fast or slow as it needs to be. Although individual calls to .onaudioprocess may take longer, the average time it takes to hop from the real-time to the main thread will be much less than 4ms I believe. Hopefully, soon we can get a real-world setup going along with trace points in Chrome to play with.
(In reply to comment #13) > (In reply to comment #12) > > (In reply to comment #11) > > ... > > Not entirely. With the realtime script processor nodes, I've needed a buffer > > size of at least 1024 to avoid glitches due to UI events. Since that amounts > > to 23ms, I thought I was being rather optimistic in picking 4ms :) > > There's a difference between a good "safe" buffer size for real-time > operation such as 1024 or 2048 to avoid glitches, since that's based on > worst case performance of any single .onaudioprocess invocation > > compared with a non-realtime (perhaps and hopefully often faster than > real-time), we don't have to worry about "safe" sizes, and can do the > synchronization as fast or slow as it needs to be. Although individual > calls to .onaudioprocess may take longer, the average time it takes to hop > from the real-time to the main thread will be much less than 4ms I believe. Sure. Understood. I just did a quick-n-dirty estimate of what this rate can be. It looks like the average time can be 0.5ms in Chrome. Gist here - https://gist.github.com/srikumarks/6043702 Btw the offline processing doesn't need to be in a real-time thread and probably shouldn't be, since it is running "as fast as it can". It should perhaps be in the same class as the threads used by web workers. > Hopefully, soon we can get a real-world setup going along with trace points > in Chrome to play with. Looking forward to it!
(In reply to comment #7) > If every node used this "wait for all inputs before running" logic, then > script nodes with buffer sizes greater than 128 need not impose a delay in > their signal paths. I just realized a subtlety in this. If a script processor node's onaudioprocess reads computed values from AudioParams, then the perceived k-rate for those AudioParams will be determined by the block size set for the script node and not the fixed 128-sample-block in the spec. Not only that, it will look like a filter-type script node (with input and output) is prescient and anticipates animated AudioParams, because the the onaudioprocess will only get to run once enough input chunks have accumulated, meaning the values of some of these k-rate AudioParams could already have advanced to a time corresponding to the end of the script node's buffer duration. Again, using a buffer size of 128 would solve such rate discrepancies in reading AudioParams. In general, it is looking to me like the best way for script nodes to access AudioParams would be via some structure passed through the event instead of directly reading the AudioParam objects.
(In reply to comment #15) > (In reply to comment #7) > > If every node used this "wait for all inputs before running" logic, then > > script nodes with buffer sizes greater than 128 need not impose a delay in > > their signal paths. > > I just realized a subtlety in this. If a script processor node's > onaudioprocess reads computed values from AudioParams, then the perceived > k-rate for those AudioParams will be determined by the block size set for > the script node and not the fixed 128-sample-block in the spec. Not only > that, it will look like a filter-type script node (with input and output) is > prescient and anticipates animated AudioParams, because the the > onaudioprocess will only get to run once enough input chunks have > accumulated, meaning the values of some of these k-rate AudioParams could > already have advanced to a time corresponding to the end of the script > node's buffer duration. No, according to the spec the implementation must do 128-frame block processing all the time, which means that for example if we have 1024 frames to fill up for a ScriptProcessorNode, we need to call the block processing code 8 times, and each k-rate AudioParam will be sampled at the beginning of each block.
(In reply to comment #16) > (In reply to comment #15) > > (In reply to comment #7) > > > If every node used this "wait for all inputs before running" logic, then > > > script nodes with buffer sizes greater than 128 need not impose a delay in > > > their signal paths. > > > > I just realized a subtlety in this. If a script processor node's > > onaudioprocess reads computed values from AudioParams, then the perceived > > k-rate for those AudioParams will be determined by the block size set for > > the script node and not the fixed 128-sample-block in the spec. Not only > > that, it will look like a filter-type script node (with input and output) is > > prescient and anticipates animated AudioParams, because the the > > onaudioprocess will only get to run once enough input chunks have > > accumulated, meaning the values of some of these k-rate AudioParams could > > already have advanced to a time corresponding to the end of the script > > node's buffer duration. > > No, according to the spec the implementation must do 128-frame block > processing all the time, which means that for example if we have 1024 frames > to fill up for a ScriptProcessorNode, we need to call the block processing > code 8 times, and each k-rate AudioParam will be sampled at the beginning of > each block. That holds only for the native nodes, doesn't it? With the real-time context, script processor nodes with buffer sizes > 128 (which is all the time) already have a lower k-rate than the native nodes if they read computed values of AudioParams within their onaudioprocess callbacks. Anyway, to ensure that the k-rate is uniform at least during offline processing, it looks like the only way is to raise onaudioprocess events for each 128-sample-frame block. The event dispatcher better put up some performance :)
(In reply to comment #17) > (In reply to comment #16) > > (In reply to comment #15) > > > (In reply to comment #7) > > > > If every node used this "wait for all inputs before running" logic, then > > > > script nodes with buffer sizes greater than 128 need not impose a delay in > > > > their signal paths. > > > > > > I just realized a subtlety in this. If a script processor node's > > > onaudioprocess reads computed values from AudioParams, then the perceived > > > k-rate for those AudioParams will be determined by the block size set for > > > the script node and not the fixed 128-sample-block in the spec. Not only > > > that, it will look like a filter-type script node (with input and output) is > > > prescient and anticipates animated AudioParams, because the the > > > onaudioprocess will only get to run once enough input chunks have > > > accumulated, meaning the values of some of these k-rate AudioParams could > > > already have advanced to a time corresponding to the end of the script > > > node's buffer duration. > > > > No, according to the spec the implementation must do 128-frame block > > processing all the time, which means that for example if we have 1024 frames > > to fill up for a ScriptProcessorNode, we need to call the block processing > > code 8 times, and each k-rate AudioParam will be sampled at the beginning of > > each block. > > That holds only for the native nodes, doesn't it? No, that's true for all nodes. > With the real-time > context, script processor nodes with buffer sizes > 128 (which is all the > time) already have a lower k-rate than the native nodes if they read > computed values of AudioParams within their onaudioprocess callbacks. I'm not sure what you mean here. How do you "sample" the AudioParam value inside the audioprocess event handler? > Anyway, to ensure that the k-rate is uniform at least during offline > processing, it looks like the only way is to raise onaudioprocess events for > each 128-sample-frame block. The event dispatcher better put up some > performance :) Doing that violates the current spec, and I think would be a very bad idea.
(In reply to comment #18) > (In reply to comment #17) > > (In reply to comment #16) > > > (In reply to comment #15) > > > > (In reply to comment #7) > > > > > If every node used this "wait for all inputs before running" logic, then > > > > > script nodes with buffer sizes greater than 128 need not impose a delay in > > > > > their signal paths. > > > > > > > > I just realized a subtlety in this. If a script processor node's > > > > onaudioprocess reads computed values from AudioParams, then the perceived > > > > k-rate for those AudioParams will be determined by the block size set for > > > > the script node and not the fixed 128-sample-block in the spec. Not only > > > > that, it will look like a filter-type script node (with input and output) is > > > > prescient and anticipates animated AudioParams, because the the > > > > onaudioprocess will only get to run once enough input chunks have > > > > accumulated, meaning the values of some of these k-rate AudioParams could > > > > already have advanced to a time corresponding to the end of the script > > > > node's buffer duration. > > > > > > No, according to the spec the implementation must do 128-frame block > > > processing all the time, which means that for example if we have 1024 frames > > > to fill up for a ScriptProcessorNode, we need to call the block processing > > > code 8 times, and each k-rate AudioParam will be sampled at the beginning of > > > each block. > > > > That holds only for the native nodes, doesn't it? > > No, that's true for all nodes. > > > With the real-time > > context, script processor nodes with buffer sizes > 128 (which is all the > > time) already have a lower k-rate than the native nodes if they read > > computed values of AudioParams within their onaudioprocess callbacks. > > I'm not sure what you mean here. How do you "sample" the AudioParam value > inside the audioprocess event handler? > > > Anyway, to ensure that the k-rate is uniform at least during offline > > processing, it looks like the only way is to raise onaudioprocess events for > > each 128-sample-frame block. The event dispatcher better put up some > > performance :) > > Doing that violates the current spec, and I think would be a very bad idea. I agree that we shouldn't force it to run at 128. But I think we should change the current spec to allow for size 128, especially for the OfflineAudioContext case. Right now, the minimum size is 256, and we would like to get zero latency for this offline case. I've created an early prototype in Chrome which synchronizes the audio thread with the main thread, and runs at 128. I've found that the average time between onaudioprocess callbacks is around 50microseconds. I tried a really simple test case: AudioBufferSourceNode -> ScriptProcessorNode -> OfflineAudioContext.destination and processed during a time period of several minutes long. On a mid-range Mac pro I saw around 60x real-time performance.
(In reply to comment #19) > I agree that we shouldn't force it to run at 128. But I think we should > change the current spec to allow for size 128, especially for the > OfflineAudioContext case. Right now, the minimum size is 256, and we would > like to get zero latency for this offline case. Hmm, I know that roc really wanted the buffer size to be a choice that the UA makes not one that the author makes... But that aside, I still don't correctly understand why we have to fix the latency issue in the offline processing case by allowing a buffer size of 128. Let's say that on a given path from a source node in the graph to the destination node, we have N ScriptProcessorNodes, all with a buffer size of M (for simplicity's sake.) In this sitaution, these nodes are creating a latency of N*M frames. Can't we address this issue by disregarding the first N*M frames that the destination node observes? (I'm not 100% sure if this solution works, but I can't completely convince myself either way.) > I've created an early prototype in Chrome which synchronizes the audio > thread with the main thread, and runs at 128. I've found that the average > time between > onaudioprocess callbacks is around 50microseconds. I tried a really simple > test case: > > AudioBufferSourceNode -> ScriptProcessorNode -> > OfflineAudioContext.destination > > and processed during a time period of several minutes long. On a mid-range > Mac pro I saw around 60x real-time performance. Have you performed measurements on whether (and how much) that affects the responsiveness of the main thread? I'm worried that if the audioprocess event takes let's say 5ms to process on average, this may degrade the performance of the page if it uses requestAnimationFrame to render an animation, for example.
If we allow the UA to set the ScriptProcessorNode buffer size, we can potentially use larger ScriptProcessorNode buffer sizes. For example, if there's one ScriptProcessorNode in the graph and it's not in a cycle, we can process N blocks for all the nodes before the ScriptProcessorNode, run the ScriptProcessorNode with a buffer of N*128 samples, and then process N blocks for all the nodes after the ScriptProcessorNode. BTW, no matter how we resolve the block size issue, there's a quite severe problem we need to address: what happens if the ScriptProcessorNode's event handler modifies the audio graph. This isn't a big problem when the event handler is running asynchronously (the current situation), but it becomes a huge problem when we're effectively running it synchronously.
(In reply to comment #20) > (In reply to comment #19) > > I agree that we shouldn't force it to run at 128. But I think we should > > change the current spec to allow for size 128, especially for the > > OfflineAudioContext case. Right now, the minimum size is 256, and we would > > like to get zero latency for this offline case. > > Hmm, I know that roc really wanted the buffer size to be a choice that the > UA makes not one that the author makes... > > But that aside, I still don't correctly understand why we have to fix the > latency issue in the offline processing case by allowing a buffer size of > 128. Let's say that on a given path from a source node in the graph to the > destination node, we have N ScriptProcessorNodes, all with a buffer size of > M (for simplicity's sake.) In this sitaution, these nodes are creating a > latency of N*M frames. If we use a buffer size of 128, then the total latency would still be zero even with N ScriptProcessorNodes. I could actually create a test case which shows this. Can't we address this issue by disregarding the > first N*M frames that the destination node observes? (I'm not 100% sure if > this solution works, but I can't completely convince myself either way.) > > > I've created an early prototype in Chrome which synchronizes the audio > > thread with the main thread, and runs at 128. I've found that the average > > time between > > onaudioprocess callbacks is around 50microseconds. I tried a really simple > > test case: > > > > AudioBufferSourceNode -> ScriptProcessorNode -> > > OfflineAudioContext.destination > > > > and processed during a time period of several minutes long. On a mid-range > > Mac pro I saw around 60x real-time performance. > > Have you performed measurements on whether (and how much) that affects the > responsiveness of the main thread? I'm worried that if the audioprocess > event takes let's say 5ms to process on average, this may degrade the > performance of the page if it uses requestAnimationFrame to render an > animation, for example. Yes, I too was concerned about that possibly interfering with the main thread. But the way I've implemented it, the synchronization happens one at a time, and do not "pile up". I just created a test using requestAnimationFrame() to draw smoothly at (hopefully) 60fps, while the OfflineAudioContext is running and calling back frequently to the main thread. It seems to draw just fine, and I anticipate it would handle user events without a hitch, but haven't yet tested that...
(In reply to comment #21) > If we allow the UA to set the ScriptProcessorNode buffer size, we can > potentially use larger ScriptProcessorNode buffer sizes. For example, if > there's one ScriptProcessorNode in the graph and it's not in a cycle, we can > process N blocks for all the nodes before the ScriptProcessorNode, run the > ScriptProcessorNode with a buffer of N*128 samples, and then process N > blocks for all the nodes after the ScriptProcessorNode. > > BTW, no matter how we resolve the block size issue, there's a quite severe > problem we need to address: what happens if the ScriptProcessorNode's event > handler modifies the audio graph. This isn't a big problem when the event > handler is running asynchronously (the current situation), but it becomes a > huge problem when we're effectively running it synchronously. Yes, you're right there. One solution is to forbid graph manipulation in that case. We could still allow graph manipulation with OfflineAudioContext if we're allowed to call startRendering() more than once. Then we could be allowed to modify the graph each time the .oncomplete handler is called...
(In reply to comment #23) > Yes, you're right there. One solution is to forbid graph manipulation in > that case. That would be OK, but it will be quite a lot of work to specify and implement checks for every single API that must be blocked.
(In reply to comment #24) > (In reply to comment #23) > > Yes, you're right there. One solution is to forbid graph manipulation in > > that case. > > That would be OK, but it will be quite a lot of work to specify and > implement checks for every single API that must be blocked. Synchronous graph manipulation is, I'd say, a *must have* in the offline case. If I want to mix down a 5 minute composition with 8 tracks with 1 voice in each track triggered every second (not too taxing a scenario for the realtime composer case), that would need 2400 nodes to be pre-created ... and this doesn't count effects nodes. Permitting the graph to be changed within a script node would provide a way to instantiate these voices just in time. If that isn't possible, it would help to add a separate synchronous event that can be used for this purpose, that would also be useful for tracking progress of an offline context.
Web Audio API issues have been migrated to Github. See https://github.com/WebAudio/web-audio-api/issues
Closing. See https://github.com/WebAudio/web-audio-api/issues for up to date list of issues for the Web Audio API.