Bugzilla – Bug 20698
Need a way to determine AudioContext time of currently audible signal
Last modified: 2013-04-26 20:11:57 UTC
If one needs to display a visual cursor in relationship to some onscreen representation of an audio timeline (e.g. a cursor on top of music notation or DAW clips) then knowing the real time coordinates for what is coming out of the speakers is essential.
However on any given implementation an AudioContext's currentTime may report a time that is somewhat ahead of the time of the actual audio signal emerging from the device, by a fixed amount. If a sound is scheduled (even very far in advance) to be played at time T, the sound will actually be played when AudioContext.currentTime = T + L where L is a fixed number.
On Jan 16, 2013, at 2:05 PM email@example.com wrote:
It's problematic to incorporate scheduling other real-time events (even knowing precisely "what time it is" from the drawing function) without a better understanding of the latency.
The idea we reached (I think Chris proposed it, but I can't honestly remember) was to have a performance.now()-reference clock time on AudioContext that would tell you when the AudioContext.currentTime was taken (or when that time will occur, if it's in the future); that would allow you to synchronize the two clocks. The more I've thought about it, the more I quite like this approach - having something like AudioContext.currentSystemTime in window.performance.now()-reference.
On Jan 16, 2013, at 3:18 PM, Chris Rogers <firstname.lastname@example.org> wrote:
the general idea is that the underlying different platforms/OSs can have very different latency characteristics, so I think you're looking for a way to query the system to know what it is. I think that something like AudioContext.presentationLatency is what we're looking for. Presentation latency is the time difference between when you tell an event to happen and the actual time when you hear it. So, for example, with source.start(0), you would hope to hear the sound right now, but in reality will hear it with some (hopefully) small delay. One example where this could be useful is if you're trying to synchronize a visual "playhead" to the actual audio being scheduled...
I believe the goal for any implementation should be to achieve as low a latency as possible, one which is on-par with desktop/native audio software on the same OS/hardware that the browser is run on. That said, as with other aspects of the web platform (page rendering speed, cache behavior, etc.) performance is something which is tuned (and hopefully improved) over time for each browser implementation and OS.
Note (Per discussion at Audio WG f2f 2013-03-26):
We need to differentiate the latency-discovery issue (already filed) from follow-on questions of audio clock drift and granularity which may not affect user experience to same degree
Can we clearly delineate? I'm not positive I understand what "latency discovery" is, because there's one bit of information (the average processing block size) that might be interesting, but I intended this issue to cover the explicit "I need to synchronize between the audio time clock and the performance clock at a reasonably high precision - that is, for example:
1) I want to be playing a looped sequence through Web Audio; when I get a timestamped MIDI message (or keypress, for that matter), I want to be able to record it and play that sequence back at the right time.
2) I want to be able to play back a sequence of combined MIDI messages and Web Audio, and have them synchronized to a sub-latency level (given the latency today on Linux and even Windows, this is a requirement). Even if my latency of Web Audio playback is 20ms, I should be able to pre-schedule MIDI and audio events to occur within a millisecond or so of each other.
Now, there's a level of planning for which knowing the "average latency" - related to processing block size, I imagine - would be interesting (I could use that to pick a latency in my scheduler, for example); but that's not the same thing. Perhaps these should be solved together, but I don't want the former to be dropped in favor of the latter.
This bug is intended to cover both the MIDI-synchronization cases that you proposed and also the original case I raised, which involved the placement of a visual cursor that is synchronized with audio that's being heard at the same time.
In the original visual case, the main thread needs to be able to determine the original "audio scheduling" time (in the context time frame used by start(), setValueAtTime(), etc.) for the audio signal presently emerging from the speaker. AudioContext.currentTime does not supply this time, as I explained in my original bug description.
I am not interested in the average latency or processing block size and agree that would be a different bug.
I believe we're talking about two sources of latency here, one is the clock drift between what we measure on the main thread through AudioContext.currentTime and the actual clock on the audio thread, and the other latency is between the "play" call from the audio thread to the point where the OS actually starts to hand off the buffer to the sound card (and another one of potentially a delay until your speakers start to play out what was received on the sound card.) With all of that, if the implementation also uses system level APIs which do not provide enough resolution (as is the case on Windows XP, for example), there is another artificial latency that is introduced in the calculations because of the unability to measure time precisely enough.
The use case of syncing the display of something on the screen with sound coming out of speakers is very hard to satisfy, since browsers generally do not provide any guarantee on when the updates resulting from a change in the DOM tree or a Web API call will be reflected on the screen. On an implementation which strives to provide a 60fps rendering, this delay can be as high as 16ms in the best case, and much more than that if the implementation is suffering from frame misses. So, no matter what API we provide here, there will _always_ be a delay involved in getting stuff on the screen.
For the MIDI use case, I imagine knowing the latest measured drift from the audio thread clock and what AudioContext.currentTime returns should be enough, right?
Ehsan, let me clarify the needs here with respect to the latency between the context's currentTime and the signal coming out of the sound card.
High accuracy for this use case is not needed. It's OK for screen updates to be slightly delayed for the purposes of seeing a cursor or pointer whose position over some sort of waveform or notated music reflects what one is hearing. These visual delays will not become really bothersome until they are consistently over 75ms or so. And typically the DOM-to-screen display delay is much, much lower (more like the 16ms number you gave).
On the other hand these delays can be dwarfed by the currentTime-to-sound-card latency on some platforms, which can be as high as 200 or 300 ms. Having the cursor be misplaced by that amount is an experience-killer. That's why it's so important for an application to be able to acquire this number from the API: it's potentially much larger.
(In reply to comment #5)
> Ehsan, let me clarify the needs here with respect to the latency between the
> context's currentTime and the signal coming out of the sound card.
> High accuracy for this use case is not needed. It's OK for screen updates to
> be slightly delayed for the purposes of seeing a cursor or pointer whose
> position over some sort of waveform or notated music reflects what one is
> hearing. These visual delays will not become really bothersome until they
> are consistently over 75ms or so. And typically the DOM-to-screen display
> delay is much, much lower (more like the 16ms number you gave).
> On the other hand these delays can be dwarfed by the
> currentTime-to-sound-card latency on some platforms, which can be as high as
> 200 or 300 ms. Having the cursor be misplaced by that amount is an
> experience-killer. That's why it's so important for an application to be
> able to acquire this number from the API: it's potentially much larger.
Yeah, I totally agree. But I'm not sure how that related to exposing the potential huge latency to web content. Ideally an implementation should minimize the latency as much as it can, and bring it well under the ranges that can be perceived by humans. Once such a latency is achieved, do you agree that exposing the latency information to web content would not be useful any more?
Based on my knowledge of various audio platforms I don't know if it is likely that implementations can always succeed in getting the latency down to the point where it doesn't matter.
I agree that in principle if it was always quite small, it wouldn't matter much, but I am concerned that this is not a realistic goal to sign up for.
(In reply to comment #7)
> Based on my knowledge of various audio platforms I don't know if it is
> likely that implementations can always succeed in getting the latency down
> to the point where it doesn't matter.
> I agree that in principle if it was always quite small, it wouldn't matter
> much, but I am concerned that this is not a realistic goal to sign up for.
Fair enough, but I think we should have examples on cases where this latency is unavoidably above the human perception range and the implementation can do nothing about it. I believe that if implementation avoid using imprecise OS clock facilities, there should not be a case where this can happen.
As one example, Android audio latency is high, and in the perceptible range that I have described. This is not due to imprecise clocks -- my understanding is that it is fixed delay inside the OS that is a consequence of internal handoffs of sample frame buffers. Far from being imprecise, this delay is rock-solid consistent (if it varied, there would be output glitches).
OK, that is a good example, but that is an example of the second class of latencies I gave in comment 4. Not sure how much can be done in order to report those latencies.
The only apparent alternative is to do what our app has to do now, namely check the user-agent string to see if the OS is Android, and impose a fixed hardcoded time correction to AudioContext.currentTime for the purposes of understanding what the user is currently hearing.
I think the idea of the application being able to know what is currently playing is pretty fundamental. But I don't want to belabor the point, knowing that so many other fundamentals also need to be implemented -- I just want to clarify why this matters.
I think it is better to split this issue to three seperated issues:
1) latency issue
2) time drift issue (currentTime and currentSystemTime)
3) granularity issue.
although all of these three issues are time releated, but they are much different.
it will bring some confusion if we combined these issues together.
Yeah that would probably make sense.
I would like to preserve this bug as capturing the latency issue, since that is why I originally filed it.
The concerns about granularity or clock drift do not seem serious enough to me for me to be effective at capturing those concerns in additional bugs.
(In reply to comment #14)
> I would like to preserve this bug as capturing the latency issue, since that
> is why I originally filed it.
> The concerns about granularity or clock drift do not seem serious enough to
> me for me to be effective at capturing those concerns in additional bugs.
I understand. as time drift is critical for some use scenaria we care, I would like to file another bug to track it if no objection.
Yes, please file another bug, no objections at all!
I'm thoroughly confused, as this bug is (based on its title) currently targeting the currentTime/currentSystemTime area.
Joe: latency of Android, etc may be quite consistent, but are you just concerned about average latency, or are you trying to synchronize live audio with something in the system time space? I thought it was the latter.
I've retitled the bug to try to more effectively communicate the nature of the issue. Yes, it is about average latency: the difference between AudioContext.currentTime and the original as-scheduled playback time for the signal that is currently being emitted from the audio hardware. Please see the very first comment in the bug for a description of the use case that I am trying to address.
If someone else wants to file a bug about how to correlate AudioContext time with other timebases in the browser I'm fine with that, but that isn't the problem that I'm concerned about.
Would a simple readonly constant which gives the UA's best approximation of the average latency on the given platform/hardware suffice for your needs?
Yes, Ehsan, that is exactly what I am asking for.
Gah. That's not what the current title asks for - average latency is a fine thing to want to know, but it doesn't address the precise synchronization need I'd mentioned in the email at the top of this message that would let authors synchronize MIDI and audio, or on-screen and audio. I think the errors would quite possibly be audible, for MIDI (because you can hear a < 16ms error, even if you can't see it), depending on how frequently JS code could be called with the same currentTime (related to block size? I'm not sure, given what Ehsan said about their processing mechanism, that you wouldn't be able to see visual sync errors with only average latency, if the block processing is >16.7ms on a slow system.).
I'd suggest a title of "Need to expose average latency of system", and then I'll go file the "Need to expose time stamp of currentTime" issue that is necessary for synchronization with MIDI. I'd actually rather have this bug represent that issue, given the long background thread, but I can link them.
Please feel free to change the title to something that will make you comfortable. :) I have nothing invested in the title, I'm just trying to do my best to interpret your feedback.
Capturing discussion from the WG conference call 4/25/2013:
- Need to document that currentTime represents the time at which the next sample block to be synthesized by the node graph will be played.
- Need to document that currentTime advances (for an non-offline context) roughly in real time, not just monotonically. In other words, the clock-time derivative of currentTime is approximately 1.
- Will introduce a new read-only attribute on AudioContext that exposes a "presentation latency" as described in the initial comment for this bug. This latency is not an absolute guarantee, it just includes all predictable latency contributions known to the implementation. On Android <= 4.0, for instance, this latency would expose the roughly .25 second delay imposed by the platform.
- AC will expose a function converting from currentTime units to high-resolution DOM performance time.
- AC will expose an inverse function converting from high-resolution DOM performance to currentTime units.
An additional note that was not discussed:
- Corollary: need to document (if this is true) that currentTime advances monotonically by a time quantum equal to the block size, as each block is synthesized. This is important because if true it implies that ScriptProcessingNode JS code may consult currentTime instead of event.playbackTime, and that the latter is no longer needed since it's always equal to currentTime.
I would like to suggest a different approach, which would solve both the latency and drift issues by adding 4 methods:
triggerTime() // TSC when audio transfers started, in ns
currentSystemTime() // current system time (TSC), in ns
currentRendererTime() // time reported by audio hardware (in ns), reset to zero when transfer starts
currentTime() // audio written or read to/from audio stack (in ns)-> same as today
With these 4 methods, an application can find the latency by looking at currentTime()-currentRendererTime(). If a specific implementation doesn't actually query the hardware time, then it can implement a fixed os/platform offset.
Now if you want to synchronize audio with another event, you have to monitor the audio/system time drift, which can be done by looking at (currentSystemTime()-triggerTime())/currentRendererTime()