Audio Working Group Teleconference -- 25 Apr 2013

<olivier> trackbot, start meeting

<trackbot> Date: 25 April 2013

<padenot> Zakim: Mozilla is me

<padenot> Zakim: [Mozilla] is me

<scribe> scribenick: chrislowis

<padenot> not sure how this is supposed to work.

olivier: I'll scribe.
... :D

clarify the definition of various time-related values

<olivier> https://www.w3.org/Bugs/Public/show_bug.cgi?id=20698

olivier: Joe can you give us a quick summary?

joe: I've had some exchanges with ehsan that are not included in the bug link.
... for both gaming and music there is a need to be able to syncronise scheduled sounds with visual things that are happening.
... and potentially other things under the browsers control.
... what's become apparant is there is not a very clear relationship in the spec between the audio timeline and these events.
... for example:
... I thought audioContext.current_time was within some minimal interval that was perceptable by the user. This is not the case on android, where latency can be much higher.
... it also doesn't include synthesis time.
... all it says is that sample time increases monotonically, and it also doesn't specify that it advances in lock-step with "real" time.

olivier: I can't hear any noise.

joe: I don't know what the solution is or should be, but at the root of it is the need to synchronise.

crogers: that was a good summary.
... the current time starts at zero when the context is created.
... it's an arbitrary start point, in effect.

joe: what zero means is not so important.

crogers: that's right.
... audio processing happens in blocks of 128 sample frames at a time.
... so basically, current time advances everytime you process a block of 128 sample frames.
... so ~3ms at 44.1kHz

joe: that is important and should be stated.
... but we don't know that a sample frame will be calculated at specific intervals. How many blocks are being calculated could vary.

crogers: it's complicated ... it is very regular in some implementations and in others it's not. On the mac it's very regualar every 128 frames. On windows, it'll try to process in blocks of 480 samples: in real time if you graphed it out it would be funny looking, roughly every 10ms.
... on linux, current time updates only every roughly 50ms.

joe: but there's nothing that says that implementors have to observe this at the moment. The number of blocks calculated could vary dynamically.

ehsan: I think that not specifying the interval is an advantage from implementation point of view. These implementation concerns should not matter to application developers - they should just care about specifying an effect that happens at a precise point of time.
... I think hardware latency has to be considered separately.

crogers: I agree with ehsan. As long as you're able to specify when things happen, that's what matters. It would be nice if we could be more precise, but platform concerns make that difficult.

joe: I agree with both of you. The granularity is not important, and developers shouldn't have to care about it.
... but there's no way to tell what the offset is between creating an event and when it happens. And it may not be constant.

crogers: it would have to be constant to avoid changing the pitch?

joe: it could jitter back and forth in the short term ... ?
... but either way, I think the spec should say what is happening.

crogers: what you say is true, by the way current time updates it can wobble. If you have live audio input and you plug your guitar in, and you pluck a note and hear a certain latency, hopefully not very high, whatever that delay is, it will always be the same whenever you play a note. Does that make sense?

joe: perfectly. We need to say that current time is the time of the next block to be generated by the graph, because it doesn't say that anywhere.

ehsan: I'm not sure about that. Reading current time 2 times in a row will result in different values, which will be consistant. If you meant in the graph I agree.

joe: I did.

<olivier> https://www.w3.org/Bugs/Public/show_bug.cgi?id=20698#c19 <- ehsan's suggestion

ehsan: I suggested adding a read only property that will return a constant value which will give a platforms idea of how high the latency is.

joe: at the moment, I have a hard-coded constant which feels wrong.

crogers: so we could add a "presentation latency"?

joe: it needs to be the difference between audioContext.current_time and when you hear a sound coming out.

ehsan: I don't think we can implement what you want. There are latency factors that are out of control of the implementation.

joe: it's a serious problem for any game developer - if you have an explosion on the screen you need to know how to make the audio match that.

ehsan: but the latency depends on what will happen in the future? We don't know how soon a graph will send a sound.
... I don't have any objection to having a "presentation latency" which only takes into account things that are in the implementations control.

joe: that would satisfy me.

ehsan: On the web platform, we normally specify what an output should look like, and allow people to compete on quality of implementation.

gmandyam: To joe, we discussed this in terms of webrtc too, and we were talking about time references. One of the suggestions was to leverage an ntp time reference. If you're dealing with a shifting time reference it could be constantly dynamically adjusting. How accurate do you need this measurement?

crogers: I disagree. (Using the example above with a guitar), the latency will always be the same.

gmandyam: I'm asking what the tolerance is for this use case?

joe: I think 50ms is a reasonable tolerance...

gmandyam: ±50ms on the latency?

joe: I think so. For me, others might disagree.

crogers: how do you differentiate between current time and performance "now" time?

joe: yes, that's a whole other discussion ...

crogers: I don't know if we were to create another attribute "performance_time" which would sit alongside current_time, or to have a translation method. We could do either one, I suppose?

ehsan: I think having a method which takes a double in the context coordinate and returns a timestamp.

<gmandyam> Clarification to crogers: Taking a latency measurement with respect to a shifting clock will always result in inaccuracy. But it looks like the tolerance levels Joe is looking are large enough for this not to matter.

crogers: and the opposite?

ehsan: yes, why not.

crogers: ok.

gmandyam: can we also put an accuracy requirement on it? Or is that over-specifying it?

ehsan: I would prefer not to have that in the spec.
... 50ms might look acceptable today, but 5 years from now that might look really high.

olivier: Joe - could you document those 3 things?
... how latency works
... a way to query latency
... and the two methods to convert time to context time and vice versa.
... in BUG 20698 maybe?

<olivier> 20698

joe: sure, I'll put it in there and retitle if necessary.

"finished" event on AudioBufferSourceNode and OscillatorNode

ehsan: currently we have a finished method on AudioBufferSourceNode and OscillatorNode.
... the problem is the way it's currently specified it's possible for its value to change if you call it twice.

<olivier> Relevant Thread -> http://lists.w3.org/Archives/Public/public-audio/2013JanMar/thread.html#msg431

ehsan: the proposal is just add a finished event on the two nodes to signal to the context exactly when a node has finished its work.
... Are people happy for me to do that? And what should it be called?
... I'm happy to call it "ended" as has been suggested.

olivier: there were discussions on the list about which nodes this applies to. Was there any objection to this just applying to these two nodes?

<olivier> (AudioBufferSourceNode and OscilatorNode)

ehsan: I'd like to keep it applied to just those two nodes. Perhaps in the future we could extend it to other node times.
... the immediate need is to have something other than playback state on these two nodes.

gmandyam: we're trying to come up with a name in the mediacapture spec.
... I have no opinion one way or another, but suggest taking a look there.

ehsan: webrtc use "ended". I have no objection to that.

crogers: ehsan, is it your understanding that if you don't have any listeners in the node you wouldn't dispatch an event?

ehsan: I think that's best left to the implementation to decide.

crogers: that's what I'd hope you'd say!

ehsan: the amount of work we have to do in gecko to work out which listeners to notify is large.

crogers: the reason I bring this up: the main thread is different to the audio thread, so at least in webkit and blink we have to call on the main thread to make this happen.

ehsan: that's what we'd do on gecko too.

crogers: so if there are no event listeners to receive the event, then I don't want to call from one thread to the other, as it's a heavy operation.
... I agree with removing playback state and adding this event, but I haven't read through this thread fully yet. I just want to have a look and see if anyone is actually using this playback state in content out there before agreeing to removing it?

ehsan: do you have a good way of finding out?

crogers: I'd just check a few high profile sites.

AOB

ehsan: last night I found out that there is some documenation effort underway at mozilla to look at webaudio
... I'd encourage people to take a look, give feedback and even better to contribute if at all possible.

https://developer.mozilla.org/en-US/docs/Web_Audio_API

olivier: how would you feel about contributing this documentation to the web platform docs?

ehsan: I don't know if there's a simple way, but even if it means cut and paste don't see why not. I'll reach out to the team responsible.

olivier: I'll check with Doug to see what he thinks.
... could someone mention it on the developer community list?

ehsan: I'd just like to take a look over it first and review it before talking about it there.
... the other announcement I have is about testing.
... I finished ScriptProcessorNode the other week, and we've been playing around with using that for testing.
... I submitted a pull request to the repository.
... it's a simple test, but I'd like to see what other people think about it.

gmandyam: have you verified that in chrome too?

ehsan: yes.
... the biggest trick I used is to only test a single buffer.

crogers: one of the things we talked about is that I'm concerned that the latency might cause us problems using the ScriptProcessorNode. And the other thing is that if we don't have an offlineAudioContext is that the tests will run in realtime.

ehsan: I have nothing against testing with offlineAudioContext. The only problem is that it's not specced fully, so it's really hard at this point for me to implement it.

crogers: I've tried to improve that section a bit in the last couple of weeks, so if you could take a look over it, I'd really appreciate it.

joe: I posted a 6-point list of improvments - any thoughts?

crogers: I'll go back to that and see what I can get from that and see what I can do now.

joe: not sure if I captured all of ehsans concerns but hopefully it moves it on.

ehsan: I'm really sorry, I think this is one of the emails I've missed. I'll take a look at those things soon.

chrislowis: I'll look at rationalising the testing frameworks asap.

olivier: Ok, so I'll wrap up the call for now. A good candidate for the 9th May call is offlineAudioContext.

- DRAFT -

Audio Working Group Teleconference

25 Apr 2013

Attendees

Contents

clarify the definition of various time-related values

"finished" event on AudioBufferSourceNode and OscillatorNode

AOB

Summary of Action Items

Scribe.perl diagnostic output