Audio WG -- 18 Oct 2011

<chris> anybody know the callin number and passcode?

<kinetik> Zakim: IPcaller.a is kinetik

<quinnirill> case sensitive :)

<kinetik> zakim: IPcaller.a is kinetik

<chris> phone number?

<tmichel> *Bridge US:* +1-617-761-6200 (Zakim)

<kinetik> thanks

<quinnirill> np

<chris> tmichel: thanks - and the code?

<tmichel> the passcode is 26631#

Al is talking about the different APIs we have and asking to open up discussion.

<chris> thanks :)

Jo suggests a useful starting point might be to ask a question about implementation.

<tmichel> chris has joined ?

<chris> yes

Jo suggests we need to consider the use cases - streaming and synthesis.

Chris Rogers joins and asks for a brief summary.

Jo asks whether a specific concrete implementation could serve the two different use cases.

Jo thinks the APIs can be made to coincide. But can an implementation work in both worlds.

Chris Rogers says, I recently posted some example code integrating the audio API with the streaming API / Web RTC implementation.

Chris R says there has to be some point of integration between these different APIs.

The public web rtc implementation looks good to Chris for setting up p2p connections and dealing with them at a high level, and he's given some examples in the last couple of days showing how the web audio API could work with it.

He thinks the implementation would be possible in the next couple of months in Chrome.

The distinction is the objects don't have to be the same.

Joe asks what the arguments are for for unification.

Chris Rogers says that a unified API would be difficult to develop the larger it gets (adding audio event scheduling to the p2p api for example)

Namespace collision, fragile base class problem and so on.

Al asks roc for his ideas.

roc says he's addressed quite a lot of issues by email, asks whether to recap.

Joe would like roc to talk a little about Chris's suggestions.

quinnirill: I think that's JoeB

<Alistair> that was JoeB

<quinnirill> yeah, just trying to identify who is echoing :)

roc says we don't yet have a concrete namespace collision problem.

<quinnirill> ty :)

roc can't remember an occasion when we've split objects on the web to make them simpler

roc, but it's hard to be sure of how things will evolve.

Al says it seems that we have different apis for different use cases

Audio API for games, synthesis and so on.

And RTC for streaming.

It seems to make sense for RTC to have some audio features, and it would make sense to see if we could share some core features.

Maybe a basic audio spec for mixing and panning that we could share, then pass the output of that to a seperate mixing level.

But that might lead to redundency

Chris is talking about Al's questions about controlling the audio of each tab.

For mixing and panning Chris says that's the bread-and-butter of the Audio API.

It's able to take and mix / change the volume and pan from a number of sources, including the web rtc apis/

Al is asking whether it would make sense to split the API into two differnent APIs, while recognising that that runs contrary to roc's opinion.

He says it would make sense to share functionality at some level.

JoeB is asking that if the mixing part of the API is simple, why can't it just be spliced in to the audio path at the point where it's needed

Al understands the argument, clarifies that the split might not be in code necessarily but a split in the standardisation effort.

Al has talked to musicians and they would like the synthesis/effects pinned down to the sample accuracy.

But because we have the push from RTC for communications in the browser, and could use things like compression and noise reduction, it seems like we could take the set of most-used features and put that in one deliverable

and the more complicated features a bit further out.

Chris Rogers(?) thinks that phasing the support levels of APIs in general is a great idea.

Chris doesn't think we need to standardise to the sample level when we role out, just as web graphics specs don't specify rendering down to the pixel level.

roc - authors don't generally care about graphics anti-aliasing in general, whereas musicians maybe do care as Al points out.

If we're producing things that sound the same we'll be ok, but if they're different then not.

Chris thinks we can produce something that sounds the same across implementations. Most things are specified to the sample level in the Web Audio API (except perhaps the compression stuff which is however following established principles. There's no agreed 'standard' for compressors)

Al thinks the point about starting out without full precision is good one.

Chris says that the impulse response of the convolution filters exactly defines how those effects work.

And mixing and panning, and filters can all be specified precisely in a mathematical sense

Al asks roc how far he's thought about rendering audio and what the signal path would be.

roc - wants to integrate the framework with media elements, that's what he's working on now.

Then he wants to test it with hundreds of streams at the same time and see what kind of performance he can get.

Al - that would be interesting to see.

Al asks roc to talk about pulling the audio api into the streaming api.

roc doesn't have a lot of experience with effects, and would like to see the effects made available in a common proposal.

roc's looking at a simple mixing effect at the moment.

Al asks Chris Rogers about copyright issues around his API

Chris says it's all open source.

Chris would like to see some more agreement with roc to take the Web Audio code as is and move it into the Gecko code base to integrate it there.

As an alternative to roc working to reimplement from scratch.

roc says most of the effort so far has been on syncronisation issues, blocking issues and so on.

And they'll remain issues even if we take a bridging approach.

Roc - does Web Audio have the ability to notice that streams have stopped?

Chris - is the issue around syncronising audio elements in the case of buffer underruns.

Roc - it's also about syncing filters and mixers and so on, so you don't filter silence while you're waiting for rebuffering.

Chris - if it's only going to pause for a second or so you can keep running the filter even with silence going through it.

Roc - but that might not work if there's an element that you're waiting for.

Al asks if we can add handlers for these things and let developers worry about it.

Roc - you have to handle it in real time which is very tricky, and is better handled by the browser.

Chris thinks that that kind of work needs to be done in the browser, is not sure about the syncing of effects though. Would like to discuss that further later.

Chris - we're talking about streaming html element streams, a kind of stream that can be blocked and we could work on the html media element apis to allow syncronisation of streams.

Roc mentions the media controller proposal which does some of that.

JoeB - by bridging the audio graph to the rtc api then the developer would have control of how granular they wanted to handle the blocking and syncronisation issues.

Chris agrees. In the audio context there is no "blocking" everything is a continuos stream, which may be silent for periods.

Chris thinks the syncing should be in the HTML element or the controller proposal (with which he's not so familiar)

<roc> http://www.whatwg.org/specs/web-apps/current-work/#mediacontroller

(the proposal roc mentioned at 20:51)

Joe is talking about treating the whole graph as something that blocks.

Chris proposes that if the blocking and syncing is handled externally, and if something is blocked it just inputs silence into the audio API graph, with no notion of transmitting blocking information to the audio api.

The audio api doesn't need to care about it.

Roc mentions that if you want audio to be in sync with video that's not what you want.

<roc> oops dropped off

Chris disagrees - it doesn't mean that we'd lose syncronisation, it would just process silence.

<roc> yeah I'm back

<tmichel> RRSagent help

<quinnirill> tmichel: publish minutes maybe?

tmichel: is that what you wanted to do?

<quinnirill> haha, that's an interesting command

Chris - imagine you have a video tag with transport controlls. The audio is going through a reverb. If the video freezes the audio stops but the tail of the reverb carries on.

Chris - when you hit play again, the audio would go back through the reverb.

<tmichel> right I wanted to make the minutes public. thanks.

If you didn't want the reverb tail to sound, you could use javascript to turn off the reverb at the point that "pause" was pressed.

tmichel: I'll publish them at the end of the call.

<tmichel> THey are already been published ...

tmichel: yeah, I think we need to add some metadata at the end.

roc so the latency of the filter isn't an issue?

chris - for the most part no, the effects built in the audio api don't have a latency.

chris sometimes with a delay you'd mix wet and dry, that's not latency as such - it's part of the effect.

Al closes the meeting for today. Let's keep the discussion going on the thread.

<scribe> Scribe: Chris Lowis

<scribe> ScribeNick: chrislo

Alistair: still there?

<Alistair> yes

<Alistair> chrislo: thanks so much for scribing, i really appreciated itr

- DRAFT -

Audio WG

18 Oct 2011

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output