F2F Jan 2012

From Audio WG Wiki

The Audio Working Group held its second face-to-face meeting in California (Google LA offices) at the end of January 2012.

Attendees were Olivier Thereaux (Chair, BBC), Thierry Michel (W3C, staff contact), Doug Schepers (W3C, staff contact), Joe Berkovitz (Noteflight), Chris Rogers (Google), Chris Lowis (BBC). Alistair McDonald (Co-Chair) joined by phone for part of the meeting.


For the record, here was the proposed agenda, which was refined during the meeting.

  • DAY 1 - 23 Jan: Use cases and requirements document (going through UC and figuring out list of requirements)
  • DAY 2 - 24 Jan:
    • Morning: demos and reviews of current specs and implementations, start discussion on prioritization
    • Afternoon: Discussion on minimum viable spec
    • Prioritise the Requirements / Scope (set some aside for a v.next)
    • roadmap to next pub


Goals for the meeting

  • Olivier: wants to have UC&R document in good shape, and have a sense of the scope for the “Minimal Viable spec”
  • Joe: ditto, plus hopes to be able to evaluate the API on their merits
  • Thierry: also would like to think how core features will be implemented
  • ChrisR: dealing with two specs. Audio API has had a lot of scrutiny, now being shipped on 3 platforms and being used. Worried about time it will take. Would like perhaps to go through ROC’s blog post?
  • ChrisL: maybe we can over the next couple of days look at the spec proposals against the use cases, figure how we can get input from developers

Use Cases and Requirements

10:45am - we started working through the use cases capturing additions and notes as we go.

ACTION ON: each UC editor : add these to the use case document after/during the meeting.

Use Case 1: Video Chat


  • Additions
    • volume detection / visualisation (signal level meter that goes “into the red” when too loud)
    • there is an interface to surface each of the participants audio to the environment of this API

Clarification to Note 1: This is a shared use case between us and WebRTC.

ACTION ON: Doug - add a separate use case for the record and playback of audio.

Extra requirements. We talked about “expander” effects, specifically the noise gates: http://en.wikipedia.org/wiki/Noise_gate which would be used in the Video Chat use case.

Panning / spatialization

  • two types of spatialization (equal power/level panning and binaural HRTF-based spatialization. We also discussed spatialization with 5.1 surround-sound systems)

Discussing the “voice effects” part of use case we settled on the requirement to apply arbitrary manipulation of channels of audio (using javascript) - rather than capture requirements here for the exact nature of the effects.

Doug and Chris introduced the concept of “shaders” in graphics - a language for performing arbitrary transforms. Chris believes that the audio “shaders” language is Javascript, compared to other custom languages for processing audio (SAOL, csounds, ChucK, SuperCollider etc.). In audio there is no such thing as an commonly accepted extension language, and also there are security implications to executing arbitrary code in the browser.

Use Case 2: HTML5 game with audio effects, music

13.00 - started working through UC2.

ACTION ON: Olivier - add a section where the character hears the clock (muffled) from another room.

We discussed the need to start and stop sounds as well as control parameters at precise, specified points in time. The idea of cross-fading between two different background audio tracks and a particular point in time.

Chris Rogers mentioned Plink as an example of generative music.

Joe suggests adding “ability to simulate an acoustical environment”. Chris mentions other audio game libraries (e.g. FMOD Ex / OpenAL) have this ability.

ACTION ON: Olivier - add a section when the snake turns towards the user and the sound change with the direction of the snake’s head (directivity of the source/”sound cones”)

ACTION ON: Oliver - add something that captures Doppler shift effects (e.g. police siren moving towards the user).

Use Case 3: Online Music Production Tool

13:37 - reading through Chris’s use case

Chris points out that this kind of tool takes many, many person-years of development. Even if the APIs provide the basic building blocks, a full-featured DAW application could well be a long way off. He also mentions SoundCloud’s experiments (https://github.com/georgi/soundcutter)

We captured an overarching set of requirements that came out at this stage (given that this use case covers a lot of ground).

  • Low latency
  • Modularity
  • Graceful degradation (of an application built with v2 of the API running on a browser implementing v1)
  • Flexible signal routing
  • Being able to change/modulate parameters of processing functions
  • Arbitrary processing encapsulated as modules
  • a library of common processing tasks is provided “out of the box”
  • level detection on a signal
  • filters
  • amplification/attenuation
  • non-blocking playback of any given audio source in arbitrary overlapping instances
  • spatialization, reverb
  • sample-accurate scheduling
  • buffering/recording

Use Case 4: Online Radio Broadcast



levelers (compressors with a long time constant)

Metadata should be left untouched by the processing API, but is ultimately the responsibility of the codec.

ACTION ON: Thierry - Cross-reference the Meta-data API.

ACTION ON: ChrisL to copy relevant requirements from UC1 to UC4 (and add ducking and levelling requirements)

Use Case 6: Synthesis of a virtual instrument



Joe takes us through this use case. Mentions that the wavetable synthesis approach that he describes has some similarities to the Sound Font and DLS specification.

Rename: Wave table synthesis of a virtual music instrument

ChrisL wonders whether to split into 6a sequencing, 6b wavetable synthesis, 6c FM synthesis etc. Joe agrees and notes the requirements are already separated.

ChrisL asks whether wavetable synthesis could be provided as a function in the APIs. ChrisR and Joe have discussed this in the past and concluded that it would “bake in” one particular implementation of a wavetable synth. ChrisR prefers a modular approach so the developer can chose, for example, the order of filter to use in the wavetable synth implementation.

Discussion on content protection around virtual instruments. We need to capture some requirements around this, probably from a party who has experience of this in the non-web world.

ACTION ON: Joe to think about gathering requirements (from people in the softsynth industry) around content protection on virtual instrument definition. Talk to Olivier to coordinate with the TV interest group.

Doug is also concerned about the bandwidth requirements for downloading large wavetables. ChrisR mentions that compression helps with the download size. Caching also helps, browser cache, file API. ChrisR points out that the Web Audio API has a node for “decompressing” data from, e.g. the file API, before it is used in other nodes.

Use Case 5: Writing Music on the Web

17:43 http://www.w3.org/2011/audio/wiki/Use_Cases_and_Requirements#UC_5:_writing_music_on_the_web

Chris Rogers’ presentation & demos

Note: we need to talk about documenting how our work interacts with <audio> element.

Olivier notes the question of making it work on TV (weak processing capabilities, but a need for the API to work).

Discussion on performance approaches.

ACTION ON: CRogers to add web workers capability to Web Audio API

Discussion on next steps for publication.

Next in the pipeline:

  • bug fix in audioPanner node
  • extra attribute added to ConvolverNode
  • other small additions

ACTION ON: CRogers to prepare review of changes to Web Audio API spec by next teleconference

Use Cases and Requirements (cont’d)

UC 8: UI/DOM Sounds


UC 9: Audio Speed

12:00 http://www.w3.org/2011/audio/wiki/Use_Cases_and_Requirements#UC_9:_Audio_Speed

Discussing renaming the use cases to not be about the feature, but more about the scenario (language learning, podcast playing, DJ music, etc)

ACTION ON: Olivier to reorganise the video editing tool UCs together

CRogers notes that html5 audio element has a way to speed up without changing pitch (webkit-preserves-pitch ? Also in moz?)

Note: UC 9-c : JS library - https://github.com/janesconference/Voron/blob/master/voron.js

UC 7: Audio / Music Visualization



CRogers mentions this should be usable for audio production work.

Categorizing the Requirements

The group categorizes the list of requirements extracted from all the use cases into one document. The top categories are Sources of audio, Playing sources of audio, Transformations of sources of audio, Source Combination and Interaction,

Joe suggests a table which would have, for each requirement, whether they are:

  • important
  • real-time
  • automatable

Use Case Heap

ACTION ON: CLowis to look through temp heap of use cases, extract anything that hasn’t been covered yet

ACTION ON: Olivier to send requirement doc examples to CLowis