Audio API Use Cases

From Audio Incubator
Revision as of 21:20, 9 August 2010 by Mgood (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Core Use Cases - v1.1

Chris Grigg has put together "Core Use Cases - v1.0" which consists of various use-cases-classes and their basic descriptions. This document is expected to shift somewhat: please feel free to offer comments, suggestions on the mailing list.


Core Use Case Classes

  1. Usability/Accessibility Speech
  2. UI Sounds
  3. Basic Media Functionality
  4. Interactive Audio Functionality
  5. Audio Production Basics
  6. Audio Effects II: Mastering
  7. Audio Effects III: Spatial Simulation
  8. Digital Sheet Music


Definitions used below:

  • "sound" means recorded audio, synthetic audio, or synthetic music content (sequence + instruments)
  • "spoken" mean speech content either as recorded audio or synthetic speech


Class 1. Usability/Accessibility Speech

These use cases address basic usability of web pages generally. I mention them since the XG Charter includes speech synthesis, however they may already be addressed in part by other W3C specs (for example CSS3 Speech, SSML, "Voice Browser" WG, Content Accessibility Guidelines, etc.).

  • Upon user click, mouseover, etc. (or as a preferences behavior):
    • Trigger spoken version of a web page's (or textual element's) text contents (for visually impaired users)
    • Trigger spoken help for a web page (or entry form)
  • On error:
    • Trigger spoken error message

Support for multiple alternate versions in different natural languages should be considered.


Class 2. UI Sounds

These use cases bring to web apps the basic UI aural feedback (AKA 'sonification') typical of native apps and games. They may already be addressed in part by the HTML5 event handling model.

  • Trigger one or more sounds when:
    • User clicks (hovers, etc.) any given visual element within a web page
    • User presses Tab key to move to the next visual element within a web page
    • User presses a key while in a text entry field
    • A visual element within a web page changes its own state (open, resize, move, transition, close, etc.)
    • A window changes state (open, resize, move, transition, close, etc.)
    • (Etc.)


Class 3. Basic Media Functionality

These use cases bring simple audio-for-visual-media behaviors to web applications. They may already be addressed in part by the HTML5 event handling model.

  • Automatically trigger one or more sounds:
    • In synch with an animated visual element (animated GIF, SVG, Timed text, etc.)
    • As continuous background soundtrack when opening a web page, window, or site
  • Connect user events in visual elements (click etc.) to:
    • sound element transport controls
      • (play/pause/stop/rewind/locate-to/etc.)
    • music synthesis events
      • (note-on/note-off/control-change/program-change/bank-load/etc.)
  • Upon user click, mouseover, etc. (or as a preferences behavior):
    • Trigger speech containing additional informational content not present in page text (or visual element)
      • (consider multiple alternate versions in different natural languages)


Class 4. Interactive Audio Functionality

These use cases support common user expectations of game audio, but can also improve the user experience for traditional web pages, sites, and apps. Interactive audio can be defined as (i) sound that changes based upon the current game/app state, and (ii) sound with enough built-in variation to reduce listener fatigue that would otherwise occur over the long timespans typical of many games.

  • Branching sounds (one-shot -- selection among multiple alternate versions)
  • Branching sounds (continuous -- selection among multiple alternate next segments)
  • Parametric controls (mapped to audio control parameters like gain, pan/position, pitch, processing, etc.)

Note: This functionality may require either defining new media type(s), or perhaps a change to the <audio> element semantics. In interactive audio, a sound is not the same as a single playable media file; typically a sound (or 'cue') is some kind of bag of pointers to multiple playable audio files, plus some selection logic and/or parameter mapping logic.


Class 5. Audio Production Basics

For sounds that are generated (or in some cases combined) in real time, these use cases support common listener expectations of well produced music and sound.

  • Mixing:
    • By default, multiple sources + effects combine to one output
    • By default, audio sources' effects sends combine to designated effects
    • <video> elements with audio output are treated as audio sources
      • (maybe submixes, but this is more advanced == less important)
  • Audio Effects I:
    • Reverb (studio production types, not physical space simulations)
    • Equalization
      • (maybe Delays, Chorus, etc. but this is more advanced == less important)

These effects may be usefully applied on a per-source basis, on an effects send basis, on a submix output, or on the master mix output.

Note: In many cases recorded music, sound effects, and speech will (or can be made to) incorporate their own production effects, and therefore will not need API support.

Note: We could stop after Class 5 and still support most game genres.


Class 6. Audio Effects II: Mastering

For sounds that are generated (or in some cases combined) in real time, these use cases support a higher level of listener expectations of well produced music and sound, and may also increase intelligibility generally.

  • Dynamics (compression, limiting)
  • Aural enhancement (exciters, etc.)

Mastering functionality is more advanced == less important than the above classes.

Note: In many cases recorded music, sound effects, and speech will (or can be made to) incorporate their own mastering effects, and therefore will not need API support.

Class 7. Audio Effects III: Spatial Simulation

For those users listening in stereo, 3D spatialization causes a sound source (or submix) to appear to come from a particular position in 3D space (direction & distance). This functionality can be useful in some game types where 3D position changes in real time, and in teleconferencing where spatializing each speaker to a different static position can help in discriminating who is speaking. Environmental reverb provides clues as to the size and character of the enclosing space (room/forest/cavern, etc.), supporting a more immersive gaming experience.

  • 3D spatialization
  • Environment simulation reverb

Spatial simulation is more advanced == less important than the above classes.

Class 8. Digital Sheet Music

Audio is one way to represent music in computer format. Another way is through a symbolic representation of music. Symbolic representations directly represent musical concepts relevant to performers, such as pitches, rhythms, and lyrics. Music notation, or sheet music, is used by performers of classical music, film music, and other musical repertoires.

Existing solutions for displaying and playing digital sheet music in the browser (e.g. Sibelius Scorch, MusicNotes, FreeHand Solero, Legato, Noteflight, Myriad Music Plug-in) use either browser-specific plug-ins or Flash. Digital sheet music developers are looking for more browser-standard approaches for displaying and playing sheet music on the widest possible variety of devices. In particular, mass-market tablet devices have the potential to serve as electronic music stands, displacing the user of paper sheet music over time.

The MusicXML format has established itself as the de-facto standard format for music notation, supported by over 125 different notation programs. The format is available under a royalty-free license modeled on the W3C license. Display and playback of the MusicXML format would make this feature useful to a wide variety of applications and developers. Extensibility so that individual vendors could add support for proprietary formats may also be desirable.