This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Audio-ISSUE-85 (OscillatorFolding): Oscillator folding considerations [Web Audio API] http://www.w3.org/2011/audio/track/issues/85 Raised by: Philip Jägenstedt On product: Web Audio API It is not defined how the time-domain signal of an oscillator is generated. It would appear that the main reason for WaveTable being defined in the frequency domain is to allow for Nyquist-correct signal synthesis. For example, if the Oscillator frequency is 1000 Hz and the WaveTable has length 4096, the highest frequency component will be 4096 KHz, which could cause folding artifacts. Depending on how the time-domain signal is generated, the anti-aliasing performed would sound very different. For example, if the signal is generated in the naive way be looping over the output for each frequency component, one could simply stop before the Nyquist frequency. However, this approach could be very slow. If nothing is done to prevent folding, the purpose of having a frequency-domain WaveTable at all is questionable. Finally, should the built-in types (SINE, SQUARE, etc) also be generated using WaveTables internally and be subject to the same folding processing as custom WaveTables?
[admin] Assigning items currently being worked on by editor.
More detailed background added: https://dvcs.w3.org/hg/audio/rev/afb5ef123c50 Similar to how we do not specify specific anti-aliasing algorithms for lines circles in the Canvas 2D specification and the exact image resizing smoothing for <img>, I don't think we should specify the exact rendering here, but instead need to define the precise "ideal" rendering which an actual implementation should strive to achieve.
Over all, the new text is non-normative, except for the phrasing "care must be taken to discard (filter out) the high-frequency information". Here, it is said that something must be done, without specifying what must be done. At this point, I don't really have a preference for whether we should strive to have a common method for synthesizing sound, or allow for variations between implementations. However, I think it should be clear what the upper/lower quality bound is. For instance, if we disregard the anti-aliasing requirement, it would be possible for an implementation to simply do an inverse FFT of the wave table as a pre-processing step, and then do nearest neighbor interpolation into that time-domain signal without any anti-alising or interpolation efforts at all. Would that be acceptable?
(In reply to comment #3) > Over all, the new text is non-normative, except for the phrasing "care must be > taken to discard (filter out) the high-frequency information". Here, it is said > that something must be done, without specifying what must be done. > > At this point, I don't really have a preference for whether we should strive to > have a common method for synthesizing sound, or allow for variations between > implementations. However, I think it should be clear what the upper/lower > quality bound is. > > For instance, if we disregard the anti-aliasing requirement, it would be > possible for an implementation to simply do an inverse FFT of the wave table as > a pre-processing step, and then do nearest neighbor interpolation into that > time-domain signal without any anti-alising or interpolation efforts at all. > Would that be acceptable? From a purist perspective, I don't consider that an acceptable technique for synthesis of high-quality oscillators because it will generate considerable aliasing. But, I consider it ok for a basic implementation, especially if it's used as a performance optimization for low-end hardware. Once again, I'd make the analogy with drawing lines. It's "allowed" for a browser to draw jagged lines, but they might not look so great compared with nicely anti-aliased smooth lines. I'm happy to share implementation techniques for getting reasonably high-quality oscillators. In WebKit, the approach we're currently taking is a multi-table approach where we generate a dozen or so tables with successively filtered out partials, then index dynamically into the table appropriate for the instantaneous playback frequency. We have code to share, or we could discuss the general approach in more technical detail (without code). In any case, that would be an informative section if we added something to the spec there.
> But, I consider it ok for a basic implementation, especially if > it's used as a performance optimization for low-end hardware. My main concern here is the wording in the spec. As it is now, the only "must" in the text is used in conjunction with a sections that *seems* to be non-normative. If interpreted as a normative statement (which would currently be a correct interpretation of the spec), an implementation MUST filter out frequencies above the Nyquist frequency. Suggestion: Make the first part of section 2.23 non-normative (the text before "Both .frequency and .detune are a-rate parameters..."), and drop the "must" from the sentence "care must be taken to discard (filter out) the high-frequency information".
Web Audio API issues have been migrated to Github. See https://github.com/WebAudio/web-audio-api/issues
Closing. See https://github.com/WebAudio/web-audio-api/issues for up to date list of issues for the Web Audio API.