Bringing Music to the Web

Jacco van Ossenbruggen*
Anton Eliëns

Abstract:: To reduce the amount of resources needed for high quality music on the WWW, we advocate the use of client-side sound synthesis techniques. This paper discusses techniques extending the functionality of web browsers and describes the design of the hymne class library, which is used by the hush web widget to synthesize the sound of musical scripts embedded in HTML pages.
Keywords:: Music, software sound synthesis, client-side computation.

Introduction

Music can significantly enhance the perception of HTML pages, especially in a commercial or educational environment. At the moment, the web allows hyperlinks to audio files, and most browsers simply delegate the playing of the samples to external viewers.

However, due to the relative high costs of good quality audio, music is still a rare phenomenon on the web. High bandwidth networks combined with new compression techniques may decrease the amount of resources needed. Still, in the near future, audio on the web will be associated with high costs and long network delays. As a result, pages containing music are far less popular than they could be, among both information providers and users.

The DejaVu project [2,4] at the Vrije Universiteit takes a completely different approach in bringing music to the web. We propose to transmit musical scores (instead of the raw samples) across the Internet and to add sound synthesis functionality to web browsers.

Musical score files are usually a few orders of magnitude smaller, and the audio signal can be synthesized at the client side at any appropriate sample rate. Additionally, a high-level description of music provides the browser with far more information when compared to the raw samples. Future browsers, supporting dynamic documents, might need such information to provide high-level synchronization (e.g. "sync the start of the second scene of the video with the third measure of the intro tune"). Servers may use this information to answer specific queries (e.g. select all tunes in 3/4 meter and key C-minor).

Developing a new MIME [1] type describing musical scores and requiring browsers to support the new type may not be feasible until standards for musical description languages become generally accepted.

Client-side Computation

The web is currently moving its focus from server- to client-side computation. Browsers like Sun's HotJava and the hush web widget [8] are able to execute program scripts and display the results within an HTML page. The functionality of such browsers can be easily extended with sound synthesis functionality. The only requirement is that one can express commands to synthesize audio within the interpreted language. Thus, for browsers mentioned above, the problem boils down to extending the language involved (Java and Tcl [5] resp.) with some kind of sound synthesis functionality.

Sound Synthesis and hush

Traditionally, sound synthesis is performed by dedicated hardware such as digital signal processors. Many modern personal computers can use such hardware to play MIDI encoded music, either through an external synthesizer or a sound-card with a MIDI interface. As a result, extending browsers with MIDI support can be a relatively small effort. On Unix platforms however, MIDI support is less common. Fortunately, today's workstations are fast enough to make real-time software sound synthesis (SWSS) possible. SWSS does not need special hardware except for a digital to analog converter (DAC), found on every modern workstation.

Hush (hyper utility shell) [3] is a C++ class library providing a convenient yet flexible access to the Tcl/Tk [6] toolkit. Every hush application is in fact a full-fledged Tcl/Tk interpreter, and the hush library can be used to add new commands and widgets to Tcl/Tk since hush provides a type-secure solution for connecting Tcl and C++ code.

Hymne [9] is a C++ API to Csound [10], a SWSS package developed at MIT's Media Lab in the tradition of the Music V system. We have used hymne and hush to extend Tcl/Tk with commands to make the functionality of Csound available from Tcl scripts. The notation used to describe the music is called Scot [10], and is translated to the notation used by Csound. The Scot translator comes with the standard Csound distribution.

The hush web widget [7,8] is another extension to Tcl/Tk, offering a graphical WWW browser as an off-the-shelf component to hush programmers. The web widget allows the execution of inline scripts, by extending HTML with a new tag.

Since the hymne library provides sound synthesis commands from within these scripts, combining hymne with the web widget enables you to use inline, real-time synthesized music in your HTML page. All other Tk widgets may be used as well. In figure 1, we have used the scale widget to embed a tempo and volume scale inside an HTML page.

Figure 1: Some musical fragments in an HTML page

Score fragments may contain Tcl string variables, so one might change the tempo or even the key signature of a score by modifying the appropriate variables and replay the modified tune without the need to retransmit the score. Note that window events (like mouse clicks) operating on the widgets do generally not result in a request to the remote server either: they are handled by the widgets themselves.

In example 1, the hush tag is used to execute an Tcl command playing some arbitrary notes. The optional text between the hush begin and end tag will be ignored by the web widget. However, it will be displayed by generally available browsers like Netscape or Mosaic that do not support hush' features. It can be used to provide alternative information or warning messages for users who are not using the hush web widget.

Example 1: An HTML Fragment

<h1>Inline Music on the WWW</h1>
<p>
Each time this page is displayed, some music will be
played as well.
</p>
<hush tcl="play 8acea(b,ec'b)<(4.a-8a-)">
This text will not be displayed by the hush web widget.
Instead, it will play the notes above.
</hush>

Software Wrapping

The hush web widget is just one of the many applications which may use the hymne library to synthesize music. However, the hymne library was designed to provide an application programmers interface (API) to Csound which was flexible enough to be used in a hypermedia environment.

The standard API of Csound is not flexible enough to satisfy the needs of a real hypermedia system. To provide the desired flexibility, we have developed a software wrapper (See figure 2) with an object oriented interface around Csound.

Figure 2: Wrapper classes interfacing Csound and Scot

This wrapper allows the processing of musical events in the flexible manner required by a hypermedia system. The wrapper provides the necessary functionality to play arbitrary, real time generated fragments of musical scripts. Additionally, this interface makes it possible to have access to information about the way playing proceeds: how many notes have been played, which notes are being played at the very moment, how long it will take to play the rest of the notes, etc.

For the implementation of this wrapper it is, in principle, not necessary to modify the Csound program. Instead, the interface runs Csound in a special mode which continuously reads the input for incoming events and continuously fills the audio buffer.

A C++ object executes Csound in another process and provides streams to write events to the Csound process, and to read its output. An arbitrary fragment of a Csound script can be played by writing it on the input stream. The wrapper object analyzes the produced output messages, in order to provide the real-time information described above. Programmers can install their own handler object to use this information in application programs.

To be able to use the Scot language instead of the awkward note lists used by Csound, the Scot score translator is wrapped in the same way as the Csound program. Score fragments will be translated by the Scot translator and played by Csound. The Scot language is sufficiently powerful to denote most common note combinations (including chords, slurs, ties, triplets, etc), but the plain Csound note lists may be used as well.

The set of instruments used to play the notes is described in the orchestra file. Csound provides many operators which can be combined to define new instruments. Most instruments make use of stored wavetables to increase efficiency. In example 2, a simple instrument is defined (with an arbitrary instrument and wavetable number). Hymne applications may use a set of default instruments or switch dynamically to other orchestra files.

Higher level classes, derived from those described above, provide primitives to (re)play fragments starting at an arbitrary moment in time and to perform other useful operations upon these fragments.

The details of this sound synthesis process are hidden behind the class interface of the top level objects of the hymne library. Application programmers can use these objects to use the hymne library in their C++ code. The hymne library can be used without the hush environment, to extend other browsers or arbitrary C++ applications with sound synthesis functionality.

Obviously, hymne has also been fully integrated with the hush library. As a result, application programmers may use the new Tcl commands and access the functionality of the library from Tcl scripts.

Example 2: A Simple Instrument Definition

        instr   7                        ; Instrument # 7
        ivolume = 5000                   ; Const volume
        iftable = 5                      ; use function table 5
asignal oscil   ivolume, cpspch, iftable ; basic unit generator
        out     asingal                  ; output signal
        endin

Future Work

Currently, the hymne package employs pipes to communicate with the audio synthesis program. This part of the hymne library will be re-implemented using a client/server architecture. While this will hardly alter the programmers interface, it will result in a better performance because it will be possible to run the Csound (server) process and the application (web client) process on a different host. Additionally, this implementation will make it possible to run simultaneously different applications which are all using the hymne library. This is not possible at the moment, because the digital to analog converter is regarded as an exclusive device. In the C/S implementation, there will simply be many clients communicating with one server process, which can have the exclusive access to the audio device while active.

We have planned to develop an experimental MIME document type to support musical documents in a less ad-hoc fashion. Because of the textual format of music description languages, it should be possible to employ anchoring and link facilities within musical documents as well.

At the moment, the technique of software wrapping, as described in this paper, is used to wrap (already available) video decoding software, in order to extend hush with a video widget. The video widget is used by the web widget to allow for HTML pages with inline, interactive video as well.

Conclusions

Music can significantly enhance the perception of HTML pages. The main drawback of music on the web is the large amount of (server) resources needed to store and transfer raw audio samples of good quality. By employing client-side sound synthesis, only the musical scores need to be stored and transmitted.

We have implemented a WWW browser capable of executing Tcl scripts and extended the Tcl language with a flexible interface to an existing sound synthesis package. This interface has been used in other, non-Tcl environments as well. The browser itself is implemented as a Tk widget and can be used as a GUI component like the other Tk widgets.

Using the new browser, we can enrich our HTML pages with music, which will be generated at the client side at any desired sample rate.

Acknowledgments

Matthijs van Doorn designed and implemented the hush web widget, and gave us some helpful comments on earlier versions of this paper.

References

1. N. Borenstein and N. Freed, MIME (Multipurpose Internet Mail Extensions) Part One, September 1993, RFC-1521, obsoletes RFC-1341.

2. A. Eliëns, DejaVu--A Distributed Hypermedia Application Framework, Available via ftp or URL http://www.cs.vu.nl/~dejavu/papers/DejaVu.ps.gz, December 1992.

3. A. Eliëns, Hush: A C++ API for Tcl/Tk, The X Resource, (14):111--155, April 1995.

4. A. Eliëns, Principles of Object-Oriented Software Development, Addison-Wesley, 1995, ISBN 0-201-62444-3.

5. J.K. Ousterhout, Tcl: An Embeddable Command Language, USENIX, 1990.

6. J.K. Ousterhout, An X11 Toolkit Based on the Tcl Language, USENIX, 1991.

7. M.A.B. van Doorn and A. Eliëns, Integrating WWW and Applications, ERCIM/W4G--International Workshop on WWW Design Issues, Amsterdam, November 1994.

8. M.A.B. van Doorn and A. Eliëns, Integrating Applications and the World Wide Web, In Computer Networks and ISDN Systems, pages 1105--1110, April 1995, Proceedings of the Third International World-Wide Web Conference, April 10-14, Darmstadt, Germany.

9. J.R. van Ossenbruggen and A. Eliëns, Music in Time-based Hypermedia, ECHT'94, The European Conference on Hypermedia Technology, pages 224--270, September 1994.

10. Barry Vercoe, Csound, A Manual for the Audio Processing System and Supporting Programs with Tutorials, 1993, Available via ftp://cecelia.media.mit.edu/pub/Csound/Csound.man.ps.Z.

About the Authors

Jacco van Ossenbruggen [http://www.cs.vu.nl/~jrvosse/]
Faculty of Mathematics and Computer Science, Vrije Universiteit, de Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands.
Jacco van Ossenbruggen is a Ph.D. student at the Vrije Universiteit. His research interests include open hypermedia sytems, SGML/HyTime, object orientation and Pattern Languages.

Anton Eliëns [http://www.cs.vu.nl/~eliens/]
Faculty of Mathematics and Computer Science, Vrije Universiteit, de Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands.
Anton Eliëns is Associate Professor at the Computer Science Department of the Vrije Universiteit Amsterdam. He has recently written a book on the principles of object oriented software developement. His research interests include hypermedia, object orientation and distributed logic programming.