The digital video album

On the merging of media types in multimedia

By Gunnar Liestøl

The computer as a medium for other media
Isolated resources and distributed environments
Hypertext and hypermedia
Media types and linearity
A digital video album
The digital revolution in multimedia technology has only just begun. At present we know the basics of what the technology can do. Applications including these new features have been tested in isolated, controlled environments; but the real implementation in the world of mass users still lies ahead. The purpose of this paper is to look briefly at interrelationships between the fundamental qualities of digital media technology and the problems and possibilities it creates for semantics and conventions in the future genres of multimedia. With the concept of integration and its relationship to both distributed and isolated multimedia messages as a key starting point, the paper will touch upon the history of hypertext and hypermedia; the linearities of various media types and the idea of a digital video album. A last section will focus on some relevant features in a specific multimedia production.

The computer as a medium for other media

The development of information and media technology accelerates so fast that only the most dedicated and privileged are able to keep up with the process. The phenomenon multimedia is a field only vaguely defined. In this coreless territory perhaps the most unifying criterion is 'newness': the fact that changes involved in this technology are mostly innovations; multimedia in its basics is new media technology. Soon we will certainly experience a more defined subject matter as institutionalisation in the field continues. For the time being the idea of integration may give us a valuable perspective. The Latin word integer means 'unharmed', 'whole', 'undiminished'. The verb integrare means 'to complete'; integratio signifies 'innovation', 'restoring an ideal order'. In sociology integration signifies the social relations uniting people in communities as parts of a whole as well as the condition of this wholeness. In the history of the electronic computer we experience integration at both the social, technological and informatic levels.

Ever since its emergence in electronic-digital form, the computer has been a meeting place of some kind. Like an agora, computer technology and environments have, in their different stages of development, constituted a place where persons, disciplines, media and information have converged and interacted. Starting out as enormous mainframe computing machines performing arithmetical tasks far beyond individual, manual achievements, the computer has developed into a small, relatively cheap but powerful tool marketed to millions of individual users.

Today, not only numbers, but words, drawings, pictures, sound, and video may be generated, processed and distributed by means of personal computers. The computer as we know it today has developed into a powerful and complex medium activating and forcing the use of multiple senses and capable of receiving, manipulating and transmitting all the traditional media types. Different media developed in distinct but overlapping and related institutions and technologies - such as painting, writing, print, photography, telephone, radio, film, television and video - now merge as digitally coded information in the hardware and software of microcomputers. Connected in global networks of end-users, the computer medium has constituted itself as a true super- or multimedium - a medium for other media.

Adding the digital machine's own capabilities of interaction and random access to vast amounts of information, this multimedia vehicle now represents the most technological advanced and complex medium in the joint history of human communication and information technology. Grounded on this infrastructure, we experience the emergence of a postprint, electronic-digital public sphere of only vaguely known structure, function and consequence.

Isolated resources and distributed environments

In hypermedia a distinction has been made between to kinds of system types: resources and environments (1). A resource is a large collection of read-only texts, for instance plain reference works like encyclopaedias or databases. With resources users may read, view, and copy the stored information. However, the information on the disc cannot be expanded as new material can only be added in the next edition. The hypermedia environment, on the other hand, is not constrained by read-only technology. It is an open and individual system where readers and writers share the same electronic environment and users may contribute with their own texts and link them in various ways to the documents already in the system. The hypermedia environment then becomes an ever growing and changing body of interconnected electronic texts. In these relations the idea of context gains new significance - texts no longer appear isolated and neither are they displaced from relevant and related contexts.

This relationship can be seen as parallel to the distinction between systems 'published' on physically unchangeable, isolated media, such as CD-ROMs, and, on the other hand, distributed systems based on electronic networks. This difference is important and will generate various conventions for composing multimedia messages. But although a fully developed network may have all advantages over publications of physical, unchangeable entities, the latter may still have a long life. Two thousand years of manuscript and book technology (followed by the success of video and audio cassettes) has instituted a market of producers and consumers where messages as goods are intimately related to physical entities. Despite the technical possibility of fully developed on-line networks, I believe it will take a considerable amount of time before the current mode of production will allow a radical redefinition of the individual message as a product and object. If the information commodity, in the form of electronic multimedia texts should only be available on-line, the text will loose its appearance as a concrete sensible object unambiguously pointing at its origin the author-producer. Since our western economic system seems to outlast every challenge from non-market economies, it is likely that the reification of information and knowledge existing in the age of the printed book will survive the transference to electronic, interactive media and thus for a long time favour the publication of units and genres compatible with this tradition.

The opposition between distributed and isolated multimedia messages is evident and will probably become more significant in the future. Though for the time being many of the problems we are dealing with in multimedia production, for instance the dynamic integration of media types and the establishing of meaning-effective conventions, are problems that need to be solved whether the messages are distributed over a network or published on a compact disc.

Hypertext and hypermedia

The terms hypertext and hypermedia were first presented in an essay by Theodor Holm Nelson published in 1965:

These paragraphs introduce for the first time the terms and concepts 'hypertext' and 'hypermedia' and define them relatively close to the way they have later become known and implemented: as practices of interactive production, exchange and consumption of multiple media information by means of digital computers. Nelson's exposition of hypertext displays a double dimension. First, the verbal definition by means of concepts, but at the same time a demonstration. By attaching the five asterix signs '*****' to the (first use of the) word 'hypertext' and to the corresponding footnote the reader is asked to transcend the linear order of the main text and jump to another textual level, to be found at the bottom of the page, for further explanations of the meaning of the prefix 'hyper-'. Nelson's definition then is not only based on the linguistic information but at the same time activates and implies the actual action it seeks to describe. By de facto being forced to move over the main text when reading from 'hypertext' to 'The sense of hyper-' - by following the rules and conventions of the footnote paradigm of the textual organisation - the reader accomplishes a double exposition, the verbal description and in addition the action of following a link. Thus, this piece of text and the rules that govern its use become a self-reflexive structure that does what it says and shows what it tells. This feature points to the potential of indirect communication in hypertext and hypermedia, where the shapes of electronic texts gain semiotic significance.

One might object, then, that hypertext is basically a paper based textual form implemented in book technology through features such as footnotes, references, allusions, etc. Although hypertext frequently has been described and explained as 'the generalised footnote', the second footnote in our quote shows the limitations of textual interconnections in paper-based technology compared to the qualities of hypertext. The second footnote is a reference to something that is not there, unavailable to the reader, an elaboration which exists 'elsewhere', outside the present paper.

Nelson uses hypertext and hypermedia at distinct grammatical levels. Hypertext, hyperfile, and hyperfilm are mentioned as examples of hypermedia. Hypermedia then becomes a generic term and not the name of a kind of hypermedium itself. The 'hyper-' prefix denoting non-linear organisation of either text, pictures, sound, or video, is here thought of as a structure implemented in the separate media, not a device for linking across platforms. Later, Ted Nelson has adjusted his point of view and according to what is now common usage let 'hypermedia' be compatible with 'hypermedia document', 'hypermedia production', etc. Today, hypertext and hypermedia are being used as synonyms since 'text' is applied as a metaphor for all kinds of media types not just verbal text.

Before Nelson, the now classic paper by Bush (3), and Engelbart (4) were conceiving and implementing similar ideas - Bush in an environment of analogue mechanics before and after the 2nd World War and Engelbart as the first to radically explore the potential of the digital computer. Within the frames of Engelbart's strategic project on 'augmenting the human intellect', most of the features on any modern microcomputer or workstation were developed: text editing, windows, the mouse, hypertext, outlining, groupware and e-mail. For both Bush and Engelbart as well as the later Nelson, the integration of media types was a major concern. Engelbart, for instance, demonstrated combinations of text, graphics and video on networked workstations as early as twenty-five years ago.

During the 1980s there was much talk about the future of multimedia and hypermedia, when only the supporting technological environments finally would be in place. During this period interesting projects based on combinations of analogue and digital equipment where developed, projects that also led to the establishment of conventions (5). In the early 1990s the technology gave sufficient support for full integration of audio and video on any medium range personal computer, especially the Apple Macintosh. The status of hypermedia today is not exactly what one would have imagined five to ten years ago. The all-digital environment is here, but real attempts of innovative integration of the various media types have still not shown up. Instead, one might detect a splitting in academic, lingocentric hypertext as one extreme and image dominated computer games as the other.

Media types and linearity

Ever since its introduction one of the basic characteristics of hypertext and hypermedia has been 'non-linearity'. The motivation behind non-sequential organisation of information has been to set authors and readers free from the linear slavery contained in written presentation based in paper and book technology. What does non-linear signify? Interacting with a hypermedia or multimedia document, users may choose their own path through the information to be consumed, thus constituting their own messages. The author-producer has not chosen one particular sequence of reading guaranteed by pages, chapters, and so on. We may say that the message presents itself with a non-linear structure, or without a primary sequential order. Does this imply that the whole hypermedia institution, the totality of production, text and consumption, and the rules that govern their use, on all levels display a non-linear structure? That would be an uncritical statement. Every conception of succession and order is in principle linear since it takes place in time. Reading and writing presuppose temporal and linear organisation and the stringent following of rules: the succession of letters constituting words, the succession of words constituting sentences, the succession of sentences constituting paragraphs, and so on. At one point in this structure it makes sense to talk about leaving a strict linearity, however, it is never totally absent, due to the temporal succession always present when generating and consuming information. The negating terms non-linear and non-sequential should then be substituted by the suggested multi-linear and multi-sequential (6).

This focusing on the problems of linearity gives us a perspective on the relationships between the various types of media. In the context of this essay, we may focus on four main media types: image, verbal text, video (moving images), and audio. In the perception and consumption of visual images there are no fundamental rules of linearity we must follow when exploring the image; which succession we go through to combine the elements into a whole, an image. Our gaze may travel relatively freely over the surface. Of course, there are limitations and structures which generate preferred readings of images, like foreground and background, constraints of light and shade, etc., but in general we might say that this is a succession or a linearity based on the subject's actions, a subjective linearity.

With written text it is quite different: if we are to generate the information and meaning invested in a written text at all, we have to follow the rules for reading written language. Vi must add letter to letter, words to words, sentences to sentences, etc. The rules of succession constituting writing are founded in language as an inter-subjective praxis among social human beings. We may then say that with the media type verbal text we are facing an inter-subjectively founded linearity. At the micro level this linearity is necessarily very constrained, but the bigger the chunks of texts the less constrained and the less linear, and the more they may be consumed in multi-successive sequences. We do not read all texts from beginning to end, but feel free to jump back and forth, depending on our needs and interests. Writing at the macro level then, allows for a certain degree of interactivity as in encyclopaedias and newspapers.

With video (including television and film) we face a machinery which is presupposed before the media type starts to exist. The apparatus has to display a certain rate of frames, and the spectator must follow this flow of moving images, conditioned by time. A sequence can normally not be stopped; neither is it normal to choose the speed of consumption. The playback speed should be identical to the speed of recording. We may shut the apparatus down, but then this media type disappears as well. We may say that video as a media type is conditioned by an object founded kind of linearity.

Sound as a continuum is also dependent on an external source for organisation and existence. While image, text and video are visually based information, sound is aural. Audio information does not have the same potential of being selected, with sound space being reduced to the instant now, with a before and an after. Sound is thus an extreme kind of media type, and its linearity is a temporal linearity. Space may in audio occur as parallels of accords and harmony. These four media types find their places on an axis with the still image as the extreme space based exponent and with audio on the temporal side. Text and video are mixtures in between (see figure 1). Interactivity and the potential of selection are best when there is a space dominant modus supporting the subjective kind of linearity.

Integration of media types is not new in the history of media and communication. Writing and print technology have combined images and verbal text; and the silent movie changed from one configuration of multimedia (moving images + verbal text) to another (moving images + sound) when the talking movie was introduced. But these examples of integration are fundamentally different from what we might expect of computer based, digital multimedia. In literature, text usually dominates the image. The verbal text gives the ambiguous and meaning-rich environment of images direction and precision, rules on how to understand and use them. In the silent movie era text was subsumed under the speed and rhythm of the actions depicted in the film. Also, subtitles and text on television are always presented on the premises of moving images and their temporal linearity. In digital multimedia these relationships of repression and dominance do not have to exist, since the computer has the capacity to include and maintain the founding premises of each individual media type in integrated documents. The problem, however, is then to create a dynamic integration of the different sign types and still conserve their unique individual qualities without tending to dominance and subsumption. How can video and text be combined in a smooth way without the one dominating the other; or to put it another way: how can video be given qualities originally belonging to verbal text like a larger degree of selection and interactivity; and can text benefit from the video mode of linearity? The following discussion - related to the idea of a video album concretised in the production of a specific multimedia system - is a tentative attempt to provide some information about these problems.

A digital video album

In ancient Rome the word album (from Latin albus = white) designated the public boards with a white coat of chalk upon which were inscribed various messages and declarations to the public. The album served as a writing place on which individual documents were posted in manners similar to modern bulletin boards (both hard copy and electronic). This form of message exchange displays a contingent relationship between posted or inscribed documents. Each message was individual and self-contained, their common grounds were the group that produced them and the community that they addressed. The album format characteristically lacks a strict successive organisation of messages and meanings. Instead, its modern variant - the family album - tells stories by means of snapshots, and its effects have little in common with the constrained structure of linguistic articulation. On the other hand, the album generates the richness and variety of a pictured whole constituted by individual media clips as punctuations of time and place. Given the collage and montage elements of the album format and its multi-linear, fragmented structure it may serve as a metaphor for the composition of multimedia messages and their unconventional merging of various media and sign types. The family album displays a combination of figurative and verbal information as well as a fragmented, self-contained multi-linear order of organisation (although chronology often is a dominant principle). Each piece of information is autonomous and gives perfect meaning on its own, and is not dependent upon which information is consumed in the temporal context of before and after.

Facing the challenge of creating structure and form in a multimedia system - 'The Interactive Kon Tiki Museum' - for presenting the excavations and expeditions of the Norwegian explorer Thor Heyerdahl, the idea of a video album was central. All Heyerdahl's major expeditions have been well-documented on film, photographs and in literature. But for a system planned as a supplement in the exhibition at the Kon Tiki Museum in Oslo, it was important to make much of the video available to the visitors. But the filmed material was all in the format of feature films or television documentaries which are not suitable for the museum environment and mode of information acquisition. The video material had to be made available to the users in short but self-contained chunks, and it had to be possible for the viewer to choose between various themes to get more detailed information on topics of their interest - still in the video format. Cross-references and linking in the video media, film and television are completely absent. One way of making video interactive was by adapting the footnote mode of reference, developed in literature, and use it as a model for how to organise the many chunks of specially edited digital video clips; but at the same time the dynamics, the rhythm and tempo of the whole presentation, had to be maintained. I will discuss these solutions in relation to the three following still images (screen shots) from the system.

The image in figure 2 depicts the scene in the system which corresponds to the table of content in a book or a magazine. In a time span of a couple of minutes, this scene introduces all the main topics or sections of the system. To introduce one of the items, four actions or sequences are joined to create a continuous flow or movement: The spinning of the globe, the marking of a path or a place, the zooming of an image from the geographic point to the button, then the short movie displayed on the button, before the whole sequence is repeated with the next button. These four different actions all contain different kinds of movement but are linked in a sequence to get the impression of a moving wave transported from the globe spinning to the movie icon (micon) on the button. This wave movement helps to create the impression of continuity, despite the discontinued elements and different types of actions involved.

The image in figure 3 shows the introductory scene to the Easter Island section. The two buttons named 'The Origins' and 'Statues' are the video footnotes. In literature, the convention of footnotes - the relationship between the main text and the footnote text - is marked with identical signs. Here, a different kind of pointing or suggesting relation was chosen. In the previous example the relations between the geographical point on the globe and the corresponding button were created by the movement of a growing image, starting as a pixel on the surface of the globe and ending as the first frame in a mini movie on the button. In the Easter Island scene this convention is more formalised. On the Macintosh, an animated frame indicates that a document is being opened. Following this convention, the image itself does not move, only an animated frame representing the reference to the video footnote.

If following the link to 'Origins' the whole screen changes and a new background appears with a blank field which does not display a video but text; three texts that like ingresses introduce the buttons to additional video clips (see figure 4). Instead of a time based media type like video introducing the scene, text alone was supposed to introduce the videos at this sub-level. However, this became problematic. When the user had got used to the dynamics of the two first scenes and their video based tempo and rhythm, something felt wrong when entering this scene. Suddenly, the user was facing text (a medium type with no intrinsic tempo for information aquisition) instead of video (with its object-based linearity). The dynamics of the presentation as a whole was gone, integration had not taken place in the intended way. Video had been supplied with characteristics from text. Now this had to be done the other way round; text had to be given the time dependent qualities of video. This was done in two ways. The text was divided into three paragraphs showing up on the screen in a sequence. Also, a voice over was added, an audio track reading the text. Thus the original non-temporal text was turned into a talkie, with changes over time both in the audio and in the video track. To achieve dynamic integration and flow one had to compromise a bit, the result was that video became interactive and selective like a text, and text time based like video and sound, but with the interactivity intact.


The video album as a synthesis of book and video conventions is only in its infancy of development. As a model and metaphor for the integration of media types in multimedia, it may still serve some functions. Another important aspect of the album is its mode of description. An album, non-linear and fragmented in its sequentially may depict a representation of a topic able to conserve the multilevelled complexity of its subject matter. With the album format the message is not spoken or pointed at directly, but through a complexity of approaches and representations. The flexibility and multilevelled dimensions of such a format may have the capacity to level with communicative representational potential of future multimedia messages.


