To the top

Sound Media: Theoretical introduction to sound media

Theoretical introduction to sound media
Sound and listening
Medium theory
Backward history
Auditory rhetoric

Part I – The present time
2. The acoustic computer: Nervous experiments with sound media
3. Synthetic music: Digital recording in great detail
4. The mobile public: Journalism for urban navigators
5. Phone radio: Personality journalism in voice alone
6. Loudspeaker living: Pop music is everywhere

Part II – Backwards history
7. Tape control: A revolution in recorded music, 1970s – 1950s
8. The acoustic nation: Live journalism, 1960s – 1930s
9. Microphone moods: Music recording, 1940s – 1930s
10. Atmospheric contact: Experiments in broadcasting, 1920s – 1900s
11. The repeating machine: Music recording, 1920s – 1870s

References, soundtrack supplement, acknowledgements
What you hold in your hands is a book, and consequently you are now immersed in its sensory environment. Your eyes follow the argument line bv line and page by page; you can skip between chapters at your leisure; and once in a while you may find yourself thinking new thoughts. Reading and writing are efficient techniques of communication, and they have been fostered around the book (and scroll and clay tablet) for thousands of years. The experience of sound media is entirely different from the experience of the book. You tap your toot half-consciously to the funky beat of the music, and you imaginatively share the adventures of the foreign correspondent on radio while doing the housekeeping.

This book is all about sound media, and this chapter clears the ground for an analysis of altogether ten different set-ups of the sound media that are widespread at the present time, or were influential earlier in history. The book is organized according to my version of the research tradition called medium theory. Joshua Meyrowitz has given a lucid definition of medium theory that I will start from:

Medium theory focuses on the particular characteristics of each individual medium or of each particular type of media. Broadly speaking, medium theorists ask: What are the relatively fixed features of each means of communicating and how do these features make the medium physically, psychologically, and socially different from other media and from face to face interaction?

(Meyrowitz 1994: 50)

By my lights Meyrowitz sets up a reasonable ambition for the media researcher, and indeed dozens of prominent researchers have studied more or less exactly what he prescribes without actively thinking about themselves as medium theorists (for example, Ellis 2000 and Scannell 1996). Briefly stated, my version of medium theory has four dimensions: 1) a description of sound and listening; 2) a theory of what a medium is; 3) a method for a backwards history of media; and 4) a method for rhetorical analysis of journalism and music.


Sound and listening

The Concise Oxford Dictionary (1975) defines sound as an experience of the ear caused by vibrations in the surrounding air; an event that is being or may be heard; the act of giving forth sound or causing to sound. But you don’t need a dictionary to know what sound is.Your body hears long before you can read a dictionary, since from the first day of your life you have navigated through the world with the aid of the sense of hearing. You don’t have earlids like you have eyelids, and even the deaf can feel sonic vibrations in their bodies. Hunters in the jungles of New Guinea relate to bird song, insect noises, and trees and plants moving in the wind. City dwellers relate to sounds of transportation, large masses of people, ventilation systems and fire engine sirens. R. Murray Schafer ([1977]1994: 274) coined the term ‘soundscape’ to capture this never-ending presence of sound in people’s everyday lives.

Natural sound
Natural sound is my term for all the sounds that are non-mediated – that is, thev occurred before sound media were invented, or they occur without any form of transmission or recording at the present time. Natural sound is crucial to public life in all civilizations of the world, especially in the form of oratory and song. Imagine Enrico Caruso (1873-1921) performing in San Francisco in April 1906, on the night before the great earthquake. The concert hall is packed with well-to-do citizens in starched shirts and gowns, their senses trained on the operatic singer and the orchestra. With eager expectation they hear sound waves emanating from Caruso’s mouth at 340 metres per second. The sounds inform us about certain features of the actions of Caruso and the orchestra that the other senses do not, but at the same time the other senses give access to visible, touchable and odorous aspects of the same events.

My approach takes the sensory richness of communication into account. It is concerned not just with the sounds in isolation, but also with the things that vibrate – the singers and speakers and their equipment and the wider surroundings. Think of Caruso’s body, which is the entire basis of his expressive voice, and imagine his beautiful clothes, his jewellery and the other accessories. These other things also have a communicative influence. Indeed, all five senses must be thought of as one exploratory entity and in pursuing this thought I am inspired by Merleau-Ponty ([1945]’ 1992), Gibson (1966) and Ihde ([1976] 2007). On the basis of these influential works, I identify four existential characteristics of sound that guide the individual’s communication effort, and these characteristics are also integral to the visible and touchable materials of communication:

Time (duration, chronology, causes and effects)
Space (directions, shapes, volumes, distances)
Personal expressiveness (emotions, moods)
Coded message (for example, news and love song)

Firstly, sounds always tell us something about time. A performance always happens right now, in front of those present, and for Caruso this means he is under social pressure to perform well. It is the same thing in the theatre, the opera and vaudeville. The real-time progression of sound events causes such phenomena as the nervousness of live performance, whether in the concert hall or at the political rally, where the performers have only one chance of making an impression, and nobody will forget it if they make a fool of themselves because they were ill-prepared. Sound events are ephemeral; they last for only a second, and at most around fifteen seconds in the extreme reverberation of a mountain pass. When the energy is expended a particular sound is gone forever. Some sound events appear to last for a long time, for example the constant roar of a waterfall or a tedious political speech that goes on for hours, but these consist of a continuous generation of sounds that all wear off immediately and are never heard again.

Caruso in the consert hall. Illustration: Atle Skorstad
Caruso in the consert hall. Illustration: Atle Skorstad

Before the invention of recording all the sound events were by definition continuous with the progression of the world at large. Caruso represented a new era with his famous recordings, which he released from the early 1900s (Day 2000). Many San Franciscans had listened to his records the night before the opera; over and over again they had hstened to his tenor voice rising and falling, and the experience must have heightened their expectations. Tonight I will hear and see him in the flesh! And this is partly why Caruso is nervous. Unlike the recording session, a concert has no second take. Reviewers from San Francisco newspapers would be listening carefully and publish their reviews the next morning.

Secondly, sounds always tell us something about space. Caruso is singing in a modern concert hall, which is sound-proofed, with a rich and precise resonance created by expert acousticians. A concert hall is a sound technology, but it is not a mediation technology. It can be compared with the ancient amphitheatre and arenas in Greek and Roman times, except that the biggest arenas did not have a roof and had less well-controlled acoustics. Over 50,000 people could be in attendance at a Roman arena, and the sounds from the stage could reach even the cheapest seats with a measure of clarity, at least when the audience was silent. In a telling phrase Theo van Leeuwen (1999: 14) calls sound a ‘wrap-around medium’. Referring to the same experience, Rick Altman calls sound a ‘three-dimensional materiality’. He beautifully describes a woman speaking in an auditorium: ‘Radiating out like a cone from the actress’s mouth, the sound pressure soon fills up the entire auditorium, bouncing off the walls, the floor, and the ceiling, and bending around audience members, chairs, and posts until it is finally completely absorbed’ (Altman 1992a: 21). Sound is never located at a singular geometrical point; it is always in the process of spreading further into the surroundings, and therefore the environment resounds with events from above or below, far away and too near, all the time. The bang of a closing door goes through the walls and resonates up the stairs, for an instant filling the corridor or even the street with its impatient movement. Great waterfalls can be heard miles away. In more technical terms the resonance in a given surrounding is related to the volume of the sounds (the louder, the greater the area of coverage), their frequency characteristics (low frequencies spread out in all directions, high frequencies go in a precise direction), and the texture of the things involved in the movement (waves are absorbed by soft materials and bounce off hard materials).

The environmental function of sound is important because humans live with it all their lives, perhaps coping well but perhaps also being stressed by it. Schafer ([1977] 1994) vividly describes the low fidelity sound of the modern West, where mechanical and electrical noises of all kinds make sure that there is never a moment of real silence. He perceptively points out that in such an environment sound does not come towards the listener but is present everywhere. Tony Schwartz (1974: 48) argues that ‘acoustic space is more like something we wear or sit in than a physical area in which we move. A listener is wrapped in auditory space and reverberates with the sound. ‘To clarify the concept of environment I will set up a contrast between the general environment and the ambient environment. The general environment really consists of an average, and takes into account all the auditory experiences that a person could have while moving around in a given city or country, while the ambient environment refers to the actual sounds and other sense impressions that individuals have in their everyday locations, where they go about their lives as usual. This book focuses on the individual experiences of the sound environment, but it must be said that it is quite impossible to make empirical descriptions of them (I do not have access to their perception), and therefore it is nevertheless a general description of individual experiences.

Directional hearing developed as an early-warning system for physical danger — for animals just as much as for humans. Hearing surveys the sound-scape and helps us to direct our eyes to a particular source of sound. This is simply human awareness, the ability to react quickly to new information (see Plomp 2002; Handel 1989). Wandering around in the soundscape of their city or village, people can easily discern the difference between locations based on sound. Sounds are the raw material for the orientations and explorations in which human beings constantly engage. In San Francisco in 1906 it started with a low rumbling that was different from all the familiar sounds of the city; it was soon accompanied by all kinds of things falling down, and the creaking and whining of wood, concrete and metal being dislocated, things crashing down on them. Finding yourself in an earthquake in the middle of a big modern city awakens your survival instincts. This is perception at its most acute.

Thirdly, sounds always tell us something about the personality of the performers. Simon Frith (1998: 191) claims that the singing voice ‘stands for the person more directly than any other musical device’. Song and speech sounds spread out from the mouth, with the hands and body often helping the words to achieve their intended meaning. When Caruso sang ‘The Siciliana’, a complex ensemble of tongue, jaw, teeth, lips, nasal cavity, larynx and breath were involved, all trained to perfection by the great tenor. Beyond the talented timbres of ‘The Siciliana’ is the person Enrico Caruso. How did he interpret the intended passions of the song? Did he sound vulnerable or aggressive; and were any of his emotions particularly authentic because of a desperate love affair in his own life? The personal and private resonance of communication became very important with the emergence of sound media, and its historical development is at the heart of this book.

Finally, sounds often tell us something about the world by carrying a coded message. After all, the main reason why humans carry on vocalizing and melodizing is that these sounds can communicate messages to other humans very efficiently. There is no end to the uses to which language and melody can be put, and the resulting communication varies with, for example, the mother tongue used (Italian versus Norwegian), the social setting (formal or informal) and the speaker’s skills (eloquent or clumsy). Let me stick to my case, and inform you that during the fateful night in San Francisco Caruso sang an aria from the opera Cavalleria rusticana (1890) by Pietro Mascagni. As the opera begins a young villager sings ‘The Siciliana (O Lola, lovely as the spring’s bright blooms)’, a tormented love song to a young maiden. The villager has returned from military service and found that while he was gone Lola abandoned him and married the prosperous village teamster. This act of treason is sweetened by the fact that she is still in love with the young man. From this starting point the love story evolves. Please imagine the rich cultural analysis that could be made of Caruso’s performance by combining operatic history with Italian cultural history and the great immigration surge to the USA during the early 1900s. Although the larger cultural context of these messages is not pursued actively in my book, it is all the time a background feature.

Mediated sound
Since the 1870s the messages in sound have been not only a natural but also a mediated phenomenon. Strange things are accomplished through recording, telephony and broadcasting. These media separate sounds from their occurrence in one place only and allow them to be projected in many unassociated places at the same time, or be repeated indefinitely later on (this has been pointed out by a host of authors, for example Jones 1992; Chanan 1995; Millard 2005; Katz 2004; Lax 2008). In millions of homes people have listened to the music of Caruso on the gramophone, have struggled with the weak transatlantic telephone connection, or have worried at the stern sound of Margaret Thatcher’s voice on the radio. The fact that sounds were repeated outside the time and place of the original performance caused confusion in private and public life. In a typically modern way both producers and listeners have explored all conceivable opportunities to communicate with each other, slowly creating new provinces of meaning in sound communication (Bull and Black 2003).

I will analyse in this book a series of mediated sounds quite closely all of which are contained on the accompanying soundtrack CD. The first track is symbolic of the theoretical tradition from which I write. The LP is called The Medium is the Massage and was released by Columbia Records in 1967 as an accompaniment to the book of the same title (McLuhan and Fiore 1967). These sounds could only be made with modern, professional stereo tape equipment (8 or 12 track). The production is typical of the media environment in New York City in 1968, in the midst of psychedelia, the Vietnam War and the 1960s cultural revolutions. The book version of 77fe Medium is the Massage is, by the way, a beautiful example of creative typography, and the pages are filled with unusually large and small type faces, drawings, photographs and facsimiles that support the argument of the volume.

My intention in analysing the McLuhan LP is to clarify the difference between the properties of mediated such as and the properties of natural sound such as Caruso singing in the concert hall in 1906. In order to be systematic, I will present the McLuhan track according to the same four characteristics as before: time, space, personality and message. McLuhan’s aphorisms are transcribed for legibility, but most of the sounds are completely untranscribable.

McLuhan in the control room. Illustration: Atle Skorstad
McLuhan in the control room. Illustration: Atle Skorstad

Track 1: Marshall McLuhan: The Medium is the Massage, 1961 (1:42).


– Until writing was invented, man lived in acoustic space, boundless, directionless, horizonless, in the dark of the mind, in the world of emotion, by primordial intuition, by terror. Speech is a social chart of this bond.
– The medium of our time, electric circuitry, profoundly involves men with each other. Information [verbal loops and effects throughout].
– There are no grammatical errors in a non-literate society.
– All media work us over completely. They are so pervasive in their personal, political, economic, aesthetic, psychological, moral, ethical and social consequences that they leave no part of us untouched, unaffected, unaltered. The medium is the massage.
– Any understanding of social and cultural change is impossible without the knowledge about how media work as environments.
– Everything we do is music.

Firstly, the temporal existence of recorded sound is quite different from that of natural sound. Recorded sound is a material object fixed in time that can be bought and sold on the market. A recording has no continuity with the world, and that is why we can hear McLuhan and his companions today, even though they spoke in 1967. People can record important events such as the birth of their first child for the family history archive, and in doing so they bring the event into the future as something that can be experienced again and again.

Secondly the acoustic space of a recording is in a sense double (Altman 1992a: 27). The sounds from the loudspeaker have their own acoustic space that is safely contained on the recording. The weird electronic noises that McLuhan and company made in 1967 can be played back in a number of different acoustic settings, and when they fill the listeners’ room they are affected by the characteristics of that room. Since the technically produced acoustic space fills up a domestic space, the result is a double space. Notice that the acoustic space ot the recording is unchangeable, except that the listeners can adjust the volume and place the loudspeakers in different ways to influence it slightly. If you move closer to the loudspeaker the sound gets louder, but you don’t move closer to McLuhan. And there is obviously no way of entering that recorded space and moving around in it. Furthermore, the acoustic space of mediated sound is transportable. It can be played back in all kinds of public and private places. People can play the sound on their private stereo system, and this allows them to share the experience with friends. Since the Walkman was introduced in the early 1980s people have been able to take the mediated acoustic space with them wherever they go. If they like, they can be completely immersed in their own private experience.

Thirdly personality in sound media is quite an elusive matter. Clearly, there is no direct contact between speakers and listeners as there could have been between Caruso and his fans. The performances are already complete when people hear them. Listeners cannot interact with McLuhan in a reciprocal way. There is, for example, no way to ask him what the heck he is trying to tell us. This means that in recording and broadcasting the relationship between producers and listeners is asymmetrical. The producers are absent from the listeners’ locale, and the listeners are absent from the producer’s locale. Never the twain shall meet. But despite the division there is obviously a process of contact between them, since mass communication works fine across the years and over large distances. There is an industrial distribution of messages to a dispersed public instead of a dialogue between interlocutors (Scannell 2005: 130). Anthony Giddens argues that the mass-produced address requires a specific form of trust. Since media events are substantially absent from the listener’s perspective, people are forced to trust the persons who make the claims in quite an open and risky way: ‘Trust presumes a leap to commitment, a quality of “faith” that is irreducible.’ It is specifically related to the account of events from which people were absent in time and space, Giddens stresses (1991: 19). An implication of Giddens’s argument is that there is little need for trust in events that are constantly in view, and which can be directly monitored and intervened in if necessary. Consequently, there is a great need Entrust in the mass media.


Medium theory

As already stated, I subscribe to a long tradition of scholarship that is often called medium theory, and it comes as no surprise that Marshall McLuhan is a crucial influence on my work. There is a large literature of interpretations of McLuhan s work; see, for example, Miller (1971), Grosswiler (1998), Genosko (1999), Levmson (1999) and Moss and Morra (2004).

There is one sentence on the McLuhan LP that is very helpful in pointing out what medium theory is about: ‘Any understanding of social and cultural change is impossible without the knowledge about how media work as environments.’ I take McLuhan s proposition to be profoundly true. The media are environments on a level with railways, road systems, airports and other gigantic technological infrastructures in society, although they are indeed many other things also. It is worth sticking with the material dimension, as McLuhan does when he argues that ‘technological media are staples or natural resources, exactly as are coal and cotton and oil’ ([1964] 1994: 21).The humans have set about refining their natural environment with electronic technologies, and are planning to live with these arrangements for a really long time. McLuhan describes what happens during such a long exposure to a technology: ‘Physiologically, man in the normal use of technology (or his variously extended body) is perpetually modified by it and in turn finds ever new ways of modifying his technology’ (ibid. 46). His theory acknowledges that this is a flexible relationship, but he nevertheless stresses that man is not completely in control of his technologies. He argues that ‘technological environments are not merely passive containers of people but are active processes that reshape people and other technologies alike’ (McLuhan [1962] 1992: i). For example, there are environmental aspects to flying across the Atlantic, and they will affect all passengers more or less equally in the long run, but the passengers are probably more concerned with the short-term effect of getting home quickly. I find it fruitful to apply this environmental theory of change on the mass media.

A medium cannot work as an environment without lots of people using the same equipment and practising the same techniques for a long time. A technology that has just left the laboratory cannot be said to work as an environment. The concept of media environment presumes industrial production of equipment in many countries and millions of people who have become accustomed to using it over a long time, perhaps during their entire life. And, most importantly, the concept of a media environment presumes that the medium quite regularly appears as a social background in people’s everyday engagements.

Materiality up front
Notice how strongly my theoretical approach stresses the material dimension of the media (this perspective is inspired by Inms [1951] 1991; Winner 1986; Ihde 1990; Gumbrecht and Pfeiffer 1994 and Mitcham 1994, among others). The media are results of scientific research under Western capitalism, and its combination of high-tech precision and desperate competition has produced great things. Most types of media equipment were first painstakingly engineered as prototypes in the secret laboratories of large corporations. The historical development of the equipment has had a direct relevance for the social history of the mass media. Many factors propel the industrial production of equipment and make sure that society becomes ever more saturated by the media. There is a regular replacement of equipment in private homes and company offices whenever a new and more efficient version has been launched on the market. Electronic stores such as PC World and Dixons are full of new equipment that promises to give the buyer improved efficiency and greater pleasure within a given context of use. In’attics and museums discarded equipment piles up, for example cassette decks from the 1970s and 1980s. The wind-up gramophone has been discarded so completely that in 2008 you can really only listen to one if you go to a museum. In addition to the regular replacement of equipment there is an increase in the number of technological platforms that are used at the same time.

Not only do we regularly throw away old versions of the equipment and buy-improved versions, we also possess more and more different types of equipment. This process propels the mass production of equipment and innovation in technology. When a new medium is introduced, it never really replaces an old medium but begins to exist alongside the old ones, partly replacing some functions and partlv introducing completely new ones (Briggs and Burke 2002: 5). Consider that, during the period from the early 1970s until now, at least two major new technical configurations for communication have been erected: the personal computer, with broadband internet as an important feature, and mobile phone networks with text messaging of many kinds. Lab engineers have developed an endless amount of appliances and plug-ins that go along with them. Consider that before 1970 there were many mature media configurations, for example multitrack stereo music, colour television. 3D movies and voicemail for the telephone.

McLuhan postulated that the materiality- of a medium has long-term effects on perception, while the content in the traditional sense is of minor importance. A mediums core characteristic is that it changes the ratio of the senses in public communication, compared to the ratio typical of previous media. After becoming prominent the medium promotes and cultivates some perceptual activities more than others, and in this indirect way it causes social change. McLuhan boldly formulates a law about the relationship between technology and communication:

For the ‘message’ of any medium or technology is the change of scale or pace or pattern that it introduces into human affairs. The railway did not introduce movement or transportation or wheel or road into human society, but it accelerated and enlarged the scale of previous human functions, creating totally new kinds of cities and new kinds of work and leisure.

(McLuhan [1964] 1994: 8)

This statement should not be rejected too hastily. Despite its deterministic ring it is a fruitful starting point for investigations of the changing relationship between humans and media. With refinements McLuhan’s proposal to investigate the change of scale or pace or pattern can become a useful tool for analysing media history, as I hope to show in the empirical chapters of this book (and as I have also tried to show in Nyre 2003).

The notion of an influence from the medium itself has caused strong theoretical resistance towards medium theory. It seems to clash head-on with a more widespread way of theorizing the role of technologies in the media, namely the position that is often called social constructivism (see, for example, Tuchrnan 1978: Douglas 1987; Metz 1985; Marvm 1988; Winston 1998; and Lastra 2000).These approaches postulate social needs and aspirations as the driving force of historical development in the media. If such positions are incommensurate with mine it is not because of disputes about the historical facts, but because they do not give the material features of these historical facts sufficient attention. Carolyn Marvin has made a claim about the history of the media that I will label ‘social constructivist’:

Media are not fixed natural objects; they have no natural edges. They are constructed complexes of habits, beliefs, and procedures embedded in elaborate cultural codes of communication. The history of media is never more or less than the history of their uses, which always leads us away from them to the social practices and conflicts they illuminate.

(Marvin 1988:8)

From my perspective it is hard to agree with this way of thinking. Consider the sound of Neil Armstrong’s statement ‘A small step for man, a giant leap for mankind’, in July 1969. It was uttered in a helmet in outer space and transported back to earth at the speed of light, and then it was heard live by almost a billion people all over planet earth.The Apollo 11 broadcast goes to show that the mass media certainly have natural edges. The many technologies that made it possible for Armstrong to be heard conform to the laws of gravity, they run on electricity, they take advantage of electromagnetic radiation, and they put sensual constraints on users. It seems that the history of the mass media may just as well lead us towards these natural edges as away from them.

This book argues that a historically new form of social communication came about with microphones and loudspeakers from the late nineteenth century. There were no credible precursors to the experiences created in and around these media; there were only weak approximations such as the mechanical piano and the click of the telegraph inker. Edmund Carpenter says that each medium if its bias is properly exploited, reveals and communicates a unique aspect of reality. Each offers a way of seeing an otherwise hidden dimension of reality. It´s not a question of one reality being true, the others distortions. One allows us to see from here, another from there, a third from still another perspective taken together they give us a more complete whole, a greater truth’ (Carpenter [1960] 1979: 371). It is safe to say that electronic media had been ‘properly exploited’ when they allowed humans to study the earth from the perspective of the moon.

While a new medium certainly creates a new reality, it does so by its specific way of limiting human experience. For example, the telescope introduced human the eye to very large objects very far away, but these objects could not be heard or touched or tasted. They could only be experienced through the lens. The philosopher Don Ihde refers to this as a technology’s non-neutrality. Technologies reveal and conceal, magnify- and reduce, amplify and mute. Technologies transform experience, and this is an important aspect of their non-neutrality, Ihde argues (1990: 49). Again it follows that a medium is not ‘constructed complexes of habits, beliefs, and procedures’, as Carolyn Marvin would have us think, but rather a system of constraints on the senses that makes all messages similar in a systematic way, and leaves out other things just as systematically. An opportunity for action always carries with it constraints on action. Until replacements have been made the medium works only like this, and all experiences and interpretations in the culture will be framed by it for the duration. This goes to show that a medium is not a machine for transporting persuasive messages; it is a form of persuasion in its own right.

Documentary realism
It is well known that sounds, like moving images, seem to communicate more directly to our senses than written texts. There is a profound difference between experiencing the sound of a real gun at 1 metre’s distance and experiencing the word ‘bang’ displayed on a piece of paper at 1 metre’s distance. Media theorists have tried to capture the perceptual character of sound and moving images in many ways. Joshua Meyrowitz (1985: 75) argues that television involves ‘an access code that is barely a code at all’, and John Ellis (2000: 9) writes that radio and television present a ‘quasi-physical documentation of specific moments in specific places’. I will refer to this as documentary realism, and I will demonstrate documentary realism in sound media with a detailed sound example. For classical music the recording medium has the same communicative purpose throughout its history, namely to convey the musical performance as vibrantly and realistically as possible, and nothing else. The characteristics of this type of documentary realism come across if we compare three recordings of the same music score over a period of sixty years.

La Valse by the French composer Maurice Ravel has been interpreted and re-recorded endlessly since it was composed in 1920 (Larner 1996). It is often interpreted as a metaphor for the demise of the Austrian and German cultures that led up to World War I, embodied in the waltz. At the end the orchestra unleashes a terrifying energy that shatters the waltz and ends in an unsettling crescendo. My comparative case study comprises three different recordings representing the digital, magnetic and electric versions of the recording medium. First a 1991 recording made on DAT tape and released on CD.

Track 2: Cleveland Orchestra: La Valse, 1991 (1:13).

We hear the great musicianship with clarity because microphones are well placed to pick up the sounds from the instruments: some are placed near the instruments to pick up direct sound, others are placed in the ceiling or at the back to pick up the reverberations. The recording has a great sense of spaciousness and distinction of detail, and it is not an exaggeration to say that we can hear each musician’s contribution to the whole. A number of complex skills are needed among the production staff to create this good sound, plus of course the musicians’ talented efforts.

Moving twenty-one years backwards, we stop at the next version of La Valse, which was produced on magnetic tape and released on stereo LP in 1970.There may be a little less spaciousness and distinction of detail in this version than in the CD from 1991, but the difference is in no way substantial. They both sound very good. What should be noticed, however, is that in 1970 stereo had just become a standard feature of home equipment and the aesthetics of recording. Stereo sound greatly enhanced the sense of documentary realism, at least in classical music.

Track 3: London Symphony Orchestra: La Valse, 1970 (1:18).

The experience of stereo music was powerful and impressive in the room. Roland Gelatt (1977: 314-15) says that no one hearing stereo tape recordings for the first time could fail to be impressed by ‘their sense of spaciousness, by the buoyant airiness and “lift” of the sound as it swirled freely around the listening room’. The listener could both locate sound sources horizontally from the left speaker to the right speaker and use the balance knob on the stereo to create a spot where the sounds from the two loudspeakers reproduced the intended acoustic architecture in a ‘sweet spot’ with maximum accuracy.

From 1970 we move another thirty-nine years backwards. The oldest version of La Valse is a mono recording, and really cannot be said to have a sweet spot at all. It was recorded and released on 78 rpm disc in 1931. At this time the audio quality was distinctly less clear and spacious than what could be created by later platforms, but this did not limit the sense of documentary authority in the recording.

Track 4: Orchestre Lamoureux: La Valse, 1931 (1:02).

The recording sounds thin and shrill compared to the two others. There is less clarity and therefore it becomes much more difficult to make out individual instruments in the mix, and there is also less spaciousness and ‘lift’ in the acoustics. But although the sound quality is very poor according to modern standards, this does not reduce the sense that we are hearing a live musical performance. On the contrary, any knowledge of recording practices in the 1930s will convince the listener that this recording is indeed more realistic than the others. The 1991 and 1970 recordings both sound almost clinically perfect, as if they had been modified and mixed without us being able to hear it, and without the producers informing us about it. In contrast, the 1931 version sounds truly indexical, as if there were no creative manipulations at all, only a great sluggishness in the medium that the musicians and producers managed to overcome.

These three versions of La Valse demonstrate a remarkable stability in the recording medium’s function – namely to record music and other performances with as great a documentary authority as possible. Seemingly there is no real influence from the medium itself.

Figure 1.
Figure 1.1

Figure 1.1 displays a model of how the recording medium works, and it is traditionally known as the linear communication model. The sounds of voices and instruments enter the medium at the microphone; the signals are recorded on tape or another storage medium; the recording is thereafter copied industrially and distributed on LP or another type of disk or file; and finally the recording is played back on a domestic record player and the sounds of voices and instruments are re-created through a loudspeaker. The notion of media neutrality has been widespread in classical music and much of journalism for the better part of a century. The truth claims of the news, the presumption that singers are authentic, and other expectations of realism rely on the idea of medium neutrality. It doesn’t matter what is between the microphone and the loudspeaker, since it is in any case without substantial influence.

I consider the idea that a medium is a neutral transmission channel to be misleading. In popular music and rock there have been no ambitions of documentary realism since the 1960s, and radio reports had been edited together on tape long before that. The McLuhan LP, which was recorded at the same time as the 1970 version of La Valse, demonstrates this clearly. There is almost nothing in the McLuhan recording that comes from the world outside the studio and which could be represented with documentary realism (or lack of it). In the early twenty-first century it is increasingly obvious that there isn’t really an indexical link back from the recording to an original event. Instead, there is a huge pile of interfaces and storage platforms and transmission platforms with a variety of different functionalities, and they are combined in different ways that vary greatly through history. How can this confusing mix best be approached?

The medium itself
The medium model in figure 1.1 is a good starting point for descriptions of different sound media, since it shows all the components that are necessary for mediation to occur. Notice, however, that each single component influences the character of communication, and when one of them is replaced a change in the perceptual conditions for the users occurs. Notice also that many more components than those displayed in figure 1.1 can be a part of the medium. Imagine drawing up the components that make up the computer, mobile telephony or satellite transmission, not to speak of hand-held devices that incorporate all three of them. Clearly, a medium is not a compact, self-contained entity, but a series of interconnected technologies where most components are regularly replaced without the basic lines of communication breaking down.

In this book I will discuss the functionalities of the history of sound media in a consistent vocabulary. A sound medium consists of interfaces where sound is expressed and listened to by humans, platforms that control, store and transmit signals to the public domain, and signal carriers that effect the physical transportation of the signal. The vocabulary is strongly inspired by Albert Borgmann’s Technology and the Character of Contemporary Life (1984).

The interface is the point of contact between humans and technology (Johnson 1997: 14). It is designed specifically to be handled and related to by humans, typically with the hands, the mouth and the ears, and through visual perception. The microphone interface is crucial because it translates sound expression into signals that can thereafter be technically manipulated, and the loudspeaker is crucial because it translates signals into sounds that humans can hear. An interface is a point of simultaneous contact and division, meaning that it also makes us aware of how far away from other people we are when we communicate with them, for example, on the telephone.

A platform is a device that controls the storage and/or transmission of the signal. When, for example, an AM radio signal has been broadcast it can be received by all devices that contain a reception platform for AM radio signals. The platform is the publishing and distribution component of a medium, and it depends on the interfaces for something to publish and the signal carriers for efficient transport of the product. At the producing and receiving end there may be different but compatible platforms, so that a conversion process is necessary. This was the case in the 1960s, when pop music was produced on magnetic tape but distributed and enjoyed on stereo LPs. Notice also that a medium may consist of a whole series of interconnected platforms. The internet could be said to be a platform for websites, but in any case it relies on the computer platform for domestic access and the telephone platform for online connection. It could furthermore be argued that every piece of software on the internet that can distribute messages systematically is a platform – such as podcasting, web radio and file sharing

The signal carrier facilitates the actual contact between separate but compatible platforms, and this involves transportation of the signal across large geographical distances as well as over a long historical period. The carrier contains analogues of sound events in a material form that is suitable for mass distribution. The signal can be carried 1) through the air by electromagnetic waves; 2) through landline wires strung between houses and offices; or 3) on a revolving disc or other tangible container. The signal carrier is by definition transportable and in the case of radio transmission it moves at the speed of light.

Finally, I will comment on what could be called the machinery. The interfaces, platforms and signal carriers obviously rely on electrical power, and this comes from batteries or mans electricity. Electricity powers the tube or transistor amplifiers, the computer disc drives, the microphone and loudspeaker diaphragms and all the other electronic equipment. The machinery drives the equipment in a stable and inconspicuous way because the functions that demand manual labour and attention have been automated. The machinery’s delicate movements are protected behind metal or plastic covers, and this process is often called ‘blackboxing’. Although I do not analyse the machinery in any systematic fashion, it is absolutely crucial to modern media. Just imagine the severe disruption of the media environment that occurs during a power cut.


Backwards history

It is time for a proper introduction to the historical perspective of this book. Medium theory presumes that the emergence and improvement of the media occur in history – that is, in a complex interconnection with all kinds of human endeavour, ranging from the trivialities of life without tap water to the political revolutions of the two world wars.

Media behaviour must be thought of as historically contingent; it is taught, conserved and translated inside a given technological system, and will die out if the equipment is removed or if better and more efficient techniques are introduced. Jonathan Sterne (2003; 2) describes the slow process: ‘It is not that people woke up one day and found everything suddenly different. Changes in sound, listening, and hearing happened bit by bit, place by place, practice by practice, over a long period of time.’ This book traces the emerge and disappearance of these cultural techniques in different sound media (Gentikow 2007).

Sound media form a global, modern phenomenon, one that is obviously very complex. The main purpose of the backwards history approach is to separate and identify all the cultural techniques in a systematic fashion. The complexity of the issue is unnerving. Each mass medium is made with different user inter¬faces and different cultural purposes in different countries, historical periods and social groups; and the process has been going on for at least 4,000 years. Faced with this great panorama, medium theorists study the media according to their material differences from one another (Schudson 1991; Ziehnski 2006).

McLuhan, in Understanding Media ([1964] 1994), devotes one chapter each to several dozen media (including roads, weapons and other technologies that would not normally be labelled media), and he tries to explain how they are different from each other in sensory and functional ways. He is very sensitive to cultural meanings rooted in the specific ways in which a technology is designed and used. Friedrich Kittler writes about Gramophone, Film, Typewriter (1999), and Walter Ong writes about Orality and Literacy (1982) with much the same pre¬sumptions. Brian Winston’s Messages (2005) also separates the mass media from each other in a systematic fashion, and describes them from the introduction of the printing presses in the 1450s into our own time. My book essentially describes two media, namely recording media and live media. The telephone is an important backdrop, as is the internet, the television, sound film and books, but none of these other media will be analysed with the same level of detail.

My backwards narrative of individual sound media has two dimensions: the composition of the medium at a given historical time, which will be displayed in medium models and timelines, and the chronological changes from one stable state of things to another, which comes across when the models and timelines from different chapters are compared.

The term ‘break boundary’ clarifies both these dimensions, and makes them applicable to systematic narration (Blondheim 2003: 179). Firstly, there are break boundaries between the characteristics of media existing at the same time, for example between newspapers, film, music recording, radio, television and the internet in our time. This is the synchronous dimension. Secondly, a medium exists in a definite historical period that comes after its invention and lasts until it has become obsolete. This is the diachronic dimension, and from this perspective one can describe break boundaries between the different historical phases of a mediums development, as well as the boundaries towards other media developments that may influence its course. My narrative has a well-delineated historical span which simply goes back to the invention of the first sound media in the 1870s.

Figure 1.2: Timeline of live sound media.
Figure 1.2: Timeline of live sound media.

Figure 1.2 is a timeline of all the live sound media that will be analysed in this book (in the black rows). Notice that the newest media come at the top and the oldest at the bottom. Below the timeline I have located other important electronic media that are part of the contemporary setting – in this case telegraphy and television, which are also five media. The five storyline goes back to Alexander Graham Bell’s invention from 1876. It is important to note that private telephony and internet media are also live, although that is not how they are commonly presented. The figure helps us to notice that a new live medium does not make the others obsolete – except for Marconi’s ur-technology, which I will describe in chapter 10. There is a noticeable accumulation of different live media as history progresses.

Figure 1.3: Timeline of recorded sound media.
Figure 1.3: Timeline of recorded sound media.

Figure 1.3 is a timeline of all the recording media that will be analysed in this book. The storyline goes back to the start of recorded sound, with Thomas Edison’s invention from 1877. Looking back, four basic platforms can be found building on each other: computer sound, magnetic sound, electric sound and acoustic sound. It is noticeable how quickly the platforms replace each other; and it is clear that the platforms have a tendency to make each other obsolete, which is a quite different structure of development from live media. Below the timeline I have identified sound film, television programmes and music videos, which are all highly influential audiovisual recording media.

When going backwards, it is soon revealed that people who live now possess many technologies that previous generations did not have, but that our technologies are nevertheless to a large extent built on theirs. Backwards storytelling tries to untangle these dependencies in a systematic fashion, and in this sense it resembles archaeology. In fact all historical research can be thought of as a kind of archaeology. The researcher begins every investigation in the present and digs their way layer by layer into the past. But when the digging is over the researcher will most often turn this process on its head, and let it start in the distant past and narrate it towards their own time.

However, by starting the narrative in the present and progressing towards the past I write a history of disappearance. The further back we go in history, the fewer of the techniques are widespread. The number of people who regularly handled a computer to log into the internet in the 2000s could be measured in the billions, but if we go back to the 1960s the number of people using a computer could be counted only in the dozens. The infrastructure of television in the USA in the 2000s is enormous, but in the 1930s television was found only in a few laboratories in Berlin, London and New York, and there were no cultural techniques associated with it.

The further back we go, the fewer are the countries and cities in which the medium is located. And the further back we go, the more manual are the processes of mass communication, because the functionalities have not yet been protected in automatized systems. The timelines and medium models are drawn up to aid in this backwards history telling. In different ways they demonstrate how the technological configuration gets smaller and less complex until there is really nothing left to analyse.


Auditory rhetoric

This book is a long series of studies in auditory rhetoric, and here I will clarify the method involved. Rhetoric is often called the art of persuasion, that is, the art of attempting to convince people to think or act in ways that suit the speaker’s interests. A famous example is when Cicero was consul of Rome, in 63 BC, and he denounced Catiline as a dangerous enemy of the Roman republic. Greatly aided by his oratorical powers, Cicero managed to get Cataline sentenced to death and himself praised as a true republican (Fafner 1982:79).

This high-stakes political rhetoric is dangerous for those involved, but there are fewer dramatic rhetorical situations in everyday life. In the 1940s the literary theorist Kenneth Burke changed the emphasis in a direction that suits the communication in sound media perfectly. Instead of being concerned with deliberation in an explicit sense, he deals with ‘an intermediate area of expression that is not wholly deliberate, yet not wholly unconscious. It lies midway between aimless utterance and speech directly purposive’ (Burke [1950] 1969: xiii). He refers to it as the rhetoric of identification. It is an attraction towards other persons and groups, and it is of vital importance for public life. Burke mentions the rhetoric of courtship as a case of such semi-conscious rhetoric in everyday life, but street concerts and political speeches rely on it too. In his perspective, rhetoric is the act of appealing for identification with a person or a group on the grounds of a claim or an idea (ibid.: 21).

The nature of rhetoric is well described by Lloyd Bitzer, who writes that ‘rhetorical discourse comes into existence as a response to a situation, in the same sense that an answer comes into existence in response to a question, or a solution in response to a problem’ ([1968] 1991: 9-10). ‘Kairos’ is an ancient term referring to this response, and it describes the happy situations where a speaker says exactly the right words at the right time, or the year when exactly the right pop sounds dominated during the summer. The competence needed by the producer is to be able to give a ‘fitting response to a situation which needs and invites it’ (ibid.: 10). My approach to rhetoric relies on Burke’s and Bitzer’s definitions, which I believe are well suited to the manifold techniques of the sound media, but it does not incorporate the vocabulary of the rhetorical tradition in a strict sense (see, for example, Foss 1996; Brummett 1991).

Auditory rhetoric consists of appealing to the public ear, making the best out of every microphone event, encouraging the studio staff to be creative, securing a big budget for the production, and gaining access to the best possible equipment. Based on these resources the producers manipulate acoustic spaces, time chronologies and voices to create experiences for people. A list of the stakeholders would include radio stations, record companies, electronics companies, celebrities, politicians, journalists, musicians and technicians. I will point out three strategies that resonate with the existential characteristics of communication defined earlier in this chapter:

– acoustic architecture
– time effects
– persuasion in person

I presume that basically sound media communicate through all manner of moods, but mostly through inviting, pleasing and attractive ones. Paddy Scannell (1996: 88) argues that the moods are public in character, and disclose a climate of feelings, opinions and attitudes. The moods are made to fit into the individual’s domestic setting. ‘When I turn on the TV set I am “in the mood” for watching or listening’, Scannell says. ‘I am in the mood for a bit of entertainment, or relaxation or for finding out about what’s going on in the world, or even just for having the telly on as a bit of company’ (1998: 22). The dimension of mood makes up the primary social link between listeners and producers – in journalism by-telling credible stories about reality, in music recording by playing enjoyable music. Consequently, the creation and maintenance of audience moods must be considered the primary task in radio and recording. These deliberate moods must be studied as the outcome of a historical process where journalism and music have continually adapted to the national and global conditions.

Acoustic architecture
Grandeur matters just as much now as it did in the time of Louis XIV at his Versailles palace. Locales are built with the skills of architects and decorators, to impress people, to make them feel at home. Architecture is about designing buildings and structures, and often also the design of the total built environment, such as town planning, urban design and landscape architecture (Carter 1995). It can be attuned to many different types of sociability among people, from the massive authority of the Pentagon in Washington to the serenity of an ancient Greek temple. And of course well-organized locales are just as important in radio, recorded music, film and television, although the techniques of construction are quite different (Connell and Gibson 2003; Blesser and Salter 2007).

Sound media rely on public acoustics, and this acoustics has been made in order that thousands and millions of people can listen to it in their domestic settings. Producers design the acoustic properties of their products in a very careful fashion, and this can be called the acoustic architecture of sound media. Ross Snyder ([1966] 1979: 350) thinks of the producers in broadcasting as architects of a spatial habitation which contemporary man will live and move and have his being’. We no longer read about the larger world of history and politics only in textbooks, he says; rather, ‘we are present in it’ (ibid.: 353).Tony Schwartz (1974) claims that radio and television communicate by ‘resonating’ inside people’s homes and in their social surroundings. When you walk though the city you will encounter many different media sounds; they resonate from a shop, a passing car, or from a window on the fourth floor of a building in a side street.

The production acoustics is created with a combination of two techniques that are quite particular to audiovisual media. The first technique is called microphone placement. It is of great importance how near or far away from the sound source the microphone is located. Edward Hall (1969) identified four spatial zones around the individual’s body that carry different communicative implications. Social distance is so short that the interlocutors almost touch each other’s lips; personal distance is an arm’s length or two; social distance is across the room or round a table at a cafe; while public distance is across an auditorium, a concert hall or a town square. These distances can easily be replicated in the sound media, and they give much the same social impression that Hall stipulated for natural surroundings (Meyrowitz 1979:58).The second technique is called volume control. Producers can adjust the volume and pitch of everything they record and transmit, and they can mix the sound so that all sources are blended to be just right for the purpose. Arnt Maaso (2002) describes how the Norwegian TV2 made a promo where soft female whispering is mixed very loud. It creates a strangely attractive address with a volume typical of an important public message and the distance of a very intimate relationship. The two techniques of microphone placement and volume control are crucial to at least eight types of production acoustics:

– voice acoustics inaudible studio
– acoustics resounding studio
– acoustics multitrack acoustics
– synthetic acoustics telephone
– acoustics outdoors acoustics
– equipment acoustics

Sound events can be produced in voice acoustics. A high-quality microphone picks up a voice in a sound-proof studio. There is only one source of sound in a highly controlled space and, depending on the performance, the address can feel very intimate (a whisper), personal (a soft pleading tone of voice) or social (lively shouting). This acoustics is only used in broadcast journalism, and not in music production.

Sound events can be produced in inaudible studio acoustics. Several microphones are rigged to pick up several sound sources in a controlled studio environment. It can be a series of musical instruments for a music recording or several speakers for a journalistic programme. There is little sense of an identifiable locale, although the room’s resonance may sometimes be mixed to create the feeling of a warm room, a large room, etc. All the sound sources are well shielded from each other, and are also fed separately into the mix.

Sound events can be produced in resounding studio acoustics. Several microphones are rigged in a large studio, and the performance takes place in front of a live audience. The event is arranged and mixed to balance the performers and the audience reactions, and also to convey the size of the hall and its atmospherics. Notice that this is nevertheless a very controlled environment, where producers also supervise the behaviour of the live audience. These events are typically marked by social or public distance.

Sound events can be produced in multitrack acoustics, which did not exist before the 1960s. Many recordings are edited together in an audible way. The producer can make use of sounds recorded especially for a session or select suitable archive sounds. Montage techniques are commonly exploited in this acoustics, and the producer can make the acoustic signature change all the time, jumping for example from the voice alone, to parliament, to the studio sound of the Eagles. A two-minute recording may have dozens, even hundreds, of different elements, each implying their own acoustics. Multitrack acoustics can be brutal for dramatic effect or gentle for emotional effect.

Sound events can be produced in synthetic acoustics. The most radical version is where no microphones are used, and music is created with MIDI programming. There may be an ice-cold metallic sound that does not resonate with any known space outside the medium.

Sound events can be produced in telephone acoustics. Telephone acoustics imitates or simply channels sounds of mouthpieces of telephones. There are often technical noises that disturb communication, and the frequency range is limited to just about the range of the human voice. However, there is very little complaining about the poor sound of telephones on radio, probably because we are all so used to this soundscape from our private fives that we recognize it as familiar.

Sound events can be produced in outdoors acoustics. One or more microphones are used to capture events in their natural surroundings. There will typically be several controlled sound sources, such as a reporter and some interviewees, but there will always be uncontrollable events in addition, for example heavy traffic or a crowded swimming pool, and these events are integral to outdoors acoustics. The events that are mediated can be planned or spurious, and they can take place in private or public settings. This acoustics is frequently used in broadcast journalism, but not in music production except for concert albums. In Hall’s terms, outdoors acoustics typically has social or public distance.

Finally, sound events can take place in equipment acoustics. The equipment itself makes sounds that can be very telling of how well or bad the equipment works. There are, for example, interruptions, static, hiss, pop and crackle. Equipment acoustics is essentially the sounds of resistance to mediation, and they can be stressful for the producers and listeners alike.

Reception environments
Earlier I stated that the acoustic space of sound media is double — that is, it is presented from loudspeakers in a domestic setting. The listeners experience this in a range of lifeworld settings that they can, to a large extent, organize as they wish. In parallel to the craft of the producers, listeners learn two basic techniques that affect the sound: loudspeaker placement and volume control. The listeners decide in which directions the sounds will go, and can for example organize a sweet spot for their stereo system or 5.1 surround sound. Regarding volume control, the listeners can adjust the volume of the sound to suit the situation -for example by turning up the volume in a noisy environment or turning it down if they are tired and edgy. The two techniques of loudspeaker placement and volume control are used to project sound into all the regions of everyday life. Most people in Western countries are likely to be familiar with these five reception environments;

– the home
– the car
– earphones
– public arenas
– outdoors

The home environment is stationary, and the radios and stereo set typically have set positions in the kitchen, the bathroom, the bedroom, the living room, and so on. The rest of the family is never far away in the home environment, and most of us can relate to the constant negotiation among family members about what should be on: the TV or the radio, rock music or hip-hop, loud music or quiet music, and so on (Morley 2000).

The car environment is portable, and the stereo, radio and loudspeakers are built into the interior of the vehicle. It is a small enclosure with very little resonance, but speakers are custom made and there can be fabulous fidelity in a well-equipped car. The driver typically decides what will be played, and also controls the volume, and passengers with a preference have to negotiate with the driver (Bull 2003).

The earphone environment is wearable, and this means that it can follow the individual wherever he goes (Bull 2000). Notice that there is a distinction between earphones and headphones. While earphones are inserted into the ear cavities, headphones only cover the outside of the ears. The latter have been in use throughout history, while earphones were introduced in the 1980s. The earphone environment can be fully controlled by the individual, who can start and stop and adjust the volume entirely to his own liking.

The public arena environment is an umbrella term that covers all kinds of organized settings for sound reproduction, such as cafes, restaurants, pubs, clubs, sports arenas and shops. The listener can influence the volume only by leaving the place.

The outdoors environment is also an umbrella term, and it points to a range of ways in which sound can be radiated into the surroundings without prior agreement. Teenagers play basketball in a back alley and a boom box blasts out hip-hop music, or a crew of carpenters listens to Kiss FM at the work grounds. There are two positions towards outdoors sound: that of the people who control the sound and those who are exposed to it. The people who get exposed have little control over the volume, except by confrontation. They may not even be in a position to leave (if the noise is on their street), and may in the worst case feel threatened.

Time effects
The temporal characteristics of radio and recording are simple, as I have explained. Radio is fundamentally an ephemeral medium, and programmes are mostly heard only once, with a steady flow of new instalments. Another way of saying this is that live media present events in real time instead of recording them for later publication. Recording media repeat already completed events, a long or a short time after they have happened, and typically with heavy editing m between (Wurtzler 1992; Auslander 1999).

Regardless of the difference between live media and recording media, their presentations are very often felt to be live (except for those made with multi-track acoustics), and this effect comes about because listeners are prone to conceive of human sounds as taking place in some kind of simultaneous presence (Ellis 2000; Scannell 1996).This perceptual tendency is inherited from face-to-face situations, and it can be called the liveness effect of sound media. It is actually not very puzzling that we can hear Winston Churchill as a living person long after he is dead, because this is how- we always hear human sounds. This liveness effect is an important part of media soundscapes, and it has been exploited strategically from day one.

I have a systematic focus on the material features of time experience in the media, and this can be distinguished from the ideological approach. Nick Couldry (2004: 356) argues that liveness is not a natural category but a constructed term. It is ‘a category whose use naturalizes the general idea that, through the media, we achieve a shared attention to the “realities” that matter for us as a society’ (see also Feuer 1983). I agree with both of them that the public sense of time is laboriously constructed, but there are limits to which aspects of temporal experience can be constructed. An LP can never be live in the way a news bulletin is, and a news bulletin can never be recorded like an LP. This distinction has far-reaching consequences and leads me to treat live media separately from recording media throughout the book, while the ideological concept of ‘liveness’ is less important to my argument.

First, I will describe the temporal character of the recording media. According to my analytical method, recorded sound can be arranged in two basic ways:

– live-on-tape (pre-production)
– edit-on-tape (post-production)

A performance can be recorded from start to finish without any interruptions, and this is often called live-on-tape recording. I have already presented this technique, which can also be called pre-production, in detail in relation to the three versions of Ravel’s La Valse. Microphones send the signal directly to a disc (or a broadcasting transmission station), and there is no editing of the signal on the way. The producer can only start and then stop the recording. Everything about the performance must therefore have been planned and rehearsed in advance, and the musicians would do new takes until everyone was satisfied. In a very long period from the 1870s to the 1930s, because of the limitations of the gramophone disc, this was the only way producers could publicize sound. The technique of pre-production is now more or less outdated because of tape and computer editing.

In strong contrast to live-on-tape, there is edit-on-tape recording. Here the technique can also be called post-production (Moylan 1992). In a studio environment the producer selects partial performances from many different times and localities and creates a carefully dramatized entity. The finished product is often called a master, and it can be a music recording for LP or CD, or a programme for radio or some other publication platform. There are great variations in complexity, for example between the hundreds of edits and overdubs in the McLuhan track and the relatively few edits in a reportage for a news bulletin. Indeed, it is possible to use the resources of post-production to make a record that sounds completely untouched, so that most people would believe it was pre-produced. This strategy is called continuity recording, and will be discussed in chapter 7.

Moving on to a discussion of the temporal character of live media, I will stick rather closely to journalism in radio (and television), but notice that the telephone and the internet are also live media. Anyhow, radio and television are live at the point of transmission (Ellis 2000: 31; Hendy 2000: 120), and this means that the programme can always be interrupted with a message if, for example, there is a terrorist attack in a city. The main purpose of journalistic techniques is to present the country or city’s organized life as it progresses through the day, every day. If recorded sound relates to an inner, imaginative time, then live sound relates to the outer, directly shared time.

Think about the dramatic hours on the morning of September 11, 2001, American time, to which I will return in chapter 4. Imagine that you live in New York City and, as the reporter describes the first of the twin towers falling down, you too can actually see it falling, from your penthouse window. This would be a strong case of real-time mediation. John Ellis makes the fundamental point: ‘Transmission is live, even when the programmes are not’ (2000: 31). There are at least four basic ways of experiencing real time through electronic media, and they are all raw materials for liveness effects:

– station flow
– live programmes
– being on the internet
– speaking on the phone

All stations have a continuous organized flow of sound elements. There are typically news updates at the top of the hour, and jingles, promos, advertisements and all kinds of pre-recorded programmes inserted at various times. Recorded programmes can be inserted in the station’s live flow, and ‘are able to claim the status of liveness for themselves simply because the act of transmission attaches them to a particular moment’ (ibid.). The flow can be more or less automated, and it will typically be organized according to the time of day. In the morning the pace and intensity of music and speech is different from shows that are aired in the afternoon or during night-time.

In live programmes the main events obviously progress in real time, with the responses of speakers and other attendants being audible and taking place in human time. There are few or no recorded elements, except for some that are quite audibly recorded. Examples can again be jingles, promos and advertisements, which very few people would mistake for live programming. Over time several genres have developed, mainly live outside reports, for example from a dramatic accident, and live studio shows with guests and telephone conversations and quizzes. There are also specially staged media events, such as sports events, royal weddings or big political rallies (Dayan and Katz 1992). The most spectacular type of live programme is the unexpected event which gets relayed in the form of breaking news.

When people are logged on to the internet they engage in a live communication activity, although often it does not feel especially live. Being on the internet is basically a private activity. People are connected at their own leisure, for a short or long period governed by themselves, and they can download and upload all kinds of information during the session. With broadband it has become more and more common to be constantly online, so that the hook-up feels less live than during the more precarious modem age. Notice that people can contact radio and television stations when they are online, especially through email and posts at chat rooms associated with stations. This is an ever-growing resource for public life.

When people speak on the telephone they obviously engage in a live communication activity. The speakers will never hear their conversations again (unless they are under surveillance, and end up hearing them again as evidence in court). Phone conversations enter into the ongoing flow of life, and skilled phone callers know how to talk to the right people at the right time of day. We all know that we should call our business partners in the daytime and friends and family in the evening, and nobody at all in the middle of the night unless there is an emergency. Regarding the character of telephony, the only exception to live exchanges are the pre-recorded answering machine messages, which nobody would mistake for a live conversation (except if they are intentionally made to confuse people). Notice that live telephone calls are an important resource for talk radio, and people can also use the mobile phone to send a short text message (SMS) to a quiz show or other programme (see chapter 5).

These real-time experiences have something that a record album lacks entirely, namely the special allure of feeling that what you are hearing is actually happening right now. Gary Gumpert (1979: 294) argues that, when the listener knows that something is live, there is an implicit belief that they can influence the future outcome by participation. He describes how the sports fan yells, perhaps pounds the table or strokes his lucky charm, in an effort to make his team score a goal. In a similar vein Shingler and Wieringa (1998:106) write, ‘A listener tuning into a live broadcast can feel that they too are part of the process of “life”, that they are part of history as it is being made, rather than being consumers of the past. ‘Although we cannot be absolutely certain that we hear a live event, we all nevertheless think of it as outer time, real time. The listener wants to feel ‘the aura of uncertainty in which he can cast his evil spell, dispense a blessing, or merely hope’ (Gumpert 1979: 294).

Persuasion in person
Electronic media have a bias towards the personal and private, the welcoming voices, cosiness and fun (see, for example, Langer 1981; Johansen 1999). Listeners can forgive and forget almost anything if they are emotionally attached to the performers. For musicians and journalists alike it is crucial to know how to present oneself in an appealing context. Indeed, personal credibility, or ‘cred’ is one of the greatest values of the mass media. Think of the PR strategies that brand a new folk artist as ‘authentic’ or the glamorous life that many celebrities stage for themselves in order to get press coverage. The techniques in question here are largely intuitive strategies for making listeners feel a certain connection, of giving credibility to oneself in the media setting. It could be called persuasion in person, and it is a craft that has been studied under many names both inside and outside the media (for example, Goffman [1959] 1990;Schutz 1970;Sennett [1974] 1988).

Programmes and recordings alike require careful planning and execution. This is to say that sound media rely heavily on scripts and rehearsals of the different elements in those scripts (Ytreberg 2002, 2004). A script can be of many types, for example a poem to be read out, the piano notation of a melody, or the wording of a news bulletin. The performers’ behaviour can be analysed on a continuum ranging from completely script-based to completely improvised performance. The script is a way to control what gets recorded or what gets on air. Erving Goffman points out that the notion of ‘speaker’ is often discussed in a confused manner, and he presents a threefold definition of its intentionality structures. Speaker means animator – the sounding body from which utterances come. It means author – the agent who puts together, composes or scripts the lines that are uttered. And it means principal – the party to whose position, stand and belief the words attest (Goffman 1981: 226). Based on these distinctions I have identified four inflections of personality that will be used prolifically in the analyses of later chapters:

– reciting speech and song
– eyewitnessing
– role play
– projecting your personality

Firstly, there is the technique called reciting from a script or score. In this type of address the speaker functions as a skilful animator of a script. In radio this strategy was inspired by the public authority of newspapers and telegraphy and radio journalists tried to create an auditory version of this by speaking in a neutral and solemn tone of voice. Individual characteristics of the speaker, such as sex, age, dialect and voice timbre, are therefore suppressed, and supposed to be without importance. This script-driven address may easily be submitted to prior censorship. Although the message should be vivid and lively, there should be as few traces of the speaker’s personality as possible. It should be possible to replace one actor with another without communication being affected by the replacement. This technique is used by singers and journalists alike. The good reader is relaxed, has forgotten about the microphone, and knows how to imagine the audience as a single person. In journalism the author of the script is often the person who reads it, and they have written the script in the way they like to read it. But in music and theatre the author is typically somebody else, for example a composer, a writer, a poet. In journalism the reader often has a strong ethos, for example, being recognized as a journalist with a good reputation. There are many ways of infusing the reading style with authority. For example, educational programs address listeners in an authoritative way, and the sound of the address as such implies that the speaker is an expert in a field, and that they have all the personal credibility and scholarly integrity needed for listeners to trust them.

Secondly, there is the technique called eye witnessing. The speaker is or has been present at the scene of some important event, and the listeners are presumed to acknowledge the speaker’s presence there. The eyewitness describes the event as best they can, and the public expects a realistic description. Very often eyewitnessing is live at the scene, and in these cases the words have to be improvised. Only other persons who were present at the event could replace the current speaker. When someone witnesses and recounts an event it is in a sense the event itself that speaks. It demands a realistic description of its properties, and the speaker is in what Erving Goffman (1981: 233) calls a ‘slave relation’ to it. However, to witness something has two faces: the passive one of seeing and the active one of saying. Witnessing in the rhetorical sense is therefore ‘the discursive act of stating one’s experience for the benefit of an audience that was not present at the event and yet must make some kind of judgment about it. Witnesses serve as the surrogate sense-organs of the absent’ (Peters 2001: 709). Remember that the listener is in no position to challenge the truth claim since they are not present at the scene, and they are likely to trust it. This technique is used only in journalism, and not in music.

Thirdly, there is the technique called role play. In this type of address the speaker plays a role that the listeners are supposed to recognize as unique. The performer pretends to be a character that is described in a score, a fictional play, a radio script or an ad-libbed situation. The interpretation of the role is crucial, and the character of Hamlet has been presented in as many shades as there are actors who have played him. Role play often implies quite strong emotional display, and the behaviour is more lively and exaggerated than in other genres. This is demonstrated, for example, in a stand-up show or talk show. In Goffman’s scheme the speaker is an animator only, and most often the author is another person – a long dead composer, etc. Role play relieves the performer from any expectation of trust-worthiness, since everybody knows that an author or composer put the words in the speaker’s mouth. But there is indeed something about the individual performer’s unique existence that plays a part in the experience of credibility. There are many borderline cases between role play and being oneself.

Lastly, there is the technique called projecting your personality. Politicians, celebrities and journalists have engaged in this technique for a hundred years, and it is intimately related to the presence of the microphone. First I will point out that all humans of course project their personalities in everyday life; we do it in tactical and spontaneous ways, on the phone or face to face. But this is not what I am referring to here. When people project their personality in public there is a necessary awareness of the way in which they comport themselves – a kind of meta-consciousness that is not as noticeable in completely private settings. As I have suggested, there is a sliding scale here, from a weather forecaster reciting in a mechanical voice to the celebrity who suffers a psychological breakdown on prime-time television (Tolson 2006; Salamensky 2001).

The intended effect of the technique I call ‘projecting your personality’ is to come across as honest and unaffected, although this effect can be difficult to achieve. Emotional qualities such as charisma, charm and character are the main communicative tools. The speaker tries to come across as a unique individual, so that nobody else could replace them. A side effect of this impression is that listeners can also hold the speaker personally responsible for their words and actions, since, unlike the weather forecaster, they presume to be speaking only in the capacity of being themselves. They can rightfully be blamed or credited for all aspects of their presentation. This technique is used by singers and journalists alike, but it is most common in journalistic settings. In Gofffman’s scheme the projection of personality implies that the speaker incorporates the three functions of animator, author and principal.

At the end of the chapter I will briefly reconnect the lines of investigation that this book is built on. As I have already mentioned, my theory of sound media has four dimensions: 1) a description of sound and listening; 2) a theory of what a medium is; 3) a method for a backwards history of media; and 4) a method for rhetorical analysis of journalism and music. These methods will now be applied to cover the sound media in Europe and the USA for 130 years. From this detached perspective I hope to show that the media are a joint venture of electro-mechanical resources and human creativity. As a hint about the balance of forces in this venture I will quote McLuhan again: ‘All media work us over completely. They are so pervasive in their personal, political, economic, aesthetic, psychological, moral, ethical and social consequences, that they leave no part of us untouched, unaffected, unaltered. The medium is the massage.’

Go to: Sound Media: Part I – The present time