Scarlet & Grey
Ohio State University
School of Music

Musical Expectation

David Huron

Table of Contents


Research pertaining to musical expectation is reviewed and an over-arching theoretical framework is described. Expectations originate in evolutionarily adaptive mechanisms for anticipating future events. Accurate expectations facilitate gathering information from the world and aid in preparing appropriate motor responses. Expectations are learned in response to statistical regularities evident in one or more cognitively encapsulated environments. Cognitive encapsulation permits the co-existence of different "genres", so diverging expectations can arise depending on how the listener conceives of the genre. The statistical heuristics that drive expectations may vary in their accuracy and some heuristics may evoke expectations that exhibit systematic errors. Independent expectations relate to the "what" and "when" of possible outcomes. Tonality is one manifestation of "what"-related expectations, whereas meter is one manifestation of "when"-related expectations. Expectations may pertain to both immediate successions of events and to more distant contingent events. Expectations can arise from general-purpose schemata or from episodic memories. Sometimes these two memory systems predict different outcomes, accounting for such phenomena as the "surprise" of a deceptive cadence that is otherwise entirely expected. In addition, expectations can adapt dynamically as events unfold. Four sources of expectation-related emotion are distinguished: pre-outcome imaginative and tension responses, and post-outcome appraisal of the outcome, and appraisal of the expectation. Using these resources, musicians have become adept at crafting specific emotional effects. Several musical examples are analyzed.

Table of Contents

1 Introduction
2 Evolutionary Psychology of Expectation
3 Characterizing Expectation
4 Auditory Learning
5 Statistical Properties of Music
6 Reality versus Appearance
7 Musical Genres and Environmental Contexts
8 Realized, Thwarted, Reverse, and Paradoxical Expectations
9 Types of Syncopation
10 Psychological Consequences of Expectation
11 Conclusion
The Baldwin Effect
The Problem of Induction
Event Frequencies
First Impressions
Contingent Frequencies
Mental Representations
Pitch Proximity
Step Inertia
Post-skip Reversal
Actual Melodic Structure - Expected Melodic Structure
Narmour's Theory of Melodic Organization
Scale Degree Expectations
Key Profiles
Scale Degree Distributions and Tonality
Exposure Effect and the Pleasures of the Tonic
Expectation and Enculturation
Expectation in Time
Long-Range Contingent Expectations
Anchoring and Tendency Tones
Anchoring and Embellishment
Schema Selection
Schema Failures
1. Expectations Fulfilled
2. Expectations Thwarted
3. Mixed Expectations
4. Reverse Psychology: Expecting the Unexpected
5. Paradoxical Expectations
The Tenacity of Schematic Expectations
Schematic versus Veridical Expectations

1. Imaginative Response
2. Tension Response
3. Outcome Response
4. Prediction Response
Response Interactions
Expecting What and When
The Poetry of Expectation
Predictability and Boredom
Emotional Effect of Delay
Musical Applications
The Anticipation
The Suspension
The Odd-ball Note
Misattribution and the Exposure Effect
Predictable Music
Emotional Effect of Delay
Expectation Shapes Mental Representations
Presentation Slides


Expectation is a constant part of mental life. A cook expects a broth to taste a certain way. A pedistrian expects traffic to move following a green light. A poker player expects an opponent to bluff. A pregnant woman expects to give birth. Even as you read this book, you have many unconscious expectations of how a written text should unfold. If my text were abruptly to change topics, or if the prose suddenly switched to a foreign language, you would naturally be dismayed. Even less dramatic changes would still have an effect. Some element of surprise would occur if a sentence proved ungrammatical, or if a sentence ended. Prematurely.

Half a centry ago, Leonard Meyer drew attention to the importance of expectation in the listener's experience of music. Meyer's seminal book, Emotion and Meaning in Music, argued that the principal emotional content of music arises through the composer's choreographing of expectation. Meyer noted that composers sometimes thwart the listener's expectation, sometime delay the expected outcome, or simply give the listener what is expected. Emotion and Meaning in Music was written at a time when there was little general experimental or theoretical psychological groundwork to draw upon. In the intervening decades, a considerable volume of research has accumulated. This research provides an opportunity to revisit Meyer's topic, and to recast the discussion in light of contemporary findings. The principal purpose of this book is to fill-in the details and to describe a more comprehensive theory of musical expectation -- a theory I call the "ITPO" theory.

My motivation for developing the theory originated in purely musical ambitions. However, in piecing together a theory of musical expectation, it became clear that the ITPO theory amounts to a general psychological theory of expectation. Accordingly, psychologists are apt to find the theory of interest, even if they have no interest in music. While the theory itself will be described in general terms, the illustrations will be drawn almost entirely from the field of music. Psychologists may wish to skip much of the applied discussion that dominates the latter half of the book. Parallel examples in visual perception, linguisticis, social behavior, and ethology will readily come to mind for those readers who are knowledgeable in such areas.

In recent decades, much of the experimental research pertaining to expectation has focussed on auditory and musical expectation. This is unusual in the world of psychology, where research on perception is dominated by vision. Perhaps the inherently dynamic nature of sound encouraged a greater curiosity about the problem of expectation among auditory researchers. It may well be the case that the experimental research related to musical expectation represents the most advanced of any of the literatures pertaining to expectation. As a musicologist interested in the psychological experience of music, I am all too aware of how much my work has benefitted from discoveries in general psychology and cognitive science. So it is gratifying to imagine that music scholarship may be able to repay some of our debt to psychology.

I should at the outset confess that this book began in a uniquely inauspicious way. It began as a file-folder containing ideas that I didn't want to write about. In 1999, I was invited to give the Ernst Bloch lectures at the University of California, Berkeley. My lectures were entitled "Foundations of Cognitive Musicology", and one of the six lectures concerned a theory of music and emotions I had assembled. At the time, I thought that expectation was a comparatively minor component of the emotional experience of listeners. To be sure, I did not think expectation was unimportant. I just thought that other aspects of auditory-evoked emotion were more central. In writing up my theory of music and emotion for later publication, I bracketed "expectation" as a topic that would be explicitly excluded. Nevertheless, as expectation-related issues surfaced, I wrote brief summaries and tossed them into the folder labelled "Ignore This." As my work on music and emotion progressed, more and more slips of paper were relegated to this file.

In the spring of 2001 I taught a graduate seminar entitled "Music and Emotion". Once again I was eager to segregate the phenomenon of expectation from what I considered the main curriculum. Of course the topic could not simply be hidden from the inquiring minds of my students. I wrote a document entitled "Musical Expectation" whose sole purpose was to provide a stand-alone resource that students could read outside of class. I wanted to prevent the phenomenon of expectation from spilling into what I really wanted to talk about in class.

As I began to write the document, it became clear that my file on expectation had become the proverbial 500 pound gorilla in the filing cabinet. I finally recognized that I could no longer ignore the role of expectation in musically-evoked emotion. The document expanded into an article, and finally into this book. In short, this book began as a "negative" endeavor -- a file of things I wanted to exclude from my other writing projects. Having admitted to its trash-pile origins, I sincerely hope that the finished product has transcended its inauspicious beginnings, and that readers will find that the theory contained here coherent and worthwhile.

A number of colleagues, collaborators, students, and friends have contributed directly or indirectly to this book. I wish to make explicit my debts of gratitude, and to express publicly my most sincere thanks. Much of this work was inspired by research carried out in my lab by post-doctoral fellow Paul von Hippel and graduate student Bret Aarden. Although their work was mostly carried out in my lab at the Ohio State University, I was not always a collaborator. Paul von Hippel took my suggestion about the possible influence of regression-to-the-mean in melodic organization and was able to make major discoveries that produced a silk purse from a sow's ear. In particular, von Hippel's experiments made clear the discrepancy between the reality and appearance of expectations. Bret Aarden took my interest in reaction-time measures in judging melodic intervals, and turned the paradigm into a truly useful tool for investigating musical expectation. His work has transformed the way we understand the Krumhansl and Kessler key profiles.

Although I have always preferred so-called "structural" theories of tonality to "functional" theories, I have benefitted enormously by having David Butler (the principal advocate of functional tonality) as a departmental colleague. Professor Butler's persistent and knowledgeable criticisms of structural theories led me to better understand the importance of concurrent mental representations.

My discussion of rhythmic expectation builds directly on the research of my colleague, Prof. Mari Riess Jones working in the Ohio State University Department of Psychology. Along with her collaborator, Ed Large, they assembled a theory of rhythmic attending that has proved valuable for understanding the "when" of expectation.

Other colleagues provided stimulating conversation, correspondence, critiques and encouragement, including Caroline Palmer, Kristin Precoda, Simon Durrant, Don Gibson, Jonathan Berger, and Joy Ollen. To all of these individuals, my heartfelt thanks.

Finally, I am indebted to the Ohio State University. Of the various institutions I have worked at, this institution has an unusually high concentration of enlightened administrators. The university's support for music cognition has been visionary and unprecedented. Since the 1960s, the OSU School of Music has both tolerated and promoted the systematic and empirical study of music. I am grateful for the supportive and productive research environment.


The theory of expectation proposed here is explicitly founded on principals of evolutionary psychology. From the evolutionary perspective, two questions help to frame the problem: (1) Why did the mental capacity to form expectations arise? That is, what are the adaptive purposes of expectations? And (2) Why might expectations evoke various feeling states? That is, what are the adaptive purposes of the emotional responses that are conjured up due to expectations?

As a starting point, the ability to form accurate expectations about the future is clearly a potentially valuable biological function. It would be an advantage for an animal to be able to anticipate (say) that the trajectory of an approaching object is likely to intercept its path. Similarly, it would be an advantage for an animal to anticipate that eating food in a less conspicuous spot might reduce the likelihood of attracting a crowd of hungry competitors.

Of course it is possible that such behaviors might arise without any phenomenal sense of expectancy. For example, an animal might simply have an innate disposition to seek isolation when eating food. Similarly, the response to avoid an approaching object might merely arise as a conditioned reflex. In each case, it is possible that no "expectation" is involved. The animal need not "expect" that other animals might want to take its food, or "expect" that it will be struck by a moving object unless it takes evasive action. How do we know when a mental state might be properly regarded as one of "expectation?"

One defining characteristic is that the object of expectation is an event in time. Accordingly, an expectation entails some conscious or unconscious mental representation of such a future event. Consider again an animal moving to avoid collision with an approaching object. We might say that this is a conditioned response if the animal holds no mental representation of the hypothetical event of a collision. Similarly, consider again an animal moving food to a less conspicuous location. If the animal takes this action based on some vague anxiety, or some antipathy about being observed, then the phenomenon is not properly speaking one related to expectation. However, if the animal forms a mental image or representation, say of other animals stealing its food, then we might regard the animal's actions as a proper consequence of a mental expectation.

In short, expectation, as conceived here, has something to do with the generation of mental representations of alternative possible future states. In this sense, expectation may be regarded as a cognitive phenomenon. These mental representations might be very complex and conscious, as when a politician imagines what will happen should she win an election. However, the research suggests that most of the mental representations related to expectation are unconsious and engage considerably simpler representations. From a behavioral perspective, it may be quite difficult to determine whether a particular animal behavior arises due to expectation or from some other process. I expect that the differences will become more apparent with advances in neurophysiology. Moreover, from an evolutionary perspective, there may be little difference whether an adaptive behavior is evoked due to a conditioned reflex, or whether it arises from a physiological process we might call "expectation."

The object of expectation is an event in time. Two principal types of uncertainty attend expectation: what will happen and when will it happen. And (2) Why might expectations evoke various feeling states? I propose that what we call "expectation" involves four functionally distinct physiological systems. Each of these systems can evoke responses independently. The responses involve both physiological and psychological changes. Some of these changes are autonomic, and might entail changes of attention, arousal, and motor movement. Others involve noticeable psychological changes such as rumination and conscious evaluation. Outcomes matter, so positive and negative (i.e. "valenced") feeling states can also arise. It is the possibility of influencing these feeling states that attracts musicians to the phenomenon of expectation.

As it happens, these four systems tend to be invoked at different times. Consider Figure 1. Figure 1

Fig. 1. Schematic diagram of the "ITPO" theory of expectation. In expecting some future event, four response systems are activated successively. Feeling states are first activated by imagining different outcomes (I). As the anticipated event approaches, physiological arousal typically increases, often leading to a feeling of increasing tension (T). Once the event has happened, some feelings are immediately evoked related to whether one's predictions were born-out (P). Finally, feel states are evoked that are directly related to the value of the outcome (O). See text.

There are a number of issues related to expectation that will be addressed in this book. It is appropriate to take a moment to identify in advance some of these issues. One issue is the so-called Wittgenstein's paradox. Wittgenstein raised the problem of how it is possible to be surprised by something we know will happen. In music, an excellent example of Wittgenstein's paradox can be found in the deceptive cadence. The deceptive cadence (usually a V-vi harmony) will continue to sound "deceptive" even in musical works that are total familiar to a listener. In Chapter X, we will show how this paradox is resolved by the distinction between viridical and schematic memory. Several other interesting listening phenomena will fall out of this distinction.

Another issue is how is it that we can coherently hold expectations for different genres. For example, we expect a classical string quartet not to exhibit syncopation, but we expect a jazz number to be syncopated. Are there cross-genre influences? Does a modern listener's experience with jazz lessen the effect of a hemiola when listening to a renaissance motet? How rapidly do listeners adapt when listening to a new genre, or new work? Are there piece-specific expectations? What is the relationship between expectation and pleasure?


The world provides an endless stream of unfolding events that can surprise, delight, frighten, or bore. The capacity to form accurate expectations about future events confers significant biological advantages. Those who can predict the future are better prepared to take advantage of opportunities and sidestep dangers. Over the past 500 million years or so, natural selection has favored the development of perceptual and cognitive systems that allowed organisms to anticipate future events. Like other animals, humans come equipped with a variety of mental capacities that help us form expectations about what is likely to happen. Anticipating future sounds is one of the evolved functions of the auditory system. This capacity to anticipate the course of acoustical events inevitably influences how listeners experience music. Moreover, musicians have learned how to manipulate such expectations in order to achieve specific types of responses.

Music scholars have long observed that listening to music engages a mental disposition to anticipate. Some theorists have made expectation a centerpiece in their music theorizing (e.g., Berger, 1990; Gjerdingen, 1988; Kramer, 1982; Larson, 1999; Lerdahl & Jackendoff, 1983; Meyer, 1956; Narmour, 1990, 1992). Many other theorists discuss expectation in the context of other musical phenomena (e.g., Aldwell & Schachter, 1989; Hindemith, 1944; Piston, 1978; Rameau, 1722; Riemann, 1903; Schenker, 1906). In the 1950s and 1960s, writings on musical expectation drew inspiration from the new field of information theory (e.g., Cohen, 1962; Coons & Kraehenbuehl, 1958; Kraehenbuehl & Coons, 1959; Moles, 1958/1966; Pinkerton, 1956; Youngblood, 1958). More recently, musical expectation has attracted the attention of experimentalists (e.g., Aarden, 2002; Abe & Oshino, 1990; Bharucha, 1994; Bigand & Pineau, 1997; Carlsen, 1981; Cuddy & Lunney, 1995; Dowling & Harwood, 1978; Federman, 1996; Francès, 1958; von Hippel, 1998; Jones, 1990; Jones & Boltz, 1989; Krumhansl, 1999; Rosner & Meyer, 1982; Schellenberg, 1997; Schmuckler, 1989; Sloboda, 1992; Thompson, Balkwill & Vernescu, 2000; Unyk, 1990; Werbik, 1969). At the same time, the general phenomenon of expectation has received sustained attention among psychologists working in a number of diverse fields (e.g., Mandler, 1975; Olson, Roese & Zanna, 1996).

In the first instance, organisms would not be able to anticipate future events if the real world did not exhibit some structure. It would be impossible to predict an amorphous world that was devoid of any discernable patterns. Fortunately, the world exhibits many regularities. These regularities provide a useful starting point for understanding the nature of expectation. Expectations can be viewed as hypotheses about the structures underlying real world events (Shepard, 1981).

An important form of regularity is simple event frequency -- the tendency for some events to occur more frequently than others. You are more likely to hear the sound of a bird singing outside your window than the sound of a falling tree. You are more likely to hear a human voice than a bassoon. You are more likely to hear someone say "hello" than "hell no". And you are more likely to hear the pitch C4 than G#8. As we will see, music perception research has established that listening experiences are strongly shaped by such simple event frequencies.

Another form of regularity arises from the fact that some auditory events are contingent upon other events. The sound of my neighbor's car pulling into her driveway has a strong likelihood of being followed by the barking of her dog. The sound of a hammer striking a nail is likely to be followed by a repetition of the same sound. A dominant chord is more likely to be followed by a tonic chord than by a mediant chord. As with event frequencies, music perception research has established that contingent frequencies also influence the way music is experienced.

An additional aspect of expectation is the environmental context. The words a person is likely to utter can change dramatically depending on the situation. The words spoken by a robber holding a gun are predictably different from the words spoken by a man on his knees holding an engagement ring. We may anticipate that the sounds about to be emitted from a singer standing in front of an orchestra will differ from the sounds arising from a singer standing in front of a jazz trio. Expectations shift depending on such environmental contexts. Moreover, in order to form accurate contextual expectations, minds must learn to distinguish and recognize different contexts.

The fact that the world exhibits patterns or regularities does not necessarily mean that we are capable of taking advantage of these regularities. We may fail to decipher, recognize or learn the patterns that exist. For several years I failed to realize that shocks from static electricity are much more likely when I wear a certain jacket. Many listeners fail to learn that the second movement in a multi-movement work is likely to have a slow tempo. Most of the patterns that exist in the world go unrecognized. It is this profusion of unrecognized patterns that provides grist for the enterprise of science.

Even when we do learn to recognize a pattern, we may not recognize the right pattern. Consider, for example, the manner in which the pacific bull-frog anticipates a meal. During the Second World War, American soldiers stationed on pacific islands discovered an unusually maladaptive frog behavior. Soldiers discovered that if they rolled lead pellets from a shot-gun shell toward a bull-frog, the frog would immediately thrust its tongue forward and eat the pellet. Curiously, the frog would do this repeatedly, never learning to avoid consumming the lead shot.

This is not at all a nice thing to do to a frog. But the phenomenon highlights an important fact about frog behavior. The frog has an instinct to eat anything that is small and black and moving. That is, the pattern "small-black-moving" causes the frog to anticipate a meal. In most circumstances, this instinctive behavior is beneficial for the frog. But in exceptional circumstances, the frog's disposition is utterly inept. For the pacific bull-frog, this behavior is instinctive, so the frog is incapable of learning a more nuanced behavior. While there are important advantages to instinctive behaviors, the case of the pacific bull-frog vividly demonstrates why learned behaviors can be superior to pre-wired instincts.

This raises the general question of whether expectations are learned or innate. As we will see, there are excellent reasons why auditory expectations would be predominantly learned. Since we know that learning can be incomplete or inaccurate, we might also expect to see evidence of "poorly learned" expectations in sound and music. In PART I, we will consider in detail evidence concerning the nature of auditory learning.

Expectations can differ with respect to their time-frame. Some expectations pertain to the flow of immediately successive events, as when your eyes move predictably along a line of text. Other expectations relate to longer time-frames, as when a person anticipates a surprise birthday party several days in advance. In music, contingent events occur in both the short-term succession of notes, and in longer-term expectations, such as an impending cadence, an anticipated modulation, or the expectation of the ensuing song on a recorded album. In PART II, we will examine the different time frames that exist in anticipating future musical events.

In PART III we will note that minds are sensitive to the contexts of different worldly regularities. Important cognitive functions have evolved in order to ensure that these contexts are segregated from one another. We will note that these encapsulated contexts make it possible for different musical styles and genres to exist. Several different sets of auditory expectations can co-exist within the mind of a single listener.

In PART IV a comprehensive theory will be proposed whose purpose is to account for the observed psychological consequences linked to expectations. As we will see, accurate expectations are rewarded -- even when the predicted outcome is unpleasant. Four different types of responses will be distinguished; two types of responses precede the stimulus, and two further types of responses follow with the advent of the stimulus. The theory will be illustrated by analyzing several musical passages. The theory is not restricted to musical or auditory phenomena, however, and can be applied to any expectation-related behavior.

It is the capacity for expectations to evoke largely predictable emotional responses that makes the manipulation of psychological expectation such a compelling phenomenon for musicians. We will see that a number of common compositional techniques can be plausibly attributed to the manipulation of listener expectations. At the same time, we will see the importance of enculturation in establishing a background of auditory expectations that make it possible to use specific musical devices.

Following a summary conclusion, we will identify as yet poorly understood aspects of the psychology of expectation, and point to future research possibilities.


The Baldwin Effect

Whether it is best for a behavior to be instinctive or learned depends in part on the stability of the environment. When an environment changes relatively rapidly it becomes difficult for an adaptive instinct to evolve. Biological examples of this phenomenon abound. For example, the most flavorful insect eaten by a species of salimander keeps changing color markings every decade or so. Rather than providing the salimander with an instinct to eat insects with a fixed coloration, a better adaptation would provide the salimander with the capacity to learn which color markings are indicative of a tasty food source.

The idea that evolution can account for the capacity to learn without invoking a Lamarckian notion of inherited learning was postulated in 1896 by James Baldwin. An evolved capacity to learn is consequently referred to as the Baldwin Effect (Baldwin, 1896, 1909).

Conceptually, auditory expectations might include both innate and learned components. A small number of aspects of human audition appear to be innate. For example, loud unexpected sounds will reliably evoke a startle response in all animals that have a sense of hearing. This response engenders a number of physiological changes that prepare the individual for possible defensive action -- such as increased heart rate and perspiration. Similarly, the orienting response is an innate reflex that causes listeners to direct their auditory gaze at unexpected sounds. This response produces physiological and neurophysiological changes that facilitate gathering further information from the environment (Lang, Simons & Balaban, 1997). However, apart from a handful of such reflexes, the extant research strongly implicates learning. This reliance on learning, in turn, implies that the auditory environment in which humans evolved was characterized by a high degree of acoustic variability. Like the salimander eyeing the color markings of an insect, humans could not necessarily count on a given sound to have a reliable or invariant "meaning."

The Baldwin effect holds important repercussions for our understanding of music's creative future. If learning plays the preeminent role in forming auditory expectations, then this suggests that musicians may have considerable latitude in creating a wide range of musics for which listeners may form appropriate expectations.

The Problem of Induction

Before we begin talking about auditory expectations, we should consider how auditory learning takes place. Learning from experience is regarded by philosophers as the premiere example of inductive reasoning. Induction is the process by which some general principle is inferred from a finite set of observations or experiences.

The 18th-century Scottish philosopher, David Hume, recognized that there are serious difficulties with the method of induction. Hume noted that no amount of observation could ever resolve the truth of some general statement. For example, no matter how many white swans one observes, an observer would never be justified in concluding that all swans are white. Epistemologists agree that, in contrast to deductive reasoning, inductive reasoning is inherently fallible. From a purely logical point-of-view, it is not possible to infer the true principles underlying the world, solely from experience.

At first, the problem of induction would seem to make "knowledge" about the world impossible. Clearly, organisms do indeed learn from experience. The problem of induction merely places restrictions on this knowledge. Inductive knowledge must be contingent and fallible. Inductive knowledge is vague and adaptive, rather than precise and logical.

How, we might ask, has nature addressed the problem of induction? On what basis do organisms form generalized principles about the patterns of the world? It appears that nature approaches the problem in a manner quite similar to the methods of empirical science. Experiential learning appears to be statistical in nature. Most swans are white is good enough.

One of the most important discoveries in auditory learning has been that listeners are sensitive to the probabilities of different sound events. Learning occurs for both event frequencies and contingent frequencies.

Event Frequencies

Both humans and animals are attuned to the frequency of occurrence for various stimuli in their environments. This sensitivity to probabilistic patterns is evident in auditory, visual and tactile stimuli, and has been observed in a number of species (see Hasher & Zacks, 1984; Gallistel, 1990; Kelly & Martin, 1994; Reber, 1993 - as cited in Saffran et al, 1999).

Perhaps the best example of event frequency learning in music is the phenomenon of absolute pitch. A person who possesses absolute pitch can name or identify the pitch of a tone without any external reference. Obviously, absolute pitch must involve learning since the pitch categories and labels are culture-specific. But the evidence for learning runs much deeper. People who have absolute pitch are slower at identifying some pitches than others. For example, the pitches C and G and more quickly identified than E and B; similarly, the pitches C# and F# are more quickly identified than D# and G# (Miyazaki, 1990; Takeuchi & Hulse, 1991). In general, identifying black notes is slower than white notes. Simpson and Huron (1994) carried out a study that simply tallied how often each pitch occurs in a large sample of music. As one might expect, white notes are more common than black notes, and pitches like C# and F# occur more frequently than pitches like D# and G#. Simpson and Huron went on to show that the relationship between speed of identification and frequency of occurrence follows a well-known law of learning known as the Hick-Hyman law (Hick, 1952; Hyman, 1953). The learning occurs by simple exposure, and listeners learn best those sounds that have the highest event frequencies. Another way of interpreting the Hick-Hyman law is that perception is more efficient for expected stimuli than for unexpected stimuli.

First Impressions

Only a minority of listeners have the skill of perfect pitch. More commonly, listeners hear tones with respect to a scale context. In Western tonal music, pitches may tend to be heard as scale degrees.

If listeners have internalized a simple probability distribution of events based on past experience, then we might expect that listeners would tend to assume that the first thing they hear would correspond to the most common event. For example, since the tonic and dominant pitches are among the most common pitches in music, [1] we might expect listeners to assume that an isolated pitch will be the tonic or dominant. Conversely, we would expect that listeners might have difficulty hearing an isolated tone as an improbable scale degree. Recall that the purpose of expectation is to form accurate predictions about the world -- so it should come as no surprise that good listeners would tend to expect an isolated pitch to be the tonic.

In Huron (1999), musician listeners heard isolated tones and were asked to imagine the tone as a particular scale degree. For example, the pitch G#4 might be played and the listener instructed to imagine the tone as the dominant pitch. Once they were able to hear the pitch as the specified scale degree, they responded by pressing a key. In order to ensure that listeners were responding honestly, a harmonic cadence was played immediately following the key-press, and listeners were asked to indicate whether the cadence corresponded to the correct key or not. Fig. 1 shows the average response times for only those responses where the listener correctly recognized that the cadence passage was in/out of the correct key.

Figure 1
Fig. 1. Average response times for listeners to hear an isolated tone as a specified scale degree. Data are shown only for responses where the listener correctly recognized that an ensuing cadence passage was in/out of the correct key.

As can be seen, the fastest average response time is for the tonic pitch, followed by the dominant. That is, listeners were most easily able to imagine an isolated tone as the tonic or dominant. Some scale tones, like the supertonic and subdominant, are somewhat slower. The especially slow processing for "fah" will strike musicians as odd, since it is not a notably rare pitch. However, if we look at the initial notes in a large sample of major-key melodies, it turns out that "fah" occurs least frequently of all the scale tones. Melodies tend not to begin with "fah", and this fact is reflected in the difficult listeners have in conceiving an isolated tone as "fah".

In effect, listeners tend to hear an isolated pitch as though it is the starting pitch of a major-key melody: listeners tend to form expectations that approximate the distribution of melody-initiating tones. The most frequently occuring starting scale degrees prove to be the easiest to process mentally. [N.B. von Hippel has collected data about similarly echoes listeners assumptions of absolute pitch height]. Even before the first note of music is sounded, listeners have expectations. Moreover, once the first note sounds, listeners are already "jumping to conclusions."

For musicians, these experimental observations simply affirm our informal subjective intuition that listeners tend to assume that an isolated pitch corresponds to the tonic.

Contingent Frequencies

Event frequencies pertain to the simple likelihood of individual events without regard to preceding events. But humans and other animals also learn to anticipate sounds on the basis of what has just been heard. For example, the probability of hearing the tonic pitch is increased if we are currently hearing the leading-tone. These context-related regularities are referred to as contingent frequencies or conditional probabilities.

Jenny Saffran, Richard Aslin and their colleagues carried out a set of seminal experiments that demonstrate the statistical manner by which tone sequences are learned by listeners. Saffran, Johnson, Aslin and Newport (1999) constructed various musical "vocabularies" consisting of 3-note "figures." An example of a vocabulary consisting of six basic melodic figures is notated Fig. 2.

Figure 2
Fig. 2. Sample of six melodic figures used in Saffran et al (1999). Exposure tone sequences were constructed by randomly stringing together such figures.

Using these figures, Saffran et al constructed a long (seven minute) tone sequence that consisted of a random selection of the six figures. Fig. 3 shows a sample excerpt from the sequence; it begins with figure #2, followed by figure #4, followed by figure #6, followed by figure #5, and so on. The random sequences were constrained so that no individual 3-note figure was repeated twice in succession.

Figure 3
Fig. 3. Sample tone sequence used in the exposure phase of Saffran et al (1999). Sequences were constructed from the three-note figures shown in Figure 2. Tone sequences were constrained so no single figure was repeated twice in succession.

Twenty-four listeners heard the seven-minute sequence three times for a total of 21 minutes of exposure. Note that the listeners had no prior knowledge that the tone sequence was conceptually constructed using a vocabulary of 3-note figures: listeners were simply exposed to a continuous succession of tones for 21 minutes.

In order to determine whether listeners had passively learned to preferentially recognize any of the 3-note figures, the 21-minute exposure phase was followed by a test phase. For each of 36 trials, listeners heard two 3-note stimuli. One stimulus was selected from the six vocabulary items whereas the other 3-note stimulus had never occurred in the entire tone sequence. A sample test item is illustrated in Fig. 4 -- the first sequence is a vocabulary item whereas the second sequence is not:

Figure 4
Fig. 4. Sample test stimuli used in Saffran et al (1999). Listeners heard two three-note sequences and were asked to identify which sequence was more familiar.

Listeners were asked to identify which of the two 3-note items was more familiar. The results were clear: listeners correctly identified the three-note sequences they had been exposed to.

A possible objection to Saffran's experiment is that 4 out of 6 of the vocabulary items end on the pitches of a D major triad (D, F#, A). The pitches used in this experiment are consistent with the key of D major, so perhaps Saffran's listeners were merely preferring test items that implied some tonal closure.

Actually, the experiment was a little more sophisticated. The twenty-four listeners were divided into two groups. Only half of the listeners were exposed to the tone sequences described above. The other listeners were exposed to a different sequence constructed from six entirely different vocabulary "figures." Both groups of listeners were tested, however, using precisely the same test materials. The pairs of three-note figures were organized so that what was a vocabulary item for Group #1 was a non-vocabulary item for Group #2 and vice versa. What one group of listeners deemed "familiar" was the precise opposite of what the other group deemed "familiar."

This experimental control allows us to conclude that what listeners heard as a "figure" had nothing to do with the structure of the figures themselves, and relates only to their simple probability of occurrence. A simple linguistic analogy might help to clarify the results. Suppose you heard a long sequence of repeated syllables ... abababababa ... How would you know whether you were supposed to hear ab, ab, ab, ab, ab ... or ba, ba, ba, ba, ba ...? In effect, Saffran trained two different groups of listeners, one to hear the sequence as ab, ab, ab ... and the other to hear the sequence as ba, ba, ba. (In fact, in an earlier experiment, Saffran, Newport and Aslin (1996) had done exactly this for spoken syllables.) For each item in the test phase, one group of listeners heard as a figure what the other group heard as a non-figure and vice versa.

Saffran and her colleagues went on to repeat both experiments with 8-month old infants. Infants tend to stare longer in the direction of novel stimuli. By tracking head movements in the test phase, they were able to show that the unfamiliar figures were perceived as exhibiting greater novelty for the infants. Once again, the infants were divided into two groups and exposed to different random sequences. That is, in the test phase, what was a "vocabulary" item for one group of infants was a "non-vocabulary" item for the other group, and vice versa. In short, both infants and adults learned to recognize the most frequently occurring patterns -- whether tone sequences or phoneme sequences. Moreover, those patterns that occurred most frequently, were the patterns that both adults and infants best recognized.

It is important to note that there were no silent periods, dynamic stresses or other cues to help listeners parse the figures. From the listener's perspective, the figures might have consisted of 2-note groups, 3-notes groups, or some other group size or mixture of group sizes. Also recall that none of the figures were repeated twice in succession. Since two groups of listeners learned diametrically opposite "motivic vocabularies", the internal structure of the figures had no effect on the perception of grouping. This means that the only possible conclusion is that listeners were cuing on the simple statistical properties of various tone sequences. More precisely, listeners were learning the contingent frequencies: given pitch X, the probability of pitch Y is high, but the probability of pitch Z is low, etc.

The 21-minute period of exposure allowed listeners to form a sense of the likelihood of different pitch successions. Table 1 shows the long-term conditional probabilities for sequences using the six figures shown in Fig. 2. The vertical axis indicates the antecedent state (initial note) and the horizontal axis indicates the consequence state (following note). For example, the probability of the pitch `C' being followed by a `C#' is 0.056. That is, 5.6 percent of C's are followed by C#'s. By contrast, the pitch `C#' is never followed by the pitch `C'.

Table 1

consequent state
c c# d d# e f f# g g# a b
c 0 0.056 0 0 0 0.056 0.056 0 0 0 0
c# 0 0 0.056 0 0 0 0 0 0 0 0
d 0.011 0 0.022 0.011 0 0.078 0 0.022 0 0.022 0.056
d# 0 0 0 0 0.056 0 0 0 0 0 0
e 0.011 0 0.011 0.011 0 0.011 0 0.011 0 0.011 0
f 0.056 0 0 0 0.056 0 0 0 0 0 0
f# 0.011 0 0.011 0.011 0 0 0 0.011 0 0.011 0
g 0 0 0 0 0 0 0 0 0.056 0 0
g# 0 0 0 0 0 0 0 0 0 0.056 0
a 0.011 0 0.067 0.011 0 0.011 0 0 0 0.011 0
b 0.011 0 0.011 0.011 0 0.011 0 0.011 0 0 0

Applying these probabilities to the original exposure sequence, we can identify the likelihood of each pitch-to-pitch transition. Fig. 5 provides a schematic illustration of the transitional probabilities for the sequence shown in Fig. 3. Thick lines indicate pitch successions that have a strong probability of occurrence. Thin lines are less strong. No line indicates a weak likelihood. Notice how the 3-note structure of the figures can arise simply by recognizing strong conditional probabilities. Indeed Saffran's experiments establish precisely this fact: in order for a listener to learn to hear this sequence as constructed from 3-note vocabulary "motives" the listener would have to recognize, in some sense, that the boundaries between vocabulary motives have relatively low probabilities.

Figure 5

Fig. 5. Sample exposure stimuli showing the long-term statistical probabilities of note-to-note transitions. Thick lines indicate high probability. Thin lines indicate medium probability. Absence of line indicates low probability.

The work pioneered by Richard Aslin and Jenny Saffran provides just one of many examples showing how people (and animals) learn from exposure. Much of the research in this area pertains to vision, but Saffran and Aslin have shown that the same statistical learning processes occur for adult and infant listeners -- both when listening to speech as well as when listening to tone sequences. In effect, both adult and infant listeners build a representation of the transitional probabilities between adjacent tones in a tone stream, grouping together tones with high transitional probabilities, and forming figure-boundaries at locations in the tone stream where transitional probabilities are low. The statistical properties of the sequence are learned as a by-product of simple exposure, without any conscious awareness by the listener.


The work of Jenny Saffran and others has established that listeners are sensitive to the probabilities of different sorts of events. But in Saffran's work, the tone sequences exhibited properties that were based on purely artificial probabilities constructed for her experiments. If we want to understand music-related expectations then we should focus on whatever statistical regularities real music exhibits.

There are indeed a number of stable probabilistic relationships that can be observed in music. Some of these probabilities reflect properties of individual musical works. Huron (2001a) for example, has shown how comparative probabilistic analyses can be used to identify thematic and motivic features in a musical work and distinguish one piece from another. Other probabilities appear to reflect properties of particular styles or genres (Moles, 1958/1966). Yet other probabilities appear to reflect properties of music as a whole. We might begin our musical story by looking for statistical regularities that seem to characterize Western music in general.

Mental Representations

Before continuing we might ask what is it that listeners represent when they form mental analogs of probability structures? For example, are tone sequences represented as pitches or as intervals? Saffran's experiments do not address this issue. A variant of Saffran's experiments might present the test materials transposed upward or downward and compare the associated recognition scores with those for the untransposed materials. If there is no difference, then the result would suggest that listeners employ a relative-pitch or interval based mental representation rather than an absolute pitch based representation. Conversely, if transposed figures evoke only chance recognition, then the results would suggest that listeners rely on an absolute pitch-related representation.

So what are the mental representations used by listeners? Theoretically, possible representations might include absolute pitch, pitch chromas (or pitch classes), intervals, scale degrees, contours, duration, relative duration, metric position, harmonic functions, chord qualities, spectral centroids, or other concepts.

Experimental evidence suggests that all of these representations are used by at least some listeners in some listening situations. Clearly, absolute pitch representations are available only to a minority of listeners -- those with perfect pitch. Musical coding may involve several concurrent representations; Dowling (1978), for example, has proposed that for melodies, the most important pitch-related representations are scale degree and contour. Despite the research, little is known at the moment about the mental coding of music.

In some circumstances, knowledge of the precise nature of the mental representation may not be important. A useful way to illustrate this is provided by information theory. The field of information theory (Shannon, 1948; Shannon & Weaver, 1949) has provided useful mathematical techniques for characterizing the probablistic relationships between events. Information theory inspired a number of music theorists throughout the 1950s and early 1960s. However, it was abandoned (for reasons that are not entirely clear) by about the mid 1960s. [2] Information theory provides a way to measure contingent probabilities. When rolling dice, for example, we know that the number rolled is independent of numbers previously rolled (this is true even for loaded dice). By contrast, other events exhibit contingent effects as when the occurrence of the letter "u" in English text is considerably increased when preceded by the letter "q".

Figure 6 plots the flow of information for the tune Pop Goes the Weasel. Information is plotted (in bits) for five different representations. For example, the upper-most plot shows information according to the probabilities of different scale degrees. The probabilities used in Fig. 6 were derived from an analysis of roughly 6,000 Western European folk songs.

Figure 6
Fig. 6. Information theoretic analysis of Pop Goes the Weasel showing changes of information (in bits) as the piece unfolds. Plotted information includes scale degree, scale degree succession (degree diad), metric position, melodic interval, and melodic interval succession (interval diad).

Notice that the information for both scale degree and melodic interval representations peak at the word "pop". For scale degree diad and interval diad the word "pop" coincides with the second highest information value -- with the maximum value following immediately after the word "pop". There appears to be an element of musical "surprise" at this point that is echoed in the lyrics. As a children's action song, this point is usually accompanied by some abrupt action, also suggestive of surprise.

Note, however, that there is no comparable information peak for metric position. That is, the interval/pitch/scale-degree may be relatively surprising, but the moment of its occurrence is not surprising. This highlights a distinction that can be made between the what and when of surprise. In some musical situations, the "what" is expected, whereas the "when" may be relatively unexpected. A well-known example is evident in the popular "Ode to Joy" from Beethoven's Ninth Symphony, where one of the phrases begins a beat early.

With the exception of the metric position information, all of the pitch-related information values are positively correlated. Table 2 shows a correlation matrix for the information content (measured in bits) for the various representations used in the above analysis of Pop Goes the Weasel. An analysis of a sample of 200 melodies from American, Chinese, Dutch, Pawnee, and Xhosa sources confirms that these positive correlations are endemic.

Table 2

degree degree dyad metric position interval interval dyad
degree +1.00
degree dyad +0.45 +1.00
metric position -0.31 -0.05 +1.00
interval +0.17 +0.74 -0.00 +1.00
interval dyad +0.30 +0.90 +0.02 +0.77

The fact that different musical representations are positively correlated is both an advantage and a disadvantage. The advantage is that it implies that we can proceed with a probabilitistic analysis of music with relatively little concern over the choice of representation. On the other hand, this high correlation invites onerous mistakes of interpretation (as we will see). Results of perceptual experiments may very well be consistent with a particular representation, but the same results are likely to be consistent with several other alternative representations as well. For example, a result that is consistent with small interval sizes, will also be consistent with successions of neighboring pitches, or with close pitch chromas, or with small log-frequency differences between fundamentals, or with small differences in spectral centroid, or with small critical band distances, or with tonotopic proximity along the cochlear partition.

Pitch Proximity

One of the best generalizations we can make about melodies is that they typically employ sequences of tones that are close to one another in pitch. This tendency to use small intervals has been observed over the decades by innumerable researchers, including Ortmann (1926), Merriam, Whinery and Fred (1956), and Dowling (1967). Fig. 7 reproduces results in Huron (2001b) showing the distribution of interval sizes using samples of music from a number of cultures: American, Chinese, English, German, Hasidic, Japanese, and sub-saharan African (Pondo, Venda, Xhosa, and Zulu). For a broad range of cultures, the preponderance of intervals tend to be small. Only pseudo-polyphonic melodies (such as yodelling) fail to consist predominantly of small pitch movements.

Figure 7
Fig. 7. Frequency of occurrence of melodic intervals in notated sources for folk and popular melodies from ten cultures (n=181). African sample includes Pondo, Venda, Xhosa, and Zulu works. N.B. Interval sizes only roughly correspond to equally-tempered semitones.

In 1981, James Carlsen carried out an experiment to determine whether listeners tend to expect small interval continuations. Carlsen tested listeners from three different Western cultures: American, German, and Hungarian listeners. Although there were some differences between these three groups, all listeners showed a marked expectation for continuations involving small pitch movements.

Unlike Saffran, Carlsen's work did not explicitly establish that the expectation for small intervals is learned by exposure to the music. (It is theoretically possible that these expectations might have some other origin.) But given the facts that melodies tend to use mostly small intervals, and that the auditory system is sensitive to frequently occurring phenomena, it is not unreasonable to suppose that listeners might have learned to expect small intervals. At a minimum, we can conclude that small pitch intervals are a common feature of real music, and that listeners appear to expect small intervals.

Step Inertia

Another property of melodic expectation pertains to what Paul von Hippel has called step inertia. This is the idea that small pitch intervals (1 or 2 semitones) tend to be followed by pitches that continue in the same direction. Music theorist Eugene Narmour has suggested that listeners form these sorts of "step inertia" expectations for melodies and has even suggested that these expectations might be based on innate dispositions (Narmour, 1990).

The first question to ask is whether melodies themselves are indeed organized according to step inertia. Is it the case that most small pitch intervals tend to be followed by pitch contours that continue in the same direction? The answer to this question is a qualified yes. Von Hippel examined a large sample of melodies from a broad sample of different cultures. He found that only descending steps tend to be followed by a continuation in the descending pitch direction. Roughly 70% of descending steps are followed by another descending interval. In the case of ascending steps, no trend is evident. Following an ascending step, melodies are as likely to go down as to continue ascending (see Table 3).

Table 3
Followed by Ascending Step Followed by Descending Step
Initial Descending Step
Initial Ascending Step

Probabilities for Step-Step movements in a large sample of Western and Non-Western musics

But what about listeners' expectations? Do listeners expect a step movement to be followed by a pitch movement in the same direction? Von Hippel (2001) carried out the pertinent experiment and measured listeners' expectations in a variety of melodic circumstances. Von Hippel's listeners heard a twelve-note sequence and were then asked to indicate whether they expected the next note to be higher or lower. The results showed that listeners do indeed expect descending steps to be followed by another descending interval. Surprisingly, listeners also expect ascending steps to be followed by another ascending interval. That is, the results are consistent with Narmour's suggestion of step inertia.

But real melodies exhibit a tendency for step inertia only for descending intervals. So why do listeners expect step inertia for both ascending and descending contexts? Von Hippel suggested a plausible logic as to why listeners "over-generalize" in forming their melodic expectations: Notice that since ascending steps have a 50-50 chance of going in either direction, there is no penalty for (wrongly) assuming that ascending steps should typically continue to go up. That is, the expectation for step inertia is no worse than chance for ascending contours. Since the strategy of expecting step inertia pays off for descending intervals, listeners who form a step-inertia expectation will still, on average, have more accurate expectations than a listener who has no step-inertia expectation.

The "step-inertia" strategy is favored for another reason as well. Working at the University of Nijmegen in the Netherlands, Piet Vos and Jim Troost (1989) discovered that large melodic intervals are more likely to ascend and that small melodic intervals are more likely to descend. Fig. 7 shows the frequency of occurrence of ascending intervals for different interval sizes. The dark bars show the results for Western classical music whereas the light bars show the results for mainly Western folk music. Fewer than 50% of small intervals ascend. The reverse holds for large intervals:

Figure 7

Fig. 7. Frequency of occurrence of non-unison ascending intervals. Dark bars: sample of 13 Western composers. Light bars: sample of Albanian, Bulgarian, Iberian, Irish, Macedonian, Norwegian, and American Negro folk songs. (After Vos & Troost, 1989.)

Since ascending steps occur less frequently than descending steps, there is even less of a penalty for wrongly expecting that an ascending step is likely to continue in the same direction. The bias favoring descending steps therefore further increases the likelihood that a step-inertia expectation will pay off.

There is one noteworthy complication that arises from Von Hippel's experiment. Von Hippel tested both musician and non-musician listeners. He found step-inertia expectations only for the musician participants. The non-musicians had no discernable pattern related to step-interval antecedents. It is plausible that musicians have more experience listening to music than non-musicians. If so, it may be that the origin of the step-inertia expectation is attributable to passive learning through extensive exposure.

Post-skip Reversal

We have seen that listeners expect melodies to consist mostly of small pitch intervals. Experienced listeners also expect that small intervals tend to be followed by pitches that preserve the melodic direction -- although musical melodies only exhibit step-inertia for descending intervals. What about expectations following large intervals?

For hundreds of years, music theorists have observed that large intervals tend to be followed by a change of direction. More specifically, most of the theorists who have commented on this purported phenomenon have suggested that large intervals tend to be followed by step motion in the opposite direction. Since most pitch intervals are small, any interval should tend to be followed by step motion. The important part of the claim is the idea that large leaps should be followed by a change of direction. Following Paul von Hippel, we can call this purported tendency post-skip reversal (von Hippel, 1998).

Once again, the first question to ask is whether actual melodies conform to this principle. Do most large leaps tend to be followed by pitches that change direction? In 1924, Henry Watt tested this idea by looking at melodic intervals in musical samples from two different cultures: Lieder by Franz Schubert and Ojibway Indian songs. Watt's results for Schubert are shown in Fig. 8.

Figure 8
Fig. 8. Watt's (1924) analysis of intervals in Schubert Lieder. Larger intervals are more likely to be followed by a change of melodic direction than small intervals. Watt obtained similar results for Ojibway Indian songs. No data point corresponds to 11 semitone intervals because of the absence of such intervals in Watt's sample. From von Hippel and Huron (2000).

For intervals consisting of 1 or 2 semitones, roughly 25 to 30 percent of contours change direction. That is, the majority of small intervals continue in the same direction. However, as the interval size increases, the graph tends to rise upward to the right. For octave (12 semitone) intervals, roughly 70 percent of intervals are followed by a change of direction. (There is no data point corresponding to 11 semitones because there were no 11-semitone intervals in Watt's sample.) Watt found similar results for the Ojibway songs.

Von Hippel and Huron (2000) carried out further tests of this idea using a broader and more diverse sample of melodies from cultures spanning four continents: traditional European folksongs, Chinese folksongs, South African folksongs and Native American songs. Once again, for each of these repertories, the majority of large intervals are indeed followed by a change of direction.

Von Hippel and Huron proposed a rather unexciting reason for the existence of post-skip reversal. Most large intervals tend to take the melody toward the extremes of the melody's range. For example, a large ascending leap has a good probability of placing the melody in the upper region of the tessitura or range. Having landed near the upper boundary, a melody has little choice but to go down. That is, most of the usable pitches lie below the current pitch. Similarly, most large descending leaps will tend to move the melody near the lower part of the range, so the melody is more likely to ascend than to continue descending.

Melodies do not simply wander around the range of human hearing by taking mostly small steps. Instead, melodies exhibit pitch distributions that show a central tendency. That is, melodies display a stable tessitura or range. The most frequently occurring pitches in a melody lie near the center of the melody's range. Pitches near the extremes of the range occur less commonly.

Statisticians have shown that whenever a distribution exhibits a central tendency, successive values tend to "regress toward the mean." That is, when an extreme value is encountered, the ensuing value is likely to be closer to the mean or average value. Regression-to-the-mean should not be regarded as a "phenomenon." There is no "force" or "magnet" drawing values toward the mean. Regression-to-the-mean is simply an artifact of the fact that most values lie near the center of the distribution.

When you encounter a tall person, the next person you encounter is likely to be shorter. But the shorter person is not "caused" by the previous encounter with a tall person. It is simply a consequence of the fact that most people are near average height. Similarly, when we encounter a high pitch, we must be careful about assuming that movement toward the high pitch will somehow "cause" the next pitch to be lower.

If post-skip reversal is a consequence of regression-to-the-mean, then we ought to see a difference for leaps, depending on where they occur in the range. Consider the ascending intervals shown in Fig. 9. In this schematic illlustration, the mean or median pitch for the melody is represented by the bold center line in the staff. The first ascending leap takes the contour above the median. Both regression-to-the-mean and post-skip reversal would predict a change of direction to follow. In the second case, the ascending leap straddles the median pitch. Once again, both regression-to-the-mean and post-skip reversal predict a change of direction. In the third and fourth cases, the two theories make different predictions. In the third case, the leap lands directly on the median pitch. Post-skip reversal continues to predict a change of direction, whereas regression-to-the-mean predicts that either direction is equally likely. Finally, in the fourth case, the leap lands below the median pitch. Here regression-to-the-mean predicts that the contour should continue in the same direction (toward the mean), whereas post-skip reversal continues to predict a change of direction. So how are real melodies organized? Are they organized according to post-skip reversal? Or according to regression-to-the-mean?

Figure 9
Fig. 9. Four hypothetical interval relationships relative to the median (or average) pitch (represented by the bold central line): (1) median-departing leap, (2) median-crossing leap, (3) median-landing leap, and (4) median-approaching leap. See also Figure 10.

In order to answer this question, von Hippel and Huron (2000) studied several hundred melodies from different cultures and different periods. For each melody we calculated the median pitch and we then examined what happens following large leaps. Our results are plotted in Fig. 10, for the case where a `skip' is defined as intervals larger than 2 semitones. The black bars indicate instances where an interval is followed by a change of direction. The grey bars indicate instances where an interval is followed by a continuation in the same direction.

Figure 10
Fig. 10. Number of instances of various melodic leaps found in a cross-cultural sample of music. Most large intervals that approach the median pitch continue in the same melodic direction. Large intervals that land on the median pitch are as likely to continue in the same direction as to reverse direction. Results support the phenomenon of melodic regression, and fail to support post-leap reversal.

If post-skip reversal is the important organizing principle, then we would expect to see taller black bars than grey bars in each of the four conditions. By contrast, consider regression-to-the-mean. This would predict that black bars should be taller than grey bars for the median-departing and median-crossing conditions (which is the case). For skips that land on the median pitch, regression-to-the-mean would predict roughly equivalent numbers of continuations and reversals (that is, we would expect the black and grey bars to be roughly the same height -- which is the case). Finally, in the case of median-approaching skips, regression-to-the-mean would predict that melodies ought to be more likely to continue in the same direction toward the mean (that is, we would expect the grey bar to be taller than the black bar -- which is again the case).

Von Hippel and Huron carried out further statistical analyses which reinforce the above result. With regard to large intervals, melodies behave according to regression-to-the-mean and are not consistent at all with the idea of post-skip reversal. The further the leap takes the melody away from the mean pitch, the greater the likelihood that the next pitch will be closer to the mean. If a leap takes the melody toward the mean, then the likelihood is that the melody will continue in the same direction. Incidentally, we tried a number of different definitions of "large" leap. The results are the same no matter how a leap is defined in terms of size. We also looked for possible "delayed" resolutions. That is, we looked to see whether the second or third note following a large leap tended to change direction. Once again, the aggregate results always conformed to regression-to-the-mean, but not post-skip reversal. This was true in Schubert, in European folksongs, in Chinese folksongs, in sub-Saharan African songs, and in traditional Native American songs.

It bears reminding that most large intervals are indeed followed by a change of direction. (For skips of 3 semitones or greater, roughly two-thirds are followed by a reversal of contour.) But this is only because most large intervals tend to take the melody away from, rather than toward, the mean pitch for the melody.

Having investigated the organization of actual melodies, we might now turn to the question of what listeners expect. Even if melodies are not organized according to post-skip reversals, might it not be the case that listeners expect large intervals to be followed by a change of direction? Or do listeners expect the next pitch to move in the direction of the mean?

Once again consider our earlier analogy to people's heights. When we encounter a tall person, do we (1) expect the next person to be of average height (the "real" phenomenon) or (2) expect the next person to be shorter -- an artifact of (1)? This question was answered experimentally by Paul von Hippel (in preparation). Von Hippel played large intervals in a variety of melodic circumstances, and asked listeners to predict whether the melody would subsequently ascend or descend.

The melodic contexts were arranged so that some large intervals approached the mean and other large intervals departed from the mean. If listeners' expectations are shaped by post-skip reversal, then they ought to expect all large intervals to be followed by a change of direction. However, if listeners' expectations are shaped by regression to the mean, then they ought to respond according to the register of the interval: intervals in the low register (whether ascending or descending) should be followed by higher pitches while high register intervals (whether ascending or descending) should be followed by a lower pitch.

The results were clear: the register or tessitura of the interval doesn't matter -- listeners typically expect large intervals to be followed by a change of direction without regard to the location of the median pitch. That is, listeners expectations follow the post-skip reversal principle, rather than regression-to-the-mean.

As before, these results apply only in the case of musician listeners. Von Hippel's non-musician listeners showed no systematic pattern of responses. This difference between musicians and non-musicians once again implicates learning.


We have seen two examples where experienced listeners have established an expectation strategy that works in most circumstances, but is only an imperfect approximation of the actual structure of the melodies. By way of summary, we can now compare and contrast how melodies are actually structured with how experienced listeners think they are structured.

Actual Melodic Structure - Expected Melodic Structure

Melodies show the following organizational elements:

  1. Pitch Proximity. Successive pitches tend to be near to one another. Pitch proximity is not merely an artifact of central tendency. That is, pitch proximity doesn't arise simply because most of the pitches in a melody lie near the center of the distribution. If pitch proximity were the only organizing principle for melodies, then melodies might look something like the pitch sequence shown in Fig. 11. Here we see a randomly generated "melody" in which the only constraint is a bias toward smaller rather than larger intervals. The result is a so-called "random walk" -- what engineers call Brownian noise.

    Recall that correct expectations ought to better prepare an organism -- either for appropriate action or for more efficient perception. In the case of pitch proximity, Deutsch (1978) showed that listeners are more efficient when processing tones preceded by small intervals than by large intervals. Similarly, Boomsliter and Creel (1979) found that when exposed to short tones, listeners are faster to form pitch perceptions when the stimuli are embedded in music-like sequences. By contrast, unprepared listeners take longer to form appropriate pitch sensations.

    Figure 11
    Fig. 11. "Brownian" or "random walk" melody. Successive pitches are constrained only by the principle of small distances to the preceding pitch.
  2. Central Pitch Tendency. If real melodies were constrained only by pitch proximity, then long melodies would inevitably wander out of range at some point. However, like the vast majority of other phenomena in the world, the most frequently occurring pitches in melodies tend to lie near the center of some distribution. If a central tendency were the only organizing principle then melodies might look something like the pitch sequence shown in Fig. 12. Here we see a randomly generated "melody" whose distribution corresponds to a normal distribution, centered in the middle of the staff. Engineers call this kind of distribution Johnson noise or white noise.

    Figure 12
    Fig. 12. "Johnson" or "white noise" melody. Pitches are randomly selected from a normal distribution centered on middle C (the most likely pitch).

    Since melodies are organized according to both pitch proximity and central tendency, melodies exhibit a sort of intermediate character between Brownian and Johnson fluctuations. Incidentally, Johnson noise has a so-called power distribution of 1/fo, whereas Brownian noise has a power distribution of 1/f2. When these two principles are combined, the resulting power distribution approaches 1/f -- the so-called fractal distribution (Voss & Clarke, 1978; Gardner, 1978). Voss and Clarke (1975) have shown that melodies exhibit a power distribution similar to 1/f noise. While there are a number of natural phenomena that exhibit this distribution, there is nothing particularly magical about this observation.

  3. Ascending Leap Tendency/Descending Step Tendency. In general, melodies tend to exhibit relatively rapid upward movements (ascending leaps) and relatively leisurely downward movements (descending steps). The reason for this asymmetry is not known. However, it is interesting to note that a similar phenomenon can be observed in the pitch of speaking voices. Researchers who study the "melody" of speech have observed that the initial part of an utterance tends to ascend rapidly, and then the pitch of the voice slowly drops as the utterance progresses. Linguists call this phenomenon declination and attribute it to the fall in sub-glottal air pressure as the lungs deflate (Pike, 1945; Lieberman, 1967; 't Hart, Collier & Cohen, 1990). Fig. 13 shows a randomly generated "melody" that is constrained only by an asymmetrical distribution favoring ascending leaps and descending steps. The melody behaves as a modified random walk, and so like Fig. 11 would inevitably drift out of range.

    Figure 13
    Fig. 13. Random melody based on asymmetrical distribution favoring descending steps and ascending leaps.

In an ideal world, these actual musical patterns would lead to the following subjective expectations:

  1. Pitch Proximity. Listeners would expect an ensuing pitch to be near the current pitch.
  2. Regression-to-the-mean. As the melody moves further away from the mean or median pitch, listeners would expect the next pitch to move closer to the mean.
  3. Downward Steps. Listeners would expect most intervals to be descending steps.

Instead, experienced listeners show the following expectational tendencies:

  1. Pitch Proximity. Listeners expect an ensuing pitch to be near the current pitch.
  2. Post-skip Reversal. Experienced listeners expect a large interval to be followed by a change of direction.
  3. Step-Inertia. Experienced listeners expect a small interval to be followed by a subsequent small interval in the same direction.

Like the pacific bull-frog, experienced listeners to Western music rely on patterns that are serviceable, but not exactly right.

Narmour's Theory of Melodic Organization

Note that these expectations conform very well to a theory of melodic organization proposed by Eugene Narmour (1990, 1992). Narmour proposed five predispositions that affect implicative melodic continuations (see Schellenberg, 1996 for a summary description). Two predispositions are central to Narmour's implication-realization theory. The first is registral direction and the second is intervallic difference.

Studies by Cuddy and Lunney (1995) and Schellenberg (1996, 1997) have shown that Narmour's original theory can be simplified without loss of predictive power. Schellenberg (1997) in particular was able to show that Narmour's theory could be reduced to just two principles. One is the pitch proximity principle. The second principle is a combination of Narmour's registral direction and registral return dispositions. However, an analysis by von Hippel has shown that these phenomena can be accounted for by regression to the mean.

Similarly, earlier work by Rosner and Meyer (1982) and by Schmuckler (1989) had shown that listeners' responses are consistent with the notion of gap-fill. However, subsequent statistical analyses by von Hippel has established that the appearance of gap-fill is wholly attributed to regression-to-the-mean.

Narmour proposed that these expectations are somehow innate. At face value, the experimental research suggests that the expectations are learned, and that the expectation heuristics used by listeners are just approximations of structural properties present in the music itself.

Theoretically, it is possible that cause and effect might be reversed in the above account. It is possible that the organization of music has been shaped by a priori expectational tendencies rather than vice versa. That is, it is possible composers intend to create music conforming to post-skip reversal, but then somehow erroneously construct melodies shaped by regression-to-the-mean instead.

This view is not very plausible, however. Regression-to-the-mean is a property of all distributions that exhibit a central tendency. The vast majority of distributions in nature show such central tendencies, so regression-to-the-mean is found wherever one cares to look. Moreover, there is a plausible explanation for why distributions of musical pitches would display a central tendency. When singing, vocalists find that it is physically easier to perform near the center of their range; both high and low notes are more difficult to sing. Similarly, most instruments are easier to play in some central register.

Scale Degree Expectations

Having examined pitches, interval sizes, and up/down contours, let us return again to consider the perception of scale degree. As we have seen, the key to understanding expectation begins by identifying patterns in the music itself.

In the first instance, we should consider the simple event frequencies for scale degrees. Like pitches, not all scale degrees occur with the same frequency. Bret Aarden (in preparation) has produced scale degree distributions based on a large sample of musical melodies. Figures 14 and 15 show the frequency of occurrence for works in major keys (first graph) and for minor keys (second graph). Both graphs are normalized by transposing all works so the tonic pitch is C.

Figure 14

Fig. 14. Distribution of scale tones for a large sample of melodies in major keys. All works were transposed so the tonic pitch is C; all pitches are enharmonic.

Figure 15

Fig. 15. Distribution of scale tones for a large sample of melodies in minor keys. All works were transposed so the tonic pitch is C; all pitches are enharmonic.

For both major and minor keys, the most common pitch is the fifth scale degree (dominant). In the major key, the second most common pitch is scale degree one (tonic) followed by scale degree three (mediant). In the minor key, the order of the tonic and mediant is reversed. Scale degrees four and two are next most common, followed by scale degrees six and seven. The non-scale or chromatic tones occurring least frequently.

The distributions shown in Figs. 14 and 15 are not merely an artifact of the aggregate of a large number of musical works. As it turns out, the scale degree distribution for most individual musical works are very similar to those shown in the figures. For example, the pitch-class distribution for J.S. Bach's Fugue No. 1 from the first book of the Well Tempered Clavier correlates with the aggregate major key distribution at +0.90. Such high correlations turn out to be typical (Huron, 1992). Any musical passage written in a major key, that does not modulate to a different key for a prolonged period, will also show a strong positive correlation between its scale degree distribution and the aggregate distribution shown in Fig. 14. Similarly high correlations occur between works written in minor keys and the minor key distribution shown in Fig. 15 -- although the correlations tend to be lower for the minor keys compared with the major keys.

In an ingeneous set of experiments, Aarden (2002; in preparation) has shown that listeners' expectations conform to these distributions. Aarden established this by collecting reaction-time measures in a continuous listening task. Listeners were asked to press one of three keys (up, down, same) indicating the pitch-movement of successive pitches in various melodies. When listeners correctly anticipate an ensuing note, this is reflected in a faster reaction time. Conversely, when listeners are less certain of an ensuing note, this is reflected in a slower reaction time. When the data were collapsed according to scale degree, Aarden found that average reaction times were inversely proportional to frequency of occurrence. That is, listeners were faster when responding to scale degrees that occur more frequently in real music.

In a follow-up experiment (Aarden, in preparation), Aarden collected data only for the last note in a melody. Listeners heard 80 unfamiliar tonal folk melodies and watched a numerical counter count-down the number of notes remaining in the melody. When the final note appeared (count zero), listeners responded to the pitch contour (up/down/same) as quickly as possible. In this case, Aarden found a somewhat weaker correlation between the average reaction times and the frequency of occurrence of various scale degrees. However, Aarden found a very high correlation between the average reaction times and the frequency of occurrence of final tones in a large sample of folk songs. That is, listeners were faster when responding to scale degrees that occur most frequently as the terminal pitches in a melody. Aarden's results imply that listeners maintain a different expectational "set" or "schema" for melody-final tones compared with ordinary melody tones.

Key Profiles

Aarden's work has provided an important clarification of a well-know experiment by Carol Krumhansl and Ed Kessler (Krumhansl & Kessler, 1982). Krumhansl and Kessler exposed listeners to a key-defining context, such as an ascending scale followed by a cadential harmonic progression. They then played an isolated "probe" tone, and asked listeners to rate how well the tone fits with the preceding context. They repeated this task using all twelve pitch classes and applied this procedure for both the major and minor key contexts. The results are shown in Figures 16a and 16b.

Figure 16

Fig. 16. Krumhansl and Kessler "key profile" for major context.

Krumhansl and Kessler "key profile" for minor context.

For a number of years, it was recognized that the Krumhansl and Kessler key profiles are similar (but not identical) to the frequency of occurrence for scale degrees in the respective major and minor key contexts. The principal difference is that the tonic is rated more highly in the Krumhansl and Kessler (K&K) profiles. In addition, the second and fourth scale degrees (super-tonic and sub-mediant) are rated significantly less highly. In this, the K&K distributions more closely resemble the distributions of pitch classes occurring at the ends of melodies, rather than the distribution of all pitch classes.

Aarden noted that since the probe-tone method stops the sequence of tones, listeners may tend to perceive this moment in terms of closure. In effect, rather than answering the question "how well does this tone fit with the preceding sequence of pitches?", listeners are answering the question "how well does this tone complete the preceding sequence of pitches?" (see also Butler, 19XX). Listeners' responses resemble the distribution of melody-final pitches much more than the general distribution of pitch classes. Using a multiple regression analysis, Aarden showed that the Krumhansl and Kessler key profiles can be fully accounted for by a combination of the general pitch-class distribution and the melody-terminating pitch-class distribution. More precisely, the distribution of melody-terminating pitch classes accounts for roughly 85 percent of the variance in the Krumhansl and Kessler key profiles, whereas the remaining variance (roughly 15 percent) is accounted for by the general pitch-class distribution.

Krumhansl has long argued that listeners are sensitive to the frequency of occurrence of various scale degrees, and that learned mental schema arise for major and minor contexts (Krumhansl, 1990). However, discrepancies between the probe-tone profiles and the frequency distributions for actual works made Krumhansl's empirical evidence appear equivocal. Aarden's work brought clarity to the experimental data by showing that different schemata are employed for terminating pitches versus in-stream pitches, and that Krumhansl's experimental data are confounded by the perceptual closure that tends to accompany the probe-tone method. Once the distinction is made between in-stream and terminating pitch-class schemata, Aarden's work reinforces the view that listeners are indeed sensitive to the frequency of occurrence of pitch-classes.

Exposure Effect and the Pleasures of the Tonic

A favorite game musicians play involves performing a passage that provides a strong sense of key, and then to walk away from the music after playing the seventh scale degree or leading-tone. Most listeners find this experience grossly unsatisfying -- bordering on the intolerable. The music is left "hanging." By contrast, one can end on the tonic pitch and evoke a considerable sense of pleasure. What accounts for the psychological pleasure evoked by the tonic pitch?

In the first instance, not all tonic pitches evoke a sense of pleasure. When played as a passing tone in the context of a dominant harmony, the tonic will sound unstable and transient. The tonic pitch evokes the greatest pleasure when it terminates a phrase or passage. That is, the pleasure of the tonic is linked to closure.

In the second instance, the tonic pitch is not alone in its capacity to evoke pleasure at moments of closure. The third (mediant) and fifth (dominant) scale degrees can also evoke a pleasant sense of closure -- although the pleasure evoked is often less than for the tonic pitch. In some jazz styles, ending on the sixth (sub-mediant) or even the second (super-tonic) scale degrees is often satisfactory.

As we have seen, the tonic is the most common way to end a musical passage. We might suppose that musicians choose to place the tonic at terminal moments because it sounds the most pleasant. But like all correlations, it is possible to confuse cause and effect. What if the pleasantness arises because the tonic is the most common terminal pitch?

Psychologists have documented, in innumerable ways, a tendency for people (and animals) to prefer the familiar (see review by Bornstein, 1989). Researchers have established that people have a preference for the "average" face. Similarly, Moreland and Zajonc (1977, 1979) carried out a set of experiments where subjects were exposed to various stimuli, such as complex polygons and Japanese ideographs. The stimuli were presented in such a way that the participants were unaware that some of the stimuli were being presented repeatedly. After an initial training period, the participants were exposed to another set of stimuli that contained both previous and novel stimuli. The subjects were asked to indicate whether they had seen the stimuli before, and were also asked which stimuli they preferred. A distracter task was included as part of the experiment. Either due to the distracter and/or because of the complexity of the stimuli, the subjects were rather poor at discriminating between novel and familiar stimuli. However, in all experiments, subjects showed a marked preference for the more familiar stimuli. This preference for the familiar is referred to as the exposure effect.

In one of the Moreland and Zajonc experiments, tones of different frequencies were used. As in the case of the visual stimuli, listeners were unable to distinguish which frequencies they had been previously exposed to. Nevertheless, they showed a distinct preference for the most frequently occurring pitches. For musicians, this may not look like a very impressive result. Surely, the listeners were tending to assume that the most frequently heard pitch is the tonic. They preferred these tones because they heard them as tonics. That is, "tonality" would seem to explain the preference.

This interpretation is possible, although not perhaps very plausible. The phenomenon of preferring the most frequent stimulus is a general psychological phenomenon that has been observed with a wide variety of stimuli -- including both visual and auditory. Should we conclude that "tonality" is a fundamental phenomenon that operates in sequences of faces and polygons as well as tones? On the contrary, the experimental results suggest that the exposure effect is the more fundamental phenomenon. Listeners' preference for the tonic is more parsimoniously explained by appealing to the exposure effect rather than tonality.

Another reason for supposing that tonality is caused by the exposure effect, rather than vice versa, is that the effect is not limited to isolated tones. Wilson (1975, 1979) carried out dichotic listening experiments in which various melodies were presented in one ear while a story was recited in the other ear. Subjects were required to follow the story line against a written text. The written distractor task was highly successful in getting listeners to ignore the melodies: in a subsequent recognition test, listeners performed at chance levels when asked to identify which melodies they had been exposed to. Nevertheless, listeners exhibited a preference for the melodies they had heard in the original exposure task. That is, entire melodies were preferred in a manner analogous to individual tones.

In a later discussion of expectation-evoked emotions (Part IV), an explanation will be offered for the origins of the exposure effect.

Expectation and Enculturation

The theory advocated in this study is that musical expectations arise from statistical learning through simple exposure to music. The results of Saffran et al (1996, 1999) provide strong evidence for statistical learning in tone sequences. But Saffran's experiments do not relate statistical learning to listeners' expectations. On the other hand, the work of von Hippel (2001) shows that the statistical properties of actual melodies are strongly correlated with the melodic expectations of listeners. But von Hippel's work does not demonstrate that the melodic expectations arise from statistical learning per se.

At the moment, there is unfortunately no direct experimental evidence testing the notion that listeners learn to infer statistical patterns from their past listening experiences and use these statistical properties to form musical expectations. Nevertheless, the existing evidence is suggestive. In the absence of direct evidence, we can describe further experimental results that converge with this interpretation. Two pieces of converging evidence would be especially helpful. First, it would help to show that people from different cultural backgrounds exhibit different expectations when listening to the same music. Secondly, it would help to show that the expectations listeners exhibit reflect the statistical properties of the music found in their background cultures.

Consider first evidence that people from different cultural backgrounds exhibit different expectations when listening to the same music. In 1999, von Hippel, Huron and Harnish carried out an experiment that reveals how dissimilar expectations can be for different groups of listeners. The experiment contrasted the expectations of American musicians with Balinese musicians.

Both groups of musicians listened to a traditional Balinese melody played on a peng ugal. The Balinese musicians were highly familiar with the genre, whereas the American musicians indicated that they had little or no previous experience with traditional gamelan music. None of the participants had heard the test melody prior to the experiment. Each musician was tested individually using a betting paradigm.

The experimental apparatus consisted of a loudspeaker through which a sound recording of the melody could be heard, a digital keyboard sampler which reproduced the sound of the peng ugal and which was available to the musicians for consulting, a computer monitor that displayed a limited set of notes from the melody using a numerical notation, and a physical mock-up of the instrument on which listeners could place bets using poker chips.

The goal of the task was for participants to place bets on each successive pitch of the melody and to attempt to acculumlate the greatest aggregate winnings. Bets placed on the correct pitch were rewarded ten-fold. Bets placed on the incorrect pitch were lost. Each participant was tested individually.

The participant heard the first note of the melody and the pitch was indicated on the computer monitor. The participant was then invited to bet on what they thought would be the likely second note. Having placed their bets, the actual second note would be revealled, the winnings tabulated, and a sound recording of the melody played stopping before the third note. The participant was then invited to bet on what they thought would be the likely third note. This process was repeated until the entire 34-note melody was revealled.

Throughout the experiment, participants could see the notation up to the current point in the melody, and could try out different continuations using the digital keyboard sampler.

In general, the results between the American and Balinese musicians were quite striking. Starting with a nominal grub-stake of $1.50, by the end of the melody, the most successful Balinese musician had amassed a fortune of several millions of dollars. The best American musicians failed to do as well as the worst Balinese musician. Moreover, several American musicians went bankrupt during the game and had to be "advanced" a new grub-stake.

In the post-experiment interview, it was determined that all four Balinese musicians had been raised in religious homes where gambling was actively discouraged. So the differences between the American and Balinese participants cannot be ascribed to greater gambling experience for the Balinese participants.

Fig. 17 shows summary information for the American and Balinese listeners. When a person is uncertain of the outcome, they will tend to spread their bets over many more notes than if they are more confident of the likely outcome. A simple way to measure uncertainty is via entropy. Fig. 17 shows the average entropy for the American and Balinese listeners at each point as the melody unfolds. A rough approximation of the melodic pitches is provided using Western notation.

Figure 17

Fig. 17. Average moment-to-moment uncertainty for Balinese and American musicians listening to an unfamiliar traditional Balinese melody. Uncertainty is plotted as entropy, measured in bits. In general, Balinese listeners show less average uncertainty. Note positions correspond with underlying notational rendering. N.B. Notation shows only approximate pitch levels.

The graph shows that, on average, the Balinese listeners were nearly always less uncertain of possible future continuations than the American listeners. Since the American and Balinese musicians were matched for age, sex, and general musical experience, these differences are likely to have arisen due to the Balinese musicians greater familiarity with traditional Balinese music.

Consider next the issue of demonstrating that listeners' expectations reflect the statistical properties of the music found in their background cultures. Perhaps the best evidence in support of this can be found in research pertaining to structural tonality. Among Western listeners, a wealth of experimental data shows that the simple frequency of occurrence of various pitch classes plays a significant role in tonality perception (Cuddy, 1993; Cuddy & Baderstscher, 1987; Lamont, 1998; Oram & Cuddy, 1995). More importantly, evidence consistent with structural tonality has been observed in non-Western musical practices -- such as in classical Indian music (Castellano, Bharucha & Krumhansl, 1984), in Balinese music (Kessler, Hansen & Shepard, 1984), and in Korean p'iri music (Nam, 1998). In the case of traditional Korean music, for example, the most frequently occurring pitch also tends to terminate breath-delimited phrases, and also coincides with the pitch identified by Korean musicians as the central pitch of the scale. Moreoever, these pitches change systematically with respect to different Korean modes, and even with transposed pitch sets.

In the Castellano, Bharucha and Krumhansl study (1984), both American college students and Indian listeners were exposed to samples of North Indian music, and tested using the probe tone technique. Both groups responded in ways that echoed the frequency of occurrence of pitch classes in the musical samples. However, in carrying out a multiple regression analysis, Castellano et al were able to remove the variance associated with the exposure frequencies and examine the residual variance. It was found that the residuals for the Indian listeners correlated with a hierarchy of pitches in established Indian music theory (Jairazbhoy, 1971), whereas the residuals of the American listeners showed no such correlation. In effect, Castellano et al demonstrated that (1) Indian listeners were responding in a way that combined long-term statistical features engendered through years of listening to Indian music, plus short-term statistical properties related to the actual exposure sample, whereas (2) American listeners responded in a way that was consistent with the statistical properties of the exposure sample. Note, however, that the American listeners were showing clear evidence of statistical learning for the Indian musical excerpts.

In both the Castellano, Bharucha and Krumhansl and the Kessler, Hansen and Shepard studies, listeners were instructed to respond according to the "stability" or "goodness of fit" of the probe-tone. Although there is ample evidence consistent with statistical learning, one cannot claim that listeners responses represent expectations only, and so the evidence for statistically learned expectations remains indirect.

Taken together, all of these studies lend support to the view that musical expectations arise from statistical learning through (both short-term and long-term) exposure to music.


Not all musical moments are equally predictable. Possibly the most cliché aspect of music can be seen in how things end. Theorists have long recognized that cadence points tend to be organized in a stereotypic fashion. The stereotypes of musical closure can be readily observed in Landini cadences, in dominant-tonic harmonies, and in innumerable pre-cadential formulas, such as suspensions, the use of augmented sixth chords, and pre-cadential second inversion chords (see, e.g., Kramer, 1982). Figure 17 shows that Balinese listeners tend to become less uncertain of the next note as the end of the melody is approached.

Consider, for example, a sample of 300 German folksongs from the Essen Folksong collection (Schaffrath, 1995). A simple calculation might examine the information content (in bits) of pairs of successive scale degrees. For example, among the highest probability events is the dominant pitch followed by a repetition of the dominant (4.1 bits). By contrast, a low probability (high information) sequence consists of the lowered seventh followed by the raised seventh (13.4 bits). Over the complete sample of 300 folksongs, the average information content for scale-degree successions is 5.52 bits (S.D. of 1.42). However, the information content for the final two notes of each phrase is 5.08 bits (S.D. of 1.29). Such patterns are ubiquitous throughout music and can be observed in such disparate repertoires as Gregorian chant, Pawnee music, and Bach chorale melodies (Manzara, Witten & James, 1992).

Apart from closure and cadences, a number of organizational patterns are evident in many musical genres. Gjerdingen (1988), for example, has identified a number of widespread clichés associated with the classical style.

Expectation in Time

So far we have been considering only pitch-related expectations. Listeners not only form expectations about what future events may occur, but also when they occur. Caroline Palmer and Carol Krumhansl carried out a set of probe-tone studies to determine when listeners most expect events to happen. Palmer and Krumhansl (1990) presented stimuli that created particular metric frameworks, like 4/4 and 3/4. Following a meter-defining sequence, there was a pause, followed by a tone. Listeners were asked to judge the "goodness of fit" for each tone. Listeners assigned the highest values to those tones whose onsets coincided with the most important beats in the metric hierarchy, followed by the lesser beats, followed by the half-beat divisions, followed by tones that did not coincide with any beat.

Mari Riess Jones has proposed that the metric hierarchy can be understood as a structure for rhythmic attending. Auditory attention is directed at moments in time. That is, when listening, auditors do not pay attention equally at all moments. In rhythmic attending, Jones notes that the listener's attention is most acute at strong metric positions. That is, the metric hierarchy corresponds to a sort of temporal expectation framework.

Consider the following experiment carried out by Jones, Moynihan, MacKenzie and Puente (in press). Listeners heard an initial tone, followed by 12 "distractor" tones, followed by a comparison tone. The task of the experiment was for listeners to judge whether the comparison tone was higher or lower in pitch than the initial tone. In the following example, the first pitch (half-note B) is the initial tone, and the final pitch (half-note A#) is the comparison tone. The intervening tones are random distractor tones that increase the difficulty of the task.

Figure 18
Fig. 18. Typical stimulus used in Jones, Moynihan, MacKenzie & Puente (in press). Listeners heard a standard tone, followed by twelve interference tones, followed by a comparison tone. Listeners were asked to judge whether the comparison tone is higher or lower than the standard tone. The temporal position of the comparison tone was varied so that it would occur earlier or later than expected. See also Fig. 19.

Jones et al manipulated the precise temporal position of the final comparison tone. In some trials, the onset of the tone coincided with the precise downbeat (position 3). Other trials were slightly ahead (position 2) or slightly delayed (position 4) compared to the downbeat. Yet other trials were considerable ahead (position 1) or delayed (position 5) compared to the downbeat. Jones et al found that the accuracy of pitch-comparison judgments depended on the precise temporal placement of the comparison tone. Listeners were most accurate in their judgments when the comparison tone coincided with the presumed downbeat. As the tone deviated from this position, perceptual judgments were degraded:

Figure 19
Fig. 19. Effect of temporal position on accuracy of pitch judgment. (See also Fig. 18.) Jones, Moynihan, MacKenzie & Puente (in press) showed that pitch judgments are most accurate when the tone judged occurs in an expected temporal position (position 3).

This research reinforces and extends the general principles we have already seen operating with regard to auditory expectation. Specifically,

  1. Expectations facilitate perception.

    It is not simply the case that expectations prepare an organism to take appropriate action. In the case of temporal expectations, we see that listeners expect to receive information at certain times. The listener may not know what is going to happen, but might nevertheless anticipate the moment when the information arrives.

    One can imagine a number of ways in which accurate expectations facilitate perception. The prospect of perceiving something with greater accuracy could well be responsible for encouraging an organism to attempt to form accurate expectations about the future. In this sense, temporal expectations are akin to the orienting response -- a behavior that improves perception.

    In addition, expectations can be viewed as preparations for appropriate motor behaviors.

  2. Expectations are shaped by context.

    As in the case of pitch perception, rhythmic expectations are related to the context. Some contexts are quite general, as when we experience music in simple-duple meter, or compound-triple meter. At the other extreme, we may expect a particular temporal organization because of extensive familiarity with a particular rhythm or musical work. That is, rhythmic expectations may arise through veridical contexts.

    It is also possible that listeners form schematic expectations that are culture- or genre-related. Consider, for example, the siciliano -- a leisurely baroque dance form. The siciliano is generally in 6/8 meter, although occasionally it is found in 12/8. In addition to this compound-duple metric framework, there are stereotypic rhythms the occur in this form and that contribute to the stylistic cliché for the siciliano. The most distinctive feature is the dotted-eighth/sixteenth figure that begins the measure, and the quarter-note in the mid-measure position, followed by either an eighth-note or two sixteenths:

    Figure 20

    Fig. 20. Two rhythmic patterns commonly found in siciliano dance forms.

    Schubert's famous Christmas carol, Stille Nacht ("Silent Night"), exhibits the distinctive sciliano rhythm. Below is a cumulative onset histogram for a sample of bars from various siciliana, showing the relative frequency of occurrence for various points in the 6/8 metric hierarchy.

    Figure 21

    Fig. 21. Cumulative onset histogram for a sample of bars from various siciliana movements, showing the relative frequency of occurrence for various points in the 6/8 metric hierarchy.

    Once established, listeners readily expect the rhythm. In this case we can see that it is not simply the strict hierarchical metrical frameworks that influence a listener's temporal expectations. In addition to these metric expectations, listeners can also form distinctly rhythmic expectations which can employ non-regular duration patterns. Expectations can be tailored for different rhythms: sambas, tangos, rock back-beats, and so on. Similarly, complex African rhythms can evoke specific temporal expectations for those listeners who are familiar with them. [3]

  3. Temporal expectations are learned.

    Although no one has provided a formal demonstration, it is quite likely that rhythmic expectations are shaped by the same statistical learning of the auditory environment that we've seen for pitch. The reason why periodic pulse and meter are common in music is that these patterns are the easiest patterns for which brains are able to form expectations. In this regard, the metric hierarchy is truly analogous to a scale or scale hierarchy. Metric positions provide convenient "bins" for expected stimuli.

    While periodicity is helpful for listeners, periodicity is not necessary in order to form temporal expectations. It is important only that the listener be experienced with the temporal structure, and that some element of the temporal pattern be predictable. An illustration of this point can be found in the expectation for "bouncing" rhythms (see Fig. 22). Although the sound of something bouncing is not periodic, the inter-bounce interval shortens predictably as the bouncing continues and so listeners are able to predict, to some degree, the temporal sequence of events. In music, this accelerating rhythm can be found in Tibetan monastic music (where it is frequently played on cymbals). In Western music, there is no known instance of this accelerating rhythm prior to the twentieth century.

    Figure 22

    Fig. 22. Schematic representation of accelerating onsets characteristic of the sound produced by a bouncing object. Although the pattern is not metrically regular, it is nevertheless predictable.

Long-Range Contingent Expectations

To this point, our discussion of contingent expectations has focussed on comparatively short-range phenomena. Typically, we have been considering the repercussions of some event only on the immediately ensuing event. However, it is often the case that an event will have a greater impact on somewhat distance events than on neighboring events. Mari Riess Jones has assembled a wealth of data illustrating the hierarchical nature of auditory attending in time. Expectations in time appear to exhibit a range of local to global effects (see Jones, 1992).

Earlier we saw how information theory can be used to characterize short-term conditional probabilities. A branch of information theory known as "m-dependency" theory provides useful ways to characterize long-term statistical relationships between events (see Wong & Ghahraman, 1975). In English text, we know that the letter "q" tends to constrain subsequent letters -- increasing the likelihood of an ensuing letter "u". But can a letter not influence the occurrence of letters that follow at a further distance?

Figure 23 shows the interdependence of successive characters in English text. The X-axis indicates the number of characters following a given target character. The Y-axis measures the dependency (in bits). As can be seen, the strongest effect is evident for a single character. This captures, for example, the strong influence the letter "q" exerts on the ensuing character. As the distance increases, the influence decreases exponentially. The lower line in Figure 24 shows the dependencies for randomly scrambled English text. The only influence that a randomly rearranged character can have on the ensuing character relates to the overall frequency of occurrence for various letters. This line establishes a random base-line that is useful for comparison purposes. The figure shows that the future influence of an individual letter in English text declines to zero at a distance of about 6 letters.

Figure 23

Fig. 23. Graph showing the influence in English text of one letter on the presence of another letter displaced by n characters. Consecutive letters (n=1) have considerable dependency. At a distance of about 6 letters the presence of a given letter has little measureable influence on a later letter. Independence is measured as entropy (in bits). From Simpson (1996).

Working at the University of Waterloo, Jasba Simpson applied m-dependency theory to the analysis of note-dependency in music. Simpson examined four musical works: The works are Debussy's Syrinx for solo flute, Bartók's Unison for piano, Bach's Prelude I in C major from the first volume of the Well-Tempered Clavier, and Bach's Allemande from one of the six flute sonatas. The results of the analyses are shown in Fig. 24. Once again, the graphs plot the distance over which one note influences another note.

Figure 24

Fig. 24. Interdependence graphs for four musical works. Claude Debussy's Syrinx for flute. Bela Bartók's Unison for piano (from Mikrocosmos), Johann Sebastian Bach's Prelude I in C major from Volume 1 of the Well-Tempered Clavier, and Bach's Allemande from one of the sxi flute sonatas. The graphs show long term note dependencies. From Simpson (1996).

Both the Debussy and Bartók works exhibit the exponential decay typically found when the dependencies are relatively short-range. The strongest contingencies are evident when the events are close. As the notes grow further apart they exhibit less of a statistical influence on one another. In the case of the two Bach works, however, there are significant peaks evident at the higher probability orders. Note especially the graph for the Bach C major Prelude. The dependencies between successive neighbors is relatively small. Instead, the greatest influence is apparent at 8 and 16 note separations. The reason for this relationship is obvious when looking at the score (see Fig. 25).

Figure 25

Fig. 25. Opening measures from Johann Sebastian Bach's Prelude I in C major from Volume 1 of the Well-Tempered Clavier. Repetitive patterns are evident at 8 and 16 notes distance. These dependencies can be seen in the corresponding graph in Figure 24.

Throughout this piece, Bach establishes series of parallel compound melodic lines. The two voices in the bass staff are notated clearly enough, but even the seemingly singular series of sixteenth notes in the treble staff is perhaps better regarded as three independent voices. Clearly, each pitch has a strong relationship to pitches 8 and 16 notes distant. For example, the highest pitch E5 in measure 1 is connected perceptually to the pitch F5 in the second measure (Bregman, 1990; Schenker, 1906).

This sort of organization is relatively less common in the case of language -- although not entirely absent. For example, such long range dependencies can be observed in poetry with regular rhyme schemes. The statistical methods provided by m-dependency theory allow us to measure and characterize such relationships.

The fact that musical works exhibit long-term dependencies raises two questions. First, do listeners form corresponding expectancies where the implicative events are some distance removed from the expected consequence? Second, since the long-range patterns identified above are associated with individual works, do listeners quickly form new expectancies that are tailored to the unfoleding events of a musical work?

Relatively little experimental research has addressed either of these questions. Richard Aslin at the University of Rochester has carried out a series of studies where sounds are contingent on subsequent sounds, but the two sounds are separated by a statistically unrelated sound. Alsin et al have studied successions of synthesized vowels, consonants, and pitched tones. The results of these experiments are complicated. For some kinds of stimuli, listeners form appropriate expectations, whereas listeners fail to form useful expectations for other kinds of stimuli. Moreover, Aslin and his colleagues have also performed the same experiments with cotton-top tamarins and shown that these primates exhibit a different pattern in forming suitable expectations.

It is not simply the case that tamarins are unable to form some expectations that humans readily do. For some stimulus patterns, tamarins succeed in forming appropriate expectations where human listeners fail. These inter-species differences are tantilizing, and might ultimately prove to be linked to special speech-related mechanisms for processing sound sequences.

In any event, the research pertaining to long-range expectations appears to be consistent with past experimental results -- suggesting that listeners form expectations that only approximate the true underlying patterns of contingent probabilities.

Quick Study

The second question posed above asks whether listeners rapidly form expectations that are uniquely tailored to the unfolding events of an individual musical work. The above results suggest that listeners adapt their expectations to individual musical works. As the events of the piece unfold, the work itself engenders expectations that influence how the remainder of the work is experienced. This view was proposed by Meyer (1956). As we saw earlier, Castellano, Bharucha and Krumhansl (1984) have provided experimental evidence that listeners do indeed adapt relatively rapidly to music not previously encountered.

This phenomenon of rapid adaptation was anticipated in early research in information theory. Most notably, Coons and Kraehenbuehl proposed an adaptive probability model for experiencing music as it unfolds as early as 1958. (Coons & Kraehenbuehl, 1958; Kraehenbuehl & Coons, 1959). Kraehenbuehl and Coons imagained that a listener's statistically-shaped expectations would become better adapted to a musical work as the amount of exposure increased. A listener would begin the listening experience with expectations reflecting broad or generalized probabilities arising from a life-time of musical exposure. But as the musical piece progresses, the listener would build expectations that are engendered by events in the work itself. The ability to model such adapative probabilities was beyond the technology available in the 1960s. By the time the technology made such modelling feasible, music theorists had lost interest in information theory. No one has yet pursued such an adaptive modelling approach.

Schematic and Veridical Expectations

With repeated exposure, a listener can become highly familiar with a given musical work. In many instances, an entire musical work is committed to memory. Clearly, a listener has nearly "perfect" expectations for highly familiar pieces, such as Happy Birthday. At any given point in the work, the listener knows precisely what will happen next. Such seemingly "perfect" knowledge implies that no variability in expectation would be possible. At all points, the listener has complete knowledge of the ensuing events. When a work is perfectly known to some listener, what does it mean to have expectations? How does extreme familiarity with a single piece change the experience of listening to that piece?

Of course this knowledge is not entirely perfect. It typically requires several notes at the beginning of a work for the listener to gain confidence that the work is what they think it is. With just the first note, some element of doubt will exist. In addition, music typically contains repeated sections, and at particular structural points, the listener may be in doubt about the precise continuation. One piece of evidence in support of this claim can be found in the sorts of memory errors often seen when amateur musicians play recitals or auditions. A nervous performer sometimes lapses into a memory "loop" where they play the same passage verbatim without taking a "second ending" or otherwise continuing as they should with the rest of the piece. In short, there can still exist points of uncertainty, even in highly familiar works.

A more compelling problem is how an experienced listener might continue to hear elements of uncertainty that are similar to those for listeners hearing the music for the first time. This paradox is sometimes referred to as Wittgenstein's Puzzle (see Dowling & Harwood, 1986; p.220). A classic example of this problem arises in the perception of the deceptive cadence. How, we might ask, can a deceptive cadence continue to sound "deceptive" when familiarity with a work makes the progression entirely inevitable?

One possible answer lies in an apparent bifurcation of the neurophysiological paths related to expectation. One path represents a low-level path where highly practiced patterns of exposure are coded. A second path represents a higher-level, less practiced pattern of exposure. In cognitive terms, these two different paths might correspond to the distinction between schematic memory and veridical memory. Veridical memory is memory for specific events, whereas schematic memory is memory for general patterns. The difference can be illustrated using two well-known English phrases:

Once upon a time ...
Four score and seven years ago ...
In the first example, the phrase "Once upon a time" can be found at the beginning of a large number of legends and fairy tales. Several continuations are possible:
Once upon a time there was a little girl named Little Red Riding Hood ...
Once upon a time there were three bears ...
The second example, "Four-score and seven years ago" is unique to Lincoln's Gettysburg address. There is only one expected continuation:
Four score and seven years ago, our fathers brought forth upon this continent a new nation ...

Jamshed Bharucha has drawn attention to the applicability of these concepts to understanding musical expectation. Bharucha and his colleagues (1999) have shown that schematic-engendered responses are still evident in veridical listening tasks. For example, a deceptive cadence can still evoke a physiological response characteristic of surprise, even when the listener is certain of its occurrence. In effect, the fast (schematic) brain is surprised by the "deception" while the slow (veridical) brain is not.

The reason why schemas exist is to allow the brain to respond more quickly to particular situations. These schemas therefore reflect the most commonly encountered contigent expectations. That is, the schemas represent broadly enculturated aspects of auditory organization.

Of course, if a culture existed where nearly all dominant chords are followed by a submediant chord, then the V-vi chord progression would no longer be perceived as deceptive. As long as the majority of dominant chords in a culture are not followed by the submediant, this progression will still retain an element of surprise.

By way of summary, in the above discussion we have distinguished three different levels or frameworks for expectations. Schematic expectations represent broadly enculturated patterns of events. Veridical expectations represent long-term patterns arising from repeated exposure to a single complex episode. Adaptive expectations represent dynamically up-dated patterns that quickly arise in the context of a novel exposure, such as the first-hearing of a musical work.

Origin of Schematic and Veridical Memory

A helpful question is to ask why the brain distinguishes between schematic and veridical information. Why are some things remembered or coded as general principles, while other things are remembered or coded as specific events?

This question can be rephrased in terms of so-called episodic and semantic memory. In general, it is more efficient to recall general principles rather than specific events. For example, it is simpler to remember that "Eric is untrustworthy" than to remember a series of past events that all seem to testify to Eric's untrustworthiness. When we are tempted to ask Eric to attend to an important task, it is faster and more efficient to access the general principle rather than ponder all of our past interactions.

When a person concludes that "Eric is untrustworthy", based on past experience, they are making an inductive inference -- forming a general proposition based on a finite series of observations. However, as we noted earlier, induction is itself fallible. In fact, we have seen instances where observations lead to the wrong inference. It is quite possible that Eric is indeed trustworthy. When he failed to show up as promised, he might have had to take his mother to the hospital and then did not have an opportunity to explain. When Pat relayed negative gossip about Eric, perhaps Pat was attempting to unjustly tarnish Eric's reputation so that Pat would be promoted rather than Eric.

Cosmides and Tooby (2000) have argued that retaining episodic memory is functionally essential. In effect, episodic memory allows us to revisit "the original data" in order to evaluate alternative hypotheses. If we simply retained the generalized semantic or schematic information ("Eric is untrustworthy") and discarded the original episodic or veridical information, then we would be unable to reconsider a possibly questionable inductive inference.

Clearly, the brain's ability to form generalizations is important. But it is also clear that the brain needs to retain some of the original observational data so that the credence of particular generalizations can be questioned, revised, or reinforced. Evolution has addressed the problem of induction by creating two parallel memory systems.

In the case of the auditory system, these systems are evident in listening schemas that represent current generalizations about the world of sound, as well as a learned veridical system. Either system can be surprised. As we saw in the deceptive cadence, the schematic system is surprised while the veridical system is not. But it is also possible to arrange the reverse. In Fig. 26 a chimeric melody is shown that begins with the notes of "Three Blind Mice". However, at the end of the second measure, the continuation is inconsistent with "Three Blind Mice". The melody elides into "Mary Had a Little Lamb." The switch is surprising from a veridical perspective. But the pitch sequences themselves are commonplace, and so there is no schematic surprise.

Figure 26

Fig. 26. Example of a chimeric melody where one melody elides into another. At the end of the second measure, an experienced listener will experience a "veridical surprise". However, the pitch sequences themselves are commonplace, and so there is no schematic surprise.

Recovering from Wrong Notes in Improvisation

Another example of the relationship between veridical and

In music improvisation, the performer must be able to contend with unintended "accidents" -- slips that would normally be considered errors. Whether one is improvising a jazz chart or realizing a figured bass accompaniment, experienced musicians have been uniform in offering novice improvisors the advice of returning to the "wrong" note and playing the passage again including the wrong note. The goal is to convince the listener that the note was not an error, but was intentional.

First, what do we mean by an improvised note being "wrong"? From an expectational standpoint the answer is straightforward: the note has a low probability of occurrence. Given its low likelihood, the initial appearance of the wrong note will inevitably sound jarring to the listener. However, repeating the passage will allow the listener to accommodate the errant note within a newly formed expectation.

In effect, the experienced improvisor establishes the "wrong note" as a normal part of a veridical passage. The performer can do nothing about the violation of the schematic expectation. In particular, the performer can do nothing to erase the original surprise evoked by the first appearance of the wrong note. However, by encorporating the passage as part of the work, listeners can be dissuaded away from the conviction that the performer has made a mistake.

Violations of schematic expectation are commonplace in music. However, violations of veridical expectations tell listeners that something is wrong -- that the performer has messed up. The performer has mis-played "the piece."

Anchoring and Tendency Tones

As we saw earlier, different scale tones are perceived to have different degrees of stability, with the most frequently occurring tones generally having the greatest stability. Intuitively, we tend to think of the less stable tones as exhibiting some sort of tendency. For example, the leading-tone has a tendency to be followed by the tonic pitch.

Figure 27 was produced by Bret Aarden from the Ohio State University. Aarden simply measured the probability that certain scale tones would be followed by other scale tones. Some scale tones are highly constrained by what happens next. For example, the raised dominant is nearly always followed by the submediant pitch. Other tones, like the dominant, can be followed by a much greater variety of continuations. Figure 27 plots the information content (in bits) for each scale degree. If listeners acquire some knowledge of the probabilities associated with scale-degree successions, then this graph should correspond to our expectations of tendency. That is, those scale tones toward the right side of the figure will evoke a greater sense of "leading" or "tending".

Figure 27

Fig. 27. Scale tones for C major ordered according to the range of possible ensuing tones. "Flexibility" is measured as entropy (in bits). The dominant pitch (G) can be followed by many different pitches. By contrast, the raised dominant (G#) tends to severely constrain possible pitch continuations. (Calculated by Bret Aarden, 2001).

What Figure 27 doesn't show is that the strongest tendency tones lead to nearby tones. That is, typically, a tendency tone will cleave to a more stable tone that is just above or just below within the scale. It almost seems that the closer a less stable pitch is to a more stable pitch, the greater the tendency for the less stable pitch to be followed by the more stable pitch. Recall that by "stable" here, we simply mean tones that have been learned to appear more frequently, that are preferred (due to the exposure effect) and that evoke less stress.

The importance of tendency tones, and the tendency to hear them as linked to more stable neighbors was vividly described by the University of Pennsylvania theorist, Leonard Meyer (1956; p.56). Meyer noted, for example, that "In the music of China non-structural tones take the name of the structural tone to which they move together with the word pièn, meaning "on the way to" or "becoming." That is, the tendency tones are named in reference to the "resolving" tone.

Anchoring and Embellishment

As we have seen, there is a strong tendency to perceive stimuli in terms of pre-existing formulas or schemas. However, this does not mean that we perceive only what we expect. We can often tell when a performer plays a wrong note; we can be surprised in music -- and we can be disappointed as well.

There is a weaker sense in which perceptions are assimilated into schemas. We may perceive that an event is not quite right, but still interpret this discrepancy in terms of a useful pattern -- such as a schema or prototype. Consider an example studied by Eleanor Rosch (1975). Rosch found that a line tilted 10o to the horizontal is perceived to be similar to a horizontal line. She also found that people judge a slightly tilted line to be more similar to a horizontal line than the horizontal line is judged similar to the tilted line. In short, the horizontal line acts as a prototype that provides a cognitive reference point for the tilted line. The tilted line is perceived as a slight variant of the prototypic horizontal line. The tendency to interpret a stimulus as a variant of a prototype is called anchoring.

Krumhansl (1990) showed that, in a given key context (such as C major or G minor), the most stable tone is the tonic, followed by the other tones of the tonic triad, followed by the remaining diatonic scale tones, followed by the chromatic tones. In the perception of melodies, Bharucha (1984) demonstrated how less stable tones tend to become anchored to ensuing, more stable tones, that are close in pitch. For example, in the key of C major, the pitch D has a tendency to be anchored to either the neighboring C or E. Similarly the pitch D# has a tendency to be anchored to the nearest more stable pitch E. Bhuarcha asked listeners to judge whether two five-note melodic framents were identical. A single wrong note was introduced in many of the trials. Two sample trials are illustrated Fig. 28. In both trials, the target five-note melodic fragment consists of the pitches E4, G4, C5, D5, E5. Comparison passages "a" and "b" both introduce a single wrong note (B4 and F4, respectively). Listeners were much more likely to judge fragment "a" is identical to the target than passage "b". Bharucha argued that B4 tends to be better anchored to the ensuing pitch C5, and so it becomes less noticeable as a wrong note. By contrast, the F4 is not anchored to the ensuing pitch and so is more noticeable.

Figure 28.

Fig. 28. Experimental stimuli used in Bharucha (1984). Listeners were asked to identify whether the first and second five-note patterns were the same of different. The target passage (E-G-C-D-E) is the same. Comparison passage "a" was more likely to be mistaken from the target passage than comparison passage "b". Bharucha argued that the reason for the greater similarity is that the wrong pitch (B4) in "a" is anchored to the more stable subsequent pitch (C4), whereas the wrong pitch (F4) in "b" fails to be anchored to the ensuing pitch and so is more noticeable.

Conscious Expectations

To this point we have considered only with those aspects of expectation that are pre-verbal or unconscious in origin. It is also possible for listeners to develop conscious strategies arising from verbalizable knowledge. An example of such conscious expectations can be seen in the knowledge of sonata-allegro form. Sonata-allegro structure provides an organizational framework that knowledgeable listeners can employ in forming future expectations. An aware listener can use form-related sign-posts to orient herself or himself. For example, one might turn on the radio and hear a classical work already in progress. One might hear a plausible "first theme" followed by a plausible "second theme." By noting that no modulation occurred between the two themes, the knowledgeable listener could infer that the performance is in the midst of the recapitulation section, and so the ending can be expected shortly.

Some music theorists have presumed that these kinds of large-scale form-related expectations are also present at an unconscious level. However, research by Vladimir Konecni has raised doubts about this assumption. Working in the Psychology Department at the University of California, San Diego, Konecni and his colleagues have shown that listeners are surprizingly insensitive to reorderings of musical segments (e.g., Gotlief & Konecni, 1985; Karno & Konecni, 1992). The original versions of musical works consistently fail to elicit a greater preference than altered versions for both musician and non-musician listeners. Similar results have been found by Nicholas Cook in the Music Department at the University of Southampton (Cook, 1987).

Once listeners become familiar with style-related clichés, it becomes possible to thwart or otherwise manipulate the normal expectations. A good example of this with respect to closure in Western art music can be found in Haydn's so-called Joke Quartet.


In forming expectations about the world, it is easy for past experiences to become over-generalized. We may not realize that our expectations have value only in a specific narrow realm. When the context is wrong, otherwise useful information may prove false, misleading, or even harmful. As Cosmides and Tooby (2000) have noted, there are good reasons why, in the evolution of cognitive processes, special mechanisms would be needed in order to limit the scope of learned information.

When listening to music, our expectations can change dramatically depending on the style or genre of the music. In reggae, for example, there is a strong likelihood that a dominant chord will be followed by a subdominant chord. But in Western classical music, this dominant-subdominant progression is much less common. If the experienced listener is to correctly anticipate the unfolding of acoustic events, then the listener must somehow bracket two different sets of expectations. By forming two different schemas, the listener is presumed to be able to hear the dominant-subdominant progression in one context (reggae) as a commonplace event, and in another context (classical) hear the same chord progression as somewhat surprising.

Music is not unique here. Social psychology provides innumerable illustrations of the effect of context on expectation. Norms of behavior are linked to particular social roles. For example, we comply with the family doctor who asks to take a look in our ears, but we would be dumbfounded if the same request were made by a sales clerk. As sociologists have noted, the wearing of distinctive uniforms is an important way of providing role-relate cues. These overt cues help us switch between different expectational sets or schemas.

We already have good evidence for the existence of different musical schemas. Perhaps the best documented difference is the distinction between major and minor modes (Krumhansl, 1990). Western listeners exhibit dramatically different expectations depending on whether the music is perceived to be in a major or minor key. A single musical work may contain passages that switch between the major and minor modes. The existence of such works suggests that listeners are competent in switching schemas as the music unfolds. A further lesson arising from the major/minor distinction is that musically pertinent schemas are not simply restricted to different styles, genres, or cultures.

If our musical expectations change according to context, then a number of important questions arise: How many different musical schemas can a listener maintain? How fast are listeners able to identify the context and invoke the appropriate schema? When the context changes, how fast are listeners able to switch from one schema to another? What cues signal the listener to switch schemas? How do listeners learn to distinguish different contexts? How are the expectations for one schema protected from novel information that pertains to a different schema? How does a listener assemble a totally new schema? What happens when the events of the world straddle two different schemas?

Schema Selection

We might begin by asking how listeners know what schema to start with. We already know that an isolated tone tends to be heard by listeners as the tonic. But is this the tonic of a major or minor key? Following exposure to an isolated 2-second tone, listeners are more than three times as likely to expect a tone whose pitch is a major third above as a minor third above. This implies that Western listeners have a tendency to start by assuming a major mode. [4] It is conceivable that a musically-pertinent schema may be invoked prior to the onset of any sound.

Once the music has begun, how fast are listeners able to recognize the musical context? In the case of music, dramatic changes in listeners' expectations arise depending on the style or genre of the music. Perrott and Gjerdingen (1999) have observed that listeners are very quick to identify different styles. When scanning the radio dial, listeners make split-second decisions regarding the style of music being played on each station. Perrott and Gjerdingen tested this observation by selecting random musical segments from samples of 10 different styles of music, including jazz, rock, blues, country & western, classical, etc. They showed that listeners are adept at classifying the type of music in just 250 milliseconds. With just one second of exposure, ordinary listeners' abilities to recognize broad stylistic categories is nearly at ceiling; that is, further exposure to the musical work does not lead to a significant improvement in style identification. If we assume that identifying a schema is tantamount to activating the schema, then these observations suggest that experienced listeners can activate a schema appropriate to the genre of music they are hearing in a very short period of time.

What about the phenomenon of schema switching? How rapidly can a listener switch from one schema to another? Although little research has been carried out pertaining to this question, suggestive evidence has come from the work of Krumhansl and Kessler (1982). Krumhansl and Kessler traced the speed with which a new key was established in modulating chord sequences. Modulations to related keys were "firmly established" within three chords lasting a few seconds (Krumhansl, 1990; p.221). However, some sense of the initial key was maintained throughout the modulating passage. Since modulation is common in Western music, this ability to switch rapidly between schemas might pertain only to key-related schemas. One might imagine that switching, say, from a Western string quartet to Beijing opera would take longer -- although perhaps not very long in absolute duration. Bi-lingual speakers differ in their abilities to switch rapidly between different languages. But this skill appears to be related to how often speakers must change language in their daily life.

What cues signal the listener to switch schemas? Two plausible sources of cues for schema switching can be identified: auditory and non-auditory. One source might be obvious and persistent failures of expectation. Once again, switching between two languages is instructive. If a person has been conversing in French, then the failure of an utterance to conform to the schematic expectations for French ought to lead to a re-evaluation of the language context, and so precipitate switching to a different language schema. Similarly, the failure of pitch-, rhythm-, timbre- or other related expectations might be expected to instigate a search for a more appropriate schema.

A second source of pertinent cues can be found externally to the sounds themselves. For example, seeing five brass players on a concert stage will already evoke certain associations and expectations. If the players were dressed in dark evening suits, even more specific expectations might arise. Conversely, if the players were dressed in military uniforms, or if the players were dressed informally and standing on a New Orleans street, the expectations would differ. There are innumerable visual and other environmental cues that presumably pre-dispose the listener to invoke a particular musical schema.

The auditory and non-auditory cues the provoke schema switching might also provide plausible cues through which new schemas are created. The persistent failure of expectations might well raise the alarm that a novel cognitive environment has been encountered and that the listener's existing pallet of schemas is inadequate. An interesting consequence of this view is that it should be difficult to form a new schema when the new context differs only slight from an already established schema. Once again, language provides a useful analogy. Native English speakers who learn a latinate language, often encounter difficulty learning a second latinate language. For example, a non-fluent knowledge of Spanish may interfere with the ability to learn Italian. Italian vocabulary and grammar may begin to interfere retroactively with one's Spanish abilities. The difficulty appears to be the failure, from an English speaker's perspective, to sufficiently distinguish Italian from Spanish. This confusion appears to be reflected in neurological studies. It is often the case that cortical areas associated with a native language are segregated from cortical areas associated with an acquired second language. However, a third acquired language will often share cortical regions associated with the second acquired language. In this case, the weak cognitive barrier between schemas is reflected in an apparently weak neurophysiological barrier.

Whatever form these barriers take, they are clearly important in order to maintain the modular structure of auditory schemas. As we noted earlier, these cognitive barriers allow a listener to be surprised by events that in one schema are common, but in another schema are uncommon. While a modern listener might be quite familiar with jazz, this same listener might well find a moment of syncopation in a Renaissance motet to be somewhat "shocking." Such experiences imply that relatively strong barriers exist between schemas. Indeed, in Castellano, Bharucha and Krumhansl (1984) it was found that American listeners did not carry over Western pitch expectations to the experience of listening to North Indian music [check this]. More research is clearly needed to determine the extent to which one musical schema can influence another.


What happens when the events of the world straddle two different schemas? The apparent modularity of auditory schemas suggests that the boundaries between schemas provide musically fruitful opportunities for playing with listeners expectations. Many musically interesting "cross-overs" have arisen over the years. For the author, one such distinctive experience can be found in Bach Meets Cape Breton recorded by David Greenberg and the group Puirt a Baroque. Greenberg received classical training as a baroque violin specialist, but Greenberg is also an accomplished Cape Breton-style fiddler. In recording traditional baroque dance suites, Greenberg shifts easily between conventional art-music interpretations and traditional fiddling. A "gigue" by Bach will morph into a "jig." One has a palpable sense of connections being made between two formerly discrete musical schemas. A listener begins to imagine a continuum between courtly baroque dances and 18th century folk dances.

From a musical point of view, stylistic and genre distinctions contribute to the wealth and variety of musical experience. As we have seen, experienced listeners probably form different stylistic schemas for renaissance and rock music, between blue grass and bebop. As psychological constructs, however, genres exist as encapsulated expectation-related knowledge. The knowledge is modularized in separate schemas as the brain's way of preventing past experiences from being over-generalized to inappropriate contexts. When creating new styles or genres, musicians take advantage of the existing evolutionary cognitive machinery for protecting an organism from misapplying local information to other environments. The fact that the brain so readily brackets novel environments suggests that musicians have considerable latitude for creating new and unprecedented musics.

Schema Failures

Schemas can fail listeners in two ways. We may fail to apply the correct schema to a given listening situation, or our schema may be flawed in some way. We have already seen that listeners do not always learn the "right" principals of organization. Even though two genres of music may different in their underlying principals of organization, it is possible that listeners are incapable of distinguishing the two genres. Said another way, it is possible that attempts to create a new genre will fail, because the new genre does not engender a significantly different set of expectations.

Alternatively, listeners may simply fail to gain sufficient exposure to bring about the creation of the new schema. Such failures are commonplace when listening to the music of an unfamiliar culture. However, such failures can also occur within one's culture. In Western music, an example can be found in the perception of atonal music. Krumhansl, Sandell, and Sergeant (1987) found that listeners to atonal pitch sequences divided into two groups. One group of listeners had internalized atonal conventions and judged as ill-fitting those pitches that had appeared recently. However, a second group of listeners continued to hear the sequences according to tonal expectations. The two groups were found to differ in musical background -- the former group being more highly trained. This implies that greater exposure would have benefitted the second group of listeners.

The experience of atonal listening is described in more detail below. However, Krumhansl and her colleagues found no evidence for a truly unique "atonal" way of listening. Rather, their results suggest that diatonic tonal hierarchies continued to be used by all listeners, but that some listeners systematically responded in a manner contrary to the tonal schema -- a sort of musical "reverse psychology" (see below).


What happens when a listener's expectations prove correct? Conversely, what happens when a listener's expectations prove incorrect? In his book, Emotion and Meaning in Music, Leonard Meyer proposed the important hypothesis that expectations are intimately tied to emotional responses. In particular, Meyer suggested that thwarted expectations cause uneasiness or anxiety for listeners. For Meyer, "the frustration of expectation [is] the basis of the affective and the intellectual aesthetic response to music." (p.43).

Meyer argued for a sort of generalized emotion related to expectation. Contemporary empirical research supports this view in what has become known as the primary affect arising from expectation. However, the research further implies that different emotional responses are evoked depending on the nature of the expectation and its relationship to the actual outcome. With regard to primary affect, at least five conditions need to be distinguished: (1) when outcomes match the listener's expectation, (2) when outcomes conflict with the listener's expectation, (3) when some expectations are confirmed while others are simultaneously thwarted, (4) when a listener learns to expected the unexpected, and (5) when a listener experiences a single outcome as paradoxically both expected and unexpected.

1. Expectations Fulfilled

Expectations that are fulfilled represent stunning mental achievements. When a listener correctly anticipates that a dominant seventh chord will resolve to the tonic, this seemingly simple skill bears testament to millions of years of evolution that have shaped sensory and perceptual systems. Brains have evolved explicitly to make such accurate predictions possible.

Since the purpose of expectation is to anticipate events in the environment, accurate expectations may be deemed "successess" while inaccurate expectations constitute "failures." One might well imagine that expectational failures would engender stress, whereas expectational successes would engender some feeling of satisfaction or enjoyment. This simple principal carries significant repercussions for understanding the exposure effect -- discussed earlier in connection with tonality. Recall that listeners exhibit a preference for the most commonly occurring stimulus.

In the absence of any other evidence, it is reasonable for an experimental subject to predict that the next stimulus will be the most commonly experienced stimulus in the experiment. The pleasure or preference reported by subjects in these experiments may not be directly attributable to exposure. An alternative interpretation is that subjects experienced a moment of phenomenal pleasure because the most commonly encountered stimulus had unconsciously been predicted. In short, the exposure effect might itself be an artifact of positive affect evoked by accurate anticipation.

What, we may ask, is the consequence of getting things right? Part of a listener's response will depend on the associated consequence of the anticipated state. For example, a listener might predict that the cracking of a branch overhead will be followed by the thud of something hitting the ground. In this case, our expectation might provoke a motor behavior in which we step out of the way, or look up. In other cases, forming accurate expectations might suppress a response. For example, in a darkened room we may hear the sound of something moving across the floor. Our penchant to become fearful may be suppressed by the accurate prediction of the reassuring sound of one's cat meowing.

In the case of music, the consequences of our predictions are less onerous than the sound experiences our ancestors might have had in an unforgiving pleistocene environment. Nevertheless, remember that anticipating events is one of the things brains are built for. We cannot "turn off" our tendency to anticipate. Since expectations have strong survival value, it is not farfetched to suppose that the brain itself provides reward mechanisms for accurate predictions. That is, it is possible that listeners experience a small positively valenced emotional charge when expectations are fulfilled. In other words, it may not be familiarity per se that evokes preference; instead preferences may arise from successful expectation.

On the other hand, repetitive sounds can lead to boredom. There is no challenge in predicting that the swishing sound of an electric fan will be followed by more swishing sounds. Habituation is nature's way of getting an organism to ignore stimuli that carry no information.

How do we reconcile the preference for familiar stimuli with the experience of boredom? Note that all of the experiments that show people prefer familiar stimuli have been carried out using sparse stimuli. The amount of repetition used in these experiments was small, so no habituation would be expected.

When our surroundings become highly predictable, we become bored. The behavioral consequences of such situations is a lowering of arousal levels, a reduced attentiveness, and often a tendency to become drowsy and perhaps fall asleep. Since periodic sleep is biologically necessary, what better place to sleep than in an environment that is utterly banal and predictable. There are good reasons to be reassured by familiar surroundings. There are also good reasons why we might show little interest in such surroundings.

It bears reminding that habituation is not possible with all stimuli. For example, people do not habituate to painful stimuli. When an especially loud sound is continuously repeated, for example, the effect will be one of annoyance rather than boredom. In short, not all highly expected stimuli will evoke reassurance.

2. Expectations Thwarted

Incorrect expectations cause stress. In ordinary life, people who experience constant and unpredictable change are known to suffer from high levels of stress. It is likely the case that thwarted expectations engender a release of cortisol -- a stress hormone. From an evolutionary perspective, failing to predict the environment increases risk. It reduces an organism's ability to take advantage of opportunities, or prepare for possible dangers. Thwarted expectations might be expected to raise arousal levels, heighten attention, and encourage reappraisal and learning. Indeed, viewing unexpected stimuli causes galvanic skin responses consistent with increased arousal.

Expectations do not go away simply because reality doesn't comform to them. Three sorts of responses might be imagined in response to thwarted expectations. In the first case, the expectation for a specific outcome may be retained, and the listener continues to expect a given outcome, even though it hasn't yet happened. If the expectation is finally fulfilled, then the principal aesthetic or emotional effect will relate to delay. The stress of uncertainty will be short-lived and the listener is likely to experience some measure or "relief" of the "I-knew-it-all-along" sort.

Another possibility is that the listener has applied the wrong expectation to the passage. That is, the listener may have misapprehended the context. For example, a listener might have the expectation that a tonic (I) chord is not typically followed by a bVII chord. However, if a third chord (IV) ensues, then the listener might reconceive of the passage: if the first chord is regarded as a dominant (V) chord, then the passage because a (more probably) V-IV-I progression. In other words, a thwarted expectation might engender a reappraisal of the context to ensure that the correct schema is being applied.

A final possibility is that the predictive failure is total. That is, the events cannot be attributable to a delayed fulfillment or a misapprehended context. The listener is unable to reconcile the actual events with any existing perceptual schema they may have. In this case, the listener will experience a relatively high degree of stress and discomfort. Of course, the usual ongoing learning will continue, so unconscious processes will code the event and update or create a possible new schema to account for such experiences in the future.

Consider, by way of example, a Western listener who has had little or no experience with atonal music. For this listener, sequences of notes will systematically fail to conform with any existing schema. The music is likely to be experienced as stressful and uncomfortable. But with repeated exposure, the listener will slowly develop the kinds of expectations shown by experienced atonal listeners. With this new schema in place, subsequent listening experiences will be significantly less stressful.

3. Mixed Expectations

Expectations rely on some underlying mental representation. Listeners expect something concrete -- like a particular pitch, or harmony, or tone color. In the case of music the extant experimental literature implies that listeners typically maintain several concurrent musical representations. This suggests that that a given musical event might be surprising from the perspective of one representation, but entirely expected from the perspective of another representation. A possible musical example of mixed representations leading to mixed outcomes is evident in Figure 29. The passage is taken from a flute sonata by Benedetto Marcello. A sequence in the upper (flute) part is repeated three times. In the first and second sequences 4-3 suspensions correspond to the high point in the phrase. However, in the third instance of the sequence, the suspension drops down an octave (arrow) from where it might have been expected.

Figure 29
Fig. 29. Excerpt from Marcello's Sonata in A minor for flute, measures 46-54. Three instances of a sequence are shown. In the third instance, the pitches C5 and B4 are an ocatve lower than would be expected. However, the harmonic sequence is preserved.

The octave displacement here would be surprising if the passage is mentally represented using pitch contours or intervals. However, the final three notes would not be surprising if the passage is mentally represented using pitch-classes, or "pitch-class contour". Moreover, these changed notes still preserve the underlying harmonic sequence. The continuo part harmonizes each sequence as a V-of harmony ending in a 4-3 suspension. In other words, the final three notes evoke "surprise" for pitch, contour, and interval representations, whereas the notes are entirely expected for pitch-class, pitch-class contour, and harmonic representations.

4. Reverse Psychology: Expecting the Unexpected

Another form of expectation arises when listeners learn to expect the unexpected. In a famous passage outlining his method of composing with twelve tones, Schoenberg claimed that repeating a pitch has a tendency to raise the tone to the status of the tonic. Given his avowed aesthetic goal to avoid tonality, Schoenberg proposed a remarkably simple system of constructing a tone-row where all twelve pitch-classes are sounded one after another. In effect, Schoenberg advocated creating music where the aggregate distribution of pitch-classes shows a "flat" or uniform distribution. Notice that this compositional approach is very much consistent with the view that the perception of pitch stability tends to be related to an unequal pitch-class distribution where one or another pitch becomes more predictable.

Of course, tonal implications are hard to eliminate. As we have seen, playing just a single tone is apt to evoke a sense of tonic for most listeners. In the construction of a tone row, a composer might well choose ensuing pitches so that they tend to erase any latent tonal implications. For example, beginning with the pitch `C', an ensuing `G' would tend to reinforce a C-major key implication; an ensuing `C#' or `F#' would tend to contradict the tendency to assume a C-major key context.

Huron and von Hippel (2000) carried out a detailed study of the construction of 12-tone rows from the classic "Second" Viennese school composers: Arnold Schoenberg, Anton Webern, and Alban Berg. Using some 80 twelve-tone rows, Huron and von Hippel examined the moment-to-moment key implications using the Krumhansl and Schmuckler key-estimation algorithm. The moment-to-moment unfolding of the tone rows were shown to exhibit strong contra-tonal organizations. By way of illustration, consider the first four pitches in Schoenberg's tone-row for Opus 27, No. 3: G, F#, D, and E. Given these four notes, there are eight possible choices for the ensuing (fifth) pitch-class. Table 4 shows the maximum Krumhansl and Schmuckler key correlations that arise for each of the eight possible continuations for the fifth pitch-class. For example, continuing the row with pitch-class `A' causes a high maximum key correlation (r=+0.81 for D major), whereas continuing the row with `F' produces a low maximum key correlation (r=+0.43 also for D major).

Table 4

Initial Row Possible Continuation Maximum Key Correlation
G, F#, D, E C +0.64
G, F#, D, E C# +0.50
G, F#, D, E D# +0.47
G, F#, D, E F +0.43
G, F#, D, E G# +0.46
G, F#, D, E A +0.81
G, F#, D, E A# +0.55
G, F#, D, E B +0.79

If Schoenberg wished to circumvent this key implication, the best (lowest) key correlation would arise for the pitch F -- according to the Krumhansl and Schmuckler algorithm. The actual fifth pitch selected by Schoenberg is indeed F. In Huron and von Hippel, this contra-tonal tendency is evident throughout the twelve-tone rows used by these Viennese composers.

In another study of twelve-tone rows, Krumhansl, Sandell and Sergeant (1987) asked listeners to judge the "goodness" of various probe tones at successive points in a twelve-tone row. Interestingly, Krumhansl et al's listeners divided into two distinct groups. Some listeners tended to rate "highly" tones which tended to reinforce some latent possible key. That is, the most highly rated tones tended to be those which maximized the aggregate correlation for the passage with the Krumhansl and Kessler key profiles. The second group of listeners responded in a completely opposite fashion. That is, they rated most highly those continuation pitches that minimized the aggregate correlation for the passage with the Krumhansl and Kessler key profiles. In other words, this second group of listeners thought the most appropriate pitch continuations are those that create the most contra-tonal effect.

Fascinatingly, Krumhansl and her colleagues found that the two groups differed in their musical experience. The group that rated highly the most atonal continuations were the more musically experienced or trained listeners. This suggests that these listeners had internalized the contra-tonal organization underlying this music and were able to form expectations that correspond both with the aesthetic goal, and with the pitch-related statistics exhibited by the music. In other words, the bifurcation in listening strategies reflected the combination of the bifurcation of composing strategies, and the experience of the listeners.

The phenomenon of "expecting the unexpected" has repercussions for understanding musical enjoyment. Earlier it was claimed that the exposure effect may simply be an artifact of a postive affect evoked by accurate anticipation of stimuli. If this is the case, then the frequency of occurrence of a stimulus does not, by itself, engender a positive affect. The more pertinent issue is the degree of predictability. To the extent that knowledgeable listeners are better able to predict the behavior of 12-tone music, then it should not be unexpected that knowledgeable listeners might enjoy 12-tone music more than other listeners.

On the other hand, it might be noted that the expectations of knowledgeable listeners when encountering 12-tone music are rather vague. Knowledgeable listeners have a higher than chance ability to predict which pitch-classes are unlikely to occur next. But there may very well be a difference between knowing which two or three stimuli are most likely to occur next, and which two or three stimuli are least likely to occur next. It may be that expectation-evoked pleasure arises foremost when an expected stimulus is realized, not when an unexpected stimulus in not realized. It is possible that this hypothetical asymmetry limits the expectation-related pleasure that can arise from listening to 12-tone music.

5. Paradoxical Expectations

The famed philosopher, Ludwig Wittgenstein, described a paradox that has bothered generations of music scholars. How is it possible, asked Wittgenstein, for a listener to be surprised by a work whose familiarity means that it can hold no surprises? (Wittgenstein, 1966). Jay Dowling and Dane Harwood proposed that the paradox might be resolved by distinguishing conscious from subconscious listening experiences (Dowling & Harwood, 1986; p.200). Dowling and Harwood proposed that we hear familiar pieces against the background of schematic norms for various styles and genres.

Jamshed Bharucha (1987) proposed a more precise distinction between two kinds of expectations: schematic and veridical. Schematic expectations arise from a lifetime of music listening. Schematic expectations arise without conscious thought and cannot be easily suppressed. "Even when a given piece has been heard often enough to be familiar, it cannot completely override the generic, automatic expectations. Surprises in a new piece thus continue to have a surprising quality because they are heard as surprises relative to these irrepressible expectations." (Bharucha, 1994; pp.215-216) But Bharucha goes on to say that schematic expectations alone cannot account for common listening experiences: "If the surprises in a new piece continue to be surprises even after repeated hearing, the piece would never sound familiar." (p.216). Accordingly, two systems related to expectation must exist.

The Tenacity of Schematic Expectations

If a listener knows exactly what is about to happen, then surely, if the coming event contradicts the normal schematic expectations, then these schematic expectations can be ignored or suppressed. Not so. In an experiment by Bharucha and Stoeckig (1989), they pitted schematic and veridical expectations against each other with revealling results.

Once again, the task was for listeners to identify whether the target chord was in-tune or out-of-tune. But the stimuli were presented twice in succession before the listener responded. For example, in the "unexpected" condition, a listener might hear a C-major chord followed by an F#-major chord, followed by a pause, followed by a repetition of the C and F# chords. When the listener responded, the listener already knew what chord to expect. That is, the listener's veridical expectation was for the F#-major chord -- even though this progression violates the common schematic expectation for a more closely related chord. In half of the trials, the last chord was mistuned. Despite the fore-knowledge of what chord to expect, the schematically expected chords were still processed more quickly than the schematically unexpected chords. That is, the schematic expectations remained influential, even when the listener knew exactly what was coming.

The tenacity of schematic expectations provides a plausible explanation for why, for example, a deceptive cadence will still sound somehow "deceptive" even though the listener fully expects it.

Meyer proposed that it is possible for listeners to apply the wrong schema: "the same physical stimulus may call forth different tendencies in different stylistic contexts ... For example, a modal cadential progression will arouse one set of expectations in the musical style of the sixteenth century and quite another in the style of the nineteenth century." (Meyer, 1956; p.30)


As we have noted, the ability to anticipate future events is important for survival. Minds are "wired" for expectation. However, from the subjective or phenomenological point of view the most important aspects of expectation are the feelings they are capable of evoking. What happens in the future matters, so it should not be surprising that how the future unfolds has a direct effect on how we feel. In particular, music scholars have long noted that music-related expectations are capable of evoking emotional experiences.

In considering expectations, four different types of emotional responses can be distinguished. Two types of emotional responses occur prior to the event and so might be dubbed pre-outcome responses; two further types of responses are associated with the final outcome and might be dubbed post-outcome responses.

1. Imaginative Response

The first type of emotional response arises from imagining some future outcome. Imagining an outcome allows us to take some vicarious pleasure (or displeasure) -- as though the outcome has already happened. We may choose to work overtime because we can imagine the embarrassment of having to tell the boss that a project remains incomplete. We may be motivated to undertake a difficult journey by imagining the pleasure of being reunited with a loved one. This imaginative response is important in behavioral motivation. Through day-dreaming, it is possible to make future outcomes emotionally palpable. In turn, these feelings motivate changes in behavior that can increase the likelihood of a favorable outcome.

Neurological evidence for such an imaginative response is reported by Damasio (1994), who has described a neurological condition in which patients fail to anticipate the feelings associated with possible future outcomes. In one celebrated case, Damasio described a patient who was capable of feeling negative or positive emotions after an outcome had occurred, but was unable to "preview" the feelings that would arise if a negative outcome was immanent. Although Damasio's patient was intellectually aware that a negative outcome was likely, he failed to take steps to avoid the negative outcome because, prior to the outcome, the future negative feelings were not palpable and did not seem to matter. Damasio's work establishes that it is not simply the case that people think about future outcomes; when imagining these outcomes, we are also capable of feeling a muted version of the pertinent emotion. We don't simply think about the future possibilities; we feel future possibilities.

The imaginative response provides the psychological foundation for deferred gratification. Feelings that arise through the imagination help individuals to foresake immediate pleasures in order to achieve a greater pleasure later.

2. Tension Response

The second type of pre-outcome emotional response arises due to uncertainty in high-stakes situations. Sometimes outcomes are utterly certain and have litte consequence. In other cases, we may have little idea about what is about to happen. If one or more of the possible outcomes involves a high stake (something very good or very bad), then we will tend to be more alert as the moment approaches when the outcome will be made known. Specifically, our physiological arousal level will be high. Heart rate and blood pressure will typically increase, breathing will become deeper and more rapid, perspiration will increase, and muscules will respond faster. These and other physiological changes help us to react more quickly, and to attend and perceive more accurately. However, these changes are also associated with stress.

This type of pre-outcome response might be called tension responses. The stress or tension is proportional to the amount of uncertainty, and to the difference in magnitudes between the best and worst outcomes. The difference in magnitude is important. For example, a lottery winner may be relatively unconcerned as to whether the final prize is $68 million or $74 million. A large degree of uncertainty may surround the ultimate outcome, but the actual resolution of the outcome may be perceived as inconsequential. The tension response is independent of whether the anticipated outcome is positive or negative. Thus, when sentencing a convicted shop-lifter, the choice of prison term may be between 95 and 110 days. While there may exist a high degree of uncertainty about the precise sentence, the tension response may be muted because the difference between the outcomes is relatively small.

The tension response is also influenced by the elapsed time before the outcome is known. As the anticipated moment of outcome approaches, the tension increases. That is, tension is inversely proportional to the estimated remaining time to the onset of the outcome. There are good reasons why tension should increase as the outcome approaches. High arousal and attention are most needed at the point where one must respond to the outcome.

Simon Durrant has noted that, in general, organisms should try to avoid situations of high uncertainty. High uncertainty requires arousal and vigilance, both of which incur an energy cost. Consequently, it would be adapative for an organism to experience high tension responses as unpleasant. That is, even if only positive outcomes are possible, high uncertainty will lead to an unpleasant stress.

By way of summary, it is proposed that the tension response is shaped by three factors: (1) the degree of uncertainty, (2) the estimated amount of time before the outcome is realized, and (3) the range separating the most positive and most negative outcome (that is, the "stakes" of the outcome).

3. Outcome Response

Two further types of emotional responses occur only once the outcome is known. The most obvious of these emotions relates to the pleasantness or unpleasantness of the outcome, such as the "fear" of encountering a snake, the "sadness" of receiving a poor grade, or the "joy" of giving birth. We might refer to these state-related emotions as the outcome response. These types of emotions have been the subject of extensive research and will be addressed at length in a later chapter.

Here we need only note that positive and negative emotions act as behavioral reinforcements. The pain caused by biting your tongue teaches you to chew carefully and avoid tissue damage. Bad tastes and bad smells reinforce the aversion to ingesting unhealthy foods. The pleasure caused by engaging in sex encourages procreation. The enjoyment of playing with our children, encourages parental investment and nurturing. Positive emotions encourage us to seek out states that increase our adaptive fitness. Negative emotions encourage us to avoid maladaptive states.

4. Prediction Response

Recall that an expected stimulus is more accurately perceived when it is predictable. Accurate predictions help an organism to prepare to sidestep dangers and take advantage of opportunities. Since accurate predictions are of real benefit to an organism, it would be reasonable for psychological rewards and punishments to arise in response solely to the accuracy of the expectation. Following a snow storm, for example, I might predict that I will slip and fall on the sidewalk. In the event that I actually fall, the outcome will feel unpleasant, but the experience will be mixed with a certain satisfaction at having correctly anticipated the outcome. This fourth type of expection-related emotion might be dubbed the prediction response.

Psychological evidence in support of a prediction response is found in the work of Mandler (1975). The response is considered so important in the extant literature on expectation, that it is commonly referred to as the primary affect related to emotion (Olson, Roese & Zanna, 1996). [5] Confirmation of expected outcomes generally induces a positive emotional response even if the expected outcome is bad. It is as though brains know not to shoot the messenger: accurate expectations are to be valued (and rewarded) even when the news is not good. That is, a person might experience a positive prediction response and a negative outcome response at the same time.

In summary, we have distinguished four different types of expectation-related emotions. Each type serves a different biological function. The purpose of imaginative responses is to motivate an organism to behave in ways that may maximize future benefits. The purpose of the tension response is to tailor arousal and attention to match the level of uncertainty and importance of the outcome. The purpose of the outcome response is the often-noted goal of all emotions: to provide positive and negative reinforcements related to the biological value of different states. The purpose of the prediction response is to provide positive or negative reinforcements related to forming accurate expectations. All of these goals are biologically valuable.

Response type Epoch Biological Function
imaginative response pre-outcome future-oriented behavioral motivation
tension response pre-outcome optimum arousal & attention in preparation for possible events
outcome response post-outcome negative/positive reinforcements related to specific states
prediction response post-outcome negative/positive reinforcement to form accurate expectations

Informally, we might characterize the "feeling" components to these responses by posing four questions:

  1. What do you think might happen, and how do you feel about that?
  2. Are you ready for what's about to happen?
  3. How do you feel about how things have turned out?
  4. Did you place a good bet?

Expecting What and When

As noted earlier, predicting a future event actually entails two predictions: the what and the when. The predictability of the what and when can be entirely independent. In musical rhythms, for example, listeners can form a strong expectation that some sound will happen at a particular moment, even though they have little inkling of what sound will occur. In other circumstances, the listener will have a good idea of what to expect, but will be left wondering when the sound will happen.

As in the case of accurately predicting what will happen, accurately predicting when an event occurs will facilitate perception. In the work of Jones et al discussed earlier, we saw how listeners are able to more accurately process a sound when it occurs at a predictable rhythmic moment.

Listeners often claim that an unpleasant sound will seem "abrupt" sounds. Webster's dictionary provides two pertinent definitions for abrupt: "1. occurring without warning, UNEXPECTED" and "2. rising or dropping sharply as if broken off". Both of these definitions are pertinent to the experience of sound. An abrupt sound is often simply a sound that is unexpected. In addition, an abrupt sound may have an especially rapid onset. The sound of a cat starting to purr has a much slower onset than the sound of a bursting balloon. The slower acoustical onset provides the listener with slightly more time to prepare for the sound before it reaches maximum amplitude. That is, a slower sound onset provides a split second in which the auditory system can prepare (predict) for what is likely to happen next. A sound can be "abrupt" both because it occurs at an unpredictable time, and because the sound itself has a low predictability.

The Poetry of Expectation

The what and when components of expectation can be clearly seen in the case of poetry. Two features of poetry are known to appeal to listeners: a rhyme scheme and a regular meter or rhythm. Consider, by way of example, the following stanza:

Life's not so short
I care to keep
The unhappy days;
I choose to sleep.
The poem exhibits a duple meter with two iambic beats in each line. Consider the listener's expectation at the moment prior to the last word (sleep). By establishing this regular meter, listeners expect the final syllable to coincide with the second beat. That is, the meter establishes a high expectation of when the final syllable will occur.

In addition, listeners will expect the final vowel to rhyme with the "ay" of "days" (or the "ee" of "keep"). That is, the poem provides listeners with helpful clues of the what of the final syllable. The rhyme scheme directly facilitates the perception of the final vowel.

There are good reasons why people might prefer poems that have a rhyme scheme and regular meter. These structures make the sounds more predictable, and so easier to perceive and process. But more importantly, the fact that listeners are able to accurately anticipate future events means that the auditory system evokes a positively valenced prediction response. Unconsciously, the brain is rewarding itself for doing such a good job of anticipating stimuli.

Predictability and Boredom

As noted earlier, high predictability can also lead to boredom. In highly predictable environments, the tension response falls to zero. No preparation is needed in anticipation of ensuing stimuli. There is no need to be attentive or aroused, and consequently minimal stress is evoked. The behavioral consequences are boredom and sleep.

When an environment is highly predictable (utterly lacking in novelty), the tendency is for an organism to become sleepy. Highly predictable environments are typically safe, and so nature takes advantage of the opportunity to reduce arousal levels and conserve energy.

Musical Applications

The preceding model of expectation can be applied to music in a number of ways. A useful exercise is to consider common conventions found in Western art music. For example, embellishments such as anticipations and suspensions have often been regarded by music theorists to involve expectation-related nuances. Below, we analyze four common types of embellishments: the anticipation, the suspension, the passing tone, and the appoggiatura.

In analyzing these embellishments, we will consider the predictive-, tension-, and outcome-related responses arising at each moment as the embellishment is approached and resolved. Due to the complexity involved, we will not consider imaginative responses. [7] In addition, we will need to analyze separate the what and the when dimensions of expectation.

The Anticipation

By way of example, consider the anticipation illustrated in Figure 30. Here the anticipation occurs as part of an authentic V-I cadence with the final tonic pitch anticipated. The numbers identify three moments that we will analyze separately. The moments can be designated the (1) pre-anticipation, (2) anticipation, and (3) post-anticipation moments.

(1) Consider first the pre-anticipation moment.

Figure 30a

Fig. 30a. An example of an anticipation in a cadential V-I context.

Outcome response: With an already established key context, the listener hears a dominant chord. The chord itself is the "outcome" of preceding expectations. As an outcome, we need to consider its response valence. Since the chord is a simple major sonority, it exhibits a low degree of sensory dissonance and so will tend to evoke a relatively positive valence.

Tension response: At the same time, musicians would note that the dominant function would normally be considered "dissonant" insofar as it needs resolution. This way of speaking can be re-interpreted in terms of the tension response. We would note that the V chord has a low probability of being followed by silence (i.e., it is unsuitable for closure). Experienced listeners will have a strong expectation that some further sounds will occur. Moreover, the V chord has a high probability of being followed by a I chord and the supertonic has a similarly high probability of leading to the tonic. In short, the listener has a relatively good idea of what to expect next; there is little of the stress that comes with uncertainty. Consequently, the tension response has only a very small negative valence.

There is one aspect to the tension response, however, in which there is relatively higher uncertainty. This has to do with when a tonic chord might appear. Since the dominant chord occurs on the downbeat, one possible moment of occurrence would be the downbeat of the next measure. Another possibility, might be the third beat of the current measure.

(2) Consider now the moment when the anticipation note appears (C eighth-note).

Figure 30b

Outcome response: The first thing to note is that the sonority is now more dissonant. That is, the outcome response has a comparatively negative valence.

Prediction response: Since the previous moment lead the listener to make a prediction, we can now consider the successfulness of this prediction. The pitch of the anticipation was indeed the optimum prediction arising from the previous moment, so there is a predictive "reward" associated with the "what". That is, the prediction response is positively valenced. However, the timing of the onset for this note is very low. Recall that the third beat or the downbeat of the next measure were more likely moments for "when" for this event might occur.

By way of summary, at the moment when the anticipation appears, the outcome response is rather negative, while the prediction response is a mix of positive ("what") and negative ("when").

Next, consider the tension response associated with this moment.

Tension response: Compared with the pre-anticipation sonority, the anticipation occurs on a surprisingly weak beat (the second half of the second beat). This very significantly raises the likelihood of an ensuing stimulus event occurring on the third beat. That is, the presence of the eighth-note significantly reduces the uncertainty as to whether the tonic chord will appear at beat three, or wait until the next measure. From the "when" point-of-view, the appearance of the anticipation greatly reduces uncertainty and so evokes a positively valenced tension response.

In addition, the pitch of the anticipation reduces the uncertainty concerning the ensuing "what". We know that listeners expect ensuing pitches to be close to current pitches, and that the closest possible pitch movement is unison repetition. Since the listener is already predicting that the V chord will be followed by a I chord, the appearance of the tonic pitch gives greater credibility to this prediction. That is, the presence of the anticipation lowers the uncertainty of "what" and so again contributes to a positively valenced tension response.

In the case of the anticipation, both the "what" and "when" components of listener uncertainty are reduced dramatically. Although the sonority is dissonant, and although the listener tended to predict a later occurrence of this pitch, the presence of the anticipation itself produces many psychologically positively valenced repercussions.

(3) Finally, the post-anticipation moment occurs.

Figure 30c

Outcome response: The outcome response is highly positive: the chord has low sensory dissonance.

Prediction response: The listener's confident prediction of this moment is realized, and so there is a high positively valenced prediction response.

Tension response: The closure associated with this moment creates a highly certain expectation that the current moment will be sustained for two or more beats, and perhaps followed by silence. That is, the tension response is also positively valenced since both the "what" and "when" following this moment are highly predictable.

Before leaving the anticipation, consider the variant passage shown in the figure below. Here, the duration of the anticipation has been increased to a quarter-note. Two important differences distinguish this case from the previous one. First, by falling on a more predictable beat, it reduces the likelihood of something happening on beat three. That is, the dotted-quarter/eighth of the original example makes it more certain that something will happen on beat three. In effect, decreasing the duration of the anticipation renders it more effective in helping the listener predict the "when" of the ensuing event.

Figure 31

Fig. 31. A variant anticipation in which the duration of the anticipated note is extended. (See discussion in text.)

The second difference is that having the anticipation occur on beat two rather than the second half of beat three makes the anticipation note itself more predictable. In effect, there is a trade-off between the predictability of the anticipation moment and the post-anticipation moment.

The observations made above concerning the anticipation are summarized in the following table. Responses colored in red indicate a negatively valenced response, whereas responses colored in blue indicate a positively valenced response.

Summary Expectation Analysis of Anticipation
Outcome Predictive Tension
pre-anticipation consonant - low tension; strong expection of the ensuing resolving pitch
anticipation dissonant high predictive success for pitch; low predictive success for timing extremely low tension; nearly certain of ensuing resolving pitch; in a sense, the current pitch is the resolution of the previous expectation and so early outcome further reduces the tension
post-anticipation consonant extremely high predictive success -

More than other embellishments, anticipations are more likely to occur near a cadence, and therefore arise in situations that are more predictable.

The Suspension

Figure 32 shows a typical 4-3 suspension. The suspension occurs as part of tonic-dominant progression in which the movement of the tonic pitch (F) to the leading-tone (E) is delayed. The numbers identify the (1) pre-suspension, (2) suspension, and (3) post-suspension moments.

(1) Consider first the pre-suspension moment.

Figure 32a

Outcome response: With an already established key context, the listener hears a tonic chord (in F major). The chord itself is the outcome of preceding expectations that we needn't consider. The chord is a simple major sonority with low sensory dissonance, and therefore will tend to evoke a positive valence.

Tension response: As a I chord, it is quite stable and so may evoke no strong sense of continuation. Nevertheless, a number of possible continuations might be expected, including a good likelihood of being followed by a V chord. In addition, pitch proximity will tend to engender expectations that the pitch F is likely to be followed by a nearby pitch (F, G, E). An experienced listener will therefore have a reasonable intuition of what might occur next: there is relatively little of the stress that comes with uncertainty. Consequently, the tension response exhibits a relatively small negative valence. As in the case of the anticipation example, one source of tension is when the ensuing chord/event might appear.

(2) Consider now the moment when the suspended sonority appears.

Figure 32b

Outcome response: The sonority is now more dissonant, so the outcome response has a comparatively negative valence.

Prediction response: The suspended note (F) has a high likelihood of following from the previous sonority. Similarly, the dominant chord is likely to follow from the previous tonic. This implies that a listener should typically experience a predictive reward associated with the "what". The combination of the expected pitch and the expected chord is probably less predicted. Nevertheless, the outcome is reasonably common and not unusual, so one would expect that the prediction response would be positively valenced for most experienced listeners. With regard to the "when", the suspended sonority falls on a highly predictable beat. It might have occurred a quarter-duration earlier, or perhaps a half-duration later, but the occurrence on beat three has a relatively high predictability. (The timing of the suspension here is more predictable than the timing of the anticipation seen in our earlier example.)

By way of summary, at the moment when the suspension appears, the outcome response is rather negative, while the prediction response is positive for both the "what" and the "when".

Tension response: The suspended pitch creates a very high expectation to move to the E. In other words, the "what" of the next moment is almost perfectly predicted. The "when" of the post-suspension moment is a little more uncertain. The resolution might occur on the next beat, or be delayed until the next major downbeat at the beginning of the next measure. However, relatively little uncertainty accompanies the "when". Only a couple of choices are likely. As in the case of the anticipation example, rather little uncertainty surrounds what will happen following the dissonant moment. Consequently, the suspension evokes a positively valenced tension response.

(3) Finally, the post-anticipation moment occurs.

Figure 32c

Outcome response: the chord has low sensory dissonance and relatively high stability so the outcome response is highly positive.

Prediction response: The listener's confident prediction of this moment is realized, and so there is a high positively valenced prediction response.

Tension response: The closure associated with this moment creates a highly certain expectation that the current moment will be sustained for two or more beats, and perhaps followed by silence. That is, the tension response is also positively valenced since both the "what" and "when" following this moment are highly predictable.

Summary Expectation Analysis of Suspension
Outcome Predictive Tension
pre-suspension consonant - moderate to low tension; relatively strong expectation of the ensuing resolving pitch
suspension dissonant moderate predictive success due to proximity very low tension; strong expectation of ensuing resolving pitch (via anchoring)
post-suspension consonant extremely high predictive success very low tension; strong expectation of ensuing resolving pitch (via anchoring)
resolving consonant high predictive success -

The Odd-ball Note

Given the preceding analyses a skeptical reader might conjecture that the introduction of any note would have a similar effect of reducing uncertainty -- and so produce positively valenced prediction and tension responses. As a control case, consider the concocted passage shown in Figure 33. This example shows a dominant-tonic progression with an "odd-ball" note interposed. A brief analysis follows.

(1) Consider first the pre-odd-ball moment.

Outcome response: With an already established key context, the listener hears a dominant chord with low sensory dissonance which tends to evoke a positively valenced outcome response.

Tension response: The dominant chord has a high probability of being followed by a tonic chord, and the supertonic pitch is likely to be followed by the tonic. Hence, the "what" component of the tension response has only a very weak negative valence. The "when" is slightly less certain. Plausible event onsets might occur on beat two, three, or the downbeat of the next measure.

Figure 32a

(2) Consider now the moment when the odd-ball note appears.

Outcome response: As with the anticipation and suspension, the sonority is now more dissonant, so the outcome response has a comparatively negative valence.

Prediction response: Both the pitch (A-flat) and the onset timing are poorly predicted, so the prediction response is highly negatively valenced. The A-flat does not belong to the key and so has a low probability of occurrence. In addition, the A-flat is remote in pitch from the preceding note, and is approached by the unlikely interval of a diminished fifth. The A-flat might be considered part of a dominant ninth chord -- a chord borrowed from the minor key. However, in general, the listener will receive little "reward" for predicting this event.

Tension response: The lowered sixth scale degree is typicaly anchored to the dominant pitch, so a reasonable prediction would be for the A-flat to be followed by G. Like the anticipation, the timing of the A-flat strongly implies that the next event should occur on beat three. Most experienced listeners would therefore confidently predict the occurrence of G4 on beat three. Both the "what" and "when" are highly predictable. Although the odd-ball note evokes negatively valenced outcome and prediction responses, it evokes a comparatively positive tension response.

Figure 32b

(3) Finally, consider the post-odd-ball moment.

Outcome response: the chord has low sensory dissonance and relatively high stability so the outcome response is highly positive.

Prediction response: The listener's confident prediction is clearly wrong. Both the "when" and the "what" fail to conform to expectations. Only the fact that the chord is a tonic function was predicted. As a result, there is a highly negatively valenced prediction response.

Tension response: The tonic chord tends to evoke a sense of closure. However, the timing of the chord tends to reduce the closure effect.

Figure 32c

With only slight modifications our "odd-ball" example might be transformed into an appoggiatura (see Figure 33). An appoggiatura would have the A-flat resolving downward to the G on beat three. A more likely appoggiatura might employ an A-natural instead of the A-flat. But consider how this appoggiatura would evoke different expectation-related responses compared with the odd-ball passage. Both the odd-ball passage and the appoggiatura produce a dissonant moment, accompanied by a high expectation of the ensuing event. In the case of the appoggiatura, the subsequent resolution would conform to the expectation -- creating a positive prediction response in addition to the positive outcome response. However, in the odd-ball passage, the subsequent "resolution" fails to conform to expectations, hence evoking a negatively valenced prediction response.

Figure 33

Summary Expectation Analysis of Appoggiatura
Outcome Predictive Tension
pre-appoggiatura consonant - moderate to low tension; relatively strong expectation of the ensuing resolving pitch
appoggiatura dissonant poor predictive success; surprising low tension; strong expectation of ensuing resolving pitch (via anchoring)
post-appoggiatura consonant extremely high predictive success -


What all four examples share in common is that the presence of the embellishment significantly increases the predictability of the ensuing sonority. In the case of the anticipation, appoggiatura and odd-ball, both the "what" and "when" of the subsequent sonority are made more certain. In the case of the suspension, the "when" is slightly less certain than the "what", but both remain high. The appoggiatura and the odd-ball produce a negatively valenced prediction response when the embellishment appears. However, the odd-ball passage also produces a negatively valenced prediction response at the "resolution" as well.

Looking at just the conventional embellishments -- the anticipation, suspension, and appoggiatura -- the presence of the embellishment creates a circumstance where uncertainty about the future is reduced. This is purchased at the cost of momentary dissonance. In other words, the negative valence evoked by sensory dissonance is balanced against the more positive valence of predictability. More precisely, the outcome valence at the time of the embellishment is made more negative, while the concurrent tension valence and the ensuing prediction valence (associated with the resolution) are both made more positive.

Misattribution and the Exposure Effect

Positive and negative emotions are important motivators that help organisms learn. Suppose I am mugged in a dark alley. I experience highly negative emotions whose purpose it to encourage me to avoid such situations in the future. But what, precisely, is the lesson I should learn? Should I learn to avoid dark alleys? Should I avoid encounters with other people? Should I avoid walking on concrete sidewalks? Should I avoid eating a sandwich for lunch? Once again, we are faced with the problem of induction: what general principal can one infer from finite observations? Moreover, since such highly emotionally-charged events tend to be rare, what can one reasonably learn from just one or two observations?

Nature addresses this problem by casting a very wide net. When we experience strong emotions, we tend to remember many details about the experience. A person trapped in a crashed automobile will tend to retain vivid memories of the crash site, the face of the ambulance attendant, and the music playing on the car radio. Research on misattribution has established that we tend to associate strong emotional experiences with all salient perceptual cues (time-of-day, facial features, manner of speaking, location, colors, etc.). Since the experience is highly charged, it is better to draw excessively broad conclusions (which have a better chance of catching a true cue) than to draw narrow lessons (that have a high chance of failing to capture a pertinent cue). In other words, misattribution is a predictable consequence of the problem of induction.

Recall now our earlier discussion of the exposure effect -- the tendency for people to prefer stimuli that are expected. What could explain the origin of the exposure effect? The combination of the prediction response and misattribution allows us to offer a plausible explanation as to why commonly occurring stimuli would evoke a positively valenced emotional response.

Many outcomes are neither positively or negatively valenced. Yet if we predict such an outcome, a positively valenced prediction response ensues. In such circumstances, there is always the possibility that the positive prediction response will be misattributed to the stimulus that evoked the response. If state `A' is highly likely, and if we correctly predict the occurrence of state `A' on many occasions, then state `A' will tend to become associated with the positively valenced prediction response. With constant repetition, this misattribution tendency will be reinforced, and so we begin to misattribute the prediction response to the stimulus. To the extent that any frequently occurring stimulus will become more predictable, such frequently occurring stimuli will tend to accrue a positive emotional response. In effect, we now experience a positive outcome response for a previously neutral stimulus.

This phenomenon provides a plausible explanation for why the tonic pitch sounds "nicer" than other pitches. Similarly, this phenomenon provides a plausible explanation for why the "downbeat" is experienced as pleasurable. Viewed from the perspective of the outcome response, there is nothing to favor one pitch over another. There is nothing inherently more pleasurable about D4 than E4. However, when these tones appear in a context that leads to certain expectations, the expected pitch will be experienced as evoking a more positive valence.

Of course listeners don't simply prefer the tonic pitch to all other pitches. The tonic pitch as a passing tone in a dominant harmony doesn't evoke nearly the pleasure of that same tonic pitch terminating a final cadence. But recall that cadences are more predictable, and that the occurrence of the tonic at a final cadence is very predictable. What we mean by "tonality" is a system of relationships that increase the predictability of certain sounds in certain contexts, that evoke both a highly positive prediction response, as well as a positively valenced outcome response that arises from misattribution of predictability with certain outcomes.

Predictable Music

All of the foregoing discussion leads to an obvious problem. If positively valenced responses arise from predictability, then wouldn't the most enjoyable music be utterly banal? Wouldn't the best sounding music be entirely predictable?

Some music does seem to conform to this implication. For example, "trance" music, "minimalism" and "drone" music do seem to exhibit highly predictable structures. However, there is plenty of music that isn't so obviously predictable, yet it is enjoyable.

One consideration is the phenomenon of habituation. Simply repeating the tonic pitch ad infinitum will lead to a desensitization of the auditory response. Unless the stimulus is painful, organisms habituate to repeated stimuli.

However, the avoidance of habituation alone cannot explain the relative variety found in musical passages.

Response Interactions

Barbara Mellers and her colleagues have described an interesting phenomenon that might be regarded as an interaction between the prediction response and the outcome response. Consider the following experiment. Basketball players were asked to take shots from different positions around the court. Before each shot, the player was asked to estimate the likelihood of scoring a basket. Following each shot, the player was asked how good they feel. As you might expect, players are happiest when they make a shot and are unhappy when they miss a shot (i.e., positive and negative outcome responses). However, the degree of satisfaction/dissatisfaction is directly related to the player's expectation. The greatest unhappiness occurs when the player misses a shot that they judge to be "easy" and are happiest when they score a basket that is judged to have a low probability of success. In general, unexpected fortune or misfortune cause the greatest emotional responses. That is, low expectation amplifies the emotional response to the outcome.

This interaction has repercussions for how listeners experience sound. If a nominally unpleasant sound is not expected by the listener, then the sound will be perceived as even more unpleasant or annoying. Conversly, if a nominally pleasant sound is not expected by the listener, it will tend to be perceived as more pleasant. A lengthy atonal passage is likely to lead the listener to expect further atonal sonorities. Terminating an atonal passage with a major chord will tend to heighten the pleasing effect. However, from the perspective of expectation, the most negative auditory experiences will occur when uncertainty is high, when what you expect doesn't occur, and when the outcome is unpleasant.

Emotional Effect of Delay

A potent component to the tension response is delay. To this point, we have talked about tension principally in relation to the what of expectation. However, an important component of the tension response arises from the when of expectation.

We noted earlier that the tension response increases as the estimated outcome moment approaches. If the outcome occurs prior to the anticipated time, then the tension response will fail to have reached its peak. On the other hand, if the outcome is late, the tension response will reach a peak and may be sustained as we wait for the outcome to materialize. In short, delay tends to magnify the tension response.

Another way of thinking about delay is that it increases uncertainty. As we have seen, an unexpected good outcome generally evokes a more positive response than if the outcome is fully expected. Similarly, an unexpected bad outcome is generally more disappointing than if the bad outcome is expected. However, these basic relationships are influenced by the effect of delay. Suppose that there is a strong likelihood of a good outcome. If a delay ensues, the anticipation causes some doubt that the outcome will happen as expected. That is, delay provides opportunities to entertain doubts, and so delay has the effect of reducing the subjective probability. Consequently, a highly expected good outcome will evoke a greater positive response if preceded by a delay, since the delay, in effect, lowers the sense of certainty. Similarly, a highly expected bad outcome will evoke a less negative response if it ensues without delay. If a delay ensues, then the sense of inevitability will be tempered by thoughts that something might intervene to thwart the negative outcome.

As can be seen, the effect of delay is most marked when expectations are most certain. (We have the most to lose when we are virtually certain of a good outcome, and the most to gain when we are virtually certain of a bad outcome.) This means that the effect of delay in music will be greatest when applied to the most stereotypic, cliché, or predictable of events or passages.

Consider some of the most predictable aspects in Western music. The most predictable pitch is the tonic; the most predictable metric moment is the downbeat; the most predictable chord is the tonic chord; the most predictable diatonic pitch successions follow after the sixth and seventh scale degrees; phrase endings are among the most stereotypic (low information) musical moments in Western music.

The simplest and most direct form of delay is the rallantando or ritard. In most music, the greatest slowing occurs in the closing cadence of a work. Typically, this final cadence involves approaching the most predictable pitch, the most predictable chord, and the most predictable metric moment. Cadences are especially ripe points for delaying tactics.

Nor is it the case that cadences are delayed only by slowing the tempo. The history of Western music is repleat with cadential delaying tactics. Indeed, many of the most seminal harmonic techniques originated as cadential interlopers. This includes the addition of the subdominant pitch in the creation of the dominant seventh chord, the suspension, the cadential 6-4, augmented sixth chords, the Neapolitan sixth, the pedal tone, the augmented triad, the dominant ninth and thirteenth chords, the pre-terminal false modulation, the interminable terminating I chord, and the deceptive cadence. The number of ways of delaying the musical end is legion. This same phenomenon is evident in film, where the denouement is often rendered in slow motion.

In the late Romantic period, composers such as Richard Wagner established the elided phrase in which cadence moments were avoided: the anticipated cadence would instead begin the ensuing phrase. In some ways, this delaying tactic reached its apex in the twentieth century with the advent of the fade-out. In Gustav Holst's The Planets, the fade-out is achieved mechanically, but with electronic sound recording fade-outs became routine. With the fade-out, music manages to delay closure indefinitely.

Figure 29

Fig. 29. An early example of a "fade-out" ending. "Neptune, the Mystic" from Gustav Holst's The Planets (1914). The passage is for female chorus. The performance instruction reads: "The chorus is to be placed in an adjoining room, the door of which is to be left open until the last bar of the piece, when it is to be slowly and silently closed." "This bar to be repeated until the sound is lost in the distance."

The effect of delay, and the interactions between the prediction response and the outcome response are summarized in Table 5. The outcome-related affect is appraised as having either a positive, negative, or neutral valence. Positve outcomes are associated with opportunity and pleasure; negative outcomes are associated with threat and displeasure. The primary affect (or "expectancy-accuracy affect") is either expected or unexpected.

Table 5
Negative Neutral Positive
Expected Annoyance Resignation Sadness Crankiness Boredom Stability Repose Contentment Serenity Reassurance
Unexpected Disappointment Startle Defense Disgust Anger Interest Surprise Delight Joy Surprise Wonder Astonishment
Delayed Worry Foreboding Anxiety Tension Fear Orienting Attention Hope Craving Anticipation Savouring Relishing

Table 5 provides different descriptive labels for negative-delayed and positive-delayed states. However, the differences between these two states are probably less than suggested by these terms. The characteristic feeling evoked by both is a strong sense of uncertainty. We use the words "worry" and "hope" only to emphasize that the valence of the secondary affect.

One might think that the increased predictability of the embellishment tones runs contrary to Meller's work with the basket-ball players. Recall that the outcome response is especially high when the player makes a basket that is considered unlikely. This seems to suggest that uncertainty amplifies ... The important distinction is between the tension response and the prediction response. When a basket-ball player sinks a basket that was considered improbable, the tension response is muted: the player is reasonably certain that he will not score the basket. The prediction response is bad, but offset

In a study by John Sloboda from Keele University in England, music-lovers were asked to identify those musical passages they found most emotional. Sloboda (1991) found that "shivers down the spine" occurred most often in passages containing unexpected harmonies. Tears were most likely to be evoked by appoggiaturas or sequences of appogiaturas. Both of these experiences are consistent with the wonder, joy, and awe associated with unexpected positive outcomes. Examples might include an unexpected transposition, chromatic mediant chord, or sustained chord.

Note that the table includes the word "surprise" in both the unexpected/neutral cell and the unexpected/positive cell. In English, we don't have separate words to describe the distinction intended here. In the case of unexpected/positive surprise, we associate the experience with wonder, awe, fascination, or amazement. By contrast, the unexpected/neutral surprise might be associated with experiences such as stupefy, stun, confound, bewilder or flabbergast. Even these terms are a bit too negative to be associated with a neutrally valenced appraisal.

In general, unexpected and delayed events raise arousal levels, whereas expected events frequently lower arousal levels. Habituation is the epitome of an expected event. When a habituated stimulus has a neutral valence -- that is, when it is appraised as having no consequence -- then there is a tendency toward boredom. Environments that have little consequence are safe environments, so it is not surprising that boredom also tends to be associated with sleepiness: sleep is an appropriate behavior in safe environments.

When expected events have a positive valence, there is also a tendency toward a low arousal state. However, the positive valence is apt to engage a person ("entertain" in the sense of maintaining attention), and so contentment and serenity are less likely to produce sleepiness as quickly as is the case for boredom. Nevertheless, the safety of the environment is ultimately likely to progress to sleep.

Emotional responses happen in response to global expectations as well as local (within the music) expectations. Linda Dusman (1994), for example, has noted that when members of a concert audience are introduced to a new work that defies straightforward comprehension, listeners are disappointed.

Notice that the above taxonomy accounts for all the seven basic emotions identified by Lewis (1995): sadness, anger, disgust, fear, interest, surprise, and joy.

Expectation Shapes Mental Representations

As we noted earlier, expectations imply some sort of mental representation. The what of expectation must be expressed in some language. Listeners will expect a pitch, or a pitch-class, or a scale degree, or an interval, a chord function, a combination of duration and scale degree, etc. We also saw evidence of a variety of different representations, and that listeners may use a combination of representations.

Ideally, the best mental representation would be the one (or ones) that most accurately reflect the organization of the real world. If the real world is organized according to scale degrees, then scale degree would be an appropriate mental representation. If the real world is organized according to a combination of (say) pitch contour, metric position, and diatonic interval, then the most appropriate mental representation would echo this organization.

But how is a brain to know which representation is the best? How can an auditory system learn to discard one representation in favor of another? Here expectation may play a defining and perhaps essential role. Expectation is an omnipresent mental process; brains are constantly anticipating the future. Moreover, we have seen that there is good evidence for a system of rewards and punishments that evaluates the accuracy or our unconscious predictions about the world. A defective mental representation will necessarily lead to failures of prediction. Conversely, a mental representation that facilitates accurate predictions is likely to be retained. In effect, our mental representations are being perpetually tested by their ability to accurately predict ensuing events.

This claim carries an important implication. It suggests that the auditory system spontaneously is capable of generating several representations, from which the less successful can be eliminated. This in turn suggests that competing concurrent representations is the norm in mental functioning. It may well be that the brain begins by assuming a simple representation (such as absolute pitch). If the world is not organized in a manner consistent with absolute pitch (as in the persistent singing of `Happy Birthday' in different keys), then some other representation (such as interval or scale degree) will become more appropriate. However, any latent absolute pitch representation will be retained to the extent that it retains some value in predicting the future.

Expectation serves at least three functions: motivation, preparation, and representation. First, by anticipating future events, we may be able to take steps now to avoid engative outcomes or increase the likelihood of positive outcomes. That is, expectations have the capacity to motivate an organism. Second, even if we are unable to influence the course of future events, expectations allow us to prepare in appropriate ways. For example, we can adopt a state of arousal that is more suited to what is likely to happen next. We can also orient toward an anticipated stimulus, and so increase the speed and accuracy of future perceptions. That is, expectation allows us to prepare in advance suitable motor responses and craft suitable perceptual strategies. Finally, expectation provides the test-bed against different representations can be evaluated.


The main theoretical points of this study can be summarized as follows:

  1. The ability to anticipate future events is important for survival. It is reasonable to assume that evolution by natural selection has shaped perceptual and cognitive systems so that they endeavor to anticipate future events. "All brains are, in essence, anticipation machines." (Dennett, 1991; p.177).

  2. It is possible to form relatively accurate expectations only because real-world environments exhibit structure and are not totally chaotic.

  3. Some expectations are formed through conscious thought or reflection, as when a knowledgeable jazz listener anticipates a drum solo following a bass solo. However, most expectations are unconscious, automatic, and ubiquitous. We cannot "turn off" the mind's tendency to anticipate events, and we are usually unaware of the mind's disposition to make predictions. Except when we are surprised, or when the outcomes are important, we may not be cognizant of the specific predictions our minds make.

  4. Minds are disposed to anticipate all types of stimuli -- even those stimuli (like music) which appear to be unimportant for survival.

  5. Theoretically, expectations might have exclusively innate or learned origins. When an environment remains stable over millions of years, it is possible for efficient innate expectations to evolve. In hearing, innate functions are evident in such auditory reflexes as the orienting response. However, when an environment is highly variable, the capacity to form expectations through learning provides a better evolutionary strategy (Baldwin, 1896).

  6. The auditory environments in which humans evolved appear to have been highly variable. Sounds that in one context might indicate danger, might, in another context, indicate opportunity. Given the great variety of auditory contexts in human experience, it should not be surprising that the existing research implicates learning as the preeminent source of auditory expectations.

  7. Ideally, the principles underlying expectations would precisely reflect the actual principles that cause the environment to be a particular way (i.e., Shepard's complementarity).

  8. Whether innate or learned, expectations can be formed through exposure to an environment. Expectations arise through a process of induction, in which generalizations are formed from a finite number of specific experiences.

  9. Since inductive inference is known to be fallible, the generalizations formed through listener experience are also fallible. That is, the principles underlying expectations are likely to be imperfect approximations of the actual principles shaping the world (von Hippel, 2002).

  10. For a broad sample of melodies, several simple principles have been identified that appear to underly the objective organization. One principle is the tendency for successive pitches to be relatively close. Experienced listeners appear to form an appropriate expectation for pitch proximity. A second principle is for pitches to exhibit a central tendency. A mathematical consequence of central tendency is the phenomenon of regression-to-the-mean. However, experienced listeners do not form an appropriate expectation for melodic regression. Instead, experienced listeners expect post-skip reversal -- which is an approximation of melodic regression. A third principle is that large intervals tend to ascend. The more common repercussion is that small intervals tend to descend. However, experienced listeners do not form the appropriate expectations. Instead, experienced listeners expect step-inertia -- which appears to arise from a combination of the tendency for pitch proximity, and the tendency for intervals to descend.

  11. In a stable environment, the most frequently occurring events of the past are the most likely events to occur in the future. A simple yet optimum inductive strategy is to expect the most frequent event. The simple frequency of isolated events ("zereoth-order distribution") forms the foundation for learned expectations.

  12. An example of frequency-dependent learning in music is listener sensitivity to the distribution of scale degrees as documented by Krumhansl and elaborated by Aarden.

  13. In addition to zeroeth-order frequencies, listeners are also able to learn contingent frequencies of neighboring or co-occurring events. The distance separate contingent events can range from immediate neighbors to long-range relationships. In addition, contingent probabilities can be influenced by the number of prior events that combine to influence a particular ensuing event. These probability "frames" can range from a single preceding event (first-order probability), to many preceding events (higher-order probabilities).

  14. An example of contingent-frequency learning in music can be found in scale-degree successions, such as the tendency for chromatic tones to be anchored to neighboring diatonic tones.

  15. Expectations provoke emotional responses. Three response categories can be distinguished: (1) responses that precede the outcome (anticipatory affective responses), (2) responses evoked by the outcome itself (secondary affective responses), and (3) responses related to the accuracy of the expectation (primary affective responses). A positively valenced primary affect ensues when an expectation proves accurate, whereas a negatively valenced primary affect ensues when an expectation prove inaccurate.

  16. Expectations that prove to be correct represent successful mental functioning. Successful anticipations help us prepare appropriate motor responses, inhibit or suppress inappropriate responses, and better perceive ensuing stimuli. Successful expectations evoke a primary affective reward.

  17. Successful expectations can be measured. When a person's expectations are correct, they will be faster and more accurate in processing information related to the expectation. Accurate expectations can be regarded as functionally equivalent to perceptual priming.

  18. Expectations that prove to be incorrect represent failures of mental functioning. Unsuccessful expectations evoke a primary affective punishment in the form of stress.

  19. Stress is also evoked under situations of high uncertainty. That is, stress can ensue when we already anticipate that we will fail to anticipate events (negative anticipatory affect).

  20. Since successful predictions evoke a positive primary affective response, we may mistakenly attribute the positive feelings to the outcome itself. That is, we may prefer a predicted outcome.

  21. In addition, if we repeatedly make successful predictions for a given outcome, then the predicted outcome can itself become associated with the positive feelings.

  22. Since we are more likely to successfully predict high frequency events, it is high frequency events that tend to become associated with the primary affective reward that accompanies successful prediction. Over time, we come to prefer the high frequency events (expectancy effect).

  23. An example of the expectancy effect in music is the phenomenon of tonality. Once a tonal center is established, the listener will experience the tonic stimulus as more pleasant or preferable to other states.

  24. Another example of the expectancy effect is found in the phenomenon of meter. Once a metrical context is established, the listener will experience events that occur at the most expected moments to be more pleasant or preferable to other states.

  25. Emotions can also be evoked by the outcome itself. Outcomes might be a priori judged as positive, negative, or neutral. It is assumed that evoked emotions tend to slowly decay in intensity following the outcome.

  26. A sequence of events might evoke a mixed succession of positive and negative states. Since positively valenced states are preferred, it is advantageous for positive states to be sustained longer than negative states. Said another way, it would be advantageous for negative states to be quickly followed by a new state, whereas positive states would induce a delay before the next state.

  27. Successive events often occur in groups or segments, such as evident in phrases or entire works. In light of the above observations, listeners should prefer segments to be closed with a positive state since this increased the total positive valence.

  28. If most segments are terminated with a positive state, then listeners should learn to associate positive states with closure. Closure implies repose and stability. Therefore, frequently occurring states ought to u

  29. By way of summary, we can identify the following causal sequence:
    • frequently occurring events provide the best predictions for future states
    • since successful predictions are rewarded, frequently occurring events tend to become associated with positive emotions; (nominally "neutral" stimuli may thus acquire a positive valence)
    • it is preferrable for long-duration states to have a positive valence
    • by definition, the terminating event in a sequence is a long state; in creating a sequence of states, pleasure is increased if frequently occurring events tend to be placed at the ends of segments
    • through repeated exposure, terminating events become associated with closure and respose or stability; hence frequently occurring events tend to become associated with closure and repose/stability.
    In other words, frequently occurring events have a tendency to be (1) the most predicted stimulus, (2) the most preferred stimulus, (3) the stimulus that most implies closure, and (4) the stimulus most associated with repose or stability.

  30. While expected events are generally preferred, highly predictable environments can lead to reduced attention and lowered arousal -- often leading to sleepiness.

  31. Apart from the simple frequency of occurrence, we are also sensitive to the co-occurrences of various events. That is, we form expectations based on conditional probabilities.

  32. Most conditional probabilities reflect short-range moment-to-moment contingencies, as when one note tends to immediately follow another. However, long-range conditional probabilities may also be formed -- provided such long-range structures exist in the environment.

  33. Expectations can be learned dynamically. That is, listening to a passage can help listeners form expectations that arise uniquely from the immediately preceding experience.

  34. Regularities in the world are often evident only in particular contexts or environments. It is important for an organism to learn to distinguish these different environments, and to protect learned expectations within each context from the undue influence of learned associations that pertain to a different context (Cosmides & Tooby, 2000).

  35. Such cognitive firewalls permit listeners to distinguish different kinds of musical experiences. Learned expectations can be segregated into different expectational sets or "schemas."

  36. Due to lack of experience or possible cognitive deficits, it is possible that a listener fails to distinguish two forms of musical experience that other listeners experience as distinct kinds. A given listener might consequently experience a musical genre in a unique or idiosyncratic manner.

  37. Complex stimuli may unfold in an invariant way, as when we hear the succession of pitches of Happy Birthday. In this case we form veridical expectations -- given these eight notes, the ninth note will undoubtedly be ...

  38. Veridical expectations do not suppress the effects of schematic expectation (Bharucha). Schematic expectations are tenacious. This explains the apparent paradox of how some events can be both simultaneously surprising and unsurprising. For example, a wholly expected deceptive cadence doesn't entirely lose it's "deceptive" character.

  39. Schemas may include prediction rules, such as the rule that successive tones tend to be close in pitch. These rules arise because they are broadly successful in their predictions (though not infallible). Some prediction rules are sub-optimum. An example is the rule for post-skip reversals. This rule is generally successful in its predictions, however the rule merely approximates a more fundamental property of musical structure, namely that melodies tend to be constrained in their ranges. A regression-to-the-mean rule would allow listeners to better predict successive melodic pitches, however listeners appear to learn the less accurate post-skip reversal prediction rule.

  40. Expectations rely on underlying mental representations. Representations might include absolute pitch, pitch-class, scale degree, interval, contour, etc. Several representations may operate concurrently in the forming of expectations. It appears that not every listener has access to all of these representations. For example, people with absolute pitch are able to code events and expectations according to absolute pitch. A major difference between people who have AP and those who don't is that AP possessors heard musical works in early life that are always in the same key, whereas non-AP possessors typically experienced musical works in a multitude of keys. It is possible, as argued by Abramson at the beginning of the twenthieth century, that the practice of singing songs in different keys, reduces the value of coding absolute pitch, and so pitch height lost its predictive value for some listeners -- leading to the ignoring of pitch height information.

  41. Since more than one representation may be involved in forming expectations, an expectation may be mixed. For example, one element (such as pitch) may be highly unexpected, whereas another element (such as onset time) may be highly expected.

  42. When the circumstances are appropriate, listeners may come to expect the unexpected. That is, a sort of "reverse psychology" may arise. Twelve-tone music has been shown to be organized in a manner consistent with such reverse psychology.

  43. Paradoxical expectations can arise when schematic and veridical expectations differ.

  44. Different listeners may have different expectations. Individual differences may be attributable to four possible sources. (1) Listeners may differ in their underlying representation codes. For example, one listener may favor an absolute pitch representation, whereas another listener favors a scale degree representation. (2) Listeners differ in the exposure to music, and so some listeners may have had less opportunity to develop appropriate schemas. (3) A listener may fail to distinguish expectational sets that may be appropriate for different genres of music. For example, as Krumhansl has shown, a listener may continue to apply a tonal schema to an atonal listening experience. (4) Listeners may differ in the accuracy of the prediction rules. For example, it is theoretically possible that a listener experiences melodic contours in accordance with the regression-to-the-mean rule rather than the post-skip-reversal rule. (5) It is theoretically possible that existing schemas may prevent a listener from distinguishing a separate schema. For example, a hypothetical scale schema `B' might interfere with the acquiring of a similar (yet distinct) schema `A'. A listener who acquires schema `A' first may retain the ability to acquire schema `B', whereas a listener who acquires schema `B' first may be incapable of acquiring schema `A'. For example, Meyer (1956; p.46) cites the Fox Strangways who claims that some Indian music uses a scale that is very similar to the Western major scale, yet the "tonic" pitches do not coincide. The Western listener may therefore hold expectations that are wholly inappropriate to the Hindustani music (Fox Stangways, 1914; p.18).

  45. The psychological responses to expectation can be classified into four categories. In the pre-outcome phase, an individual might imagine different possible outcomes and vicariously experience some of the feelings that would expected for each outcome. This imaginative response provides an important mechanism for motivating an individual to take courses of action that increase the likelihood of a positive outcome.

  46. Also in the pre-outcome phase, appropriate arousal and attention states need to be evoked in preparation for the outcome. This tension response tailors the arousal and attention to match the degree of uncertainty and the importance of the possible outcomes. Obvious and inconsequential outcomes will evoke little response. Highly important yet uncertain outcomes will evoke a significant response. The response becomes more marked as the anticipated moment of the outcome approaches. The tension response is commonly manifested as stress.

  47. In the post-outcome phase, the accuracy of an individual's predictions are appraised in the prediction response. A positive response will occur when the outcome matches the individual's expectation. A negative response arises when the outcome is unexpected.

  48. Finally, an emotional response will be evoked according to an appraisal of the final outcome state. A positive outcome response will arise if the outcome is positively appraised.

  49. Primary and secondary affective responses interact. Highly predictable outcomes evoke less response than highly unpredictable outcomes. For example, an unexpected positive outcome will feel better than a highly expected positive outcome. Similarly, an unexpected negative outcome will feel worse than a highly expected negative outcome. In effect, increased uncertainty tends to amplify the aggregate affective response.

  50. The delaying of an outcome has the effective of decreasing its certainty. Consequently, delay amplifies the aggregate affective response. The effect of delay is most marked when events seem to be most certain.

  51. Many performance and compositional techniques can be regarded as efforts to delay expected outcomes. Such delaying techniques tend to be used in the most stereotypic musical passages.

  52. The fact that learning plays a preeminent role in forming expectations, in addition to the fact that expectations can adapt dynamically to ongoing stimuli, suggests that there exist considerable opportunities to craft a range of musics for which listeners may form appropriate expectations.

A number of questions remain to be addressed in future research concerning musical expectations. Perhaps the premiere unresolved question concerns the nature of the mental representations that underly musical expectations. What do listeners expect? Do they expect intervals, pitches, pitch-classes, scale degrees, scale degree successions, contours, rhythms, pitch-rhythms, etc. The existing research provides evidence that mental representations for music consist of a complex combination of musical elements. There is also evidence that different listeners may make use of different representations.

Under what circumstances are new expectational sets formed. That is, when will the auditory system erect a cognitive firewall to allow the formation of a new music-related schema? Is is possible for past listening experiences to prevent a listener from forming a new musical schema? Is it possible, for example, with the right regime of musical exposure, for a modern listener to form a truly "medieval" way of hearing early music?

Finally, what types of musical structures or principals of organization will fail to evoke appropriate learning? [8]


Aarden, B. (2001). An empirical study of chord-tone doubling in common era music. Masters thesis. School of Music, Ohio State University.

Aarden, B. (2002). Expectancy vs. retrospective perception: Reconsidering the effects of schema and continuation judgments on measures of melodic expectancy. Proceedings of the 7th International Conference on Music Perception and Cognition, C. Stevens, D. Burnham, G. McPherson, E. Schubert, J. Renwick (Eds.). Adelaide: Causal Productions, pp.??-??.

Abe, J., & Oshino, E. (1990) (1990). Schema driven properties in melody cognition: Experiments on final tone extrapolation by music experts. Psychomusicology, Vol. 9, pp. 161-172.

Baldwin, J.M. (1896). A new factor in evolution. American Naturalist, Vol. 30, pp. 441-451, 536-553.

Baldwin, J.M. (1909). Darwin and the Humanities. Baltimore: Review Publishing.

Barnes, R. & Jones, M.R. (in press). Expectancy, attention, and time. Cognitive Psychology.

Berger, J. (1990). A theory of musical ambiguity. Computers in Music Research, Vol. 2, pp. 91-119.

Bharucha, J. (1984). Anchoring effects in music: The resolution of dissonance. Cognitive Psychology, Vol. 16, pp. 485-518.

Bharucha, J. (1987). MUSACT: A connectionist model of musical harmony. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Hillsdale, NJ: Erlbaum, pp. 508-517.

Bharucha, J. (1994). Tonality and expectation. In R. Aiello (Ed.), Musical Perceptions, Oxford: Oxford University Press, pp. 213-239.

Bharucha, J. (1996). Melodic anchoring. Music Perception, Vol. 13, No. 3, pp. 383-400.

Bharucha, J.J., & Stoeckig, Keiko. (1987). Priming of chords: Spreading activation or overlapping frequency spectra? Perception & Psychophysics, Vol. 41, No. 6, pp. 519-524.

Bigand, E., & Pineau, M. (1997). Global context effects on musical expectancy. Perception & Psychophysics, Vol. 59, No. 7, pp. 1098-1107.

Bornstein, R.F. (1989). Exposure and affect: Overview and meta-analysis of research, 1968-1987. Psychological Bulletin, Vol. 106, No. 2, pp. 265-289.

Carlsen, J.C. (1981). Some factors which influence melodic expectancy. Psychomusicology, Vol. 1, pp. 12-29.

Chernoff, J.M. (1979). African Rhythm and African Sensibility: Aesthetics and Social Action in African Musical Idioms. Chicago: University of Chicago Press.

Cohen, A.J. (1982). Exploring the sensitivity to structure in music. Canadian University Music Review, Vol. 3, pp. 15-30.

Cohen, J.E. (1962). Information theory and music. Behavioral Science, Vol. 7, No. 2, pp. 137-163.

Cook, N. (1987). The perception of large-scale tonal closure. Music Perception, Vol. 5, No. 2, pp. 197-206.

Coons, E., & Kraehenbuehl, D. (1958). Information as a measure of structure in music. Journal of Music Theory, Vol. 2, pp. 127-161.

Cosmides, L., & Tooby, J. (2000). Consider the source: The evolution of adaptations for decoupling and metarepresentations. In: D. Sperber (Ed.) Metarepresentations: A Multidisciplinary Perspective. Oxford: Oxford University Press, pp. 53-115.

Cuddy, L.L., & Lunney, C.A. (1995). Expectancies generated by melodic intervals: Perceptual judgments of melodic continuity. Perception & Psychophysics, Vol. 57, No. 4 pp. 451-462.

Damasio, A. (1994). Descartes' Error: Emotion, Reason, and the Human Brain. G.P. Putnam's Sons.

Dennett, D.C. (1991). Consciousness Explained. Boston: Little, Brown.

Deutsch, D. (1978). Delayed pitch comparisons and the principle of proximity. Perception & Psychophysics, Vol. 23, pp. 227-230.

Dowling, W.J. (1967). Rhythmic fission and the perceptual organization of tone sequences. Unpublished doctoral dissertation, Harvard University, Cambridge, MA.

Dowling, W.J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological Review. Vol. 85, pp. 341-354.

Dowling, W.J., & Harwood, D.L. (1986). Music Cognition. San Diego: Academic Press.

Dusman, L. (1994). Unheard-of: Music as performance and the reception of the new. Perspectives of New Music, Vol. 32, No. 2, pp. 130-146.

Federman, F. (1996). A study of various representations using NEXTPITCH: A learning classifier system. PhD dissertation, City University of New York.

Feldmann, C.M. (1997). Erwartungsdiskrepanz und emotionales Erleben von Musik. PhD dissertation, Ruprecht-Karls-U., Heidelberg. Hildesheim: Olms. ISBN: 3-487-10710-4.

Fox Stangways, A.H. (1914). The Music of Hindostan. London: Oxford University Press.

Francès, R. (1988). La perception de la musique. (1958) Translated by W.J. Dowling as The Perception of Music Hillsdale, N.J.: Lawrence Erlbaum Associates.

Gallistel, C.R. (1990). The Organization of Learning. Cambridge, MA: MIT Press.

Gardner, M. (1978). Mathematical Games -- white and brown music, fractal curves and one-over-f fluctuations. Scientific American, Vol. 238, No. 4, pp. 16-32.

Gjerdingen, R.O. (1988). A Classic Turn of Phrase: Music and the Psychology of Convention. Philadelphia: University of Pennsylvania Press.

Goldstone, J.A. (1979). A general mathematical theory of expectation models of music. PhD dissertation, University of Southern California.

Gotlief, H., & Konecni, V.J. (1985). The effects of instrumentation, playing style, and structure in the Goldberg variations by Johann Sebastian Bach. Music Perception, Vol. 3, No. 1, pp. 87-102.

Granot, R., & Donchin, E. (2002). Do re mi fa sol la ti - Constraints, congruity, and musical training: An event-related brain potentials study of musical expectancies. Music Perception, Vol. 19, No. 4, pp. 487-528.

Greenberg, D., & MacMillan, S. (1996). Bach Meets Cape Breton. [Audio CD]. Canada: Marquis Records #181. ASIN: B000003WHK.

Hansen, F. (1969). Musical meaning as pointing. Music Review, Vol. 30, No. 4, pp. 300-307.

Hasher, L., & Zacks, R.T. (1984). Automatic processing of fundamental information. American Psychologist, Vol. 39, pp. 1372-1388.

Hick, W.E. (1952). On the rate of gain of information. Quarterly Journal of Experimental Psychology, Vol. 4, pp. 11-26

von Hippel, P. (1998). Post-skip reversals reconsidered: Melodic practice and melodic psychology. PhD dissertation, Stanford University.

von Hippel, P. (2000). Redefining pitch proximity: Tessitura and mobility as constraints on melodic intervals. Music Perception, Vol. 17, No. 3, pp. 315-327.

von Hippel, P. (2000). Questioning a melodic archetype: Do listeners use gap-fill to classify melodies? Music Perception, Vol. 18, No. 2, pp. 139-153.

von Hippel, P., & Huron, D. (2000). Why do skips precede reversals? The effect of tessitura on melodic structure. Music Perception, Vol. 18, No. 1, pp. 59-85.

Huron, D. (2001a). What is a musical feature? Forte's analysis of Brahms's Opus 51, No. 1, Revisted. Music Theory Online, Vol. 7, No. 4. Online text.

Huron, D. (2001b). Tone and Voice: A derivation of the rules of voice-leading from perceptual principles. Music Perception, Vol. 19, No. 1, pp. 1-64. Online text.

Huron, D., & von Hippel, P. (2000). Tonal and Contra-tonal structure of Viennese twelve-tone rows. Paper presented at the Society for Music Theory Conference, Toronto, Canada.

Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal of Experimental Psychology, Vol. 45, pp. 423-432.

Jairazbhoy, N. (1971). The Rags of North Indian Music: Their Structure and Evolution. London: Faber & Faber.

Jones, M.R. (1981). Music as a stimulus for psychological motion: Part 1. Some determinants of expectancies. Psychomusicology, Vol. 1, pp. 34-51.

Jones, M.R. (1982). Music as a stimulus for psychological motion: Part 2. An expectancy model Psychomusicology, Vol. 2, pp. 1-13.

Jones, M.R. (1990). Learning and the development of expectancies: An interactionist approach. Psychomusicology, Vol. 9, No. 2, pp. 193-228.

Jones, M.R. (1992). Attending to musical events. In: M.R. Jones & S. Holleran (eds.) Cognitive Bases of Musical Communication. Washington, DC: American Psychological Association; pp. 91-110.

Jones, M.R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, Vol. 96, pp. 459-491.

Jones, M.R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context. Perception & Psychophysics, Vol. 32, pp. 211-218.

Jones, M.R., Moynihan, H., MacKenzie, N., & Puente, J. (in press). Temporal aspects of stimulus-driven attending in dynamic arrays. Psychological Science.

Kelly, M.H., & Martin, S. (1994). Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in perception, cognition, and language. Lingua, Vol. 92, pp. 105-140.

Karno, M., & Konecni, V.J. (1992). The effects of structural interventions in the first movement of Mozart's symphony in G minor K.550 on aesthetic preference. Music Perception, Vol. 10, No. 1, pp. 63-72.

Kraehenbuehl, D., & Coons, E. (1959). Information as a measure of the experience of music. Journal of Aesthetics & Art Criticism, Vol. 17, pp. 510-522.

Kramer, J.D. (1982). Beginnings and endings in Western art music. Canadian University Music Review/Revue de musique des universites canadiennes, Vol. 3, ????

Krumhansl, C. (1979). The psychological representation of musical pitch in a tonal context. Cognitive Psychology, Vol. 11, pp. 346-374.

Krumhansl, C. (1990). Cognitive Foundations of Musical Pitch. Oxford: Oxford University Press.

Krumhansl, C. (1995). Effects of musical context on similarity and expectancy. Systematische Musikwissenschaft/Systematic Musicology/Musicologie systematique, Vol. 3, No. 2, pp. 211-250.

Krumhansl, C. (1999). Effects of perceptual organization and musical form on melodic expectancies. In M. Leman (Ed.), Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology. Berlin: Springer Verlag, pp. 294-320.

Krumhansl, C., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, Vol. 89, pp. 334-368.

Krumhansl, C., Sandell, G.J., & Sergeant, D.C. (1987). The perception of tone hierarchies and mirror forms in twelve-tone serial music. Music Perception, Vol. 5, pp. 153-184.

Kunst-Wilson, W.R., & Zajonc, R.B. (1980). Affective discrimination of stimuli that cannot be recognized. Science, Vol. 207, pp. 557-558.

Lang, P.J., Simons, R.F., & Balaban, M.T. (Eds.). (1997). Attention and Orienting: Sensory and Motivational Processes. Mahwah, New Jersey: Lawrence Erlaum Associates.

Larson, S. (1999). Continuations as completions: Studying melodic expectation in the creative microdomain Seek Well. In M. Leman (Ed.), Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology. Berlin: Springer Verlag, pp. 321-334.

Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music, Cambridge, MA: MIT Press.

Lewis, M. (1995). Self-conscious emotions. American Scientist, Vol. 83, No. 1, pp. 68-78.

Lieberman, P. (1967) Intonation, Perception and Language. Cambridge, MA: MIT Press.

Lord, A. (1960). The Singer of Tales. Cambridge, MA: Harvard University Press.

Mandler, G. (1975). Mind and Emotions. New York: J. Wiley.

Manzara, L.C., Witten, I.H., & James, M. (1992). On the entropy of music: an experiment with Bach chorale melodies. Leonard Music Journal Vol. 2, No. 1, pp. 81-88.

McCredie, A.D. (1983). Some concepts, constructs, and techniques in comparative literature and their interface with musicology. International Review of the Aesthetics and Sociology of Music, Vol. 14, No. 2, pp. 147-165.

Merriam, A.P., Whinery, S., & Fred, B.G. (1956). Songs of a Rada community in Trinidad. Anthropos 51, 157-174.

Meyer, L.B. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.

Miyazaki, K. (1990). The speed of musical pitch identification by absolute-pitch possessors. Music Perception, Vol. 8, No. 2, pp. 177-188.

Moles, A. (1958/1966). Théorie de l'information et perception esthétique Paris, 1958. Trans. as Information Theory and Aesthetic Perception. Urbana, IL: University of Illinois Press, 1966.

Moreland, R.L., & Zajonc, R.B. (1977). Is stimulus recognition a necessary condition for the occurrence of exposure effects? Journal of Personality and Social Psychology, 35, 191-199.

Moreland, R.L., & Zajonc, R.B. (1979). Exposure effects may not depend on stimulus recognition. Journal of Personality and Social Psychology, 37, 1085-1089.

Narmour, E. (1990). The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model. Chicago: University of Chicago Press.

Narmour, E. (1991). The influence of embodied registral motion on the perception of higher-level melodic implication. In M.R. Jones & S. Holleran (Eds.), Cognitive Bases of Musical Communication. Washington, DC: American Psychological Association, pp. 69-90.

Narmour, E. (1992). The Analysis and Cognition of Melodic Complexity: The Implication-Realization Model. Chicago: University of Chicago Press.

Narmour, E. (1999). Hierarchical expectation and musical style. In D. Deutsch (Ed.), The Psychology of Music. 2nd edition. San Diego: Academic Press, pp. 441-472.

Narmour, E. (2000). Music expectation by cognitive rule-mapping. Music Perception, Vol. 17, No. 3, pp. 329-398.

Olson, J.M., Roese, N.J., & Zanna, M.P. (1996). Expectancies. In: Higgens, E.T. & Kruglanski, W. (eds.) Social Psychology: Handbook of Basic Principles. New York: Guilford Press, pp. 211-238.

Ortmann, O.R. (1926). On the melodic relativity of tones. Princeton, NJ: Psychological Review Company. (Vol. 35, No. 1 of Psychological Monographs.)

Palmer, C., & Krumhansl, C. (1990). Mental representations for musical meter. Journal of Experimental Psychology: Human Perception and Performance, Vol. 16, No. 4, pp. 728-741.

Parry, M. (1971). The Making of Homeric Verse: The Collected Papers of Milman Parry. Oxford: Clarendon Press.

Perrott, D., & Gjerdingen, R.O. (1999). Scanning the dial: An exploration of factors in the identification of musical style. Paper presented at the Society for Music Perception and Cognition Conference, Evanston, IL.

Pike, K.L. (1945) The Intonation of American English. Ann Arbor, Michigan: University of Michigan Press.

Pinkerton, R.C. (1956) Information theory and melody. Scientific American, Vol. 194, No. 2, pp. 77-86.

Reber, A.S. (1993). Implicit Learning and Tacit Knowledge: An Essay on the Cognitive Unconscious. Oxford: Oxford University Press.

Regnault, P., Bigand, E., & Besson, M. (1999). Sensory and cognitive influences on musical expectancy revealed by event-related potentials. Psychophysiology, Vol. 36, S17-S17, Supplement 1.

Reimer, B. (1964). Information theory and the analysis of musical meaning. Council for Research in Music Education, Vol. 2, pp. 14-22.

Rosch, E. (1973). On the internal structure of perceptual and semantic categories. In T. Moore (ed.), Cognitive development and the acquisition of language. New York: Academic Press.

Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, Vol. 7, pp. 532-547.

Rosner, B.S., & Meyer, L.B. (1982). Melodic processes and the perception of music. In D. Deutsch (Ed.), The Psychology of Music. New York: Academic Press, pp. 317-341.

Saffran, J.R., Newport, E.L., & Aslin, R.N. (1996). Word segmentation: the role of distributional cues. Journal of Memory and Language, Vol. 35, pp. 606-621.

Saffran, J.R., Johnson, E.K., Aslin, R.N., & Newport, E.L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, Vol. 70, pp. 27-52.

Schellenberg, E.G. (1996). Expectancy in melody: Tests of the implication-realization model. Cognition, Vol. 58, pp. 75-125.

Schellenberg, E.G. (1997). Simplifying the implication-realization model. Music Perception, Vol. 14, No. 3, pp. 295-318.

Schmuckler, M.A. (1988). Expectation in music: Additivity of melodic and harmonic processes. PhD dissertation, Cornell University.

Schmuckler, M.A. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception, Vol. 7, pp. 109-150.

Schmuckler, M.A. (1990). The performance of global expectation. Psychomusicology, Vol. 9, pp. 122-147.

Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, Vol. 27, pp. 379-423, 623-656.

Shannon, C.E., & Weaver, W. (1949). The Mathematical Theory of Communication., Urbana, IL: University of Illinois Press.

Shepard, R.N. (1981). Psychophysical complementarity. In: M. Kubovy & J.R. Pomerantz (eds.), Perceptual Organization., Hillsdale, NJ: Erlbaum, pp. 279-341.

Simpson, J. (1996). A formal analysis of note-interdependence in selected works. Unpublished manuscript. Available online text.

Simpson, J., & Huron, D. (1994). Absolute pitch as a learned phenomenon: Evidence consistent with the Hick-Hyman law. Music Perception, Vol. 12, No. 2, pp. 267-270.

Sloboda, J.A. (1991). Music structure and emotional response: Some empirical findings. Psychology of Music, Vol. 19, No. 2, pp. 110-120.

Sloboda, J.A. (1992). Empirical studies of emotional response to music. In: M.R. Jones & S. Holleran (eds.), Cognitive Bases of Musical Communication. Washington, DC: American Psychological Association, pp.33-50.

Takeuchi, A.H., & Hulse, S.H. (1991). Absolute pitch judgments of black- and white-key pitches. Music Perception, Vol. 9, pp. 27-46.

Tekman, H.G. (1998). Effects of melodic accents on perception of intensity. Music Perception, Vol. 15, No. 4, pp. 391-401.

Tekman, H.G. (2001). Accenting and detection of timing variations in tone sequences: Different kinds of accents have different effects. Perception & Psychophysics, Vol. 63, No. 3, pp. 514-523.

't Hart, J., Collier, R., & Cohen, A. (1990). A Perceptual Study of Intonation; An Experimental-phonetic Approach to Speech Melody. Cambridge: Cambridge University Press.

Thompson, W.F. Balkwill, L.L., & Vernescu, R. (2000). Expectancies generated by recent exposure to music. Memory and Cognition, Vol. 28, No. 4, pp. 547-555.

Thompson, W.F. Cuddy, L.L., & Plaus, C. (1997). Expectancies generated by melodic intervals: Evaluation of principles of melodic implication in a melody production task. Perception & Psychophysics, Vol. 59, No. 7, pp. 1069-1076.

Thompson, W.F., & Stainton, M. (1998). Expectancy in Bohemian folk song melodies: Evaluation of implicative principles for implicative and closural intervals. Music Perception, Vol. 15, No. 3, pp. 231-252.

Tillmann, B., Bigand, E., & Pineau, M. (1998). Effects of global and local contexts on harmonic expectancy. Music Perception, Vol. 16, No. 1, pp. 99-117.

Toiviainen, P., Krumhansl, C., Luohivuori, J., Jarvinen, T., & Eerola, T. (1999). Melodic expectation in Finnish spiritual folk hymns: Convergence of statistical, behavioral, and computational approaches. Music Perception, Vol. 17, No. 2, pp. 151-195.

Unyk, A.M. (1990). An information-processing analysis of expectancy in music cognition. Psychomusicology, Vol. 9, No. 2, pp. 229-240.

Unyk, A.M., & Carlsen, J.C. (1987). The influence of expectancy on melodic perception. Psychomusicology, Vol. 7, pp. 3-23.

Vos, P.G., & Troost, J.M. (1989). Ascending and descending melodic intervals: Statistical findings and their perceptual relevance. Music Perception, Vol. 6, No. 4, pp. 383-396.

Voss R.F., & Clarke J. (1975). 1/f noise in music and speech. Nature, 258, pp. 317-318.

Voss R.F., & Clarke J. (1978). 1/f noise in Music: Music from 1/f Noise. Journal of the Acoustical Society of America, Vol. 63, No. 1, pp. 258-263.

Watt, H.J. (1924). Functions of the size of interval in the songs of Schubert and of the Chippewa and Teton Sioux Indians. British Journal of Psychology, Vol. 14, pp. 370-386.

Werbik, H. (1969). L'indétermination et les qualités impressives des modèles stimulants mélodiques. [Uncertainty and the affective qualities of melodic stimuli.] Sciences de l'Art, 1-2, pp. 25-37.

Wilson, W.R. (1975). Unobtrusive induction of positive attitudes. Ph.D. dissertation, University of Michigan.

Wilson, W.R. (1979). Feeling more than we can know: Exposure effects without learning. Journal of Personality and Social Psychology, Vol. 37, pp. 81-821.

Wittgenstein, L. (1966). Lectures and Conversations on Aesthetics, Psychology and Religious Belief. Compiled from notes taken by Yorick Symthies, Rush Rhees, and James Taylor; Cyril Barret (Ed.). Oxford: Blackwell.

Wong, A.K.C., & Ghahraman, D. (1975). A statistical analysis of interdependence in character sequences. Information Sciences, Vol. 8, pp. 173-188.

Youngblood, J.E. (1958). Style as information. Journal of Music Theory, Vol. 2, pp. 24-35.


Listening is not a passive activity where we simply classify successive stimuli as we encounter them. Listening is an active process. When we listen to a spoken sentence, for example, we formulate hypotheses about what is being said. We anticipate what will happen next. The context of an utterance prepares us for possible outcomes. We may already have an idea of what someone will say before they begin to speak.

In the case of music, we need only a few seconds of exposure to situate a musical work according to genre, tempo, meter, and so forth. Within three or four seconds, we will know whether the music is fast or slow, whether the key is major or minor, and whether it is baroque, bebop, big-band or blue-grass. Within a few more seconds, we will have a good intuition of the scenario of the music -- what is likely to happen, how the work may end, etc. Such scenarios act like "templates" that help us to orient ourselves during the listening experience.

Through years of listening experiences, each listener develops a repertoire of such possible scenarios. Such mental preconceptions of the normal course of events are referred to as schemas.* More precisely, a schema may be defined as a knowledge structure that arises from past experience, and which influences how we perceive and interpret current events. In a sense, schemas are like archetypal "stories" -- such as love stories, tragedies, horror, comedies, etc. Whether or not we are consciously aware of it, we will have intuitions about what is likely to happen in these stories. For example, love stories always have some impediment that must be overcome in order for the lovers to get together. Both action films and comedy films tend to have a chase scene near the end.

Schemas don't simply apply to the overall patterns in musical works. Individual phrases, and even note-to-note successions tend to follow certain norms. James Carlsen and his colleagues have carried out a number of experiments mapping-out what listeners expect to happen next -- given various antecedent musical events.


When experiencing music, listeners are not merely passive observers. At an unconscious level, listeners form expectations about what will happen next. Some of these expectations are obvious to a listener. That is, the fact that we are expecting something rises into consciousness, and we are aware that we are expecting something. These expectations are based on our past experiences. -innate, learned, veridical, schematic, enculturated, Not all learned schematic expectations are "cultural." Some will arise from idiosyncratic personal listening habits. For example, a lover of Bebop jazz may form some Bebop-specific expectancies. Yet our Bebop lover may have little or no interaction with other Bebop fans, and so the social or group component so commonly regarded as the touch-stone of "culture" may be absent. Other learned expectations arise from stimuli that are commonplace throughout the world and so learned-by-transcultural. For example, with the exception of Swiss yodelling, "melodies" throughout the world have a strong tendency for small pitch motions ("pitch proximity"). Therefore, an expectation for small pitch intervals cannot be considered "cultural," even though it is learned. Expectations are evident -Meyer (1956) -tendency tones (play a scale up to `ti') -expectation -expectation dissonance -schematic and veridical expectations

Veridical Expectations

An expectation that arises due to knowledge about a specific stimulus, such as familiarity with a given musical work. When a listener expects a certain note in a well-known song, the expectation may be regarded as veridical. By contrast, when a listener exhibits a general expectation for the leading-tone to be followed by the tonic, the expectation is regarded as a schematic expectation.

Schematic Expectations

An expectation that arises due to the existence of a mental schema. When a listener has a general expectation for the leading-tone to be followed by the tonic, the expectation may be regarded as schematic. Contrast with veridical expectation. Paul von Hippel carried out a detailed experiment on melodic expectation For large melodic intervals, musician listeners expect a change of direction. A large ascending leap, for example, causes an expectation for an ensuing lower pitch. For small melodic intervals (1 or 2 semitones), there is an expectation for the melody to continue in the same direction. The following graph illustrates these expectations for musician listeners.

Six general questions are addressed: (1) What is the biological purpose of forming expectations? (2) What aspects of musical organization do listeners anticipate? (3) How are expectations formed? (4) Do all listeners form the same expections, and if not, what accounts for the differences? (5) Are listeners' expectations accurate? and (6) How do expectations evoke emotional responses for music listeners?

It is argued that the common perceptual and emotional phenomena associated with expectation originate in the evolution of the auditory system. Many musical works are likely organized so as to evoke emotional responses that, in part, arise due to expectation-related manipulations.

Forming accurate expectations about the world is important for an organism. Like other animals, humans learn from our environments, and we form expectations of future events based on our exposure to past events. Insofar as possible, these expectations should accurately reflect reality.

When listening to music, listeners form expectations about possible future events. Our expectations are learned, but the propensity to form such expectations is innate and unconscious. As listeners, we will form musical expectations whether we want to or not. Even people who, due to injury, have lost their long-term memory, continue to learn to form new expectations on the basis of exposure.

In discussing musical expectations, we need to address a number of questions. Five questions are especially central. We have already addressed the question of why expectations exist in the first place. Second, what features of the music do listeners learn to anticipate? Third, do all listeners form the same expections, and if not, what accounts for the differences? Fourth, are our expectations accurate? What happens when, like the pacific bull-frog, our expectations prove faulty? And finally, what are the psychological consequences of forming accurate or inaccurate anticipations? Specifically, how might the dance of expectations lead to different emotions?

In recent years, psychological research has illuminated a number of aspects of musical expectations. Most of the research has focussed on melody and melodic expectations, but we'll also address harmony and rhythm later in our discussion.

Schematic versus Veridical Expectations

Of course familiarity with a single piece changes the experience of listening to the work itself. Clearly, a listener has nearly "perfect" expectations for highly familiar pieces, such as Happy Birthday. Cognitive psychologists distinguish two types of memory (and expectations): veridical and schematic. A veridical memory is a memory for a passage associated with a specific work. For example, the G-G-G-Eb motive is unique to Beethoven's Symphony No. 5. A schematic memory is a memory for a commonplace passage. For example, the pitch sequence do-ti-do occurs in a large number of works.

To illustrate the difference between veridical expectations and schematic expectations consider the following English phrases:

1. Four score and seven years ago ...
2. Once upon a time ...
The first passage is quoted from Lincoln's "Gettysburg Address" and is unique to the passage. The second passage is just as well-known, but is not unique to a particular story or fable. Most people are aware that a number of passages begin "Once upon a time" whereas there is only one continuation for "Four score and seven years ago".

When a work is perfectly known to some listener, what does it mean to have expectations? A classic problem is how a deceptive cadence can continue to sound "deceptive" when familiarity with a work makes the progression inevitable?

Having distinguished schematic versus veridical expectations, let me now withdraw and refine this distinction. There is nothing to suggest that veridical and schematic expectations are fundamentally different. A better way to think about veridical expectations is that they simply describe Markov chains containing long sequences where the note-to-note transitional probabilities equal 1.0 (or nearly so). In other words, given a specific sequence of N notes the listener's past exposure suggests a probability of nearly 1.0 for some given continuation. In short, what we have called veridical expectations are simply comparatively long stable sequences, whereas schematic expectations are shorter sequences that might have two or three plausible continuations.

One piece of evidence in support of this claim can be found in the sorts of memory errors often seen when amateur musicians play recitals or auditions. Many musical works have long sections that are repeated. A nervous performer sometimes lapses into a memory loop where they play the same passage verbatim without taking a "second ending" or otherwise continuing as they should with the rest of the piece. In effect, the music contains a long Markov chain with transitional probabilities of 1.0. However, there are boundary points where the music provides two or three choices of what should happen next. The nervous performer appears unable to break out of the chain -- seemingly perpetually doomed to take the highest probability path. Said another way, the performer's representation for the work is not truly veridical: the music is not represented as a single linear sequence of events from beginning to end. Rather, there are periodic points where the conditional probabilities are significantly less than one, and some cognitive choice must be made.

The only difference between a veridical coding and a schematic coding is the size of the coded segments, and the fact that schematic transitions are less determinate for veridical expectations. It should not at all be surprising that many long sequences of states are unique to given musical works. Given the explosion of possible combinations for a modest number of successive events, it does not take many notes to uniquely identify one particular piece.

The point of this discussion is to note that while listeners have memories for the sequences of events that constitute an entire musical work, these memories are not qualitatively different from the memories we have for typical baroque figures, common jazz riff elements, or stereotypic country & western harmonies. The work of Parry (1971) and Lord (1960) concerning the centonization of ballads and legends similarly suggests that the way we construe "a work" may still leave considerable statistical latitude for the choice of particular segments as the work is "filled in" during performance. Finally, introspection tells us that our memory for many musical works really amounts to a handful of memorable passages. When we attempt to hum all the way through, say, Dvorak's New World Symphony, we find ourselves skipping large segments, or repeating ourselves in the same manner as the nervous recitalist.

The Passing Tone

Outcome Predictive Tension
pre-passing-tone consonant - moderate to low tension
passing-tone dissonant moderate predictive success due to proximity somewhat low tension; might return (neighbor tone) or continue in same direction
resolving consonant high predictive success -


Outcome Predictive Tension
pre-resolution consonant - moderate to low tension; relatively strong expectation of the ensuing resolving pitch
resolution consonant moderate predictive success -

More Material

Research has established that the primary and secondary affective responses interact with each other. These interactions are illustrated in Table 5 which provides a taxonomy of limbic/emotional responses commonly evoked by different circumstances. The secondary affect (or "outcome-related affect") is appraised as having either a positive, negative, or neutral valence. Positive outcomes are associated with opportunity and pleasure; negative outcomes are associated with threat and displeasure. The primary affect (or "expectancy-accuracy affect") is either expected or unexpected. (Later we will discuss the consequences of delay.)

The interactions between primary and secondary affect has been measured by Barbara Mellers and her colleagues. A simple experimental design, for example, asks amateur basketball players to take shots from different positions around the court. Before each shot, the player is asked what they think is the likelihood of scoring the basket. Following each shot, the player is asked how good they feel. As you might expect, players are happiest when they make a shot and are unhappy when they miss a shot. However, the degree of satisfaction/dissatisfaction is directly related to the player's expectation. The greatest unhappiness occurs when the player misses a shot that they judge to be "easy" and are happiest when they score a basket that is judged to have a low probability of success. In general, unexpected fortune or misfortune cause the greatest emotional responses. That is, low expectation amplifies the emotional response to the outcome.

This relationship can be expressed through the equation given below. The value psi represents the realized subjective value when experiencing some specified outcome. This subjective value is determined by two summed terms -- the first representing the primary (expectation-related) affect, and the second representing the secondary (outcome-related) affect. The value v(O) designates the prior subjective preference for outcome O and ranges from negative values (negative valence) through positive values (positive valence); p e(O) designates the subjective likelihood of outcome O.

In the primary affect term, the subjective likelihood for outcome O is scaled so that maximum weight is given when occurrence is certain (p()=1.0) or when non-occurrence is certain (p()=0.0). A constant k provides a weighting for the relative importance of primary and secondary affect terms.

Emotions are evoked by a combination of what we expect will happen, what actually happens, and the accuracy of our expectations. More precisely, emotions are evoked by a combination of how we appraise the value of the expected and actual states, and how we appraise our predictive accuracy. In their model of expectation, Olson, Roese and Zanna (1996) make a useful distinction between primary and secondary affect related to expectation. Since the preeminent goal of forming expectations is to provide accurate predictions, a positive primary affect is evoked when the expectation proves accurate and a negative primary affect is evoked when the expectation proves inaccurate. Confirmation of expected outcomes generally induces a positive emotional response (Mandler, 1975). Of course it is possible to expect bad outcomes. Following a snow storm, for example, I might predict that I will slip and fall on the sidewalk. In the event that I actually fall, the outcome will feel unpleasant, but the experience will be mixed with a certain satisfaction at having correctly anticipated the outcome. It is as though brains know not to shoot the messenger: accurate expectations are to be valued (and rewarded) even when the news is bad.

Of course, outcomes are also important, and so a second affective response will result from an appraisal of the ultimate state of things. Outcomes can be appraised from a number of diffferent perspectives. Huron (2002) has distinguished six systems ranging from valenced reflexes to social appraisals. In the case of music, an outcome might evoke positive or negative responses due to differences in sensory dissonance on the one hand, or according to judgments of the social group associated with a particular style. An extensive literature exists regarding emotional responses to particular states (REFS). It is not the purpose of this article to review this literature. We will simply assume that outcome-related emotions exist.

In addition, negative and positive outcomes can be amplified or attenuated depending on the subjective certainty of the outcome. We will note that delay plays an important role in amplifying the emotional valence of highly expected outcomes.


[1] The tonic is the most common pitch only for tonal music that does not contain modulations. Return to text.

[2] Information Theory showed early promise in the analysis of music, but was abandoned by the mid 1960s. Three factors probably contributed its demise in musical circles. In the first instance, scholars tended to rely on measures of "self-information" -- that is, probabilities that were based on the work itself. However, the theory strongly suggested that the correct way to analyse works was by using probabilities that reflect the entire musical experience of a general listener. A proper analysis would involve comparing a musical work to a large sample of other works. At the time, no large-scale musical databases existed that could be used for such analyses. In the second instance, the computers that were available to music scholars in the late 1950s and early 1960s were very slow and had limited memory capacity. Even if large musical databases had existed, it would have proved difficult to carry out the types of analyses suggested by information theorists. In the arts and humanities, information theory was applied notably to the analysis of language text. However, in 1956, Noam Chomsky's landmark book, Syntactic Structures, appeared. Chomsky argued that information theory was incapable of capturing important elements of langauge organization, and offered an alternative analytic approach. The close similarity between Chomsky's tranformational generative grammars and Schenkerian analysis, led to a wholesale shift toward Schenkerian studies. It was not until the 1970s that it became recognized that Chomsky's criticisms of information theory were unfounded. By that point, music theorists regarded information theory as old-fashioned and irrelevant. Over the ensuing decades, information theory has continued to be an active area of research in mathematics, computer science, and communications engineering. With extensions such as m-dependency theory, information theory has grown into a remarkably powerful paradigm for analysing abstract structures, such as those found in music. Return to text.

[3] John Chernoff (1979, p. 94) provides a lovely description of how rhythmic organization pervades the west African culture of Ghana. In a customs office, Chernoff had to wait while a clerk typed copies of invoices. "Using the capitalization shift key with his little fingers to pop in accents between words, [the clerk] beat out fantastic rhythms. Even when he looked at the rough copies to find his next sentence, he continued his rhythms on the shift key. He finished up each form with a splendid flourish on the date and port of entry. ... I realized that I was in a good country to study drumming." Return to text.

[4] A survey of European folksongs indicates that melodies in major keys are roughly twice as common as melodies in minor keys. This suggests that even the choice of initial schema may be sensitive to the frequency of occurrence of various contexts. Return to text.

[5] The term "secondary affect" is used to designate what we have called here the "outcome response". Return to text.

[6] From Robert Frost's The Birds Do Thus. Return to text.

[7] Recall that the imaginative response relates to the activity of contemplating different future states. There is a case to be made that the vast majority of listening is done teleologically. That is, music listening is dominated by a sense of inevitability, where the listener is unable to entertain alternative ways in which the music might unfold. When I became active as a composer, I was amazed that it was possible to listen to well-known compositions with a composer's sense of "choices". That is, one could listen to a composer like Beethoven with a sense that certain choices could have been different: Beethoven might have added another variation of the current theme, or brought back and theme used earlier, or that a section of the development might have been shortened. In other words, the experience of composing made it possible to listen without the sense that the music is inevitably the way it is, and not some other way.

In ordinary music listening, it is very likely that the imaginative response is largely absent or muted. This is obviously a convenient point-of-view, since attempting to analyze the non-teleological or imaginative component to listening would be extremely daunting. Return to text.

[8] I am indebted to Paul von Hippel, Bret Aarden, Simon Durrant, Jonathan Berger, and Joy Ollen for comments made on earlier drafts of this article. Return to text.

© Copyright David Huron, 2001, 2002.
This document is available at

_ frequency frequency associated with closure stimuli become indicative of closure closure stability stimuli associated with closure perceived as more stable pleasure tonality

When occurring in a position of closure, the tonic is stable and evokes a pleasant experience. (So too, but to a lesser degree, do the mediant and dominant pitches.) Whatever else one may say, the tonic is a familiar pitch at the ends of musical passages.

**One way to measure the similarity of fit is **to calculate the coefficient of correlation. **For a large sample of music, Huron (1992) **found that the average correlation between the **Krumhansl and Kessler key-profiles and the **frequency of occurrence of the scale degrees **was +0.88.

**Perhaps the most important observation to **be made about scale degree is that listeners **readily distinguish between major and minor **key contexts. **Krumhansl and Kessler's work implies that listeners **are readily able to switch their expectations depending **upon the modal context. **That is, listeners know to apply different **expectations for music depending on whether the **mode is major or minor.

Scale Degree Distributions and Tonality

When listeners rate the stability of various scale tones, they effectively replicate the frequency of occurrence of these tones in real music. This relationship strongly suggests that listeners experience the most commonly occurring tones as the most stable.

The word "tonality" is used by musicians in at least ten definable senses. One of the most common definitions of tonality is as a system of relating pitches or chords to some focal point or center -- the tonic. In Western music, these relationships are typically identified using scale-degree terms, such as tonic, supertonic, mediant, etc. Each of these scale-degrees evokes a different psychological quality or character according to how it is heard in relation to the prevailing tonal center. As we saw earlier, by an act of will, musicians can imagine a single tone as either the leading-tone, mediant, or tonic, etc. The ability of listeners to imagine tones or chords as serving different tonal functions testifies to the cognitive (rather than perceptual) basis of tonality.

How does the tonic pitch become an internalized reference for listeners? The work of Carol Krumhansl suggests that tonal schemas are learned through exposure to music from a given culture or genre. Moreover, Krumhansl's work suggests that one of the primary factors influencing tonality perception is the simple frequency of occurrence of different tones. The most frequent pitch has a tendency to be heard as the tonic:

"Listeners appear to be very sensitive to the frequency with which the various elements [pitch chromas] and their successive combinations are employed in music. It seems probable, then, that abstract tonal and harmonic relations are learned through internalizing distribution properties characteristic of the style." (Krumhansl, 1990; p.286).