Tonal Bootstrapping: Re-Thinking the Intervallic Rivalry Model

David Butler
In: Suk Won Yi (Ed.), Music, Mind, and Science. Seoul: Seoul National University Press, 1998, pp. 7-12.


The `bootstrapping' to which the title of this paper refers was used metaphorically in 1984 by Diana Deutsch to describe simultaneous generation of more than one mental map, in which each perceptual schema somehow helps to co-generate the other. While Deutsch's attention was focused on `bootstraping' of tonal and event hierarchies, the term has been used more broadly to refer to other possible instances of co-generating mental representations; one example is seen in the proposition that metrical cues help to guide judgments of tonality, while tonal and harmonic cues simultaneously serve as metrical cues.

This paper will make some connections among some of the theoretical and experimental literature contributed in the past decade; these connections seem to indicate that aural key recognition may be an elaborate example of this sort of co-generation. The most reasonable reaction to this literature may be to admit that although it is possible to arrange tonal pitch information within a fairly coherent hierarchy in the abstract, the perceptual activity of getting one's tonal bearings within that hierarchy may involve sampling idiosyncratically from several different levels within it. But it would be helpful first to review the literature leading up to these hunches.

Research History

Ten years ago things were relatively simple, if a bit more confrontational. On the one hand, we had the tonal hierarchy; on the other, the rare intervals hypothesis. The tonal hierarchy was based variously on distributional weightings of tones found in large numbers of tonal scores, and on listeners' ratings of relative stabilities of tones appended to various rudimentary contextual patterns. Both the distributional weightings and the ratings produced sets of graphs that resembled Figure 1. The tonic is most commonly encountered and considered most stable; the dominant next-most-common and next-most-stable, and so on.

The rare intervals hypothesis (Browne, 1981) focussed on the property of exclusivity, rather than stability. As can be seen in this interval-class index (`interval vector') of the major diatonic set, for example, interval classes are populated by different numbers of entries.

< 2, 5, 4, 3, 6, 1 >

The major diatonic set produces a particularly elegant example of these differing levels of rarity/ubiquity because of the unique multiplicity of entries in each category "but the important perceptual point that Richmond Browne raised is the reciprocal nature of intervallic rarity: the rarer the interval within the diatonic set, the fewer diatonic sets it can fit into and thus be taken to represent" and therefore the less ambiguous its indication of a reference set.

Neither the tonal hierarchy nor the rare intervals hypothesis makes any claims about the relevance of temporality to the skill of key discovery, but it was just ten years ago that Helen Brown (1988) demonstrated that perception of key is not the same as perception of diatonicity when she showed persuasively that by juggling just one of many possible time variables "ordinality" one could produce dramatic (and often dramatically uniform) changes in listeners' choices of tonic for an unchanged pc-set. West and Fryer (1990) reported experimental results that seemed congruent with Brown's data, because randomizing the time orders of members of a major diatonic set did tend to flatten out the probe-tone profiles. But it is important to recall that random ordering of the scale members did not turn the probe-tone profiles into a horizontal line: the tonic was still identified as tonic more often than were other set members. In other words, it seems that a vestige of "tonicity" inheres in some members of the major diatonic set (typically the tonic and dominant), apparently resistant to the perceptual effects of time ordering variations.

Carol Krumhansl (e.g., 1987) might remind us at this point that the tonal hierarchy is not a perceptual model. It is data-generated either from stability ratings of various pre-musical patterns or from summed durations of occurrences of tones in scores. The point to be made here, then, is that by ten years ago it had become clear that there were at least two sorts of pitch relations that might inform skilled listeners who were `bootstrapping' into the tonality of a tonal piece: one a more-abstract, relatively time-independent sense of diatonicity (here the reference is to both the tonal hierarchy and the atemporal component of the rare intervals hypothesis), the other a more-concrete system of intervals temporally unfolding taking on tonal meaning within a context they generated for one another in the enculturated listener's mind. When their time orderings were manipulated to imply unambiguous harmonic information, these interval strings could quickly impart a sense of tonic. The distinction between diatonicity and tonality was made even sharper in a report, published by David Huron and Richard Parncutt in 1993, that performance of the Krumhansl-Schmuckler key-finding algorithm (see Krumhansl, 1990) could be improved by incorporating pitch salience information and a short-term memory window into a perceptual model based on the key profiles. Huron and Parncutt found, however, that their model still did not account for Helen Brown's finding that re-ordering pitches in a series could produce differing identifications of key. They concluded that their data supported Brown's distinction between "structural" and "functional" aspects of pitch relations -- "structural" cues meaning set-based and time-independent pitch relations, and "functional" cues arising from temporal unfoldings of pitch relations.

During the past few years, René van Egmond and I (e.g., Van Egmond & Butler, 1997) have extended the rare-intervals hypothesis by expanding the diatonic reference set so that it now includes both major and minor -- specifically, using a 3-set reference base of major (pure minor), harmonic minor, and ascending melodic minor. The interval-class indices of these three sets are shown in Figure 2.

Major< 2, 5, 4, 3, 6, 1 >
Harmonic minor< 3, 3, 5, 4, 4, 2 >
Melodic minor (ascending)< 2, 5, 4, 4, 4, 2 >
Figure 2: Interval-class indices for major, harmonic, and ascending melodic minor.

Van Egmond produced a computer-generated compilation of all 2-, 3-, 4-, 5-, and 6-note subsets of these three reference collections so that we could determine which were most exclusive in terms of key and mode -- following Richmond Browne's initial line of reasoning that the most exclusive subsets ought to carry the least ambiguous key and mode references to the listener.

The short-hand representation of these subsets is the Transpositional type, abbreviated Tn-type. The Tn-type is the subset of prime forms, or "Tn/TnI-types," that is sensitive to inversion. Van Egmond then invented a means of representing the tonic and mode references for these Tn-types -- the "Key Class Index", or KC-index. An example of the KC-index, for all Tn-types of cardinality 3, is given in Table 1, below.

Table 1: Tn-types of cardinality 3 with their corresponding key-class indices.

To get a sense of what the KC-Index shows, consider Tn-type ___. The KC-index shows 3 tonic references within key-class 1, which means that ___ fits all three diatonic sets with a tonic of pc-1; another way of saying it is that this trichord can be interpreted as "Ti-Do-Fa" in the major harmonic, and ascending melodic minor sets with a tonic of C# or Db. The Tn-type ___ also maps into the harmonic-minor. The inverse Tn-type ___ carries different pc-set connotations. It can map into 3 different diatonic contexts: the major set within kc-1 ("Ti-Mi-Fa"), the ascending melodic minor set within kc-3 ("La-Re-Me"), and -- as with the "Re-Me-Fa" in the harmonic minor set within kc10. If inclusiveness/exclusiveness relations of subsets to their reference collections were the only factor to consider in the listening activity of key discovery, we would have a fairly complete map of the tonal terrain at this point. But there are, of course, other factors.

It turns out that the ___ trichord also serves as a good example of one of the limitations of the intervallic rivalry hypothesis. We can know rationally that the ___ has only 4 "legal" superset references and only 2 tonics (refer back to Table 1 if necessary), but that does not prevent sizeable segments of the musical listener pool from giving "illegal" tonic identification responses, and being confident about their choices. We have found that when tones are heard in the order 6 -> 0 -> 1, this trichord elicits a strong agreement that 1 is tonic (see Fig. 3).

6 -> 0 -> 1 = 1
0 -> 1 -> 6 = 6
Figure 3: 2 orderings of the ___ Tn-type.

But when the tones are ordered 0 -> 1 -> 6, we get a strong minority vote for 6 as tonic -- that is, a tonal interpretation of Fi-Sol-Do. We have heard more than one test participant say "I know you said not to give chromatic interpretations, but this just sounds like Fi-Sol-Do!" There are two possible explanations of this reaction that seem sensible. The first one is somewhat abstract: we may be dealing with a "stability" or "finality" effect, or even a "root" effect that has been discussed with various names by psychologists and musicians -- for example, the Lipps & Meyer (e.g., Meyer, 1900) "finality effect," and Hindemith's (e.g., 1941) "roots" of intervals. In other words, there may be a second sort of "intervallic rivalry" in which each interval "heard out of context" is made up of a more-stable and a less-stable tone. In such an interpretation, stronger finality effects overpower weaker ones; in the 0-1-6 example given a moment ago, the "So-Do" interpretation of the ascending fourth is presumed to overshadow the preceding "Ti-Do" interpretation of the minor second. This notion, whether it is thought of as stability or finality, probably relates back to the "bootstrapping" example of co-generating harmonic and metrical cues given early in this paper, because of the likelihood that final or stable tones arrive on an inferred strong beat.

The second interpretation is more concrete: an interval (or a small collection of intervals) may remind us, consciously or not, of a "lick" -- that is, they may remind us of a pattern overlearned from many melodies we have heard and performed. This sort of response is likely not only culture-specific, but may even be tune-specific. For example, an encounter with "March Militaire" in the previous hour, day, or week may color the listener's tonal interpretation of the 0-1-6 trichord. Varying registral relationships may show this, also. This is a simple observation not far removed from serving as a hypothesis: The pitch-class sequence 6 -> 0 -> 1, when played with the 6 -> 0 tritone descending, will typically elicit a stronger tonic response for pc1 than when pcs 6 -> 0 -> 1, with an ascending tritone, are played. In the latter series, tonic responses will begin to swing toward pc6 -- and this shift in tonic will likely be particularly vivid if one has recently heard the tune "Maria" from Bernstein's musical "West Side Story."

Neither the tonal hierarchy nor the rare intervals hypothesis gives us a satisfactory explanation for these sorts of tonic judgments. But it is worth recalling that Browne pointed out, in his initial "rare intervals" paper, that the musical listeners is forced to work with partial tonal evidence most of the time. Thus it seems reasonable for us to expect that the listener will have become adept at picking up evidence whenever she or he can find it -- and I suggest that this may mean evidence on several levels of tonal abstraction.

The Present

This brings us to the present. Where do we now stand? How do we construct a compact yet musically valid perceptual hypothesis for key-finding (much less begin construction work on a key-finding algorithm) when the process of key discovery may result from a mix of evidence picked up from the listener's sensitivity to time-independent diatonicity cues more or less depicted by the tonal hierarchy; or to key exclusivity cues conveyed by "rare" intervals that, in certain characteristic time orders, may carry important harmonic implications; or by interval-specific "stability" cues that listeners may associate with commonly encountered melodic successions, or even with specific tunes? And even if someone managed to derive such a complex algorithm, how much would various interval weightings have to be shifted to accommodate idiosyncratic mixtures of these or other factors that play into the listening habits of individuals? do we opt for a clean, tight model that ignores most of this rich complexity, or do we opt instead for a messier, looser description that may encompass more musical reality? That is a decision that could help set the agenda for the next ten years.


Brown, H. 1988.
The interplay of set content and temporal context in a functional theory of tonality perception.
Music Perception, Vol. 5, pp. 219-250.

Brown, R. 1981.
Tonal implications of the diatonic set.
in theory only, Vol. 5, Nos. 6-7, pp. 3-21.

Deutsch, D. 1984.
Two issues concerning tonal hierarchies: Comment on Castellano, Bharucha, and Krumhansl.
Journal of Experimental Psychology: General, Vol. 113, No. 3, pp. 413-416.

Hindemith, P. 1941.
The Craft of Musical Composition. New York: Associated Music Publishers.

Huron, D., & Parncutt, R. 1993.
An improved model of tonality perception incorporating pitch salience and echoic memory. Abstract.
Psychomusicology, Vol. 12, pp. 154-171.

Krumhansl, C. 1987.
Tonal and harmonic hierarchies. In J. Sundberg (Ed.), Harmony and Tonality. Stockholm: Royal Swedish Academy of Music.

Krumhansl, C. 1990.
Cognitive Foundations of Musical Pitch. New York: Oxford University Press.

Meyer, M. 1990.
Elements of a psychological theory of melody.
Psychological Review, Vol. 7, pp. 241-273.

Van Egmond, R., & Butler, D. 1997.
Diatonic connotations of pitch-class sets.
Music Perception, Vol. 15, pp. 1-29. Abstract.

West, R., & Fryer, R. 1990.
Ratings of suitability of probe tones as tones after random orderings of notes in the diatonic scale.
Music Perception, Vol. 7, pp. 253-258.

Return to David Butler's Home Page
Return to Publication List