Scarlet & Grey
Ohio State University
School of Music

The Chi-Square Test

Brahms's Hemiolas

One of the most common statistical tests is the chi-square test. This test takes its name from the Greek letter X (pronounced "Kye"), which is used to compare proportions or ratios. It facilitates the determination of whether the proportion of occurrences of some feature in one data sample is significantly greater than or less than the proportion of the same feature in a different data sample.

By way of example, let us test the hypothesis that Brahms uses hemiolas more than his contemporaries do. To set up the test we need first to count the number of hemiolas in (a) a sample of music derived from Brahms's music and (b) a sample of music by other composers who were active at the same time. Let us suppose that the raw data looks like this:


Proportionally, hemiolas occurred in 1.8% of the data for Brahms (9 out of 500) but just 0.7% in the music by other composers (23 out of 5000). Is this a big difference? A chi-square test will tell us.

Calculating Chi-Square

This test entails two parts. First we calculate the value of chi-square. This is done by substracting the number of expected occurrences from the number of observed occurrences, squaring the result, and dividing it by the number of expected occurrences. Where O is the observed number of occurrences and E is the expected number, the formula is:

X² = (O - E)² / E

If we would normally expect 23 hemiolas in 5000 measures of music of the period, then we would expect about 2.3 hemiolas in the 500 measures of Brahms's music, i.e., that E = 2.3. Since the actual number encountered was 9, the numeric substitutions would be as follows:

X² = (O - E)² / E = (9-23)²/2.3 = (6.7)²/2.3 = 44.89/2.3 = 19.5565

Our value of chi-square is 19.5565.

Calculating p

The value of p, which tells us what the probability of the number of occurrences is, can be determined by referring to a standard table of critical values of X². For one degree of freedom (df), we find the following values:


The X² value for a p of 0.001 is 10.827. Our X² calculation gives a greater value, so the p value must be less than 0.001. This means that there is less than one chance in 1000 that one would expect to see such a discrepancy between Brahms and his contemporaries.

Other Considerations

What value of p is considered good? The short answer is that it depends on how certain you want to be.

In typical research, a relationship is considered significant when the value of p is less than 0.05. For more stringent research, a value of less than 0.01 is considered necessary. The value 0.05 is sometimes referred to as the beta confidence level, whereas the value 0.01 is referred to as the alpha confidence level.

If the value of p achieves the alpha confidence level, this does not mean that the hypothesis is true. It simply means that it is probably true. Similarly, if the value of p fails to achieve even the beta confidence level, this does not mean that the hypothesis is false. It simply means that the evidence in its favor is not strong.

The value of X² is sensitive to the number of observations. The greater the number of observations, the greater the likelihood that the result will be considered significant. If a coin were flipped 6 times and came up heads 4 of them, this incidence would be considered to lie within the realm of chance. But if the coinc came up heads on 400 of 600 tosses (preserving the same proportion), the value of p would be less than 0.001.

There are many other aspects to statistical inference that we have not discussed here. The purpose of this very brief introduction is to provide an example of how statistics may be used to provide additional evidence pertaining to musicological questions. For any given hypothesis, the scholar must also seek converging evidence from a wide variety of sources and points of view.

Funeral Marches in the Key of F minor

Recall we discovered seven funeral marches listed in Volume 17 of the New Grove Dictionary of Music and Musicians. As it turns out 5 of those 7 marches are in the key of F minor. Test the hypothesis that funeral marches tend to be composed in the key of F minor.

In order to calculate chi-square, we need to determine the probability of any given musical work being written in F minor.

The following table shows a distribution of keys from a convenience sample of 3,121 works by Bach, Bartok, Beethoven, Brahms, Carulli, Corelli, Dowland, Haydn, Mozart and Vivaldi.


Notice that works in F minor account for just 0.32 percent of all works in the sample (10/3121). As a proportion of minor-key works, works in F minor account for 2.9 percent of all works (10/344).

In a sample of 7 minor-key works, we would expect 0.2035 (2.9% of 7) works to be in the key of F minor.

X² = (O - E)² / E = (5-0.2035)²/0.2035 = (4.7965)²/0.2035 = 23.01/0.2035 = 113.05

Our chi-square value is 113.05. For one degree of freedom, this corresponds to a p value that is much much less than 0.001.

We can conclude that the null hypothesis is discarded at better than the 0.001 confidence level. The data are therefore consistent with the hypothesis that funeral marches tend to be written in the key of F minor.