One of the most common statistical tests is the chi-square test.
This test takes its name from the Greek letter *X*
(pronounced "Kye"), which is used to compare proportions
or ratios.
It facilitates the determination of whether the proportion of
occurrences of some feature in one data sample is significantly
greater than or less than the proportion of the same feature
in a different data sample.

By way of example, let us test the hypothesis that Brahms uses hemiolas more than his contemporaries do. To set up the test we need first to count the number of hemiolas in (a) a sample of music derived from Brahms's music and (b) a sample of music by other composers who were active at the same time. Let us suppose that the raw data looks like this:

COMPOSER(S) MEASURES SEARCHED HEMIOLAS FOUND Various 5000 23 Brahms 500 9

Proportionally, hemiolas occurred in 1.8% of the data for Brahms (9 out of 500) but just 0.7% in the music by other composers (23 out of 5000). Is this a big difference? A chi-square test will tell us.

This test entails two parts.
First we calculate the value of chi-square.
This is done by substracting the number of expected occurrences
from the number of observed occurrences, squaring the result,
and dividing it by the number of expected occurrences.
Where *O* is the observed number of occurrences and *E*
is the expected number, the formula is:

If we would normally expect 23 hemiolas in 5000 measures of music
of the period, then we would expect about 2.3 hemiolas in the
500 measures of Brahms's music, i.e., that *E* = 2.3.
Since the actual number encountered was 9, the numeric substitutions
would be as follows:

Our value of chi-square is 19.5565.

The value of *p*, which tells us what the probability
of the number of occurrences is, can be determined by
referring to a standard table of critical values of *X*².
For one *degree of freedom (df)*, we find the following values:

df.500 .200 .100 .050 .025 .020 .010 .001 1 .455 1.642 2.706 3.841 5.024 5.412 6.635 10.827

The *X*² value for a *p* of 0.001 is 10.827.
Our *X*² calculation gives a greater value,
so the *p* value must be less than 0.001.
This means that there is less than one chance in 1000
that one would expect to see such a discrepancy between
Brahms and his contemporaries.

What value of *p* is considered good?
The short answer is that it depends on how certain you want to be.

In typical research, a relationship is considered
*significant* when the value of *p* is less than 0.05.
For more stringent research, a value of less than 0.01 is
considered necessary.
The value 0.05 is sometimes referred to as the
*beta confidence level*,
whereas the value 0.01 is referred to as the
*alpha confidence level*.

If the value of
*p* achieves the alpha confidence level,
this does not mean that the hypothesis is true.
It simply means that it is probably true.
Similarly, if the value of *p* fails to
achieve even the beta confidence level,
this does not mean that the hypothesis is false.
It simply means that the evidence in its favor is not strong.

The value of *X*² is sensitive to the number of
observations.
The greater the number of observations, the greater the
likelihood that the result will be considered significant.
If a coin were flipped 6 times and came up heads 4 of them,
this incidence would be considered to lie within the realm
of chance.
But if the coinc came up heads on 400 of 600 tosses
(preserving the same proportion), the value of *p*
would be less than 0.001.

There are many other aspects to statistical inference that we have not discussed here. The purpose of this very brief introduction is to provide an example of how statistics may be used to provide additional evidence pertaining to musicological questions. For any given hypothesis, the scholar must also seek converging evidence from a wide variety of sources and points of view.

Recall we discovered seven funeral marches listed in Volume 17 of the New Grove Dictionary of Music and Musicians. As it turns out 5 of those 7 marches are in the key of F minor. Test the hypothesis that funeral marches tend to be composed in the key of F minor.

In order to calculate chi-square, we need to determine the probability of any given musical work being written in F minor.

The following table shows a distribution of keys from a convenience sample of 3,121 works by Bach, Bartok, Beethoven, Brahms, Carulli, Corelli, Dowland, Haydn, Mozart and Vivaldi.

TonicMajorMinorC 358 6 C# 2 2 D 194 96 D# 0 2 Eb 125 0 E 24 35 F 757 10 F# 2 4 G 978 119 G# 0 2 Ab 11 0 A 126 61 Bb 198 2 B 2 5 Totals:2,777 344

Notice that works in F minor account for just 0.32 percent of all works in the sample (10/3121). As a proportion of minor-key works, works in F minor account for 2.9 percent of all works (10/344).

In a sample of 7 minor-key works, we would expect 0.2035
(2.9% of 7) works to be in the key of F minor.

Our chi-square value is 113.05.
For one degree of freedom, this corresponds to a *p* value that is
much much less than 0.001.

We can conclude that the null hypothesis is discarded at better than the 0.001 confidence level. The data are therefore consistent with the hypothesis that funeral marches tend to be written in the key of F minor.