The iRb Corpus in **jazz format
Yuri Broze and Daniel Shanahan
Updated 26 Dec 2012
This is the online home of the iRb corpus, made freely available for music research. The corpus contains 1,186 individual files, each representing one page from the "Jazz 1200 Standards" collection from the iReal b forum.
A single .zip file of the 1,186 songs is available for download.
In order to accomodate the use of typical tools of the Humdrum Toolkit, the corpus was translated into **jazz, a newly-specified Humdrum representation. The specification adheres broadly to the **kern representation, with certain modifications to better suit jazz harmonic records. Each individual .jazz file is a UTF-8 text file with reference records, indicators of form, key interpretations, and data tokens.
To give a taste, here is the shortest **jazz file in the corpus:
!!!OTL: Sweeping Up !!!COM: Swallow, Steve !!!ODT: 1975 **jazz *>[A] *>A *M3/4 *G: 2.D7 = 2.G:maj7 = 2.F#:min7 = 2.B:min = 2.E:min7 = 2.B:min = 2.A:min7 == *-
Reference RecordsThe **jazz files in this corpus begin with metadata. Metadata associated with the .jazz files are presented in the form of kern reference records, described in the User's Guide. The iRb files include at minimum the following three records:
- !!!OTL: Title
- !!!COM: Composer
- !!!ODT: Year (date) of composition
If a lyricist is credited, the reference record used is:
- !!!LYR: Lyricist
Sometimes, there are multiple composer or lyricist credits. In this case, they are given using !!!COM1: and !!!COM2:, or !!!LYR1: and !!!LYR2: as appropriate. For example:
!!!OTL: Party's Over, The !!!COM: Styne, Jule !!!LYR1: Comden, Betty !!!LYR2: Green, Adolph !!!ODT: 1956
This means that when one wants to get a complete list of the composers represented in the iRb corpus, one should search for "!!!COM" instead of "!!!COM:", since the former will also match co-composer credits.
> grep '^!!!COM' *.jazz | sed 's/^.*: //' | sort | uniq | wc -l 557 > grep '^!!!COM:' *.jazz | sed 's/^.*: //' | sort | uniq | wc -l 366
This means that 557 unique individuals have composition credits for the songs in the corpus, but only 366 unique individuals have sole composer credits. See the Humdrum User Guide for more information about using UNIX tools in music research.
Following the initial reference records, several Humdrum interpretations appear. These specify the representation used, the formal structure, the meter, and the apparent key (as judged by the authors). For example, here are the reference records and first interpretations for a **jazz file:
!!!OTL: Since I Fell For You !!!COM: Johnson, Buddy !!!ODT: 1945 **jazz *>[A,N1,A,N2,B,A2] *>A *M4/4 *E-:
In order, these interpretations are:
- **jazz -- Specifies the **jazz representation.
- *>[A,N1,A,N2,B,A2] -- Specifies the formal structure.
- *>A -- Declares that the following belongs to the A section.
- *M4/4 -- Specifies the piece is in 4/4 time.
- *E-: -- We interpreted the piece to be in E flat major.
Note that formal structure is compatible with the Humdrum thru command (or Craig Sapp's thrux). Note that these structure guides represent a compromise between section labels and the machine-performance specifications in the iRealb originals. Therefore, these should be taken with caution.
**jazz data records are similar to **kern.
Barline tokens. Single barlines are represented as "=" and double barlines as "==". Bar numbers and barlines signifying repeats are not implemented.
Chord tokens. Chords are of the form [duration][root][extensions]. Durations are in Humdrum reciprocal form, and chord roots are represented like **kern pitches, using a single capital letter. Sharps are designated using "#", and flats by "-", in accordance with **kern usage. Chord qualities and extensions are given as they appear in written form. An optional ":" can be used to set qualities and extensions apart from the root in **jazz, to enhance comprehension.
Here is a brief example:
**jazz *>[A] *>A 1C6 = 1C6 = 1D7#11 = 1D7#11 = 1D:min7 = 1G7 = 2C6 2G7b9 = 1C6 == *-
In addition, slash chords are designated with a slash, while suggested substitutions are provided in parentheses. Note that the **jazz representation diverges from the **kern standard in that its ordering of elements is strict.
Included is a bash script that performs preliminary parsing of the **jazz files, extracting useful information into several spines. Sample output is as follows:
**jazz **kern **exten **solfa **mint **quals **dur *thru *thru *thru *thru *thru *thru *thru *M4/4 *M4/4 *M4/4 *M4/4 *M4/4 *M4 *M4/4 *D-: *D-: *D-: *D-: *D-: *D-: *D-: 2E-:min7 E- min7 re [E-] min7 2.0000 2B-7b13 B- 7b13 la P5 dom 2.0000 = = = = = = = 2E-:min7 E- min7 re P4 min7 2.0000 2A-7 A- 7 so P4 dom 2.0000 = = = = = = = 2D-:maj7 D- maj7 do P4 maj 2.0000 2G-7 G- 7 fa P4 dom 2.0000 = = = = = = = 2F:min7 F min7 mi M7 min7 2.0000 2Eo7 E o7 ri M7 dim 2.0000 = = = = = = = 2E-:min7 E- min7 re d1 min7 2.0000 2E-:min7/D- E- min7 re P1 min7 2.0000 = = = = = = = 2Ch7 C h7 ti M6 half 2.0000 2F7b9 F 7b9 mi P4 dom 2.0000 = = = = = = = 2B-:min7 B- min7 la P4 min7 2.0000 4E-:min7 E- min7 re P4 min7 1.0000 4A-7 A- 7 so P4 dom 1.0000 = = = = = = = 4D-6 D- 6 do P4 maj 1.0000 4G-7 G- 7 fa P4 dom 1.0000 4Fh F h mi M7 half 1.0000 4B-7 B- 7 la P4 dom 1.0000 == == == == == == == *- *- *- *- *- *- *-
Output from jazzparser can be used in various ways using the extract command.