Saturday, 9 April 2011

Automatic key detection in R

When you want to analyse a large corpus of melodies it's often vital that the pieces are all in the same key. For example, building a Markov model from melodies where some are in C and some in G# doesn't really make sense. So we need some way to automatically determine the key of a piece in order to transpose it to a pre-defined, "normal" scale.

Usually this is achieved by building a chroma from the melody in question. A chroma in this context refers to frequencies of pitch occurrences, usually in the same octave, i.e. modulo 12. Once the chroma has been determined, you compare that chroma to a major scale, such as C major, and for all 12 transpositions (including the zero transposition) of the melody, you find the one that is closest. That is then defined to be your "delta" by which you need to transpose the melody.

One problem with this approach is that not all melodies have the same number of notes. In fact, a melody in C might just have one or two B notes, and one melody in F might just have one B flat. Whatever distance metric you use, these two melodies will be regarded as very similar. As a remedy to this, we can take the fourth root (which has a similar shape to the log function) of the relative frequencies in the chroma. Doing this renders the frequencies of occurring notes closer to 1 (and further away from non-occurring notes).

This is the R function that extracts the chroma from a melody, where the notes are defined as standard MIDI pitch values:

extract.chroma.melody <- function(melody)
{
pitches <- melody %% 12
chroma <- numeric(12)
for(i in 0:11) {
chroma[i + 1] <- sum(pitches == i)
}
chroma <- (chroma / max(chroma)) ^ (1/4)
return(chroma)
}

This function returns a vector of the 12 semitones, starting from E flat, and their relative frequencies. Now we just need a way to compare that vector to the C major scale vector (again, starting from E flat) [0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1]. Since the vector components are not dependant on each other we use the Manhattan distance function. The following function returns the best transposition from one chroma to another:

find.best.transpose.dist <- function(chroma.fixed, chroma.moving)
{
best.dist <- +Inf
best.transpose <- NA
chroma.fixed <- as.numeric(chroma.fixed)
chroma.moving <- as.numeric(chroma.moving)
for(i in 0:11) {
# optimised manhattan dist()
new.dist <- sum(abs(chroma.fixed - shift(chroma.moving, -i)))
if(new.dist < best.dist) {
best.dist <- new.dist
best.transpose <- i
}
}

return(list(dist = best.dist, transpose = best.transpose))
}

(The shift function simply performs a circular shift of the input vector. That function is included in the Github repo.)

To get the closest major scale, i.e. the key of the melody, we just call the function with the major scale as the first argument and the melody under scrutiny as the second argument:

major <- c(0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1)
key <- find.best.transpose.dist(major, chroma)$transpose

This code is on Github, in the chroma.R file. Grab it and have a play if you want. There are some other functions in there for doing useful things with the extracted Manhattan distances, such as computing a distance matrix that can be used in clustering, for example. A bunch of functions for reading MIDI files into R are included in the RMidi package, also on Github.

No comments:

Post a Comment