Friday 15 April 2011

Automatic metre detection by autocorrelating IOI

Metre is so ingrained into our musical subconsciousness that it's harder to get it wrong than to get it right. Try to dance fox trot to a Wiener waltz, or lead a marching battalion to war with Bulgarian wedding music. But like all tasks that are completely intuitive to human beings, teaching a computer how to detect time signature is quite tricky.

So how do humans do it? A lot of it comes down to repetition. We tend to place our imaginary bar separators at the beginning of repeating segments. Also, the time between consecutive notes, the inter-onset interval, tends to be more significant to our perception of segmentation than actual pitch.

In 1992, Judith C. Brown came up with the ingenious idea of using autocorrelation of IOI to determine metre. While the approach is relatively straight forward, and has been expanded and improved on since, it is still very effective.

Implementing Brown's technique serves as quite a good case study of using R for musical analysis. Now I'll work through an example.

Let's say we have a time series vector, where the discrete observations correspond to the inter-onset intervals of notes being triggered at that particular time frame. If no note is triggered, the value will be zero.

For example, consider the German folk song "Herr Bruder Zur Rechten" (number E1145 in the ESAC database):



The corresponding IOI time series would be:

ioi.seq <- c(1, 1, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 2, 0, 2, 0, 2)

where the minimum subdivision is quavers.

Next we calculate the full autocorrelation using the acf() function from the stats package.

ioi.acf <- as.numeric(acf(ioi.seq, length(ioi.seq))$acf)

Looking at the ACF plot we notice that peaks in correlation are occurring every 6 quavers, which matches the actual 3/4 time signature.



To automatically find peaks we use Brian Ripley's implementation of the S+ peaks function (I renamed it to s.peaks).

peaks <- s.peaks(ioi.acf, 6)

The second parameter to s.peaks specifies the window length. This has to be inversely proportional to the minimum subdivision. For example, working with crotchets, the window length would be 3, with quavers 6, and with semiquavers 12.

Next, we find the distances between peaks.

peaks.indices <- which(peaks)
distances <- diff(peaks.indices)

Finally, we multiply the mode of peak distances with the subdivision to get the extracted metre.

# there must be a better way of finding mode...
distances.mode <- as.numeric(names(which.max(table(distances))))
metre <- distances.mode * 0.5

As expected:

> metre
[1] 3

Now, this method doesn't always get it spot on. Sometimes it thinks that a melody in 2/4 is in 8/4, or that 3/4 is 1.5/4. As it turns out, finding the exact metre is very difficult. How would you tell a computer that a piece is in 6/8 or 3/4? Or even worse, 2/4 or 4/8? For the purposes of music informatics, it is often sufficient to determine if the time signature is binary (2/4, 4/4, etc.) or ternary (3/4, 6/8, 9/8, etc.). Of course, this means that we will have to discard of all pieces in compound time signatures, such as 5/4 or 7/8. Fortunately for us (but perhaps unfortunately for music in general), western composers traditionally used to stick to simple metres.

In my previous post I talked about the Essen data set. One of the great things about it is that it has metre and key information encoded for all included melodies. Using this, we can test how good our metre detection algorithm actually is.

The results are promising. Out of 5120 melodies, we manage to get the correct result in 4148 of the cases, yielding an accuracy of roughly 81%.

The extract.metre() function resides on Github in the rmidi_tools repo.

Wednesday 13 April 2011

The Essen folk music dataset as a MySQL database

The ESAC project is amazing. They have an archive of almost 6000, mostly German, folk melodies which they have encoded in a nice, consistent, computer-friendly format.

The melodies can be downloaded in MIDI format here, but I couldn't find the data in SQL format anywhere. So I wrote a little script that kneaded the dataset into a MySQL database.

I put the database dump, as well as the script (so you can see where bugs in the data come from), on Github.

Tuesday 12 April 2011

Quantising monophonic MIDI melodies

Quantisation is a useful technique in its own right, but I personally
tend to use it more often as a pre-processing tool. By applying quantisation
we can enforce a minimum granularity, which helps keep a large dataset
of melodies (such as the Essen dataset) consistent.

Quantisation should really be simple as pie - just round the note start
times to a minimum subdivision. However, monophony complicates
things slightly. If two notes get rounded to the same start time,
which one do you chose? Where does the other one go? In the
implementation below I keep the note that has an original start time
closest to the quantised start time. Any colliding notes get
discarded.

Another question is whether the notes' durations should be
quantised or if they should keep their original durations. If the
original duration is maintained, one is quite likely to end up with
overlapping notes, but if durations are quantised, one could lose
rests. Since requirements might vary, I leave the choice to the user.

Finally, since many melodies begin with an upbeat, we need a way to
offset quantisation. This is particularly useful when dealing with
quite aggressive quantisation (e.g. quantisation to nearest crotchet).

Enough talk, here's the code:

# notes is the input matrix,
# subdiv the minimum subdivision (in crotchets),
# offset the quantisation offset (also in crotchets).
# preserve.duration determines whether to tie notes or
# preserve original durations.
quantise.notes <- function(notes, subdiv, offset = 0,
preserve.duration = TRUE)
{
if(subdiv == 0)
return(notes)

# first, do "naive" quantising, by just rounding start times
subdiv.ticks <- subdiv * midi.get.ppq()
offset.ticks <- offset * midi.get.ppq()
notes.quantised <- notes
notes.quantised[, "start"] <-
round((notes.quantised[, "start"] - offset.ticks) /
subdiv.ticks) * subdiv.ticks + offset.ticks

# find any collisions
tbl <- sort(table(notes.quantised[, "start"]))
collisions <- as.numeric(names(tbl[tbl > 1]))

# if there are collisions, remove all notes that are not
# the note closest to the quantised value
if(length(collisions) > 0) {
remove.indices <- integer(0)
for(collision in collisions) {
colliding.indices <- which(notes.quantised[, "start"] ==
collision)
distances <- abs(notes[, "start"] - collision)

# if two colliding notes are equally close,
# use the lower one
closest.index <- which(distances == min(distances))[1]

remove.indices <- c(remove.indices,
colliding.indices[colliding.indices !=
closest.index])
}

# remove colliding notes
notes.quantised <- notes.quantised[-remove.indices, ]
}

# if we don't preserve the original durations, we tie the notes
if(!preserve.duration) {

# order notes by start time
notes.quantised <-
notes.quantised[order(notes.quantised[, "start"]), ]

# tie all notes except last note
for(i in 1:(nrow(notes.quantised) - 1)) {
notes.quantised[i, "duration"] <-
notes.quantised[i + 1, "start"] -
notes.quantised[i, "start"]
}

# set the duration of the last note such that the duration of
# the quantised melody is the same as the duration of the
# original melody
notes.quantised[nrow(notes.quantised), "duration"] <-
notes[nrow(notes), "duration"] +
notes[nrow(notes), "start"] -
notes.quantised[nrow(notes.quantised), "start"]
}

return(notes.quantised)
}

The input to this function is a matrix with the column headers
"start", "duration", "pitch" and "velocity". In the file hosted on
Github
I provide functions from converting to this format from RMidi
matrices
, and back again.

Saturday 9 April 2011

Automatic key detection in R

When you want to analyse a large corpus of melodies it's often vital that the pieces are all in the same key. For example, building a Markov model from melodies where some are in C and some in G# doesn't really make sense. So we need some way to automatically determine the key of a piece in order to transpose it to a pre-defined, "normal" scale.

Usually this is achieved by building a chroma from the melody in question. A chroma in this context refers to frequencies of pitch occurrences, usually in the same octave, i.e. modulo 12. Once the chroma has been determined, you compare that chroma to a major scale, such as C major, and for all 12 transpositions (including the zero transposition) of the melody, you find the one that is closest. That is then defined to be your "delta" by which you need to transpose the melody.

One problem with this approach is that not all melodies have the same number of notes. In fact, a melody in C might just have one or two B notes, and one melody in F might just have one B flat. Whatever distance metric you use, these two melodies will be regarded as very similar. As a remedy to this, we can take the fourth root (which has a similar shape to the log function) of the relative frequencies in the chroma. Doing this renders the frequencies of occurring notes closer to 1 (and further away from non-occurring notes).

This is the R function that extracts the chroma from a melody, where the notes are defined as standard MIDI pitch values:

extract.chroma.melody <- function(melody)
{
pitches <- melody %% 12
chroma <- numeric(12)
for(i in 0:11) {
chroma[i + 1] <- sum(pitches == i)
}
chroma <- (chroma / max(chroma)) ^ (1/4)
return(chroma)
}

This function returns a vector of the 12 semitones, starting from E flat, and their relative frequencies. Now we just need a way to compare that vector to the C major scale vector (again, starting from E flat) [0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1]. Since the vector components are not dependant on each other we use the Manhattan distance function. The following function returns the best transposition from one chroma to another:

find.best.transpose.dist <- function(chroma.fixed, chroma.moving)
{
best.dist <- +Inf
best.transpose <- NA
chroma.fixed <- as.numeric(chroma.fixed)
chroma.moving <- as.numeric(chroma.moving)
for(i in 0:11) {
# optimised manhattan dist()
new.dist <- sum(abs(chroma.fixed - shift(chroma.moving, -i)))
if(new.dist < best.dist) {
best.dist <- new.dist
best.transpose <- i
}
}

return(list(dist = best.dist, transpose = best.transpose))
}

(The shift function simply performs a circular shift of the input vector. That function is included in the Github repo.)

To get the closest major scale, i.e. the key of the melody, we just call the function with the major scale as the first argument and the melody under scrutiny as the second argument:

major <- c(0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1)
key <- find.best.transpose.dist(major, chroma)$transpose

This code is on Github, in the chroma.R file. Grab it and have a play if you want. There are some other functions in there for doing useful things with the extracted Manhattan distances, such as computing a distance matrix that can be used in clustering, for example. A bunch of functions for reading MIDI files into R are included in the RMidi package, also on Github.

Thursday 7 April 2011

Updates on RMidi

I've been doing a bit of work on RMidi recently. I've added a function for reading in MIDI files into R, midi.read.file(), and the ability to work with MIDI control events, implemented in midi.controller(). As an example use case, I've included a little piece of open source music in the /examples directory.

While having my hands on the project, I decided to move from Google Code to Github. Github has been getting a lot of attention recently so I figured I better follow the other sheep. I really like it though, and Google Code does look a bit pale in comparison.

So, what's the expression, fork me on Github!