Friday 15 April 2011

Automatic metre detection by autocorrelating IOI

Metre is so ingrained into our musical subconsciousness that it's harder to get it wrong than to get it right. Try to dance fox trot to a Wiener waltz, or lead a marching battalion to war with Bulgarian wedding music. But like all tasks that are completely intuitive to human beings, teaching a computer how to detect time signature is quite tricky.

So how do humans do it? A lot of it comes down to repetition. We tend to place our imaginary bar separators at the beginning of repeating segments. Also, the time between consecutive notes, the inter-onset interval, tends to be more significant to our perception of segmentation than actual pitch.

In 1992, Judith C. Brown came up with the ingenious idea of using autocorrelation of IOI to determine metre. While the approach is relatively straight forward, and has been expanded and improved on since, it is still very effective.

Implementing Brown's technique serves as quite a good case study of using R for musical analysis. Now I'll work through an example.

Let's say we have a time series vector, where the discrete observations correspond to the inter-onset intervals of notes being triggered at that particular time frame. If no note is triggered, the value will be zero.

For example, consider the German folk song "Herr Bruder Zur Rechten" (number E1145 in the ESAC database):



The corresponding IOI time series would be:

ioi.seq <- c(1, 1, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 1, 1, 2, 0, 2, 0, 2, 0, 2, 0, 2)

where the minimum subdivision is quavers.

Next we calculate the full autocorrelation using the acf() function from the stats package.

ioi.acf <- as.numeric(acf(ioi.seq, length(ioi.seq))$acf)

Looking at the ACF plot we notice that peaks in correlation are occurring every 6 quavers, which matches the actual 3/4 time signature.



To automatically find peaks we use Brian Ripley's implementation of the S+ peaks function (I renamed it to s.peaks).

peaks <- s.peaks(ioi.acf, 6)

The second parameter to s.peaks specifies the window length. This has to be inversely proportional to the minimum subdivision. For example, working with crotchets, the window length would be 3, with quavers 6, and with semiquavers 12.

Next, we find the distances between peaks.

peaks.indices <- which(peaks)
distances <- diff(peaks.indices)

Finally, we multiply the mode of peak distances with the subdivision to get the extracted metre.

# there must be a better way of finding mode...
distances.mode <- as.numeric(names(which.max(table(distances))))
metre <- distances.mode * 0.5

As expected:

> metre
[1] 3

Now, this method doesn't always get it spot on. Sometimes it thinks that a melody in 2/4 is in 8/4, or that 3/4 is 1.5/4. As it turns out, finding the exact metre is very difficult. How would you tell a computer that a piece is in 6/8 or 3/4? Or even worse, 2/4 or 4/8? For the purposes of music informatics, it is often sufficient to determine if the time signature is binary (2/4, 4/4, etc.) or ternary (3/4, 6/8, 9/8, etc.). Of course, this means that we will have to discard of all pieces in compound time signatures, such as 5/4 or 7/8. Fortunately for us (but perhaps unfortunately for music in general), western composers traditionally used to stick to simple metres.

In my previous post I talked about the Essen data set. One of the great things about it is that it has metre and key information encoded for all included melodies. Using this, we can test how good our metre detection algorithm actually is.

The results are promising. Out of 5120 melodies, we manage to get the correct result in 4148 of the cases, yielding an accuracy of roughly 81%.

The extract.metre() function resides on Github in the rmidi_tools repo.

No comments:

Post a Comment