Sunday 14 August 2011

Music mining with SoX

SoX is a very powerful command line audio processing tool for Linux. In its wealth of capabilities it's a bit like ImageMagick for sound. Apart from generating and altering audio, it's also quite good at telling you what is inside a sound file.

To print metadata about an MP3 file you can do



SoX is also able to extract statistics from the actual audio information using the stats command:



The different parameters are thoroughly explained in "man sox". (The SoX manpage is actually a great little read, it covers lots of digital audio concepts.)

One annoying thing about SoX is its output format. It looks good to a human being, but to get it into a computer readable format you have to massage it quite well. Look at the output table for the stats command, for example. See how the second "r" in Crest factor sits in the same column as the "-" in the DC offset and Min level. Also notice that Num samples is 28.1M, not 28100000. To any statistics software that would make it a categorical attribute, not a numeric one.

In order to overcome some of these limitations I wrote a small shell script that takes a list of filenames from stdin and outputs a text file to stdout, where each line corresponds to one individual audio file, and the attributes that SoX extracts are separated by the pipe character.

If you use this script you'll still need to do a bit of manual processing afterwards, getting rid of lines that don't have enough values, etc.

To try it out I processed around 2,000 MP3 files I had on my hard drive. The SoX processing took around 2 hours on an old dual core Macbook. Then I got rid of lines with missing values, brought it into R, and used the following function to transform attributes in scientific notation to actual numbers:



After that I pulled the Artist ID3 tag out of the comments attribute and used J48 (C4.5 implementation) from the RWeka package to generate a decision tree model based on artist. Without tweaking the parameters for J48 very much, I got around 23% accuracy. It's obviously not a great number, but comparing it to the accuracy of a completely random prediction model, which resulted in an accuracy of 0.7%, it's not all that bad.

The audio features that SoX extracts may not be the most useful, but considering how fast and easy to use it is, I think it's definitely worth a go.

No comments:

Post a Comment