Sunday 13 March 2011

Using the subsumption architecture for musical composition

I recently learned about the subsumption architecture, and I wanted to see if I could use it in music.

(If you want, you can jump to the end of the post for a sample of what the result can sound like, before reading the actual post.)

In AI, the subsumption architecture is a simple, reactive, multi-agent architecture. The idea is that you have a number of basic actions that can trigger, based on a number of rules. The rules are ordered by precedence, so that if multiple actions can be triggered, only one action will be executed. For example, say you're driving a car. One rule could be "If you see an obstacle in front of you, stop!". Another rule could be "If you're at a traffic light and it turns green, drive!". But if you are at the traffic light, it turns green, but an elderly lady has not finished crossing the road, you probably don't want to drive her over. In that case the first rule takes precedence over the second rule, and so on.

The real power of the subsumption architecture comes when you have many agents operating together (i.e. a city with lots of cars). I remember reading somewhere that the system is modelled on ant colonies, where each member is quite stupid on his own, but put all these stupid behaviours together in an orderly fashion, you have something that works quite well.

Now, applying this to music. Let's think of an agent as a part, hammering semiquavers, in a polyphonic piece. Say we want the parts to stay in their own "pitch band", and we want to avoid a higher part dropping below a lower part (although we could allow this from time to time). Also, we don't want any part in our piece to drop below or above given pitch boundaries (max pitch, min pitch). Based on these two goals we can define the following rules:

1a. If agent's pitch is above max pitch, give the agent a bias towards downward motion.
1b. If agent's pitch is below min pitch, give the agent a bias towards upward motion.
2a. If agent's pitch is equal to another agent's pitch (i.e. a collision has occurred), and the other agent is supposed to occupy a higher pitch band, give the agent a bias towards downward motion.
2b. If agent's pitch is equal to another agent's pitch, and the other agent is supposed to occupy a lower pitch band, give the agent a bias towards upward motion.
2c. If agent's pitch is equal to more than one other agent's pitch, give the agent a bias based on a majority vote.
3. Otherwise, move randomly, based on bias.

We can define our order of precedence as 1a < 1b < 2c < 2a < 2b < 3 (where < is read "takes precedence over").

It is easy to see how these rules will meet the goals we set out to achieve. Of course, an agent could jump below a lower agent, but if we make the longest allowed jump fairly short, the agent won't get very far down before it bumps in to another agent.

So what notes should we allow the agent to play? If we allow the full chromatic scale, it will all just be a mess. If we only allow a major pentatonic scale, it will sound pleasant but boring in the long run. Really, we would want to be able to change the scale. Rodney Brooks, the inventor of the subsumption architecture, gave an example of Mars rovers that look for samples of some Mars material, and communicate by dropping radioactive crumbs. As an homage to Brooks, I'm going to drop little gems containing musical scales on my "world". Agents can pick gems up (if close enough), and then hand them over to other agents if they collide. If two agents collide, both carrying gems, the one that got its gem more recently gets to give one away. I've simplified things a bit, so that gems never actually disappear, even if they are "picked up".

Using this technique, a scale can be gradually imposed on the piece over time, creating interesting overlaps of scales. During these overlaps, tension increases while the "orchestra" of agents are bouncing in to each other, a tension which is released when all agents have adopted the same scale.

In the example piece I have created, I drop the following gems (times in crotchets, radius refers to how far away you can be to pick the gem up, the greater the radius, the more likely that the gem is picked up):

1. Scale: C, D, E, G, time = 0, pitch = 70, radius = 4
2. Scale: C, D, E, F, A, time = 8, pitch = 100, radius = 3
3. Scale: D, E, G, B, time = 16, pitch = 60, radius = 2
4. Scale: C, E, G, A, B, time = 32, pitch = 100, radius = 5
5. Scale: D, E, F#, A, B, time = 48, pitch = 60, radius = 3
6. Scale: D, G, B, time = 56, pitch = 80, radius = 4
7. Scale: C, E, G, B, time = 64, pitch = 100, radius = 4
8. Scale: C, G, time = 72, pitch = 80, radius = 6
9. Scale: C, Eb, G, Ab, Bb, time = 88, pitch = 60, radius = 3
10. Scale: C, G, time = 104, pitch = 80, radius = 3
11. Scale: C, D, G, time = 112, pitch = 60, radius = 4

At the 128th crotchet I rewind my counter and drop the first gem again, etc. Below are two graphs showing the movement of four and ten agents, respectively, during the first 128 crotchets. Notice how agents bounce back when they collide.





The next two graphs show how the scales get distributed, using "gems", to the different agents. As you can imagine, it is less likely that all scales will find their way to all agents in the four agent scenario than with ten agents.





This is what these two pieces sound like: 10 agents, 4 agents

And finally, here are a couple of longer generations: 10 agents, 4 agents

If you're interested in the source code (in R), send me a message and I'll tidy it up.