Thursday, September 26, 2013

Molecular Dynamics Sonification and Music Hack Day Chicago

I went to Music Hack Day Chicago this weekend (http://chicago.musichackday.org/index.php?page=Main+page). It was the first time I ever went to a Hackathon but was looking forward to spending 30-40 hours coding out something cool. Of course these hacks were supposed to have something to do with music and I decided upon programming a mechanism to sonify data from Molecular Dynamics simulations.

I recently received a grant from the national supercomputing center for computer time on Stampede at UTexas, currently ranked the sixth fastest super computer in the world (http://www.top500.org/lists/2013/06/). I have been running some equilibrium molecular dynamics simulations of proteins to understand conformational change pathways and basically figure out how proteins work the way they do. The amount data from these simulations is huge and complex and it is very multi-dimensional because not only are you looking at how each atom in the protein is moving in 3-dimensions but you are also looking at the physical forces that control this.

Normally one uses a variety of analysis software and also watches the simulations to try and observe obvious changes. As you can imagine simultaneously trying to understand any portion of this data is a crazy task visually so I set out to to use more senses. I wanted to sonify the data and play it back while watching the visual part of the simulation. Harnessing our auditory ability to pick out unique sounds.

So I created a layered musical arrangement that allows someone to listen for more subtle conformational changes while watching video of the simulation. This is different then most any other form of data sonification that I have found because it uses easily recognizable musical instruments whereas most data sonification just uses frequency shifted sine waves.

It surprisingly work really well and is really cool to experience. Code for demoing it is available here (https://drive.google.com/?tab=mo&authuser=0#folders/0B_R75gIJvkFUT0xveDlEaXZzQm8)


The sounds that one can hear in the video:

The piano key in the beginning represents the radius of gyration of the protein as it becomes higher in pitch the radius of gyration is becoming larger, as it becomes lower the radius of gyration is smaller. The radius of gyration is a measure of movement away from a center of mass, so basically if the protein is expanding or shrinking (https://en.wikipedia.org/wiki/Radius_of_gyration)

The beeps represent the RMSD of each residue as compared to frame 1 of the simulation. As the RMSD from the starting structure become higher the pitch becomes higher. Each beep is for each residue and they are performed in order. RMSD is basically how far each residue moved from it's initial position in relation to the protein(i.e. minus translation motion) (https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions)

To determine where a residue is located listen for secondary structure cues.

The percentage secondary structure as calculated by dssp is the background sounds.
This is an explaination of secondary structure (https://en.wikipedia.org/wiki/Protein_secondary_structure)

A violin is played for the first quarter of the protein and represents the amount of structure in the whole protein, a combination of alpha helix, beta sheet and turn. It becomes higher in pitch as the number goes up and lower as it goes down.
The second quarter of the protein's background sound is Monks making an ohhh noise(according to MIDI tables) it is a representative of the percentage alpha helix. It becomes higher in pitch as the number goes up and lower as it goes down.
For the third quarter of the protein the background sound is a guitar it is a representative of the percentage beta sheet. It becomes higher in pitch as the number goes up and lower as it goes down.
The final quarter is a Sci-Fi noise (called so by MIDI tables), it represents percentage coil. It becomes higher in pitch as the number goes up and lower as it goes down.
So for instance if you hear a group of high pitch beeps during the violin you know they are in the first quarter of the protein.

The protein is HIV protease is a dimer(i.e. composed of two of the same protein). This means the sonification of the first half will be one half and the second half will be the other. Being a dimer doesn't mean that the conformational changes are symmetric either so the two halves can sound different.

Listen to the sounds a see if you can identify where in the protein the conformational changes are occurring. 










What one can tell is that the high pitched beeps are in the beginning and end also the middle of each half.The beginning and end of proteins, the termini are often very flexible and so change alot but are often not related to protein function. However, the high beeps in the middle(residues 40-60) are the flaps of the HIV protease that open up to allow substrate binding and cleavage and allow the virus to be active. It is pretty cool that this works!