stats

Create Animated Principal Component Analysis Biplots

PCA is a handy tool (see SVD), actually, a handy procedure, that converts a set of observations into a set of values of uncorrelated variables. The uncorrelated variables are called principal components. The outcome is that, given a load of input, we can get a bunch of components that describe the variance, ordered high to low, of the input and are uncorrelated with the preceding component. It is a method that’s handy for data reduction and has been used in many areas including biology.

I took a bunch of proteins with circular dichroism (CD) spectra, parsed the associated pdb files and ran PCA on vectors of various attributes and the associated spectral values across a wavelength range associated with synchrotron radiation CD.

I used R to perform PCA:

input <- args[1]
output <- args[2]
data <- read.csv( input, h=T)
pcx <- prcomp( data, scale=T )
png( output, height=800, width=800 )
biplot(pcx, choices=1:2, main=input, scale=1)
png()

The script, executed using Rscript, takes two arguments: an input file of attributes separated using a comma (csv); an output file, the name of the .png created using the biplot command. The input files were created using a python script.

As a result I have 66 .png files. If you want to create an animation, that loops over a collection, then you need to grab imagemagick so that you can convert the .png to .gif. Because I am using a mac I installed imagemagick using homebrew:

-=[biomunky@blacksheep png]=- brew install imagemagick

Then, to convert all the pngs to gif I run convert (imagemagick) using a bash for loop:

for i in `ls *.png | sed 's/\.png//'`; do convert $i.png $i.gif; done

The result? A load of gifs, excellent.

Imagemagick can then create an animation from the collection of gifs:

convert -delay 50 -loop 0 *.gif animated.gif

-delay 50: causes a delay of 50 hundredths of a second between images.
-loop 0: causes an infinite loop.
*.gif: the input
animated.gif: the name of the output image.

The final output can be seen below – click the image and it should open in a new tab.

To understand what you are looking at read about PCA and biplots.

2010 in review

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is doing awesome!.

Crunchy numbers

Featured image

A helper monkey made this abstract painting, inspired by your stats.

In 2010, there were 22 new posts, not bad for the first year!

The busiest day of the year was November 17th. The most popular post that day was Install Matplotlib on Snow Leopard and Leopard..

Where did they come from?

The top referring sites in 2010 were en.wordpress.com, digg.com, delicious.com, and stumbleupon.com.

Some visitors came searching, mostly for matplotlib snow leopard, django template dictionary, clojure read file, biomunky wordpress, and pybrain.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Install Matplotlib on Snow Leopard and Leopard. March 2010
7 comments

2

Run Java or Scala code with CERN colt library March 2010

3

Read a file using Clojure March 2010

4

Django Template Dictionary Items March 2010

5

Setup Clojure Snow Leopard June 2010