Create Animated Principal Component Analysis Biplots

PCA is a handy tool (see SVD), actually, a handy procedure, that converts a set of observations into a set of values of uncorrelated variables. The uncorrelated variables are called principal components. The outcome is that, given a load of input, we can get a bunch of components that describe the variance, ordered high to low, of the input and are uncorrelated with the preceding component. It is a method that’s handy for data reduction and has been used in many areas including biology.

I took a bunch of proteins with circular dichroism (CD) spectra, parsed the associated pdb files and ran PCA on vectors of various attributes and the associated spectral values across a wavelength range associated with synchrotron radiation CD.

I used R to perform PCA:

input <- args[1]
output <- args[2]
data <- read.csv( input, h=T)
pcx <- prcomp( data, scale=T )
png( output, height=800, width=800 )
biplot(pcx, choices=1:2, main=input, scale=1)

The script, executed using Rscript, takes two arguments: an input file of attributes separated using a comma (csv); an output file, the name of the .png created using the biplot command. The input files were created using a python script.

As a result I have 66 .png files. If you want to create an animation, that loops over a collection, then you need to grab imagemagick so that you can convert the .png to .gif. Because I am using a mac I installed imagemagick using homebrew:

-=[biomunky@blacksheep png]=- brew install imagemagick

Then, to convert all the pngs to gif I run convert (imagemagick) using a bash for loop:

for i in `ls *.png | sed 's/\.png//'`; do convert $i.png $i.gif; done

The result? A load of gifs, excellent.

Imagemagick can then create an animation from the collection of gifs:

convert -delay 50 -loop 0 *.gif animated.gif

-delay 50: causes a delay of 50 hundredths of a second between images.
-loop 0: causes an infinite loop.
*.gif: the input
animated.gif: the name of the output image.

The final output can be seen below – click the image and it should open in a new tab.

To understand what you are looking at read about PCA and biplots.

OS X: bash colours

Add this to the .bash_profile to get some colours going. The current directory is appended to the prompt and coloured red. Directories are a light blue, executables are red and other stuff is black.

export CLICOLOR=1
export TERM=xterm-color
export LSCOLORS=gxgxcxdxbxegedabagacad # cyan directories
export LC_CTYPE=en_US.UTF-8
export PS1=’\u@\h:\[\e[1;31m\]\W\[\e[0m\]$ ‘