PCA is a handy tool (see SVD), actually, a handy procedure, that converts a set of observations into a set of values of uncorrelated variables. The uncorrelated variables are called principal components. The outcome is that, given a load of input, we can get a bunch of components that describe the variance, ordered high to low, of the input and are uncorrelated with the preceding component. It is a method that’s handy for data reduction and has been used in many areas including biology.
I took a bunch of proteins with circular dichroism (CD) spectra, parsed the associated pdb files and ran PCA on vectors of various attributes and the associated spectral values across a wavelength range associated with synchrotron radiation CD.
I used R to perform PCA:
input <- args output <- args data <- read.csv( input, h=T) pcx <- prcomp( data, scale=T ) png( output, height=800, width=800 ) biplot(pcx, choices=1:2, main=input, scale=1) png()
The script, executed using Rscript, takes two arguments: an input file of attributes separated using a comma (csv); an output file, the name of the .png created using the biplot command. The input files were created using a python script.
As a result I have 66 .png files. If you want to create an animation, that loops over a collection, then you need to grab imagemagick so that you can convert the .png to .gif. Because I am using a mac I installed imagemagick using homebrew:
-=[biomunky@blacksheep png]=- brew install imagemagick
Then, to convert all the pngs to gif I run convert (imagemagick) using a bash for loop:
for i in `ls *.png | sed 's/\.png//'`; do convert $i.png $i.gif; done
The result? A load of gifs, excellent.
Imagemagick can then create an animation from the collection of gifs:
convert -delay 50 -loop 0 *.gif animated.gif
-delay 50: causes a delay of 50 hundredths of a second between images.
-loop 0: causes an infinite loop.
*.gif: the input
animated.gif: the name of the output image.
The final output can be seen below – click the image and it should open in a new tab.
To understand what you are looking at read about PCA and biplots.