Connect to Lucid using R

*will work for any JDBC db*

First you will need to install the RJDBC library. I did this using the R package manager (remember to include all dependencies).

The following lines connect to my local lucid instance, select the number of users from my pretend visitors table and stores the result in the ‘result’ variable.

library(RJDBC)
driver <- JDBC("org.luciddb.jdbc.LucidDbClientDriver", "/Users/dan/Desktop/LucidDbClient.jar", identifier.quote="'")
conn <- dbConnect(driver, "jdbc:luciddb:http://localhost:8034", "sa", "")
result <- dbGetQuery(conn, "select count(*) from visitors")
dbDisconnect(conn)

Remember, be a good person and close your connection :)
You can find out more from about RJDBC here http://cran.r-project.org/web/packages/RJDBC/index.html

Luciddb clojure (note to self)

connect to lucid with clojure and do something, anything.


(use 'clojure.java.jdbc)

(def db {:classname "org.luciddb.jdbc.LucidDbClientDriver"
  :subname "http://localhost:8034"
  :subprotocol "luciddb"
  :user "sa"})

(with-connection db
  (with-query-results rs ["select count(*) as ctx from reports.pages"]
  (doseq [row rs] (println (:ctx row)))))

the result is: 1400000 – very exciting. :ctx is the column identifier – this would, in scala/java look something like => rs getInt “CTX” || rs.getInt(“CTX”) if you want to be verbose!

There is a small amount of work to that has to be done before this code will run – we must first get the required lucid jars. I followed the steps described here: http://www.pgrs.net/2011/10/30/using-local-jars-with-leiningen/. This resulted in maven_repository in the directory I ran mvn install. The problem was that the .jar wasn’t copied into that maven repo – the pom was. I did an ls of ~/.m2 and found a new lucid directory – again it had the pom & no jar. I copied the .jar to the same place as the pom and things worked. The thing i *really* hate about java** (and the langs that use the jvm) is the class path mess – perhaps python/perl have spoilt me.

☁  0.9.4  pwd
/Users/biomunky/.m2/repository/luciddb/luciddb/0.9.4

☁  0.9.4  ls
COPYING                   META-INF                  luciddb-0.9.4.jar         net
FarragoRelease.properties de                        luciddb-0.9.4.pom         org

** the gem system … f&%$ ruby gems (i use zsh/ohmyzsh which, i am told, is part of the problem)

Jython & Lucid

Install jython

 brew install jython

or whatever package installer your machine has.
Install pip

 sudo easy_install pip

Install Virtualenv

 sudo pip install virtualenv

Create new environment

virtualenv -p /usr/local/bin/jython <your-project-name>

Activate virtualenv


cd <your-project-name>

source bin/activate (I use zsch & this works for me)

Install jip

 ./bin/pip jip

Run a query


from com.ziclix.python.sql import zxJDBC

db = zxJDBC.connect("jdbc:luciddb:http://localhost:8034", "sa", None, "org.luciddb.jdbc.LucidDbClientDriver")

cursor = db.cursor()

cursor.execute("select * from reports.watching where event_id = '757670'")

for row in cursor:
print row

cursor.close()
db.close()

Clojure Futures & Promises – done the wrong way…

I have an interest in Clojure and figured I would mess about with futures and promises.  The general idea is that two sets of long running jobs are created and kicked off at the same time – both are a lazy collection of futures.  There’s a third process that requires the results of these before it can start, in order for this to happen a function is created that takes the two vectors of futures, kicks them off an the polls for completion – (every? realized? <jobs>).  Once a set of jobs is completed a promise is updated (deliver <my-promise> <the-value>), and once both are complete the third task can be kicked off.

As per usual, there is probably a better &| idiomatic way to do this, however it does function as intended. You need to look past the Thread/sleeps …

(defn simple-log [msg]
  (spit "/tmp/ftr.log" (str msg "\n") :append true))

(defn create-jobs [jobs]
 (map (fn [job-id]
   (future (do
     (simple-log (str "Starting job: " job-id))
     (Thread/sleep (rand-int 5000))
     (simple-log (str "Finishing job: " job-id))))) jobs))

(defn jobs-complete? [running-tasks id some-promise]
 (if (every? realized? running-tasks)
   (do (simple-log (str "job " id " set complete"))
     (deliver some-promise true))
   (do (simple-log "waiting ...")
       (Thread/sleep 100)
       (recur running-tasks id some-promise))))

(defn runner [jobs id some-promise]
 (let [running-tasks (doall jobs)]
   (jobs-complete? running-tasks id some-promise)))

(def job-set-1 (create-jobs (range 1 100)))
(def job-set-2 (create-jobs (range 100 200)))
(def set-1-complete (promise))
(def set-2-complete (promise))

(runner job-set-1 :a set-1-complete)
(runner job-set-2 :b set-2-complete)

(if (and @set-1-complete @set-2-complete) (simple-log "ALL JOBS DONE!"))

(shutdown-agents)

Fizzbuzzing hangover

class Fizzbuzz {
  def fizzbuzz(n:Int) = (n % 3, n % 5) match {
      case (0,0) => 'fizzbuzz
      case (0,_) => 'fizz
      case (_,0) => 'buzz
      case _     => n
    }
}

a test

import org.scalatest.FlatSpec
import org.scalatest.matchers.ShouldMatchers

class FizzbuzzTest extends FlatSpec with ShouldMatchers {
  val fb = new Fizzbuzz

  "Fizzbuzz.fizzbuzz(number:Int)" should "return 'fizz if number % 3 == 0" in {
    fb.fizzbuzz(3) should equal ('fizz)
  }
  it should "return 'buzz if number % 5 == 0" in {
    fb.fizzbuzz(5) should equal ('buzz)
  }
  it should "return 'fizzbuzz if number % 5 and 3 == 0" in {
    fb.fizzbuzz(15) should equal ('fizzbuzz)
  }
  it should "return number if number % 5 && 3 != 0" in {
    fb.fizzbuzz(1) should equal (1)
  }
}

compile Fizzbuzz: scalac Fizzbuzz.scala
compile FizzbuzzTest: scalac -cp .:scalatest-1.7.2/scalatest-1.7.2.jar FizzbuzzTest.scala
run the test: scala -cp .:scalatest-1.7.2/scalatest-1.7.2.jar org.scalatest.tools.Runner -p . -o -s FizzbuzzTest

Run starting. Expected test count is: 4
FizzbuzzTest:
Fizzbuzz.fizzbuzz(number:Int) 
- should return 'fizz if a number % 3 == 0
- should return 'buzz if a number % 5 == 0
- should return 'fizzbuzz if a number % 5 and 3 == 0
- should return number if a number % 5 && 3 != 0
Run completed in 191 milliseconds.
Total number of tests run: 4
Suites: completed 1, aborted 0
Tests: succeeded 4, failed 0, ignored 0, pending 0
All tests passed.

If you don’t want to use the Runner then you can start a REPL with the -cp to include . and scalatest. Then in the REPL type: (new FizzbuzzTest).execute() … it should look like this:

scala> (new FizzbuzzTest).execute()
FizzbuzzTest:
Fizzbuzz.fizzbuzz(number:Int) 
- should return 'fizz if a number % 3 == 0
- should return 'buzz if a number % 5 == 0
- should return 'fizzbuzz if a number % 5 and 3 == 0
- should return number if a number % 5 && 3 != 0

Also, if you like beer, I suggest getting some Wheat beer from Camden town brewery.

Coloured horizontal bar graphs

R is great for creating some acceptable looking graphs, much more so that GnuPlot (which is also awesome ). R plots are decent, however there is a library called ggplot that is even better. I wanted to create some simple graphs of data and add a little colour – red for +ve values, blue for -ve. This is fairly simple to achieve with R and ggplot.

I can’t share the data that I am using or explain what it is. I load file content into a data.frame and then use the names to extract a entry of interest.


setwd("path/to/data/dir")
library(ggplot2)

data <- read.table("data", h=T, sep=",")

This reads the data in.


plot <- ggplot(data, aes(fill=componentToFill, x=X, y=component)) + 
geom_bar() +
coord_flip() + 
scale_y_continuous("") + 
scale_x_discrete("") + 
scale_fill_gradient2(low="blue", high="red")

component and componentToFill are things in your data frame, again I’ve had to change my names despite the fact you don’t have the data (I don’t want to get in trouble).


ggsave(filename="example.jpg", p, dpi=1000, width=7, height=5)

It’s probably a good idea to look at the plot, in this case we will save to a file called example.jpg.

The image I had looks like this.

Image

Anagrams with python

I was asked to write a bit of python code to find anagrams. The code consists of a class with one method and a unittest.TestCase companion. The anagram class takes a list of words and performs some crude checks to make sure the class word list is not empty. Passing a word to the get_anagrams will return a set([]) of all anagrams for that word. For example, passing in ‘cat’ will return set([‘act’, ‘cat’]).

The word list used here is found in /usr/share/dict/words, this should be ok for mac and linux users, anyone using windows will have to provide a word list and change the tests.

#!/usr/bin/env python

import itertools
import unittest
import types
import os

class Anagram( object ):
    def __init__(self, words):
        """
        Create an array containing words supplied in <words>.
        Raises errors if 1) words isn't a list type
                         2) words turns out to be empty or contains only non-StringType(s)
        """
        if not isinstance(words, types.ListType):
            raise TypeError("Anagram takes a list of strings")
        else:
            self.words = [ i for i in words if isinstance( i, types.StringType )]

            if len(self.words) == 0:
                raise ValueError("Empty word list")
            
    def get_anagrams(self, word):
        """
        Given a word returns a list of anagrams for <word>
        Example: x.get_anagrams('cat') would return set(['act', 'cat'])
        """
        if not isinstance(word, types.StringType):
            raise TypeError("get_anagrams takes a StringType")

        #Get a list of words from self.words that are the same length as word
        filtered_words = set([i for i in self.words if len(i) == len(word)])

        #create a list of all possible combintations of word - create tuple of chars
        #then join the ('d', 'a', 'n') to form  string - 'dan'
        word_permutations = set([ ''.join(w) for w in itertools.permutations( word )])

        #use set.intersection to find all 'real' words, specifically those found in <words>
        return  word_permutations.intersection(filtered_words)

class TestAnagram( unittest.TestCase ):
    def test_constructor(self):
        self.assertRaises(TypeError,  Anagram, 1)
        self.assertRaises(TypeError,  Anagram, "cat")        
        self.assertRaises(TypeError,  Anagram, ())
        self.assertRaises(ValueError, Anagram, [1,2])        
    
    def setUp(self):
        filename   = '/usr/share/dict/words'
        if not os.path.isfile(filename):
            raise IOError( "Sample word file does not exist" )
    
        self.words = [i.strip() for i in open(filename).readlines()]
        self.ana   = Anagram(self.words)

    def test_get_anagrams(self):
        cat_like = self.ana.get_anagrams('cat')
        python_no_mates   = self.ana.get_anagrams('python')
        self.assertEqual( cat_like, set(['act','cat']) )
        self.assertEqual( python_no_mates,  set([]) )
        
    def test_argument_get_anagrams(self):
        self.assertRaises(TypeError, self.ana.get_anagrams, 1)
        self.assertRaises(TypeError, self.ana.get_anagrams, [1])
        self.assertRaises(TypeError, self.ana.get_anagrams, {})
        
if __name__ == '__main__':
    unittest.main()

Playing with Scalatra, Jetty and EC2

Last week I met a friend before going to a Clojure function at Skills Matter in London. He’s a Ruby fanatic – Sinatra and Rails, they’re grrreeeaaaat. I agree with him, the little bits of rails and Sinatra I have done have been enjoyable (much like Flask and Django, without the fsking white space delimiting). Somehow we got started on Scala, that led to Scalatra – the Scala version of Sinatra. It looks interesting and isn’t Lift, figured it would be worth a quick look.

I downloaded the latest sbt and created an alias in my bash_profile as sbt2 (I have an older version of sbt too).

alias sbt2=”java -Xmx1500M -jar /Users/biomunky/svn/sbt-launch.jar $@”

I then followed the instructions here to download it via sbt (not g8).

This gives you a blank-ish template. I modified src/main/scala/.scala – it should be the only file in the directory. To get something up and running quickly I made it look like this:

import org.scalatra._
import java.net.URL
import scalate.ScalateSupport

class PlayTimeServlet extends ScalatraServlet with ScalateSupport {
  get("/") {
    <html>
      <body>
        <h1>Hello Internet Person.</h1>
				<p>Would you like to try some <a href="form">simple maths?</a></p>
      </body>
    </html>
  }
	
	get("/form") {
		<form action="/form/math" method="post">
				<input name="first_value"  type="text" value="" />
				<select name="operation">
				  <option value="plus">+</option>
				  <option value="minus">-</option>
				  <option value="divide">/</option>
				  <option value="multiply">*</option>
				</select>
				<input name="second_value" type="text" value="" />
	      <input type="submit" value="Submit" />					
	   </form>
	}

	post("/form/math") {
		val param1 = params.getOrElse("first_value",  "0")
		val param2 = params.getOrElse("second_value", "0")
		val operation = params.getOrElse("operation", "plus")
		val result = try { 
			operation match {
				case "plus"     => (param1.toDouble + param2.toDouble).toString 
				case "minus"    => (param1.toDouble - param2.toDouble).toString 
				case "multiply" => (param1.toDouble * param2.toDouble).toString 
				case "divide"   => if (param2.toDouble > 0.0) { 
						(param1.toDouble / param2.toDouble).toString
					} else {
						"Divide by Zero!"
					}
			}	
		} catch { 
			case _ => "I can't do that math." 
		}
        <html>
          <body></body>
          <head>
		    <p>The result is: {result}</p>
          </head>
        </html>
	}
			
  notFound {
    // Try to render a ScalateTemplate if no route matched
    findTemplate(requestPath) map { path =>
      contentType = "text/html"
      layoutTemplate(path)
    } orElse serveStaticResource() getOrElse resourceNotFound() 
  }
}

If you start the Jetty and navigate to localhost:8080 you should see a page that reads “Would you like to try some simple maths?” – how very exciting!

Clicking the link will get a web form. Fill the details 2 + 2. Click submit. A POST! You should now see the result of the expression.

So that’s all fine on localhost, I wanted to see it outside sbt in another jetty server – specifically something I had on ec2. This means creating a .war file – this is achieved by running sbt(2) package. The war can be found under target/scala-2.9.1/.

Get a copy of jetty (tar zxvf ) and place the war under webapps/ in the jetty home directory. Start jetty by entering: java -jar start.jar. Browse to localhost:8080 and you will see the same site as above. Enter the same data (or whatever expression you want) and you should see the same result — i lied, it should’ve crashed, if it didn’t – goody for you.

I had to change a couple of things. I removed webapps/test.war and it’s context file found in contexts/ (don’t delete it just yet). I then created a file called playtime.xml in contexts/ -> playtime is the name of my project (the war is also called playtime.war). Rename the test.xml context to .war. Change line 23
from

/webapps/test.war to /webapps/playtime.war.

If you retry submitting the form, having restarted the server, you should now get the results page rather than the error.

Since I’ve been goofing about with ec2 (because you can get some stuff free for a year) I dumped it there. If you do this, you will also have to open your ec2 instance to port 8080. You can do this via: aws management console -> Network & Security -> Security Groups -> your-active-security-profile.

Biologists are using Haskell?

I find Haskell interesting, I’ve tried to work my way through Real World Haskell and failed miserably. I then tried Learn You a Haskell, it’s a really great book, the author does a great job of introducing new concepts at the right speed for me … until chapter 11. However, some biologists are using it to do some interesting work.

So I figured why not try doing something with it, something I won’t use, that isn’t complicated.

import System.IO
import Data.List

parseFasta :: [(String,String)] -> [String] -> [(String, String)]
parseFasta acc [] = acc
parseFasta acc (x:xs)
    | isPrefixOf ">" x == True = parseFasta ((x, "") : acc) xs
    | otherwise = parseFasta (appendString acc x) xs

appendString :: [(String,String)] -> String -> [(String, String)]
appendString acc newstring = case acc of (name,seq):xs -> (name,seq++newstring):xs
                                         _ -> [(newstring, "")]

main = do
        content <- readFile "1HNN.fasta"
        let fasta = parseFasta [] (lines content)
        mapM print fasta

For the stuff I am doing at the moment I can’t justify learning Haskell – shame.

python, scala and clojure – old notes.

Below are some notes I made when initially looking at Scala and Clojure.
(List(1,2,3) :\ 0.0) (_+_)

Where’s my for loop?

Counting residues … start with a list of characters something like in the case of python.

characters = ['a','b','b','c','c','c','d','d','d','d']

What we want is a count of the number of each residue, therefore what we want is a mutable data structure – specifically an empty dictionary (also called associative arrays, maps, hashmap, kevin) and a for loop.

counts = {}
for achar in characters:
    if character in counts:
        counts[achar] += 1
    else:
        counts[achar] = 1

great, counts is a dictionary, which can also be declared using counts = dict(). We then loop over each character (achar) in characters, if the counts dictionary contains achar the count is incremented (+= 1) otherwise a new key is created in the dictionary and associated with the value 1 (counts[achar] = 1).

the output of this will be a populated dictionary

{'a': 1, 'c': 3, 'b': 2, 'd': 4}

the same outcome can be reached using fewer lines of code:

counts = {}
for achar in characters:
	counts[achar] = counts.get(achar, 0) + 1

This time we used the dictionary method .get. The method returns the value for the key if it’s in the dictionary and 0 if it isn’t. Without the 0 .get would return None – you can’t add None and 1!

So we’ve already got what we wanted but there’s yet another way – we could use a for comprehension and a list method .count to do the work

>>> characters.count('d')
4

Wrap the .count call in a for comprehension

>>> counts = [ (x, characters.count(x)) for x in set(characters)]

The result is however different. We are presented with an array of tuples

[('a', 1), ('c', 3), ('b', 2), ('d', 4)]

but we wanted a dictionary, well you can just wrap the comprehension in a dict()

>>> counts = dict([ (x, characters.count(x)) for x in set(characters)])
{'a': 1, 'c': 3, 'b': 2, 'd': 4}

We can also do this using a reduce and a function (since an assignment can’t be made directly in the lambda, we call with another function !!). This isn’t something to do in ‘real’ code – but may help with understanding clojure/scala

def addToDict( amap, achar ): 
	amap[achar] = amap.get(achar, 0) + 1
	return amap

counts = reduce( lambda x,y: addToDict(x,y), characters, {} )
print counts

So, first we define a function that takes a dictionary (called amap) and a character (achar), sets/increments the count and returns the dictionary – it’s identical to the example above – except that it now carries extra overhead. We then use the global reduce function and a lambda to call the function with the dictionary (the {} at the end of the reduce line) and each character in characters until the entire sequence has been covered.

If the above isn’t obvious then perhaps the following will help:

>>> thing_to_reduce = [1,2,3]
>>> the_function_to_apply = lambda i, running_total: running_total + i
>>> xsum = reduce( the_function_to_apply, thing_to_reduce, 0)
6

Should you do this in your code? No, probably not! So why bother? It may be helpful in working out how to achieve the same thing in the functional languages that all the cools kids are using.

# SCALA #

It’s perhaps less of a departure from languages like perl and python than clojure and haskell, although much like clojure, the docs often make reference/comparison to Java code – not much use if you don’t know Java. It also has two types of ‘variable’ – val and var. var is an indicator that you can mutate while val indicates that the item pointed to is immutable (this isn’t totally true).

scala> var x = 1
x: Int = 1

scala> x = 2
x: Int = 2

scala> val x = 1
x: Int = 1

scala> x = 2
<console>:8: error: reassignment to val
       x = 2
         ^

Again, we define characters as a list of chars

val characters:List[Char] = List('a','b','b','c','c','c','d','d','d','d')
// the :List[Char] can be dropped
val characters = List('a','b','b','c','c','c','d','d','d','d')

With scala we can import mutable maps and then use a for loop to populate it

import scala.collection.mutable.Map
val mutableCounter = Map[Char,Int]()
for ( achar <- characters ) mutableCounter(achar) = (mutableCounter.getOrElse(achar,0) + 1)
println( mutableCounter )

and from this we get: Map(c -> 3, a -> 1, d -> 4, b -> 2). A value can be extracted from the map using the .get method – mutableCounter.get(‘a’). Ideally we want to avoid mutable state (it’s helpful when delving into the world of concurrent programming), to achieve this we can use immutable collections and foldLeft on the characters List.

import scala.collection.immutable.Map
val foldCounts = characters.foldLeft(Map[Char,Int]()) {
    (amap, achar) =>  amap ++ Map(achar -> (amap.getOrElse(achar, 0) + 1  )) }
//res4: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

but, you can make this look pythonic again, this time making use of the count function which belongs to List.

println( characters.count(_ == 'd') )

To do it for all chars

characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } 

to get a map, rather than List[(Char, Int)] wrap append .toMap on the end of the line

scala> characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } toMap
res10: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

I think that the fold is the best method to use although I haven’t tested it.

# CLOJURE #
Clojure is a bit different, we don’t have the same options of vars and vals and, in addition, there are lots of parens to deal with!

Starting with the same thing – a vector of characters

(def characters [ \a \b \b \c \c \c \d \d \d \d])

you can check the class of characters by doing by typing the following at the REPL

(def characters [ \a \b \b \c \c \c \d \d \d \d])
(class \a)
(class characters)

you should see: java.lang.Character and clojure.lang.PersistentVector

It may be tempting to write a function that loops over our vector of characters and builds a dictionary of characters and their counts

(defn tally-characters-v1 [chars mydict]
  (if (empty? chars) mydict
      (let [achar (first chars)]
	(tally-characters-v1
	 (rest chars)
	 (assoc mydict achar (inc (mydict achar 0)))))))
	
(println (tally-characters-v1 characters {}))

Copy and paste the above into a REPL and you should get a clojure.lang.PersistentArrayMap. This can be accessed by key value pairs, much as you can do in python:

user=> (counts \d)
4

That function called itself, plus an empty map had to be passed in. You can avoid this step by altering the declaration

(defn tally-characters-v1
  ([inchars] tally-characters [inchars {}])
  (tally-characters-v1 [inchars counts]) ...

Now the method can be called by passing just the characters vector.

As i understand it functions should make use of the recur special form rather than self-calls.

(defn tally-characters-v2 [chars]
  (loop [coll chars mydict {}]
    (if (empty? coll) mydict
	(recur (rest coll)
	       (assoc mydict (first coll)
		 (inc (mydict (first coll) 0)))))))

that does as expected and produces a clojure.lang.PersistentArrayMap, but why bother creating a function using defn when reduce and an anonymous function will do just fine?

(def counts (reduce
	     (fn [amap achar]
	       (assoc amap achar (inc (amap achar 0))))
	     {} characters))

but, you know we don’t even need to do that when writing code for use in real applications. Clojure has a function called frequencies that does this

(println (frequencies characters))

nice.

The code block below shows the latest version of frequencies as defined in clojure.core

(defn frequencies
  "Returns a map from distinct items in coll to the number of times they appear."
  {:added "1.2" :static true}
  [coll]
  (persistent!
   (reduce (fn [counts x]
	     (assoc! counts x (inc (get counts x 0))))
	   (transient {}) coll)))

This code has not been checked for timing nor has any attempt been made to make it optimal or align with any programming best practice. It’s a collection of notes i made when learning something new.