Roman Numerals – the default dojo option

All code dojos start with roman numerals, it’s the default – it’s not exciting, but it serves a purpose – get your own idea onto the board. As such, I’ve only ever done the roman numerals dojo once, at a work scala dojo while I was attempting to debug some R, it wasn’t a great experience as I didn’t really get to participate.

Today I was woken up at about 6:30 by an annoying cat and, unable to get back to sleep, figured I would do something constructive – have a quick stab at doing the roman numeral thing.

So, my hack looks like this:

This will take us from “MCMLXXXIV” to 1984 – great. I recall in the scala dojo that some people took the option of reversing the sequence then mapping a function over the reverse, this saves the use of an accumulator – for some reason I don’t like doing the reverse then a map … it’s not like this is a chunk of code where performance is critical but …

Here’s an example of the kind of other solutions I’ve seen using reverse, they all use a helper like roman-helper, in the following case ‘add-numeral’

(defn roman [s]
    (reduce add-numeral (map numerals (reverse s))))

Talking to a friend at work, he came up with a slightly different version that removed the recur call – something I like to do. This allowed me to create another solution that I think is nicer – the complete gist can be found here on github.

The updated solution follows:

(defn roman-helper [[a b]]
  (let [curnt (roman-lookup a 0) nxt (roman-lookup b 0)]
  (if (> nxt curnt) (- curnt) curnt)))

(defn roman->decimal [roman]
  (let [as-seq (conj (into [] roman) nil)
        pairs (partition 2 1 as-seq)]
    (apply + (map roman-helper pairs))))

There are a couple of changes here. The first is a small change to the roman-helper function, the aggregate has been removed so the function simply returns it’s value or negative if the current letter is worth less than the next. The roman->decimal function has changed a little bit more, with most of the action happening in the let binding.

The first step

(conj (into [] roman) nil)

turns the string into a vector of characters and then slaps a nil on the end. I am not as bothered by this as the reverse because conj takes essentially constant time – not that this matters at all in this bit of code golf.

into vs (vec “XXX”)

when dealing with big collections, the use of (into [] “xxx”) is about 30% more performant than vec, this is due to the use of transients. Since we aren’t using a big collection here I could’ve written the block as

(conj (vec roman) nil)

but I don’t think it adds much to the clarity of the code.

The second step

(partition 2 1 as-seq)

we use the vector we created in step 1 to create a sequence of overlapping pairs. The 2 indicates the number of elements per partition and the 1 the number of steps we take. Clojure will take all partitions until we exhaust the sequence – without the nil we wouldn’t get the last pair.

The final step

(apply + (map roman-helper pairs))

Convert to numbers and add ’em up.

Going in the reverse direction.
I cheated. Clojure has pprint, which has a format function that allows us to create a partial function. The partial function can be called with a number e.g.
(decimal->roman 1984) and produce “MCMLXXXIV”.

Reverse Array/List: Interview Question

I was having a chat with a couple of guys at work about interviews and their experiences at various companies and there were a couple of common problems that popped up: fizzbuzz, reverse an array and which number has been removed from the array – you are allowed only one pass and you know what the supposed total of the complete array is.

I have an entry on here somewhere about fizzbuzz so won’t go over it.  The number missing from the array is simple: sum the values left in the array and subtract from the total.

Reverse an array is a bit more fun than the missing number and is more about recursion than fizzbuzz.  Since it’s sunday night, I am somewhat bored and my wife is looking at cat pictures, I figured I would chuck out a couple of implementations in a couple of languages.

Elixir- based on the Erlang vm has ruby-ish syntax & looks like a really nice new language

defmodule Hack do
  def reverse_list([h|t], acc // []) do
    Hack.reverse_list(t, List.concat([h], acc))

  def reverse_list([], acc) do

This code requires that you compile the code first using elixrc and fire up iex. You can run it as a script (.exs) as shown here, but I’ve got a couple of issues with it (see code annotations)

The same can be achieved using foldl

List.foldl([1,2,3], [], fn (i, acc) -> List.concat([i], acc) end)

Scala- based on the Java vm

def reverser[T](x:List[T]): List[T] = x match {
    case Nil => List.empty
    case x::xs => reverser(xs) :+ x

you can also do this with a foldLeft

List(1,2,3).foldLeft(List[Int]())( (a, b) => b +: a)

Clojure – based on the java vm

(defn reverser [coll] (reduce conj '() coll))

Luciddb clojure (note to self)

connect to lucid with clojure and do something, anything.

(use '

(def db {:classname "org.luciddb.jdbc.LucidDbClientDriver"
  :subname "http://localhost:8034"
  :subprotocol "luciddb"
  :user "sa"})

(with-connection db
  (with-query-results rs ["select count(*) as ctx from reports.pages"]
  (doseq [row rs] (println (:ctx row)))))

the result is: 1400000 – very exciting. :ctx is the column identifier – this would, in scala/java look something like => rs getInt “CTX” || rs.getInt(“CTX”) if you want to be verbose!

There is a small amount of work to that has to be done before this code will run – we must first get the required lucid jars. I followed the steps described here: This resulted in maven_repository in the directory I ran mvn install. The problem was that the .jar wasn’t copied into that maven repo – the pom was. I did an ls of ~/.m2 and found a new lucid directory – again it had the pom & no jar. I copied the .jar to the same place as the pom and things worked. The thing i *really* hate about java** (and the langs that use the jvm) is the class path mess – perhaps python/perl have spoilt me.

☁  0.9.4  pwd

☁  0.9.4  ls
COPYING                   META-INF                  luciddb-0.9.4.jar         net de                        luciddb-0.9.4.pom         org

** the gem system … f&%$ ruby gems (i use zsh/ohmyzsh which, i am told, is part of the problem)

Clojure Futures & Promises – done the wrong way…

I have an interest in Clojure and figured I would mess about with futures and promises.  The general idea is that two sets of long running jobs are created and kicked off at the same time – both are a lazy collection of futures.  There’s a third process that requires the results of these before it can start, in order for this to happen a function is created that takes the two vectors of futures, kicks them off an the polls for completion – (every? realized? <jobs>).  Once a set of jobs is completed a promise is updated (deliver <my-promise> <the-value>), and once both are complete the third task can be kicked off.

As per usual, there is probably a better &| idiomatic way to do this, however it does function as intended. You need to look past the Thread/sleeps …

(defn simple-log [msg]
  (spit "/tmp/ftr.log" (str msg "\n") :append true))

(defn create-jobs [jobs]
 (map (fn [job-id]
   (future (do
     (simple-log (str "Starting job: " job-id))
     (Thread/sleep (rand-int 5000))
     (simple-log (str "Finishing job: " job-id))))) jobs))

(defn jobs-complete? [running-tasks id some-promise]
 (if (every? realized? running-tasks)
   (do (simple-log (str "job " id " set complete"))
     (deliver some-promise true))
   (do (simple-log "waiting ...")
       (Thread/sleep 100)
       (recur running-tasks id some-promise))))

(defn runner [jobs id some-promise]
 (let [running-tasks (doall jobs)]
   (jobs-complete? running-tasks id some-promise)))

(def job-set-1 (create-jobs (range 1 100)))
(def job-set-2 (create-jobs (range 100 200)))
(def set-1-complete (promise))
(def set-2-complete (promise))

(runner job-set-1 :a set-1-complete)
(runner job-set-2 :b set-2-complete)

(if (and @set-1-complete @set-2-complete) (simple-log "ALL JOBS DONE!"))


python, scala and clojure – old notes.

Below are some notes I made when initially looking at Scala and Clojure.
(List(1,2,3) :\ 0.0) (_+_)

Where’s my for loop?

Counting residues … start with a list of characters something like in the case of python.

characters = ['a','b','b','c','c','c','d','d','d','d']

What we want is a count of the number of each residue, therefore what we want is a mutable data structure – specifically an empty dictionary (also called associative arrays, maps, hashmap, kevin) and a for loop.

counts = {}
for achar in characters:
    if character in counts:
        counts[achar] += 1
        counts[achar] = 1

great, counts is a dictionary, which can also be declared using counts = dict(). We then loop over each character (achar) in characters, if the counts dictionary contains achar the count is incremented (+= 1) otherwise a new key is created in the dictionary and associated with the value 1 (counts[achar] = 1).

the output of this will be a populated dictionary

{'a': 1, 'c': 3, 'b': 2, 'd': 4}

the same outcome can be reached using fewer lines of code:

counts = {}
for achar in characters:
	counts[achar] = counts.get(achar, 0) + 1

This time we used the dictionary method .get. The method returns the value for the key if it’s in the dictionary and 0 if it isn’t. Without the 0 .get would return None – you can’t add None and 1!

So we’ve already got what we wanted but there’s yet another way – we could use a for comprehension and a list method .count to do the work

>>> characters.count('d')

Wrap the .count call in a for comprehension

>>> counts = [ (x, characters.count(x)) for x in set(characters)]

The result is however different. We are presented with an array of tuples

[('a', 1), ('c', 3), ('b', 2), ('d', 4)]

but we wanted a dictionary, well you can just wrap the comprehension in a dict()

>>> counts = dict([ (x, characters.count(x)) for x in set(characters)])
{'a': 1, 'c': 3, 'b': 2, 'd': 4}

We can also do this using a reduce and a function (since an assignment can’t be made directly in the lambda, we call with another function !!). This isn’t something to do in ‘real’ code – but may help with understanding clojure/scala

def addToDict( amap, achar ): 
	amap[achar] = amap.get(achar, 0) + 1
	return amap

counts = reduce( lambda x,y: addToDict(x,y), characters, {} )
print counts

So, first we define a function that takes a dictionary (called amap) and a character (achar), sets/increments the count and returns the dictionary – it’s identical to the example above – except that it now carries extra overhead. We then use the global reduce function and a lambda to call the function with the dictionary (the {} at the end of the reduce line) and each character in characters until the entire sequence has been covered.

If the above isn’t obvious then perhaps the following will help:

>>> thing_to_reduce = [1,2,3]
>>> the_function_to_apply = lambda i, running_total: running_total + i
>>> xsum = reduce( the_function_to_apply, thing_to_reduce, 0)

Should you do this in your code? No, probably not! So why bother? It may be helpful in working out how to achieve the same thing in the functional languages that all the cools kids are using.


It’s perhaps less of a departure from languages like perl and python than clojure and haskell, although much like clojure, the docs often make reference/comparison to Java code – not much use if you don’t know Java. It also has two types of ‘variable’ – val and var. var is an indicator that you can mutate while val indicates that the item pointed to is immutable (this isn’t totally true).

scala> var x = 1
x: Int = 1

scala> x = 2
x: Int = 2

scala> val x = 1
x: Int = 1

scala> x = 2
<console>:8: error: reassignment to val
       x = 2

Again, we define characters as a list of chars

val characters:List[Char] = List('a','b','b','c','c','c','d','d','d','d')
// the :List[Char] can be dropped
val characters = List('a','b','b','c','c','c','d','d','d','d')

With scala we can import mutable maps and then use a for loop to populate it

import scala.collection.mutable.Map
val mutableCounter = Map[Char,Int]()
for ( achar <- characters ) mutableCounter(achar) = (mutableCounter.getOrElse(achar,0) + 1)
println( mutableCounter )

and from this we get: Map(c -> 3, a -> 1, d -> 4, b -> 2). A value can be extracted from the map using the .get method – mutableCounter.get(‘a’). Ideally we want to avoid mutable state (it’s helpful when delving into the world of concurrent programming), to achieve this we can use immutable collections and foldLeft on the characters List.

import scala.collection.immutable.Map
val foldCounts = characters.foldLeft(Map[Char,Int]()) {
    (amap, achar) =>  amap ++ Map(achar -> (amap.getOrElse(achar, 0) + 1  )) }
//res4: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

but, you can make this look pythonic again, this time making use of the count function which belongs to List.

println( characters.count(_ == 'd') )

To do it for all chars

characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } 

to get a map, rather than List[(Char, Int)] wrap append .toMap on the end of the line

scala> characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } toMap
res10: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

I think that the fold is the best method to use although I haven’t tested it.

Clojure is a bit different, we don’t have the same options of vars and vals and, in addition, there are lots of parens to deal with!

Starting with the same thing – a vector of characters

(def characters [ \a \b \b \c \c \c \d \d \d \d])

you can check the class of characters by doing by typing the following at the REPL

(def characters [ \a \b \b \c \c \c \d \d \d \d])
(class \a)
(class characters)

you should see: java.lang.Character and clojure.lang.PersistentVector

It may be tempting to write a function that loops over our vector of characters and builds a dictionary of characters and their counts

(defn tally-characters-v1 [chars mydict]
  (if (empty? chars) mydict
      (let [achar (first chars)]
	 (rest chars)
	 (assoc mydict achar (inc (mydict achar 0)))))))
(println (tally-characters-v1 characters {}))

Copy and paste the above into a REPL and you should get a clojure.lang.PersistentArrayMap. This can be accessed by key value pairs, much as you can do in python:

user=> (counts \d)

That function called itself, plus an empty map had to be passed in. You can avoid this step by altering the declaration

(defn tally-characters-v1
  ([inchars] tally-characters [inchars {}])
  (tally-characters-v1 [inchars counts]) ...

Now the method can be called by passing just the characters vector.

As i understand it functions should make use of the recur special form rather than self-calls.

(defn tally-characters-v2 [chars]
  (loop [coll chars mydict {}]
    (if (empty? coll) mydict
	(recur (rest coll)
	       (assoc mydict (first coll)
		 (inc (mydict (first coll) 0)))))))

that does as expected and produces a clojure.lang.PersistentArrayMap, but why bother creating a function using defn when reduce and an anonymous function will do just fine?

(def counts (reduce
	     (fn [amap achar]
	       (assoc amap achar (inc (amap achar 0))))
	     {} characters))

but, you know we don’t even need to do that when writing code for use in real applications. Clojure has a function called frequencies that does this

(println (frequencies characters))


The code block below shows the latest version of frequencies as defined in clojure.core

(defn frequencies
  "Returns a map from distinct items in coll to the number of times they appear."
  {:added "1.2" :static true}
   (reduce (fn [counts x]
	     (assoc! counts x (inc (get counts x 0))))
	   (transient {}) coll)))

This code has not been checked for timing nor has any attempt been made to make it optimal or align with any programming best practice. It’s a collection of notes i made when learning something new.

Read a fasta file with Clojure

I am still not comfortable with clojure. I start to think it’s amazing and the something happens that sends me back to scala or more frequently python.

One thing that crops up fairly often is extracting a sequence from a fasta file. Sometimes I read the whole thing into memory and other times all I want is a single sequence. For S&Gs I wrote a little bit of clojure to do both (i am aware this is poor reinvention of the sequence extraction wheel).

The code looks like this:

(use '[] )
(use '[clojure.contrib.string :only [blank?]] )

(def filename "fasta.txt")

(defn read-fasta-file
  "Reads a .fasta file into memory"
  ( [c] (read-fasta-file c {} nil))
  ( [c sequences id]
      (let [cl (first c)]
	(if (empty? cl) sequences
	    (if (.startsWith cl ">")
	      (recur (rest c)
	      (recur (rest c)
		     (assoc sequences id (str (sequences id) (.trim cl)))

(defn extract-sequence
  ( [coll id] (extract-sequence coll id nil nil ))
  ( [coll id current-id sequence]
      (let [cl (first coll) fid (str ">" id)]
	(if (empty? coll)
	  (if (nil? sequence) nil
	      (apply str sequence))
	   (.startsWith cl ">") (recur (rest coll) id cl sequence)
;	   (blank? cl) (do (println cl) (recur (rest coll) id cl sequence))
	   :else (if (.startsWith current-id fid)
		   (recur (rest coll) id current-id (str sequence cl ) )
		   (recur (rest coll) id current-id sequence)))))))

;read the file in a lazy way
(println ">" (with-open [rdr (reader "pdbaa")] (extract-sequence (line-seq rdr) "3JU9A")))

(defn fasta [] (with-open [rdr (reader filename)]
	       (read-fasta-file (line-seq rdr))))

It’s not too well thought out or understood! I think that the file is read lazily – clojuredoc annotations state that line-seq’ing a (reader file) is lazy. The functions aren’t lazy! read-fasta-file will store the entire fasta file as a dictionary/hashmap. extract-sequence will continue looping over the file content until EOF because there i didn’t include a ‘sequence found’ boolean to indicate if what I was looking for had been found.

If you’ve read this far and you are shaking your head & thinking why the f*ck has he done it like that, I would be really grateful if you could provide feedback – i would appreciate it.

maybe this code is better for reading, once a resource containing the sequences has been acquired:

(def fasta [">cat" "123" "456", ">dog" "567", "890"])

(defn combine [m k v]
  (assoc m k (str (m k) v)))

(defn parseFasta [content]
  (loop [c content
	 m {}
	 cid nil]
    (if (empty? c) m
	(if (.startsWith (first c) ">")
	  (let [id (first c)]
	    (recur (rest c) (assoc m id "") id))
	  (recur (rest c) (combine m cid (first c)) cid)))))

I don’t think this will benefit from wrapping with lazy-seq, but I am not sure, I really have to focus more one clojure.

Setup Clojure Snow Leopard


In 2013 you can now do:

brew install lein.  You can then start a clojure repl by typing ‘lein repl’, to start a project type ‘lein new my-project’.

Don’t waste time reading the rest of this.





Setting up clojure on the mac is fairly simple:
1. use port/homebrew
2. download (svn/git/zip)

If using (2) it is necessary to create a script to startup the REPL or run scripts with *command-line-args*. It’s also a good idea to grab jline to make the REPL a tad more user friendly.

The script used to run clojure, as clj, looks like this:

JLINE=$CLOJURE_HOME/jline-0_9_5.jar #if you got it
if [ -f .clojure ];
	CP=$CP:`cat .clojure`

if [ -z "$1" ]; then
	java -server -cp $CP jline.ConsoleRunner clojure.main
	java -server -cp $CP clojure.main $*

Add an alias to this script in the .bashrc/.bash_profile

alias clj=/Users/biomunky/svn/clojure/

Source your file and *boom* working clojure.

It’s also nice to have an editor, macvim/vim serves the purpose. First go and grab vim-clojure and copy the files to .vim/. It should look something like this in your .vim/

biomunky@joshua:.vim$ ls *

candycode.vim wombat.vim


clojure.vim scala.vim

clojure     clojure.vim

clojure.vim scala.vim

clojure.vim scala.vim

Then, to get some nice rainbow parentheses, add the following to the .vimrc file in ~/

syntax on
set title
set ttyfast
set smartindent
set number
" Settings for VimClojure
let g:clj_highlight_builtins=1      " Highlight Clojure's builtins
let g:clj_paren_rainbow=1           " Rainbow parentheses'!
" because light colors on a dark background are easy on the eye
colorscheme wombat

copy and paste the following code into a text file, save it as pdb-fetch.clj and then execute it clj pdb-fetch.clj 1hnn, about 20 seconds later a fasta sequence file will appear in the same directory … woo hoo

(use '[ :only (write-lines read-lines)])
(def pdbid ( .substring (nth *command-line-args* 0) 0 4 ) )
(def url (str  "" pdbid ))
(def output (str pdbid ".fasta"))
(write-lines output  (read-lines ( url)))

The output:


the above code (probably) isn’t idiomatic, this is a learning experience – feel free to complain/suggest changes/fixes.

Fetch protein sequences from the PDB using Scala (and clojure)

I am trying to use scala in day to day work, I find that using one language (poorly) gets boring. Typically I use python to grab data from the PDB using URLLIB, this can be achieved as follows:

url = '" % s
content = urllib.urlopen(url).readlines()

It’s very simple, requiring a single import (urllib). The scala code isn’t as succinct but clearly shows that scala is a viable scripting alternative.

exec scala "$0" "$@"

object GetSeqs{
	def main(args: Array[String]) = {
		if (args.length != 1) {
			println("Gimme a PDBID")
		val id = args(0)
		val url = new URL(String.format("",id))
		val output = new FileWriter(id + ".fasta")
		for (line <- fromInputStream(url.openStream).getLines ) {
GetSeqs.main( args)

The first three lines allow you to call this code as you would any other script (you don’t need to compile this code). The rest is fairly self explanatory: create a url; open and file to write to; read a stream and write the data to the output file; close the file.

The result I get, when run as: sh seq-fetch.scala 1hnn


The same can be accomplished using clojure with a couple of java functions, for lack of clojure knowledge:

(use '[ :only (write-lines read-lines)])
(def pdbid ( .substring (nth *command-line-args* 0) 0 4 ) )
(def url (str  "" pdbid ))
(def output (str pdbid ".fasta"))
(write-lines output  (read-lines ( url)))

To run this script, using a couple of environment variables, I typed java -cp $CLJ_JAR:$CONTRIB_JAR clojure.main fetch-and-write-a-pdb-seq.clj 1hnnA. The output is exactly the same as above.

Read a file using Clojure

I’ve modified the post, the best way to read files is to use the rather than the contrib duck streams

(use '[ :only (reader)])
(line-seq (reader "/Users/biomunky/.emacs"))

If you are going to do it like this wrap the code in a with-open, like so

(with-open [rdr (reader "file")]
   (println (line-seq rdr)))

if you enter the above code into the REPL, change “file” to point to something on your machine and hit enter you should see the content of the file. The with-open (clojuredocs) makes sure that the files you access are closed properly.

If you aren’t aware of clojuredocs yet visit the site here – the examples are very handy.


To read a file with clojure use the duck-streams library from clojure.contrib. The first method uses the read-lines function. It can be imported as follows:

(use '[ ] )
; or import only the read macros
(use '[ :only (read-lines)])

then to read a file:

(read-lines ".emacs")

In the REPL this will dump your data. If you want to dump the content of the file from a script you will need to wrap (println) around the statement.

If you want to take a java-esq approach, using buffered reader, you would have had to do it like this:

(import '( BufferedReader FileReader))
(def br ( new BufferedReader (new FileReader "/Users/biomunky/.emacs")))
(println (line-seq br))

A Distance Calculation


from math import sqrt

a = [1, 2, 3]
b = [3, 2, 1]

sqrt( sum( [ pow( (a[i]-b[i]), 2) for i in range( len(a) )  ] ) )

# better
def diff_square(a,b):return pow((a-b),2)
sqrt(sum(map (diff_square, a, b ) ) )


val a = List(1,2,3)
val b = List(3,2,1)
Math.sqrt(( map { case(a: Int, b: Int) => ( Math.pow( (a-b),2) )} ).reduceLeft(_+_) )


(def a [1 2 3])
(def b [3 2 1])
(Math/sqrt( reduce + ( for [ i (range 0 (count a) ) ] (Math/pow(-(nth a i) (nth b i)) 2) ) ) )

;alternative? - it appears to work.
(Math/sqrt (reduce +  (map #( Math/pow (- %1 %2) 2  ) a b ) ))

The result of all of these operations should be: ~2.83

I suspect neither the clojure or scala methods are optimal. The order of the scala calls doesn’t seem very scala-ish while the use of the for comprehension in clojure also feels wrong.