scala

Reverse Array/List: Interview Question

I was having a chat with a couple of guys at work about interviews and their experiences at various companies and there were a couple of common problems that popped up: fizzbuzz, reverse an array and which number has been removed from the array – you are allowed only one pass and you know what the supposed total of the complete array is.

I have an entry on here somewhere about fizzbuzz so won’t go over it.  The number missing from the array is simple: sum the values left in the array and subtract from the total.

Reverse an array is a bit more fun than the missing number and is more about recursion than fizzbuzz.  Since it’s sunday night, I am somewhat bored and my wife is looking at cat pictures, I figured I would chuck out a couple of implementations in a couple of languages.

Elixir- based on the Erlang vm has ruby-ish syntax & looks like a really nice new language

defmodule Hack do
  def reverse_list([h|t], acc // []) do
    Hack.reverse_list(t, List.concat([h], acc))
  end

  def reverse_list([], acc) do
    acc
  end
end

This code requires that you compile the code first using elixrc and fire up iex. You can run it as a script (.exs) as shown here, but I’ve got a couple of issues with it (see code annotations)

The same can be achieved using foldl

List.foldl([1,2,3], [], fn (i, acc) -> List.concat([i], acc) end)

Scala- based on the Java vm

def reverser[T](x:List[T]): List[T] = x match {
    case Nil => List.empty
    case x::xs => reverser(xs) :+ x
}

you can also do this with a foldLeft

List(1,2,3).foldLeft(List[Int]())( (a, b) => b +: a)

Clojure – based on the java vm

(defn reverser [coll] (reduce conj '() coll))

Fizzbuzzing hangover

class Fizzbuzz {
  def fizzbuzz(n:Int) = (n % 3, n % 5) match {
      case (0,0) => 'fizzbuzz
      case (0,_) => 'fizz
      case (_,0) => 'buzz
      case _     => n
    }
}

a test

import org.scalatest.FlatSpec
import org.scalatest.matchers.ShouldMatchers

class FizzbuzzTest extends FlatSpec with ShouldMatchers {
  val fb = new Fizzbuzz

  "Fizzbuzz.fizzbuzz(number:Int)" should "return 'fizz if number % 3 == 0" in {
    fb.fizzbuzz(3) should equal ('fizz)
  }
  it should "return 'buzz if number % 5 == 0" in {
    fb.fizzbuzz(5) should equal ('buzz)
  }
  it should "return 'fizzbuzz if number % 5 and 3 == 0" in {
    fb.fizzbuzz(15) should equal ('fizzbuzz)
  }
  it should "return number if number % 5 && 3 != 0" in {
    fb.fizzbuzz(1) should equal (1)
  }
}

compile Fizzbuzz: scalac Fizzbuzz.scala
compile FizzbuzzTest: scalac -cp .:scalatest-1.7.2/scalatest-1.7.2.jar FizzbuzzTest.scala
run the test: scala -cp .:scalatest-1.7.2/scalatest-1.7.2.jar org.scalatest.tools.Runner -p . -o -s FizzbuzzTest

Run starting. Expected test count is: 4
FizzbuzzTest:
Fizzbuzz.fizzbuzz(number:Int) 
- should return 'fizz if a number % 3 == 0
- should return 'buzz if a number % 5 == 0
- should return 'fizzbuzz if a number % 5 and 3 == 0
- should return number if a number % 5 && 3 != 0
Run completed in 191 milliseconds.
Total number of tests run: 4
Suites: completed 1, aborted 0
Tests: succeeded 4, failed 0, ignored 0, pending 0
All tests passed.

If you don’t want to use the Runner then you can start a REPL with the -cp to include . and scalatest. Then in the REPL type: (new FizzbuzzTest).execute() … it should look like this:

scala> (new FizzbuzzTest).execute()
FizzbuzzTest:
Fizzbuzz.fizzbuzz(number:Int) 
- should return 'fizz if a number % 3 == 0
- should return 'buzz if a number % 5 == 0
- should return 'fizzbuzz if a number % 5 and 3 == 0
- should return number if a number % 5 && 3 != 0

Also, if you like beer, I suggest getting some Wheat beer from Camden town brewery.

Playing with Scalatra, Jetty and EC2

Last week I met a friend before going to a Clojure function at Skills Matter in London. He’s a Ruby fanatic – Sinatra and Rails, they’re grrreeeaaaat. I agree with him, the little bits of rails and Sinatra I have done have been enjoyable (much like Flask and Django, without the fsking white space delimiting). Somehow we got started on Scala, that led to Scalatra – the Scala version of Sinatra. It looks interesting and isn’t Lift, figured it would be worth a quick look.

I downloaded the latest sbt and created an alias in my bash_profile as sbt2 (I have an older version of sbt too).

alias sbt2=”java -Xmx1500M -jar /Users/biomunky/svn/sbt-launch.jar $@”

I then followed the instructions here to download it via sbt (not g8).

This gives you a blank-ish template. I modified src/main/scala/.scala – it should be the only file in the directory. To get something up and running quickly I made it look like this:

import org.scalatra._
import java.net.URL
import scalate.ScalateSupport

class PlayTimeServlet extends ScalatraServlet with ScalateSupport {
  get("/") {
    <html>
      <body>
        <h1>Hello Internet Person.</h1>
				<p>Would you like to try some <a href="form">simple maths?</a></p>
      </body>
    </html>
  }
	
	get("/form") {
		<form action="/form/math" method="post">
				<input name="first_value"  type="text" value="" />
				<select name="operation">
				  <option value="plus">+</option>
				  <option value="minus">-</option>
				  <option value="divide">/</option>
				  <option value="multiply">*</option>
				</select>
				<input name="second_value" type="text" value="" />
	      <input type="submit" value="Submit" />					
	   </form>
	}

	post("/form/math") {
		val param1 = params.getOrElse("first_value",  "0")
		val param2 = params.getOrElse("second_value", "0")
		val operation = params.getOrElse("operation", "plus")
		val result = try { 
			operation match {
				case "plus"     => (param1.toDouble + param2.toDouble).toString 
				case "minus"    => (param1.toDouble - param2.toDouble).toString 
				case "multiply" => (param1.toDouble * param2.toDouble).toString 
				case "divide"   => if (param2.toDouble > 0.0) { 
						(param1.toDouble / param2.toDouble).toString
					} else {
						"Divide by Zero!"
					}
			}	
		} catch { 
			case _ => "I can't do that math." 
		}
        <html>
          <body></body>
          <head>
		    <p>The result is: {result}</p>
          </head>
        </html>
	}
			
  notFound {
    // Try to render a ScalateTemplate if no route matched
    findTemplate(requestPath) map { path =>
      contentType = "text/html"
      layoutTemplate(path)
    } orElse serveStaticResource() getOrElse resourceNotFound() 
  }
}

If you start the Jetty and navigate to localhost:8080 you should see a page that reads “Would you like to try some simple maths?” – how very exciting!

Clicking the link will get a web form. Fill the details 2 + 2. Click submit. A POST! You should now see the result of the expression.

So that’s all fine on localhost, I wanted to see it outside sbt in another jetty server – specifically something I had on ec2. This means creating a .war file – this is achieved by running sbt(2) package. The war can be found under target/scala-2.9.1/.

Get a copy of jetty (tar zxvf ) and place the war under webapps/ in the jetty home directory. Start jetty by entering: java -jar start.jar. Browse to localhost:8080 and you will see the same site as above. Enter the same data (or whatever expression you want) and you should see the same result — i lied, it should’ve crashed, if it didn’t – goody for you.

I had to change a couple of things. I removed webapps/test.war and it’s context file found in contexts/ (don’t delete it just yet). I then created a file called playtime.xml in contexts/ -> playtime is the name of my project (the war is also called playtime.war). Rename the test.xml context to .war. Change line 23
from

/webapps/test.war to /webapps/playtime.war.

If you retry submitting the form, having restarted the server, you should now get the results page rather than the error.

Since I’ve been goofing about with ec2 (because you can get some stuff free for a year) I dumped it there. If you do this, you will also have to open your ec2 instance to port 8080. You can do this via: aws management console -> Network & Security -> Security Groups -> your-active-security-profile.

python, scala and clojure – old notes.

Below are some notes I made when initially looking at Scala and Clojure.
(List(1,2,3) :\ 0.0) (_+_)

Where’s my for loop?

Counting residues … start with a list of characters something like in the case of python.

characters = ['a','b','b','c','c','c','d','d','d','d']

What we want is a count of the number of each residue, therefore what we want is a mutable data structure – specifically an empty dictionary (also called associative arrays, maps, hashmap, kevin) and a for loop.

counts = {}
for achar in characters:
    if character in counts:
        counts[achar] += 1
    else:
        counts[achar] = 1

great, counts is a dictionary, which can also be declared using counts = dict(). We then loop over each character (achar) in characters, if the counts dictionary contains achar the count is incremented (+= 1) otherwise a new key is created in the dictionary and associated with the value 1 (counts[achar] = 1).

the output of this will be a populated dictionary

{'a': 1, 'c': 3, 'b': 2, 'd': 4}

the same outcome can be reached using fewer lines of code:

counts = {}
for achar in characters:
	counts[achar] = counts.get(achar, 0) + 1

This time we used the dictionary method .get. The method returns the value for the key if it’s in the dictionary and 0 if it isn’t. Without the 0 .get would return None – you can’t add None and 1!

So we’ve already got what we wanted but there’s yet another way – we could use a for comprehension and a list method .count to do the work

>>> characters.count('d')
4

Wrap the .count call in a for comprehension

>>> counts = [ (x, characters.count(x)) for x in set(characters)]

The result is however different. We are presented with an array of tuples

[('a', 1), ('c', 3), ('b', 2), ('d', 4)]

but we wanted a dictionary, well you can just wrap the comprehension in a dict()

>>> counts = dict([ (x, characters.count(x)) for x in set(characters)])
{'a': 1, 'c': 3, 'b': 2, 'd': 4}

We can also do this using a reduce and a function (since an assignment can’t be made directly in the lambda, we call with another function !!). This isn’t something to do in ‘real’ code – but may help with understanding clojure/scala

def addToDict( amap, achar ): 
	amap[achar] = amap.get(achar, 0) + 1
	return amap

counts = reduce( lambda x,y: addToDict(x,y), characters, {} )
print counts

So, first we define a function that takes a dictionary (called amap) and a character (achar), sets/increments the count and returns the dictionary – it’s identical to the example above – except that it now carries extra overhead. We then use the global reduce function and a lambda to call the function with the dictionary (the {} at the end of the reduce line) and each character in characters until the entire sequence has been covered.

If the above isn’t obvious then perhaps the following will help:

>>> thing_to_reduce = [1,2,3]
>>> the_function_to_apply = lambda i, running_total: running_total + i
>>> xsum = reduce( the_function_to_apply, thing_to_reduce, 0)
6

Should you do this in your code? No, probably not! So why bother? It may be helpful in working out how to achieve the same thing in the functional languages that all the cools kids are using.

# SCALA #

It’s perhaps less of a departure from languages like perl and python than clojure and haskell, although much like clojure, the docs often make reference/comparison to Java code – not much use if you don’t know Java. It also has two types of ‘variable’ – val and var. var is an indicator that you can mutate while val indicates that the item pointed to is immutable (this isn’t totally true).

scala> var x = 1
x: Int = 1

scala> x = 2
x: Int = 2

scala> val x = 1
x: Int = 1

scala> x = 2
<console>:8: error: reassignment to val
       x = 2
         ^

Again, we define characters as a list of chars

val characters:List[Char] = List('a','b','b','c','c','c','d','d','d','d')
// the :List[Char] can be dropped
val characters = List('a','b','b','c','c','c','d','d','d','d')

With scala we can import mutable maps and then use a for loop to populate it

import scala.collection.mutable.Map
val mutableCounter = Map[Char,Int]()
for ( achar <- characters ) mutableCounter(achar) = (mutableCounter.getOrElse(achar,0) + 1)
println( mutableCounter )

and from this we get: Map(c -> 3, a -> 1, d -> 4, b -> 2). A value can be extracted from the map using the .get method – mutableCounter.get(‘a’). Ideally we want to avoid mutable state (it’s helpful when delving into the world of concurrent programming), to achieve this we can use immutable collections and foldLeft on the characters List.

import scala.collection.immutable.Map
val foldCounts = characters.foldLeft(Map[Char,Int]()) {
    (amap, achar) =>  amap ++ Map(achar -> (amap.getOrElse(achar, 0) + 1  )) }
//res4: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

but, you can make this look pythonic again, this time making use of the count function which belongs to List.

println( characters.count(_ == 'd') )

To do it for all chars

characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } 

to get a map, rather than List[(Char, Int)] wrap append .toMap on the end of the line

scala> characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } toMap
res10: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

I think that the fold is the best method to use although I haven’t tested it.

# CLOJURE #
Clojure is a bit different, we don’t have the same options of vars and vals and, in addition, there are lots of parens to deal with!

Starting with the same thing – a vector of characters

(def characters [ \a \b \b \c \c \c \d \d \d \d])

you can check the class of characters by doing by typing the following at the REPL

(def characters [ \a \b \b \c \c \c \d \d \d \d])
(class \a)
(class characters)

you should see: java.lang.Character and clojure.lang.PersistentVector

It may be tempting to write a function that loops over our vector of characters and builds a dictionary of characters and their counts

(defn tally-characters-v1 [chars mydict]
  (if (empty? chars) mydict
      (let [achar (first chars)]
	(tally-characters-v1
	 (rest chars)
	 (assoc mydict achar (inc (mydict achar 0)))))))
	
(println (tally-characters-v1 characters {}))

Copy and paste the above into a REPL and you should get a clojure.lang.PersistentArrayMap. This can be accessed by key value pairs, much as you can do in python:

user=> (counts \d)
4

That function called itself, plus an empty map had to be passed in. You can avoid this step by altering the declaration

(defn tally-characters-v1
  ([inchars] tally-characters [inchars {}])
  (tally-characters-v1 [inchars counts]) ...

Now the method can be called by passing just the characters vector.

As i understand it functions should make use of the recur special form rather than self-calls.

(defn tally-characters-v2 [chars]
  (loop [coll chars mydict {}]
    (if (empty? coll) mydict
	(recur (rest coll)
	       (assoc mydict (first coll)
		 (inc (mydict (first coll) 0)))))))

that does as expected and produces a clojure.lang.PersistentArrayMap, but why bother creating a function using defn when reduce and an anonymous function will do just fine?

(def counts (reduce
	     (fn [amap achar]
	       (assoc amap achar (inc (amap achar 0))))
	     {} characters))

but, you know we don’t even need to do that when writing code for use in real applications. Clojure has a function called frequencies that does this

(println (frequencies characters))

nice.

The code block below shows the latest version of frequencies as defined in clojure.core

(defn frequencies
  "Returns a map from distinct items in coll to the number of times they appear."
  {:added "1.2" :static true}
  [coll]
  (persistent!
   (reduce (fn [counts x]
	     (assoc! counts x (inc (get counts x 0))))
	   (transient {}) coll)))

This code has not been checked for timing nor has any attempt been made to make it optimal or align with any programming best practice. It’s a collection of notes i made when learning something new.

Scala mergesort

My attempt at mergesort in scala. My two previous versions weren’t great, one resulted in a stack overflow when a list > 10000 was passed (using vals) the other used vars! By switching the List[Int] to a Stream[Int] the stack overflow can be avoided, however I am not sure if this is the right thing to do. Given that I am a molecular biologist, rather than a computer scientist, I am just happy that it works as expected.

def merge(left: List[Int], right: List[Int]): Stream[Int] = {
  (left, right) match {
    case (x :: xs, y :: ys) if x &lt; y =&gt; x #:: merge(xs, right)
    case (x :: xs, y :: ys) =&gt; y #:: merge(left, ys)
    case _ =&gt; if (left.isEmpty) right.toStream else left.toStream
  }
}
def mergesort(input: List[Int]): List[Int] = {
  input match {
    case Nil | List(_) =&gt; input
    case _ =&gt;
      val middle:Int = input.length / 2
      val (left, right) = input splitAt middle
      merge( mergesort(left), mergesort(right) ).toList
  }
}

There is a post on Stack Overflow about mergesort (in scala), it’s short but probably worth looking at if you want better, generic implementation.

Taken from the stack overflow post linked above:

def msort[T](less: (T, T) => Boolean) (xs: List[T]): List[T] = { 
  def merge(left: List[T], right: List[T]): Stream[T] = (left, right) match {
    case (x :: xs, y :: ys) if less(x, y) => Stream.cons(x, merge(xs, right))
    case (x :: xs, y :: ys) => Stream.cons(y, merge(left, ys))
    case _ => if (left.isEmpty) right.toStream else left.toStream
  }
  val n = xs.length / 2 
  if (n == 0) xs 
  else { 
    val (ys, zs) = xs splitAt n 
    merge(msort(less)(ys), msort(less)(zs)).toList
  } 
}

// use it ...
val d = List(5.,4.,3.,2.,1.)
println( msort( (x:Double, y:Double) => x < y )(d) )

val i = List(5,4,3,2,1)
println( msort( (x:Int, y:Int) => x < y )(i) )

Scala fetch a web page within time limit

Someone asked me to write a quick script to pull xml records from pubmed and extract various chunks of data, I figured scala was the way to go. Five minutes later they asked me to include a timeout – something I haven’t required before! Typically I don’t bother with timeouts and would use something like:

import scala.io.Source.{fromInputStream}
import java.net._

val url = new URL("http://btareawasteofspace.com")
val content = fromInputStream( url.openStream ).getLines.mkString("\n")

that chunk of code will import the required libraries, connect to a resource, get the content and chuck it into a string.

This time, instead of doing the above, I made use of the scala.io.Source library and URLConnection – to set the timeout and retrieve content:

import scala.io.Source.{fromInputStream}
import java.net._

val url = new URL( seqUrl )
val urlCon = url.openConnection()
urlCon.setConnectTimeout(2000)
urlCon.setReadTimeout( 1000 )
val content = fromInputStream( urlCon.getInputStream ).getLines.mkString("\n")

As far as I understand it, the URLConnection allows me to define how long the connection will wait for a request. The timeout argument (an int) is in milliseconds and a failure to get data before the allocated period results in a java.net.SocketTimeoutException.

scala swing table model

I’ve been playing with scala.swing as an alternative to java.swing to develop a bit of code for spectral analysis. I’ve not had much experience with java GUI work before but the stuff I have been exposed to hasn’t been fun. Scala however is fun to write, so i figured I would take a look at it as an alternative.

To GUI I need to develop needs to look identical to the one already in use, this means dumping some of the data into a table. When I was looking into this, the doc for scala.swing wasn’t very good but I did find the default table model…

import scala.swing._
import javax.swing.table._

object MyTable extends SimpleSwingApplication {
  def top = new MainFrame {
    title = "MyTable"
    preferredSize  = new Dimension( 100,100 )
    val tableModel = new DefaultTableModel( new Array[Array[AnyRef]](0,2), Array[AnyRef]("A", "B") )
    val table      = new Table( 1, 2 ) { model = tableModel }
   
    for ( i <- 0 to 10 ) { 
		tableModel.addRow( Array[AnyRef]( i.asInstanceOf[AnyRef], (i*2).asInstanceOf[AnyRef] ) ) 
	}

    contents = new BorderPanel {
      import BorderPanel.Position._
      add(new ScrollPane(table), Center)
    }
  }
}

Apparently this isn’t the ideal way to implement a table. Instead a new class should be implemented that extends the AbstractTableModel:

class MyTableModel( var rowData: Array[Array[Any]], val columnNames: Seq[String] ) extends AbstractTableModel {
  override def getColumnName( column: Int) = columnNames(column).toString
  def getRowCount() = rowData.length
  def getColumnCount() = columnNames.length
  def getValueAt( row: Int, col: Int): AnyRef = rowData(row)(col).asInstanceOf[AnyRef]
  override def isCellEditable( row: Int, column: Int) = false
  override def setValueAt( value: Any, row: Int, col: Int) {
    rowData(row)(col) = value
  }    
  def addRow( data: Array[AnyRef]) {
    rowData ++= Array(data.asInstanceOf[Array[Any]])
  }
}

Then, when creating the table model:

val tableModel = new MyTableModel( Array[Array[Any]](), List("A","B") )

When used instead of the defaultTableModel, the code looks something like this:

import scala.swing._
import javax.swing.table._

object MyTable extends SimpleSwingApplication {
  def top = new MainFrame {
    title = "MyTable"
    preferredSize  = new Dimension( 100,270 )
    val tableModel = new MyTableModel( Array[Array[Any]](), List("A","B") )
    val table      = new Table( 1, 2 ) { model = tableModel }
   
    for ( i <- 0 to 10 ) { tableModel.addRow( Array[AnyRef]("i", "j") ) }

    contents = new BorderPanel {
      import BorderPanel.Position._
      add(new ScrollPane(table), Center)
    }
  }
}

The result should be a table in the top left hand corner of the monitor – how very exciting. A disadvantage of using scala.swing instead of java.swing is the lack of a netbeans-like environment that helps out (I am not familiar with GUI development in either language – as is probably obvious to someone who is). Life can be made a little easier using the scala eclipse plugin rather than emacs. I am also a fairly basic level user of scala & java.

Pubmed fetch xml entry by ID

It wasn’t clear how to grab a pubmed entry in xml format given an ID. This is how to do it:

import scala.io.Source.{fromFile, fromInputStream}
import scala.xml._
import java.net._
import java.io._

val idFile = "medline_ids.txt"

val ids = fromFile( idFile ).getLines()

for ( i <- 0 until 1 ) {
  val url = new URL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=" + ids.next + "&retmode=xml")
  println( url )
  val inStream = fromInputStream(url.openStream).getLines.mkString("\n")
  println( inStream ) 

}

The important part is this:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=20980263&retmode=xml

rather than http://www.ncbi.nlm.nih.gov/pubmed/20980263?report=xml which simply forces the output to look like xml. If you fetch the report=xml code using scala/curl you will see that all the are represented by their html codes – not nice.

MongoDB scala – CASBAH

Playing with scala/mongodb for fun. The docs for mongodb suggest that adding

val casbah = "com.novus" % "casbah_2.8.0" % "1.0.8.5"

to the build project should be enough to get casbah included in the current project. I get no end of errors when I attempt to use this. I used a popular web search tool to find that

val bumRels = "bum-releases" at "http://repo.bumnetworks.com/releases" 
val bumSnaps = "bum-snapshots" at "http://repo.bumnetworks.com/snapshots" 
val casbah = "com.novus" % "casbah_2.8.0" % "1.0.8.1"	

allows sbt to pull casbah in. I haven’t actually tried doing anything with it yet – mainly because I keep running back to the safe world of python or playing in the shell.