Author: biomunky

About biomunky

I have a PhD in Computational & Mathematical Biochemistry. I like Python, Scala, Clojure and am developing a crush on Elixir. Oh, beer is good too. I apply machine learning algorithms to all sorts of biological problems and make the odd graph.

Game of Beer

Conway’s Game of Life

The Game of Life is a cellular automaton devised by a British Mathematician, it requires no players and depends only on an initial seed state. Like many other games there are rules:

1. Any live cell with fewer than two live neighbour cells dies.
2. Any live cell with two or three live neighbour cells live to the next round.
3. Any live cell with more than three live neighbours dies.
4. Any dead cell with three neighbours becomes live, this is like reproduction according to wikipedia.

There are a number of patters associated with GOL, some are static, others are dynamic. A couple of examples are the beehive (static)

------
--xx--
-x--x-
--xx--
------

the boat (static)

-----
-xx--
-x-x-
--x--
-----

and the blinker (dynamic)

-----  -----
--x--  -----
--x--  -xxx-
--x--  -----
-----  -----

Like the Roman Numeral problem [https://biomunky.wordpress.com/2014/03/22/roman-numerals-the-default-dojo-option/], the Game of Life can frequently be found scrawled on the coding dojo whiteboard, indeed I’ve tried the problem before at one of the London Clojure Dojos [https://github.com/biomunky/ldnclj-dojo-team-5-game-of-life].

An attempt with Scala

Since the problem is generally about cells, my starting point is just that, a cell:

case class Cell(x: Int, y: Int)

The cell is the center of our world! It contains an x:Int – the row – and y: Int – the column. For every cell in our gamewe need to be able to get its neighbours, for this we define a function … let’s call it neighbours. Before we do this, we should sketch out what we want.

Given a simple 5×5 board where the central cell [2,2] is alive:

- - - - - 
- - - - -
- - X - -
- - - - -
- - - - -

what’s going on around the center cell? We can redraw the grid and make it look like a simple list of lists …

[0,0] [0,1] [0,2] [0,3] [0,4]
[1,0] [1,1] [1,2] [1,3] [1,4]
[2,0] [2,1] [2,2] [2,3] [2,4]
[3,0] [3,1] [3,2] [3,3] [3,4]
[4,0] [4,1] [4,2] [4,3] [4,4]

If we focus in on 2,2 we have:

[1,1] [1,2] [1,3]
[2,1] [2,2] [2,3]
[3,1] [3,2] [3,3]

If we look hard at this it becomes clear that there’s a certain pattern in the surrounding cells

[-1,-1][-1,0][-1, 1]
[ 0,-1][ 2,2][ 0, 1]
[ 1,-1][ 1,0][ 1, 1]

We can use this to create our neighbour function – my initial version looked as follows:

def neighbours(cell: Cell): List[Cell] = {
	val positions = List(-1, 0, 1)
	positions flatMap (x =>
		positions.map (y => Cell(cell.x+x, cell.y+y))
	) filterNot (_ == cell )
}

The function takes the cell of interest and calculates the neighbourhood, once all the cell positions (including the ‘cell’ position) we filter out our input cell. So that’s flatMap map and filterNot! Too many things in one place, it’s probably possible to clean that up.

Yup, it is. Rather than using the 3 functions we can use a for comprehension with a filter to only yield the cells we want. This simplifies the previous function to:

  def neighbours(cell: Cell): List[Cell] = {
    val positions = List(-1, 0, 1)
    for (i <- positions; j <- positions; if Cell(cell.x + i, cell.y + j) != cell)
      yield Cell(cell.x+i, cell.y+j)
  }

If you copy that function into the repl and run it with as follows:

neighbours(Cell(2,2))

you will get

List[Cell] = List(Cell(1,1), Cell(1,2), Cell(1,3), Cell(2,1), Cell(2,3), Cell(3,1), Cell(3,2), Cell(3,3))

and that matches the 3×3 grid above – woo hoo.

What we need to do now is work out if the cell is going to live on or die off. However, we aren’t going to be dealing with one cell, but rather a collection of cells – otherwise it’s going to be a very short Game of Life.

To help with this problem I created another case class, which for the current purpose looks as follows:

case class Grid(x: Int, y: Int, livingCells: List[Cell])

The x and y are the size of the grid and livingCells are our seed cells – remember that we only care about the LIVING cells. So, first things first, given a cell, we need to establish it’s local environment – a count of the living cells – but, we need to do it for multiple cells in one sitting. How?

  def candidates(livingCells: List[Cell]): Map[Cell, Int] = {
    livingCells
      .flatMap(neighbours)
      .foldLeft(Map.empty[Cell, Int]) { (amap, acell) => amap ++ Map(acell -> (amap.getOrElse(acell, 0) + 1)) }
  }

The first step is to calculate the neighbourhood – we use the `neighbours` function defined above. To apply it to all cells in the living list and return a simple flattened list, we use flatMap. This function returns a list of all potential cells and the number of count of living neighbours, something like:

Map(...
        Cell(0,3)   -> 1,
        Cell(3,0)   -> 1,
        Cell(1,1)   -> 4,
        Cell(1,-1)  -> 2,
    ...)

The map is going to contain cells that don’t exist! That kind of sucks, but it isn’t the end of the world, as I stated above, we (I) need two functions here.

Enter the second function: step. Step is as follows:

  def step(grid: Grid) = {
    val newCells = candidates(grid.livingCells).filter { case (loc, living) =>
      living == 3 || living == 2 && grid.isAlive(loc)
    }
    .keys
    .toList
    grid.copy(livingCells = newCells)
  }

The step function takes our grid – the list of living cells – and returns a new grid with our updated cells. The first step is to calculate the possible candidates, it then filters the map so that we are left with only cells that have a count of 3 (living cells) or 2 living cells and where the cell is currently alive. Since this will return a Map of cells and counts matching our criteria, we want to keep only the keys (the grid coordinates). We can get only the living cells by calling .keys on the filtered map. This is then converted to a list and a copy of our existing grid is created – with new living cells.

But wait, there’s no .isAlive!?!?

We need to update the grid class. Copy-pastey the following function into the Grid case class

def isAlive(cell: Cell) = livingCells contains cell

This function returns true if the grid.livingCells contains the passed Cell.

I’ve quickly tested this code (only tried a 3×3 with a blinker) and pushed to github – take a look for tests and bad code :)

This post has been fuelled by Meantime Pale Ale and Brooklyn Lager.

Roman Numerals – the default dojo option

All code dojos start with roman numerals, it’s the default – it’s not exciting, but it serves a purpose – get your own idea onto the board. As such, I’ve only ever done the roman numerals dojo once, at a work scala dojo while I was attempting to debug some R, it wasn’t a great experience as I didn’t really get to participate.

Today I was woken up at about 6:30 by an annoying cat and, unable to get back to sleep, figured I would do something constructive – have a quick stab at doing the roman numeral thing.

So, my hack looks like this:

This will take us from “MCMLXXXIV” to 1984 – great. I recall in the scala dojo that some people took the option of reversing the sequence then mapping a function over the reverse, this saves the use of an accumulator – for some reason I don’t like doing the reverse then a map … it’s not like this is a chunk of code where performance is critical but …

Here’s an example of the kind of other solutions I’ve seen using reverse, they all use a helper like roman-helper, in the following case ‘add-numeral’

(defn roman [s]
    (reduce add-numeral (map numerals (reverse s))))

Talking to a friend at work, he came up with a slightly different version that removed the recur call – something I like to do. This allowed me to create another solution that I think is nicer – the complete gist can be found here on github.

The updated solution follows:

(defn roman-helper [[a b]]
  (let [curnt (roman-lookup a 0) nxt (roman-lookup b 0)]
  (if (> nxt curnt) (- curnt) curnt)))

(defn roman->decimal [roman]
  (let [as-seq (conj (into [] roman) nil)
        pairs (partition 2 1 as-seq)]
    (apply + (map roman-helper pairs))))

There are a couple of changes here. The first is a small change to the roman-helper function, the aggregate has been removed so the function simply returns it’s value or negative if the current letter is worth less than the next. The roman->decimal function has changed a little bit more, with most of the action happening in the let binding.

The first step

(conj (into [] roman) nil)

turns the string into a vector of characters and then slaps a nil on the end. I am not as bothered by this as the reverse because conj takes essentially constant time – not that this matters at all in this bit of code golf.

into vs (vec “XXX”)

when dealing with big collections, the use of (into [] “xxx”) is about 30% more performant than vec, this is due to the use of transients. Since we aren’t using a big collection here I could’ve written the block as

(conj (vec roman) nil)

but I don’t think it adds much to the clarity of the code.

The second step

(partition 2 1 as-seq)

we use the vector we created in step 1 to create a sequence of overlapping pairs. The 2 indicates the number of elements per partition and the 1 the number of steps we take. Clojure will take all partitions until we exhaust the sequence – without the nil we wouldn’t get the last pair.

The final step

(apply + (map roman-helper pairs))

Convert to numbers and add ’em up.

Going in the reverse direction.
I cheated. Clojure has pprint, which has a format function that allows us to create a partial function. The partial function can be called with a number e.g.
(decimal->roman 1984) and produce “MCMLXXXIV”.

Emacs all the things \o/

Emacs has great support for clojure. It’s also decent for python, scala and various other things I use most days of the week (protobuf, pig, sql). Since I want to do clojure fulltime – data engineering with scala/python isn’t bad, it’s just i really like clojure – i am starting to devote a small amount of time to learning emacs.

cli-all-the-things.jpeg

How?

I cloned metaweblog from punchagan/metawblog and added the following line to my ~/.emacs.d/user.el

 (setq load-path (cons "~/.emacs.d/plugins/metaweblog" load-path)) 

Next, I cloned the org2blog repo punchagan/org2blog and added more lines to user.el

(setq load-path (cons "~/.emacs.d/plugins/org2blog" load-path))
(require 'org2blog-autoloads) (setq org2blog/wp-blog-alist
'(("biomunky"
 :url "<a href="https://biomunky.wordpress.com/xmlrpc.php">https://biomunky.wordpress.com/xmlrpc.php</a>"
 :username "biomunky"
 :default-categories ("org2blog" "emacs")
          :tags-as-categories nil)))

Read the README.md at the above org2mode links to get an idea of what to do next

Reverse Array/List: Interview Question

I was having a chat with a couple of guys at work about interviews and their experiences at various companies and there were a couple of common problems that popped up: fizzbuzz, reverse an array and which number has been removed from the array – you are allowed only one pass and you know what the supposed total of the complete array is.

I have an entry on here somewhere about fizzbuzz so won’t go over it.  The number missing from the array is simple: sum the values left in the array and subtract from the total.

Reverse an array is a bit more fun than the missing number and is more about recursion than fizzbuzz.  Since it’s sunday night, I am somewhat bored and my wife is looking at cat pictures, I figured I would chuck out a couple of implementations in a couple of languages.

Elixir- based on the Erlang vm has ruby-ish syntax & looks like a really nice new language

defmodule Hack do
  def reverse_list([h|t], acc // []) do
    Hack.reverse_list(t, List.concat([h], acc))
  end

  def reverse_list([], acc) do
    acc
  end
end

This code requires that you compile the code first using elixrc and fire up iex. You can run it as a script (.exs) as shown here, but I’ve got a couple of issues with it (see code annotations)

The same can be achieved using foldl

List.foldl([1,2,3], [], fn (i, acc) -> List.concat([i], acc) end)

Scala- based on the Java vm

def reverser[T](x:List[T]): List[T] = x match {
    case Nil => List.empty
    case x::xs => reverser(xs) :+ x
}

you can also do this with a foldLeft

List(1,2,3).foldLeft(List[Int]())( (a, b) => b +: a)

Clojure – based on the java vm

(defn reverser [coll] (reduce conj '() coll))

Connecting Hive and R via MySQL using Sqoop

Rrrrrr – because pirates > ninjas

This is all kind of backwards, why would I want to take data out of cascalog/hdfs to MySQL and then back into hdfs/hive? Anyway, the general aim here it to get some data, from Hive, into R and do something with it.

Step 1: Get Data
In the first step we will grab some data from an existing database, created when doing a demo of cascalog, and load it into Hive using a tool called sqoop. Sqoop is a tool designed to move large quantities of data between hdfs and structured datastores.

Download sqoop and cd into bin/

./sqoop import --connect jdbc:mysql://localhost/cascalog  --username dan  --table unique_user_counts -m 1 --hive-import

This will import data from the cascalog MySQL database, specifically the unique_user_counts database. The –hive-import switch will place data into Hive (I installed Hive using Homebrew). The -m toggle defines the number of mappers the job will use, in this instance just 1.

You can take a peek at the data using the Hive repl.

λ bin  hive
hive> show databases;
OK
default
Time taken: 2.48 seconds
hive> use default;
OK
Time taken: 0.014 seconds
hive> show tables;
OK
unique_user_counts
Time taken: 0.286 seconds
hive> select * from unique_user_counts;
...

This dumps out the data. If you want to know more about your table then you can use the describe ‘table_name’ command.

To get data into R Hive must be started in server mode, this can be achieved by

hive --service hiveserver

The final step is R …

library(RJDBC)

hive_jars <-list.files("/usr/local/Cellar/hadoop/1.1.1/libexec", pattern="jar$", full.names=T)
hadoop_jars <- list.files("/usr/local/Cellar/hive/0.9.0/libexec/lib", pattern="jar$", full.names=T)

hive_driver <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver", c(hive_jars, hadoop_jars))

hive_conn <- dbConnect(hive_driver, "jdbc:hive://localhost:10000/default")

rs <- dbGetQuery(hive_conn,"select client, count from unique_user_counts")

x <- 1:length(rs$count)
plot(x, rs$count)

This gives me a very boring plot – it’s really of no interest at all, it’s marginally more interesting than doing a select count(*) from …

The *interesting* part here is the definition of the jars required to connect to Hive, this is largely identical to a previous post.