old notes

python, scala and clojure – old notes.

Below are some notes I made when initially looking at Scala and Clojure.
(List(1,2,3) :\ 0.0) (_+_)

Where’s my for loop?

Counting residues … start with a list of characters something like in the case of python.

characters = ['a','b','b','c','c','c','d','d','d','d']

What we want is a count of the number of each residue, therefore what we want is a mutable data structure – specifically an empty dictionary (also called associative arrays, maps, hashmap, kevin) and a for loop.

counts = {}
for achar in characters:
    if character in counts:
        counts[achar] += 1
        counts[achar] = 1

great, counts is a dictionary, which can also be declared using counts = dict(). We then loop over each character (achar) in characters, if the counts dictionary contains achar the count is incremented (+= 1) otherwise a new key is created in the dictionary and associated with the value 1 (counts[achar] = 1).

the output of this will be a populated dictionary

{'a': 1, 'c': 3, 'b': 2, 'd': 4}

the same outcome can be reached using fewer lines of code:

counts = {}
for achar in characters:
	counts[achar] = counts.get(achar, 0) + 1

This time we used the dictionary method .get. The method returns the value for the key if it’s in the dictionary and 0 if it isn’t. Without the 0 .get would return None – you can’t add None and 1!

So we’ve already got what we wanted but there’s yet another way – we could use a for comprehension and a list method .count to do the work

>>> characters.count('d')

Wrap the .count call in a for comprehension

>>> counts = [ (x, characters.count(x)) for x in set(characters)]

The result is however different. We are presented with an array of tuples

[('a', 1), ('c', 3), ('b', 2), ('d', 4)]

but we wanted a dictionary, well you can just wrap the comprehension in a dict()

>>> counts = dict([ (x, characters.count(x)) for x in set(characters)])
{'a': 1, 'c': 3, 'b': 2, 'd': 4}

We can also do this using a reduce and a function (since an assignment can’t be made directly in the lambda, we call with another function !!). This isn’t something to do in ‘real’ code – but may help with understanding clojure/scala

def addToDict( amap, achar ): 
	amap[achar] = amap.get(achar, 0) + 1
	return amap

counts = reduce( lambda x,y: addToDict(x,y), characters, {} )
print counts

So, first we define a function that takes a dictionary (called amap) and a character (achar), sets/increments the count and returns the dictionary – it’s identical to the example above – except that it now carries extra overhead. We then use the global reduce function and a lambda to call the function with the dictionary (the {} at the end of the reduce line) and each character in characters until the entire sequence has been covered.

If the above isn’t obvious then perhaps the following will help:

>>> thing_to_reduce = [1,2,3]
>>> the_function_to_apply = lambda i, running_total: running_total + i
>>> xsum = reduce( the_function_to_apply, thing_to_reduce, 0)

Should you do this in your code? No, probably not! So why bother? It may be helpful in working out how to achieve the same thing in the functional languages that all the cools kids are using.


It’s perhaps less of a departure from languages like perl and python than clojure and haskell, although much like clojure, the docs often make reference/comparison to Java code – not much use if you don’t know Java. It also has two types of ‘variable’ – val and var. var is an indicator that you can mutate while val indicates that the item pointed to is immutable (this isn’t totally true).

scala> var x = 1
x: Int = 1

scala> x = 2
x: Int = 2

scala> val x = 1
x: Int = 1

scala> x = 2
<console>:8: error: reassignment to val
       x = 2

Again, we define characters as a list of chars

val characters:List[Char] = List('a','b','b','c','c','c','d','d','d','d')
// the :List[Char] can be dropped
val characters = List('a','b','b','c','c','c','d','d','d','d')

With scala we can import mutable maps and then use a for loop to populate it

import scala.collection.mutable.Map
val mutableCounter = Map[Char,Int]()
for ( achar <- characters ) mutableCounter(achar) = (mutableCounter.getOrElse(achar,0) + 1)
println( mutableCounter )

and from this we get: Map(c -> 3, a -> 1, d -> 4, b -> 2). A value can be extracted from the map using the .get method – mutableCounter.get(‘a’). Ideally we want to avoid mutable state (it’s helpful when delving into the world of concurrent programming), to achieve this we can use immutable collections and foldLeft on the characters List.

import scala.collection.immutable.Map
val foldCounts = characters.foldLeft(Map[Char,Int]()) {
    (amap, achar) =>  amap ++ Map(achar -> (amap.getOrElse(achar, 0) + 1  )) }
//res4: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

but, you can make this look pythonic again, this time making use of the count function which belongs to List.

println( characters.count(_ == 'd') )

To do it for all chars

characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } 

to get a map, rather than List[(Char, Int)] wrap append .toMap on the end of the line

scala> characters.distinct map { achar => (achar, characters.count( _ == achar ) ) } toMap
res10: scala.collection.immutable.Map[Char,Int] = Map(a -> 1, b -> 2, c -> 3, d -> 4)

I think that the fold is the best method to use although I haven’t tested it.

Clojure is a bit different, we don’t have the same options of vars and vals and, in addition, there are lots of parens to deal with!

Starting with the same thing – a vector of characters

(def characters [ \a \b \b \c \c \c \d \d \d \d])

you can check the class of characters by doing by typing the following at the REPL

(def characters [ \a \b \b \c \c \c \d \d \d \d])
(class \a)
(class characters)

you should see: java.lang.Character and clojure.lang.PersistentVector

It may be tempting to write a function that loops over our vector of characters and builds a dictionary of characters and their counts

(defn tally-characters-v1 [chars mydict]
  (if (empty? chars) mydict
      (let [achar (first chars)]
	 (rest chars)
	 (assoc mydict achar (inc (mydict achar 0)))))))
(println (tally-characters-v1 characters {}))

Copy and paste the above into a REPL and you should get a clojure.lang.PersistentArrayMap. This can be accessed by key value pairs, much as you can do in python:

user=> (counts \d)

That function called itself, plus an empty map had to be passed in. You can avoid this step by altering the declaration

(defn tally-characters-v1
  ([inchars] tally-characters [inchars {}])
  (tally-characters-v1 [inchars counts]) ...

Now the method can be called by passing just the characters vector.

As i understand it functions should make use of the recur special form rather than self-calls.

(defn tally-characters-v2 [chars]
  (loop [coll chars mydict {}]
    (if (empty? coll) mydict
	(recur (rest coll)
	       (assoc mydict (first coll)
		 (inc (mydict (first coll) 0)))))))

that does as expected and produces a clojure.lang.PersistentArrayMap, but why bother creating a function using defn when reduce and an anonymous function will do just fine?

(def counts (reduce
	     (fn [amap achar]
	       (assoc amap achar (inc (amap achar 0))))
	     {} characters))

but, you know we don’t even need to do that when writing code for use in real applications. Clojure has a function called frequencies that does this

(println (frequencies characters))


The code block below shows the latest version of frequencies as defined in clojure.core

(defn frequencies
  "Returns a map from distinct items in coll to the number of times they appear."
  {:added "1.2" :static true}
   (reduce (fn [counts x]
	     (assoc! counts x (inc (get counts x 0))))
	   (transient {}) coll)))

This code has not been checked for timing nor has any attempt been made to make it optimal or align with any programming best practice. It’s a collection of notes i made when learning something new.