python PyBrain – brains, yum

PyBrain is a machine learning library written in python with hooks out to c++ when speed is required. The documentation isn’t fantastic, clearly being written by people that have been ‘in it’ for a while. Here is a simple classification network that uses short-cut functions to construct a classification network.

First, pull in all the classes/methods that are required (you may need to pull in a couple of others – see the comments below).

#!/usr/bin/python
import sys

from pybrain.datasets            import ClassificationDataSet
from pybrain.utilities           import percentError
from pybrain.tools.shortcuts     import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules   import SoftmaxLayer

PyBrain uses several dataset classes to make data handling easier (or not). For a classification problem, we will use the ClassificationDataSet. This dataset has two classes: 0 & 1;. The groups are clearly distinct.

In the declaration: 2 = dimensionality of the each input vector; 1 = number of output types; nb_classes = number of classes.

alldata = ClassificationDataSet(2, 1, nb_classes=2)
alldata.addSample([-1,-1],[0])
alldata.addSample([-1,-1],[0])
alldata.addSample([-1,-1],[0])
alldata.addSample([-1,-1],[0])
alldata.addSample([-1,-1],[0])

alldata.addSample([1,1],[1])
alldata.addSample([1,1],[1])
alldata.addSample([1,1],[1])
alldata.addSample([1,1],[1])
alldata.addSample([1,1],[1])

Once the dataset has been created, we split it into a training and test set:

tstdata, trndata = alldata.splitWithProportion( 0.25 )
trndata._convertToOneOfMany( )
tstdata._convertToOneOfMany( )

#We can also examine the dataset
print "Number of training patterns: ", len(trndata)
print "Input and output dimensions: ", trndata.indim, trndata.outdim
print "First sample (input, target, class):"
print trndata['input'][0], trndata['target'][0], trndata['class'][0]

Now lets build a neural network and train it using a back propagation network on the training data (trndata).

fnn     = buildNetwork( trndata.indim, 5, trndata.outdim, recurrent=False )
trainer = BackpropTrainer( fnn, dataset=trndata, momentum=0.1, verbose=True, weightdecay=0.01 )

Originally I figured the best way to train was to do it myself, having looked at my code, and that in the doc, I think that it’s best to use the built-in functions:

from pybrain.tools.neuralnets import NNregression, Trainer
# Create you dataset - as above
nn = NNregression(alldata)
nn.setupNN()
nn.runTraining()

you should be aware that there are all sorts of training approaches for a NN and that pybrain allows you to use them – if you can penetrate the doc!!!

— original method —

Train the network for n epochs

# I am not sure about this, I don't think my production code is implemented like this
modval = ModuleValidator()
for i in range(1000):
      trainer.trainEpochs(1)
      trainer.trainOnDataset(dataset=trndata)
      cv = CrossValidator( trainer, trndata, n_folds=5, valfunc=modval.MSE )
      print "MSE %f @ %i" %( cv.validate(), i )

Now predict the test data again – this should really be a HOT set.

print tstdata
print ">", trainer.testOnClassData(dataset=tstdata)
Advertisements

20 comments

  1. Thank you! Nice little article

    I needed to add:
    from pybrain.tools.validation import ModuleValidator
    from pybrain.tools.validation import CrossValidator

    to the top of the code to ensure python found these two commands, other than that it ran. I have been battling to understand the rather complex pybrain tutorials for days now

  2. Thank you for the example.

    A couple of corrections:
    “n _folds=5” should be “n_folds=5”
    “NNregression(dataset)” should be “NNregression(alldata)”

  3. Hello, and thanks for this great tutorial!
    I’m having a problem with my particular case. My input vectors are of dimension 4, and I want to have one of 5 possible output results (0, 1, 2, 3 or 4), so I used:

    alldata.addSample([0,0,0,0],[0])
    alldata.addSample([0.1,0.1,0.1,0.1],[0])
    .
    .
    .
    alldata.addSample([1,1,1,1],[1])
    alldata.addSample([1.1,1.1,1.1,1.1],[1])
    .
    .
    .

    etc

    My problem is that when I try:

    trndata._convertToOneOfMany( )

    I get an exception:

    ds._convertToOneOfMany()
    File “/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pybrain/datasets/classification.py”, line 142, in _convertToOneOfMany
    newtarg[i, int(oldtarg[i])] = bounds[1]
    IndexError: index 2 is out of bounds for axis 1 with size 2

    Do you have any suggestions?

    Thanks in advance!

  4. I am beginner in Python and Pybrain. My question is where can i upload my data into the python ANN (inside ur code). How would I write that in the code. For example I have a text file called Weather.txt inside the file is 4 variables and all are either 0 or 1.

    looks like this inside text file

    =======================
    humidity | Morning | Evening | Result(Rain)

    0,0,1,1
    1,1,0,0
    0,0,0,0

    Finally, I want the ANN to predict if its going to Rain or not

    can you please help me.

    1. Hi Ben,

      Currently only have a tablet so can’t give you some demo code. Will have something for you later today or tomorrow.

      You should take a look at using ‘with open(filename) as fh’, the csv reader python module and array indexing.

      Dan

    2. Here’s an outline

      import csv
      from itertools import islice
      
      input_file = 'sample.txt'
      
      with open(input_file) as fh:
          data = islice(csv.reader(fh), 3, None, None)
          for row in data:
              label = [row[-1]]
              features = [int(i) for i in row[:3]]
      
              # here you would add the data to the alldata classification
      	# dataset
      
      	alldata.addSample(features, label)
      
      
      
  5. thanks for this great article, but I don’t quite follow the “_convertToOneOfMany”, I have read the doc, but I don’t understand the “work better if classes are encoded into one output unit per class” part. Could you explain a little please? many thanks

  6. Good article, but I still can not understand very well how categorical variables work on pybrain.
    As a simple example I am trying to make the followying code work:

    a = np.random.rand(1000)
    c =np.array([0]*len(a))
    for i in range(0,len(a)):
    if a[i] >= 0.6:
    c[i]=0
    elif a[i] < 0.5:
    c[i]=1
    else:

    c[i]=2
    net = bld(1,3,1)
    classes = ['menor','meio','maior']
    dados = cds(1,1,nb_classes=3,class_labels=classes)
    load_dataset(dados,a,b)
    trainer = bpt(net,dados)
    trainer.trainUntilConvergence(dataset=dados,maxEpochs=100,continueEpochs=10,validationProportion=0.3)
    print net.activate([0.35])
    print net.activate([0.65])
    ""
    But when I activate it and try getting a 0/1/2 response what I recevie is like a continuous. This is the same problem I have been having on some real data, so I tried this test and could not make even this sample task work!!

    Thanks

    1. Have you tried changing the labels to 0 & 1? I suspect this is probably the case. I’m afraid I haven’t looked at the pybrain code for a very long time – essentially since this was posted … 5 years ago :)

  7. AttributeError: ‘NNregression’ object has no attribute ‘activate’

    How do we test the trained newtwork using NNregression?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s