PyBrain is a machine learning library written in python with hooks out to c++ when speed is required. The documentation isn’t fantastic, clearly being written by people that have been ‘in it’ for a while. Here is a simple classification network that uses short-cut functions to construct a classification network.
First, pull in all the classes/methods that are required (you may need to pull in a couple of others – see the comments below).
#!/usr/bin/python import sys from pybrain.datasets import ClassificationDataSet from pybrain.utilities import percentError from pybrain.tools.shortcuts import buildNetwork from pybrain.supervised.trainers import BackpropTrainer from pybrain.structure.modules import SoftmaxLayer
PyBrain uses several dataset classes to make data handling easier (or not). For a classification problem, we will use the ClassificationDataSet. This dataset has two classes: 0 & 1;. The groups are clearly distinct.
In the declaration: 2 = dimensionality of the each input vector; 1 = number of output types; nb_classes = number of classes.
alldata = ClassificationDataSet(2, 1, nb_classes=2) alldata.addSample([-1,-1],) alldata.addSample([-1,-1],) alldata.addSample([-1,-1],) alldata.addSample([-1,-1],) alldata.addSample([-1,-1],) alldata.addSample([1,1],) alldata.addSample([1,1],) alldata.addSample([1,1],) alldata.addSample([1,1],) alldata.addSample([1,1],)
Once the dataset has been created, we split it into a training and test set:
tstdata, trndata = alldata.splitWithProportion( 0.25 ) trndata._convertToOneOfMany( ) tstdata._convertToOneOfMany( ) #We can also examine the dataset print "Number of training patterns: ", len(trndata) print "Input and output dimensions: ", trndata.indim, trndata.outdim print "First sample (input, target, class):" print trndata['input'], trndata['target'], trndata['class']
Now lets build a neural network and train it using a back propagation network on the training data (trndata).
fnn = buildNetwork( trndata.indim, 5, trndata.outdim, recurrent=False ) trainer = BackpropTrainer( fnn, dataset=trndata, momentum=0.1, verbose=True, weightdecay=0.01 )
Originally I figured the best way to train was to do it myself, having looked at my code, and that in the doc, I think that it’s best to use the built-in functions:
from pybrain.tools.neuralnets import NNregression, Trainer # Create you dataset - as above nn = NNregression(alldata) nn.setupNN() nn.runTraining()
you should be aware that there are all sorts of training approaches for a NN and that pybrain allows you to use them – if you can penetrate the doc!!!
— original method —
Train the network for n epochs
# I am not sure about this, I don't think my production code is implemented like this modval = ModuleValidator() for i in range(1000): trainer.trainEpochs(1) trainer.trainOnDataset(dataset=trndata) cv = CrossValidator( trainer, trndata, n_folds=5, valfunc=modval.MSE ) print "MSE %f @ %i" %( cv.validate(), i )
Now predict the test data again – this should really be a HOT set.
print tstdata print ">", trainer.testOnClassData(dataset=tstdata)