- Introduction
- Training a Multilayer Perceptron for the Logical And
- Training a Multilayer Perceptron for the Exclusive Or
- Training a Multilayer Perceptron for the Iris Data
- Computation of the Activation Function
- Copying
- Download
- Contact

I am sorry that there is no detailed documentation yet. Below you
can find a brief explanation of how to train a multilayer perceptron
with the program `mlpt`, how to execute a trained network on
new data with the program `mlpx`, and how to do a sensitivity
analysis of a trained network with the program `mlps`.
For a list of options, call the programs without any arguments.

In the directory `mlp/ex` in the source package you can find
training pattern sets for two simple logical functions (and / exclusive
or) and for the well-known iris data (measurements of the sepal length
/ width and the petal length / width of three types of iris flowers).
How to train neural networks for these examples is discussed below.

Enjoy,

Christian Borgelt

back to the top |

As a first example let us take a look at the very simple problem
of training a perceptron so that it computes the logical and. The
training patterns for the mlpt program are stored in the file
`and.pat`, which looks like this:

0 0 0 1 0 0 0 1 0 1 1 1

The first two columns state the input values, the third column states the corresponding output value. To train a multilayer perceptron for the logical and, type

mlpt -M and.pat and.net

This will train a perceptron with two input neurons, one output
neuron and no hidden neurons for 1000 epochs. The option `-M`
tells the program that the input is a pure numerical matrix and not
a real data table with column names (see below). You need not specify
the number of inputs/outputs, because by default the program assumes
that there is only one output, which is in the last column, while all
other columns are inputs. The program also assumes by default that
there is no hidden layer.

The trained network will be written to the file `and.net`,
which looks like this:

units = 2, 1; scales = [0.5, 2], [0.5, 2]; weights = {{ 2.83695, 2.83693, -2.83692 }}; ranges = [0, 1];

The line starting with `units` lists the number of neurons
in the different layers, starting with the input layer and ending with
the output layer. As you can see, there is no hidden layer in this
network.

The next lines specifies linear transformations to be applied to the input values to normalize them to expected value 0 and standard deviation 1 (computed as the square root of the maximum likelihood estimate for the variance). There is one pair of values for each input neuron. The first value is an offset that is subtracted from the input value, the second a factor by which the result (input minus offset) is multiplied.

After the keyword `weights` the weights of the neurons are
listed. Since in this network we only have one neuron having weights
(namely the output neuron; the input neurons do not do any real work),
we have only three weights: the weights of the two connection from the
input neurons and the bias value (in this order).

The last line specifies the range of values of the output neuron. This range was computed from the training patterns and it is [0, 1], because we are dealing with a logical function. Note that if the range(s) of values of the output(s) column in the training data differ(s) from [0, 1] (the range of values of the logistic function), a linear transformation is applied to the output of each output neuron to map the interval [0, 1] to the range of values that was found in the training pattern set.

The perceptron trained above actually computes the logical and, as you can verify by typing

mlpx and.net and.pat and.out

This will compute the sum of squared errors (sse), the mean squared error (mse, mean over training patterns), and the root of the mean squared error (rmse) for the training patterns, which are (for the network above)

sse : 0.00919446 mse : 0.00229861 rmse: 0.0479439

In addition, since an output file was specified, an extended
pattern file will be written to the file `and.out`.
It looks like this:

0 0 0 0.000201242 1 0 0 0.0553624 0 1 0 0.0553603 1 1 1 0.944641

That is, the set of training patterns has been extended by a fourth column, which contains the output of the perceptron for the input patterns specified by the values in the first two columns. Of course, due to the sigmoid function, the result is not perfect (the values produced are not exactly 0 and 1), but the approximation is good enough.

back to the top |

An equally simple approach as studied above for the logical and
does *not* work for the exclusive or, i.e., for the input file
`xor.pat`, which looks like this

0 0 0 1 0 1 0 1 1 1 1 0

Training a multilayer perceptron for this problem with

mlpt -M xor.pat xor.net

yields a network looking like this

units = 2, 1; scales = [0.5, 2], [0.5, 2]; weights = {{ 0.000318987, -0.00412534, 0.00176216 }}; ranges = [0, 1];

This network does *not* solve the problem, as can be seen
from the fact that the error measures are

sse : 1.00001 mse : 0.250001 rmse: 0.500001

as well as from the output, produced, for instance, with

mlpx xor.net xor.pat xor.out

which looks like this

0 0 0 0.501392 1 0 1 0.501552 0 1 1 0.499329 1 1 0 0.499489

That is, no distinction is made between the training patterns. Of course, this is due to the fact that a simple perceptron can solve only linearly separable problems and the exclusive or is, obviously, not linearly separable. To solve this problem, we need a network with a hidden layer.

One or more hidden layers can be added to the network with the
option `-c` followed by a colon-separated list of integer
numbers. Each of these numbers specifies the number of neurons in a
hidden layer. That is, `-c2` adds a single hidden layer with
2 neurons, `-c5:3` adds two hidden layers, one with 5 neurons
and one with 3 neurons. The layers are assumed to be ordered from the
input layer towards the output layer.

For the exclusive or problem, a hidden layer with 2 neurons is needed, hence you should type

mlpt -M -c2 -e5000 xor.pat xor.net

The option `-e5000` increases the number of training epochs
from 1000 (the default) to 5000, because it is often the case that the
exclusive or problem is not solved in 1000 epochs. Note that you may
combine the two options into `-c2e5000`.

The result is a network like this:

units = 2, 2, 1; scales = [0.5, 2], [0.5, 2]; weights = {{ 3.44362, -3.44365, 3.21714 }, { 3.52177, -3.52182, -3.55384 }}, {{ -6.89205, 7.1604, 3.19169 }}; ranges = [0, 1];

Here we have three numbers in the list of units, since we added a hidden layer with two neurons. The list of weights is expanded accordingly. We have two layers of neurons (outer curly braces), the first of which contains two neurons having three weights each (the inner curly braces group the weights per neuron), the second having only one neuron (the output neuron), also with three weights. As above, the first numbers in each group are the weights of the connection to the predecessor neurons, whereas the last number is the bias value.

The performance of this network, measured with

mlpx xor.net xor.pat xor.out

is

sse : 0.00642475 mse : 0.00160619 rmse: 0.0400773

and the output file looks like this:

0 0 0 0.0378462 1 0 1 0.962613 0 1 1 0.953499 1 1 0 0.037846

That is, the problem was actually solved.

5000 epochs may appear to be a lot for such a simple problem. However, this is due to the fact that the mlpt program uses standard backpropagation by default. A faster solution can be achieved by adding a momentum term with

mlpt -M -c2 -m0.9 xor.pat xor.net

This sets the momentum factor to 0.9 and thus the program reaches
a satisfactory solution in a few hundred epochs. (Note again that
the two options may be combined into `-c2m0.9`.)

back to the top |

Let us now take a look at the more complex problem of training a multilayer perceptron for the iris data. To train such a network, type

mlpt -M -c3 -U3 iris.pat iris.net

The `-c3` adds a hidden layer with 3 neurons, just as
described above. The `-U3` states that there should be three
output neurons, one neuron for each of the three classes of iris
flowers. (Note that the file with the training patterns has seven
columns, the first four of which state the input values, while the
last three code the class with a 1-in-n code.)

The resulting network looks like this

units = 4, 3, 3; scales = [5.84333, 1.21168], [3.05733, 2.30197], [3.758, 0.568374], [1.19933, 1.31632]; weights = {{ 0.878782, -1.11655, 1.753, 1.93157, 2.34027 }, { -1.16471, 1.76549, -2.04974, -2.99973, -2.84079 }, { -0.727816, -1.62847, 6.83127, 7.75607, -10.1281 }}, {{ -3.61975, 6.45599, -3.85032, -1.43268 }, { 3.21068, -6.26839, -9.754, 1.13484 }, { -0.810291, -3.11402, 9.66869, -3.46732 }}; ranges = [0, 1], [0, 1], [0, 1];

and it solves the problem fairly well, as can be seen from the measurements computed by

mlpx iris.net iris.pat iris.out

which are

sse : 3.70898 mse : 0.0247265 rmse: 0.157247

Inspecting the output file reveals that only three training patterns are misclassified (if each input pattern is assigned to the class with the largest activation).

Although standard backpropagation training works very well for
the iris data, there is a better approach, namely the more flexible
resilient backpropagation method. This method can be chosen with the
`-a` option. The list of training methods, from which you may
choose with the `-a` option is:

bkprop standard backpropagation supersab super self-adaptive backpropagation rprop resilient backpropagation quick quick backpropagation manhattan Manhattan training

Hence, to train a multilayer perceptron with resilient backpropagation type

mlpt -M -arprop -k0c3o3 iris.pat iris.net

Note that there is also the additional option `-k0`. This
option specifies that the weights of the network should be updated
only once for each epoch, namely after a full traversal of all
training patterns. This additional option is necessary, because
resilient backpropagation does not work well for online training.

In general the `-k` option specifies the number of patterns
that should be processed before the weights are updated. As already
explained above a value of 0 means that the weights are updated only
once per epoch (batch training). The default is to update the weights
after each training pattern (online training). `-k10` means
that the weights are updated every 10 training patterns. Hence, with
the option `-k`, a gradual transition between pure online
training (update after each training pattern) and pure batch training
(update only once per epoch) can be achieved.

Sometimes the problem arises that the initial number of training epochs was chosen too low, so that the trained network is not good enough w.r.t. the given problem. An already trained network may be improved by specifying the file it is contained in as a third argument to the program mlpt. That is, for instance,

mlpt -M -e2000 iris.pat iris.new iris.old

takes the already trained network stored in the file
`iris.old`, trains it for 2000 epochs with the patterns
in the file `iris.pat`, and stores the result in the file
`iris.new`. Note that the options `-c`, `-U`,
and `-w` are ignored if an already trained network is given.
Note also that the new network file may or may not have the same
name as the old one.

back to the top |

Up to now we always used the option `-M` to tell the
program `mlpt` that the input is a numerical matrix. Without
this option real data tables (with column names etc.) are possible.
The main differences are:

- Input patterns are not restricted to numeric values, but may also
contain nominal values. Note that the first line of such a more
general pattern table should contain attribute names. If it does
not, use either
`-d`to generate default names or`-h`to read the attribute names from another file. How the values are interpreted is determined by a domain definition file, which must be given as a first argument to the program`mlpt`. (See the table package, especially the program`dom`, for more explanations about this.) Attributes that are not listed in the domain definition file are not used. Nominal attributes are automatically recoded into numeric ones using a 1-in-n code. - Instead of a number of target columns the name of a target attribute
is specified with the option
`-o`. If no target attribute is specified, the attribute listed last in the domain definition file is used as the target. The target attribute may be numeric or nominal. If it is nominal, it is automatically recoded using a 1-in-n code (like all symbolic attributes). - The program
`mlpx`evaluates the output of the network and automatically decodes the result, so that a nominal value is computed if the target attribute is nominal. This makes the two programs`mlpt`and`mlpx`very convenient to use if you want to solve a classification or prediction problem.

Example: The command

mlpt -c2k0 -aquick iris.dom iris.tab iris.net

trains a multilayer perceptron with two hidden neurons for the iris data using resilient backpropagation. The result looks like this:

/*-------------------------------------------------------------------- domains --------------------------------------------------------------------*/ dom(sepal_length) = IR; dom(sepal_width) = IR; dom(petal_length) = IR; dom(petal_width) = IR; dom(iris_type) = { Iris-setosa, Iris-versicolor, Iris-virginica }; /*-------------------------------------------------------------------- multilayer perceptron --------------------------------------------------------------------*/ mlp(iris_type) = { units = 4, 2, 3; scales = [5.84333, 1.21168], [3.05733, 2.30197], [3.758, 0.568374], [1.19933, 1.31632]; weights = {{ 2.14492, -14.671, 33.8793, 59.3337, 42.8929 }, { 1.02566, 0.814512, -7.29516, -3.16734, 6.34753 }}, {{ -101.809, 36.3122, 1.11674 }, { 15.3094, 19.9228, -24.0699 }, { 13.4884, -60.6794, 12.9485 }}; ranges = [0, 1], [0, 1], [0, 1]; };

This network leads to only one misclassification (0.67%), as can be verified with the command

mlpx iris.net iris.tab

back to the top |

The program mlps do a sensitivity analysis of a trained network
on a given dataset (which may or may not be the dataset the network
was trained on). Both programs compute the partial derivative of
the outputs w.r.t. to the inputs for each input patterns. By default
the maximum of these values for the different output neurons is used
to assess how sensitive the outputs react to changes in the inputs.
With the option `-s` the sum over the output units can be used
instead of the maximum. The resulting values are summed over all
training patterns.

back to the top |

By changing the makefile, you can activate a table based computation of the logistic activation function of the neurons, which can lead to much lower training times. To compile the programs in this way, activate the line

CFLAGS = $(CFBASE) -DNDEBUG -O3 -DMLP_TABFN

in the makefile. (The definition of `MLP_TABFN` does the
trick.) The table contains the values of the logistic function for
1024 equidistant points in the range 0 to 16. You may change the
argument range or the number of points by adapting the definitions of
`TABMAX` and `TABSIZE` in the file `mlp/src/mlp.c`.

It is also possible to compile the programs so that they use the
*tangens hyperbolicus* as the activation function of the neurons.
To get this version, activate the line

CFLAGS = $(CFBASE) -DNDEBUG -O3 -DMLP_TANH

in the makefile. (The definition of `MLP_TANH` changes the
activation function to tanh.) There is also a table based version of
this, which can be activated with

CFLAGS = $(CFBASE) -DNDEBUG -O3 -DMLP_TANH -DMLP_TABFN

There is a theoretical argument in favor of the tangens hyperbolicus: the output of a neuron is much less likely to be (close to) zero, which is desirable, since an output of zero means that the connection weights to the successor neurons do not get adapted. In practice, however, the logistic function usually leads to better results. I have not completely figured out the reasons for this yet.

Note that the initial weight range and the learning rate are changed to 0.5 and 0.05, respectively, if the tangens hyperbolicus is used. These changes are made to compensate for the different properties of the tangens hyperbolicus compared to the logistic function.

back to the top |

mlpt/mlpx/mlps - train and execute multilayer perceptrons

copyright © 1996-2016 Christian Borgelt

(MIT license, or more precisely Expat License;
to be found in the file `mit-license.txt` in the directory
`mlp/doc` in the source package of the program, see also
opensource.org and
wikipedia.org)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

back to the top |

Download page with most recent version.

back to the top |

E-mail: |
christian@borgelt.net | |

Website: |
www.borgelt.net |

back to the top |