The PhD thesis of Paul J. Werbos at Harvard in 1974 described backpropagation as a method of teaching feed-forward artificial neural networks (ANNs). In the words of Wikipedia, it lead to a "rennaisance" in the ANN research in 1980s.
As we will see later, it is an extremely straightforward technique, yet most of the tutorials online seem to skip a fair amount of details. Here's a simple (yet still thorough and mathematical) tutorial of how backpropagation works from the ground-up; together with a couple of example applets. Feel free to play with them (and watch the videos) to get a better understanding of the methods described below!
1. Background
To start with, imagine that you have gathered some empirical data relevant to the situation that you are trying to predict - be it fluctuations in the stock market, chances that a tumour is benign, likelihood that the picture that you are seeing is a face or (like in the applets above) the coordinates of red and blue points.
We will call this data training examples and we will describe th training example as a tuple , where is a vector of inputs and is the observed output.
Ideally, our neural network should output when given as an input. In case that does not always happen, let's define the error measure as a simple squared distance between the actual observed output and the prediction of the neural network: , where is the output of the network.
2. Perceptrons (building-blocks)
The simplest classifiers out of which we will build our neural network are perceptrons (fancy name thanks to Frank Rosenblatt). In reality, a perceptron is a plain-vanilla linear classifier which takes a number of inputs , scales them using some weights , adds them all up (together with some bias ) and feeds everything through an activation function .
A picture is worth a thousand equations:
To slightly simplify the equations, define and . Then the behaviour of the perceptron can be described as , where and .
To complete our definition, here are a few examples of typical activation functions:
- sigmoid: ,
- hyperbolic tangent: ,
- plain linear and so on.
Now we can finally start building neural networks. Continue reading