I assumed you have the knowledge of following topics:
Now we are finally going to construct a complete Feedforward Neural Network. All the layers are hooked up by a process of Forward Propagation. And later in this tutorial, we will train this network with Backpropagation Algorithm, a close derivation of Gradient-based Optimization.
Forward Propagation, or usually
fprop is rather simple in feedforward layer, you can take previous output as input and produce the output for next layer.
A example implementation looks like this (here):
1 def fprop(self, 2 X): 3 """Forward propagation 4 5 Parameters 6 ---------- 7 X : matrix or 4D tensor 8 input samples, the size is (number of cases, in_dim) 9 10 Returns 11 ------- 12 out : list 13 output list from each layer 14 """ 15 16 out=; 17 level_out=X; 18 for k, layer in enumerate(self.layers): 19 20 level_out=layer.apply(level_out); 21 22 out.append(level_out); 23 24 return out;
Here the loop iterates through all the layers and each layer is applied to previous layer’s output. Note that a uniform API is important when you develop a Deep Learning algorithm.
Example of create a network model
The above code describes 3 layers network, the first 2 layers are
ReLULayer and the output layer is
In order to train a neural network using gradient-based algorithm, there are two necessary parts: a cost function and a list of parameters that is subject to change. The simplest form Stochastic Gradient Descent (SGD) is updated by:
where \(J\) is the function.
A cost is defined as the difference between actual output and target output. Here since we use Softmax layer, we described here in code for categorical cross entropy cost (here).
Categorical cross entropy summarized cross entropy between 2 probability distribution (Remember that output of Softmax Layer is a probability distribution).
1def categorical_cross_entropy_cost(Y_hat, Y_star): 2 """Categorical Cross Entropy Cost 3 4 Parameters 5 ---------- 6 Y_hat : tensor 7 predicted output of neural network 8 Y_star : tensor 9 optimal output of neural network 10 11 Returns 12 ------- 13 costs : scalar 14 cost of Categorical Cross Entropy Cost 15 """ 16 17 return T.nnet.categorical_crossentropy(Y_hat, Y_star).mean();
Besides the cost between actual output and target, we usually also introduce L1 and L2 regularization
All relevant parameters in the model should be documented together in order to get the correct gradient for the entire model. You can use an one-liner to do this job:
The above code is what I used to zip all parameters in the neural network.
Usually, writing a BP algorithm is tedious and complex. Since Theano introduced auto-gradient method based on computation graph search. This process is now easy and flexible to all kinds of use.
You just need to call the function
grad(cost, params), it will compute a corresponding list of parameter gradients.
The above is a typical example of SGD.
You can then use the updates of the parameters to build a training model:
The above Theano function can be used to train all the parameters. Given a batch of data, the cost is used to update the parameters. the rest for the training is just to call this function on every training batch in number of epochs.