I assume that you have the knowledge and understand following knowledge

Activation Function

In this section we introduce 6 popular activation function for the neuron. They are usually introduced in Deep Neural Network research.

As you can see, all of them are simple functions that can be called from library easily. So why would I write it anyway. I decided to create a mask so that I don’t have to bother with remembering the current path from different places. The following is an example (here):

 1import theano.tensor as T
 3def tanh(x):
 4    """Hyperbolic tangent nonlinearity"""
 5    return T.tanh(x);
 7def sigmoid(x):
 8    """Standard sigmoid nonlinearity"""
 9    return T.nnet.sigmoid(x);
11def softplus(x):
12    """Softplus nonlinearity"""
13    return T.nnet.softplus(x);
15def relu(x):
16    """Rectified linear unit"""
17    return x*(x>1e-13);
19def softmax(x):
20    """Softmax function"""
21    return T.nnet.softmax(x);
23def identity(x):
24    """Identity function"""
25    return x;

Activate a Layer

Suppose a set of samples \(X\) and weights \(W\), the bias \(b\), then the activation can be computed as:

where \(Y=W\cdot X+b\). Turns out, this is extremely easy to write in Python.

In python, you can write it as (example):

1return activation(self.apply_lin(X));

Note apply_lin is the linear transformation function from your abstract Layer class in Feedforward Layer.

MLP Layers

We mostly interested in 4 type of layers:

  • Identity layer: apply identity function as activation.

  • Hyperbolic tangent layer: apply \(\tanh\) function as activation

  • Sigmoid Layer: apply sigmoid function as activation

  • ReLU layer: apply rectified linear unit as activation

The above 4 usually served as hidden layer in MLP network. Softmax layer is usually used as classification layer, we introduced it in Softmax Regression.

If you write your Layer correctly, then it’s very easy to extend them to be the above mentioned 4 types of layers. Here is an example (here):

 1class IdentityLayer(Layer):
 2    """Identity Layer
 3    """
 5    def __init__(self, **kwargs):
 6        super(IdentityLayer, self).__init__(**kwargs);
 8    def apply(self, X):
 9        return self.apply_lin(X);
11class TanhLayer(Layer):
12    """Tanh Layer
13    """
15    def __init__(self, **kwargs):
16        super(TanhLayer, self).__init__(**kwargs);
18        self.initialize("tanh");
20    def apply(self, X):
21        return nnfuns.tanh(self.apply_lin(X));
23class SigmoidLayer(Layer):
24    """Sigmoid Layer"""
26    def __init__(self, **kwargs):
28        super(SigmoidLayer, self).__init__(**kwargs);
30        self.initialize("sigmoid");
32    def apply(self, X):
33        return nnfuns.sigmoid(self.apply_lin(X));
35class ReLULayer(Layer):
36    """ReLU Layer"""
38    def __init__(self, **kwargs):
39        super(ReLULayer, self).__init__(**kwargs);
41    def apply(self, X):
42        return nnfuns.relu(self.apply_lin(X));

In above codes, TanhLayer and SigmoidLayer is a little different in __init__ than others. They can a better weight initialization strategy.

Initialize Weights

There are few heuristics that we can apply when we initialize the weights of a MLP layer. It should be uniformly sampled from a symmetric interval that depends on the activation function. If the activation function is \(\tanh\), then the interval is

where \(fan_{in}\) is number of neuron of \((i-1)\)-th layer and \(fan_{out}\) is the number of neuron of \(i\)-th layer. For sigmoid function, the range is

Generally, this boundary should be close to 0 and weights are randomly generated.

Related contribution can be found at this paper: Understanding the difficulty of training deep feedforward neuralnetworks.

The following is an example for a weight initialization function (here):

 1def init_weights(name,
 2                 out_dim,
 3                 in_dim=None,
 4                 weight_type="none"):
 5    """Create shared weights or bias
 7    Parameters
 8    ----------
 9    out_dim : int
10        output dimension
11    in_dim : int
12        input dimension
13    weight_type : string
14        type of weights: "none", "tanh", "sigmoid"
16    Returns
17    -------
18    Weights : matrix or vector
19        shared matrix with respect size
20    """
22    if in_dim is not None:
23        if weight_type=="tanh":
24            lower_bound=-np.sqrt(6. / (in_dim + out_dim));
25            upper_bound=np.sqrt(6. / (in_dim + out_dim));
26        elif weight_type=="sigmoid":
27            lower_bound=-4*np.sqrt(6. / (in_dim + out_dim));
28            upper_bound=4*np.sqrt(6. / (in_dim + out_dim));
29        elif weight_type=="none":
30            lower_bound=0;
31            upper_bound=1./(in_dim+out_dim);
33    if in_dim==None:
34        return theano.shared(value=np.asarray(np.random.uniform(low=0,
35                                                                high=1./out_dim,
36                                                                size=(out_dim, )),
37                                              dtype="float32"),
38                             name=name,
39                             borrow=True);
40    else:
41        return theano.shared(value=np.asarray(np.random.uniform(low=lower_bound,
42                                                                high=upper_bound,
43                                                                size=(in_dim, out_dim)),
44                                              dtype="float32"),
45                             name=name,
46                             borrow=True);