Activation Function in Deep Learning [python code included]

by keshav

Deep learning needs lots of data to perform efficiently. Today internet provides tons of data but the issue with it is that there is no solid partition between the useful and not-so-useful data. In technical terms, there is some amount of noise without out required information. If we train our learning model without removing the noise, our model will not produce the required output. This is where the Activation Function comes into the picture.

Activation functions are mathematical equations that determine the output of a neural network model. They help the network to use the important information and suppress the noise.

In a neural network, activation functions are utilized to bring non-linearities into the decision border. The goal of introducing nonlinearities in data is to simulate a real-world situation. Almost much of the data we deal with in real life is nonlinear. This is what makes neural networks so effective.

Before we dive deeper into different types of activation functions, let's first look at where is activation function used in a neural network.

Following is the picture of a simple model of neuron commonly known as the perceptron. It can be divided into 3 parts: Input layer, neuron, and output layer.

ANN

We have input as X₁, X₂, X₃,..., X_m. and the weights associated with those inputs as w1, w2, w3,...,wm.

We first calculate the weighted sum of the input i.e.

weighted sum

Then Activation function is applied as,

activation Function application

The calculated value is then passed to the next layer if present.

Now let's look at the most common types of Activation functions used in deep learning. We will also learn how to implement using python.

Sigmoid Function

Sigmoid Activation Function is one of the widely used activation functions in deep learning. As its name suggests the curve of the sigmoid function is S-shaped.

Sigmoid transforms the values between the range 0 and 1.

The Mathematical function of the sigmoid function is:

sigmoid activation function

Derivative of the sigmoid is:

Python Code

import numpy as np
import matplotlib.pyplot as plt

# Sigmoid Activation Function
def sigmoid(x):
  return 1/(1+np.exp(-x))

# Derivative of Sigmoid
def der_sigmoid(x):
  return sigmoid(x) * (1- sigmoid(x))

# Generating data to plot
x_data = np.linspace(-10,10,100)
y_data = sigmoid(x_data)
dy_data = der_sigmoid(x_data)

# Plotting
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('Sigmoid Activation Function & Derivative')
plt.legend(['sigmoid','der_sigmoid'])
plt.grid()
plt.show()

sigmoid activation function and derivative

Hyperbolic Tangent (htan) Activation Function

The tanh function is similar to the sigmoid function. The output ranges from -1 to 1.

The Mathematical function of tanh function is:

tanh

Derivative of tanh function is:

Python Code

import numpy as np
import matplotlib.pyplot as plt

# Hyperbolic Tangent (htan) Activation Function
def htan(x):
  return (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))

# htan derivative
def der_htan(x):
  return 1 - htan(x) * htan(x)

# Generating data for Graph
x_data = np.linspace(-6,6,100)
y_data = htan(x_data)
dy_data = der_htan(x_data)

# Graph
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('htan Activation Function & Derivative')
plt.legend(['htan','der_htan'])
plt.grid()
plt.show()

tanh acticvation function and dericative

Rectified Linear Unit (ReLU) Activation Function

The rectified linear activation function (RELU) is a piecewise linear function that, if the input is positive say x, the output will be x. otherwise, it outputs zero.

The mathematical representation of ReLU function is,

The derivative of ReLU is,

ReLU is used widely nowadays, but it has some problems. let's say if we have input less than 0, then it outputs zero, and the neural network can't continue the backpropagation algorithm. This problem is commonly known as Dying ReLU. To get rid of this problem we use an improvised version of ReLU, called Leaky ReLU.

Python Code

import numpy as np
import matplotlib.pyplot as plt

# Rectified Linear Unit (ReLU)
def ReLU(x):
  data = [max(0,value) for value in x]
  return np.array(data, dtype=float)

# Derivative for ReLU
def der_ReLU(x):
  data = [1 if value>0 else 0 for value in x]
  return np.array(data, dtype=float)

# Generating data for Graph
x_data = np.linspace(-10,10,100)
y_data = ReLU(x_data)
dy_data = der_ReLU(x_data)

# Graph
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('ReLU Activation Function & Derivative')
plt.legend(['ReLU','der_ReLU'])
plt.grid()
plt.show()

ReLU activation Function and Derivative

Leaky Rectified Linear Unit (leaky ReLU) Activation Function

Leaky ReLU is the most common and effective method to solve a dying ReLU problem. It is nothing but an improved version of the ReLU function. It adds a slight slope in the negative range to prevent the dying ReLU issue.

The mathematical representation of Leakt ReLU is,

The Derivative of Leaky ReLU is,

Python Code

import numpy as np
import matplotlib.pyplot as plt

# Leaky Rectified Linear Unit (leaky ReLU) Activation Function
def leaky_ReLU(x):
  data = [max(0.05*value,value) for value in x]
  return np.array(data, dtype=float)

# Derivative for leaky ReLU 
def der_leaky_ReLU(x):
  data = [1 if value>0 else 0.05 for value in x]
  return np.array(data, dtype=float)

# Generating data For Graph
x_data = np.linspace(-10,10,100)
y_data = leaky_ReLU(x_data)
dy_data = der_leaky_ReLU(x_data)

# Graph
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('leaky ReLU Activation Function & Derivative')
plt.legend(['leaky_ReLU','der_leaky_ReLU'])
plt.grid()
plt.show()

leaky ReLU activation Function and Derivative