Softmax Activation Function in Neural Network [formula included]

by keshav


softmax-formula-math.PNG

The softmax activation function is the generalized form of the sigmoid function for multiple dimensions. It is the mathematical function that converts the vector of numbers into the vector of the probabilities. The softmax activation function is commonly used as an activation function in the case of multi-class classification problems in machine learning. The output of the softmax is interpreted as the probability of getting each class.

 

The mathematical expression of softmax activation function is,

softmax-function

Let’s learn about the working of the softmax function with an example. Let’s consider a neural network that classifies a given image, whether it is of cat, dog, tiger, or none. Let X is the feature vector (i.e. X = [x1, x2, x3, x4]).

Also Read:

 

multi-class classification

We normally use a softmax activation function in the last layer of a neural network as shown in the figure above.

 

In the neural network shown above, we have

calculation of Z

 

Where,

 

   

, calculated values at layer (L-1),

 

weight matrix

 

is the weight matrix. 

 

m = total nodes in layer L-1 and n = nodes in output layer L.

For this example, m = 3, n = 4.

 

And,

 bias

 is the bias matrix, n = 4 in this example.

 

Now we calculate the exponential values of elements of matrix Z [L].

exponential values of Z

 

 

And,

sum of exponential

The probabilities of being indifferent classes given input X are calculated as follows.

calculation of probabilities

 

On the basis of this probability distribution, our neural network classifies whether the given image is of cat, dog, tiger, or none.

 

Now let’s make it clearer by taking some numeric values.

 

Suppose we input an image and obtained these values

Z values supposition

 

 

 

 

Then,

exponential value calculated numeric

 

 

 

 

 

 

 

 

And,

sum of exp numeric

 

 

 

 

Therefore the probability distribution is calculated as,

 

probability calculation formula numeric

 

 

 

 

 

 

 

 

 

By observing the probability distribution, we can say that the supplied image is of a dog.

The advantages of using the softmax activation function are:

  • It can be used for multiclass classification
  • It normalizes the outputs for each class between 0 and 1, and divides by their sum, giving the probability of the input value being in a specific class.
  • For neural networks that need to categorize inputs into numerous categories, Softmax is often employed exclusively for the output layer.


No Comments


Post a Comment