# Loss Functions| Cost Functions in Machine Learning

by keshav Every Machine Learning algorithm (Model) Learns by the process of optimizing loss functions (or Error/Cost functions). Loss functions in machine learning are the functions that deal with the evaluation of how accurate the given prediction is made. If the prediction is made far away from the actual or true value i.e. prediction deviates more from actual value, then the loss function gives a high numeric value. For the model to produce a good prediction, it must have a low deviation from the actual value i.e. low loss. We use some optimization techniques like gradient descent algorithm, to reduce the loss in our prediction.

There are several loss functions in machine learning. Now, the question arises, can we use any of the loss functions in our machine learning algorithm? The answer is no. No in the sense, if we use the loss function randomly, then we may face problems in the calculation of loss as well as might produce some error if the loss function is more sensitive of an outlier. So it is important to know about the loss function before using them to calculate the loss in our prediction. It should be selected on the basis of the machine learning algorithm we are using. There are several factors that govern the selection of Loss function for your problem like, the algorithm you are using, ease of evaluation of probability and derivative, presence of outlier, etc.

Depending upon the type of evaluation model i.e. classification or regression, the loss functions can also be divided into two types: classification or regression loss function (actually there is no such classification. we made this classification for the ease of understanding only).

In classification, we try to predict the class or label of any supplied tuple (set of features) on the basis of the given dataset for modeling. It means categorical value (eg male or female, dead or alive etc.) is predicted.

In regression, we predict the continuous value for any given set of features on the basis of the given dataset for modeling.

• ### Classification Loss Functions

Some of the classification loss functions are:

1. Hinge loss/SVM loss Function

Hinge Loss is a loss function that is used for the training classifier models in machine learning. More precisely, it is used for a maximum-margin classification algorithm (i.e. SVM).

Let, T be the target output such that T = (-1 or +1) and classifier score be Y, then hinge loss for the prediction is given as, It should be noted that y is not a class label but is raw (i.e. numeric output) given by classifiers' decision surface.

For example, Linear SVM: Y = W.X + b; (W,b) are the weight and biases which are parameters of Hyperplane and X is a feature to classify.

Interpretation for Hinge Loss:

We can see that if T and Y are of the same sign (i.e. classified in right class) and |Y|>= 1, then loss, L(Y) =0. It means the classification accuracy is high. On the other hand, the loss, L(Y) gradually increases if T and Y are of opposite sign (Wrong classification) and if they have the same sign but |Y|<1 (called as Low Margin error).

1. Cross-entropy Loss/ Negative Log-Likelihood Loss function

Cross-entropy loss/ Negative log Likelihood is a loss function that measures the probability prediction of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverge from the True value or actual label. A perfect model would have a log loss of 0 and its high value suggests a high error in our predictive model.

The general Mathematical expression for Cross-entropy Loss is, Where, M= total number of classes to be classified e.g. if we have a class label as a cat, dog, rat, then M=3.

Y= binary indicator (1 or 0) if the class label ‘c’ is correctly classified for an observation ‘o’.

P= Predicted probability for an observation ‘o’ is of class ‘c’.

For, binary classification, where M=2, cross-entropy can be calculated as: • ### Regression Loss Functions

Some of the Regression Loss Function is:

1. Mean Square Error/Quadratic Loss/L2 Loss (MSE):

It is given as the average squared difference between the actual value and predicted value by the learning model in regression.

Mathematically it is given as, Where T is true value i.e. actual value and Y predicted value.

The optimization of MSE is done by using a gradient descent algorithm. It is more sensitive to outliers (as it includes squared difference) than MAE.

1. Mean Absolute Error/L1 Loss (MAE):

It is given as the average of the absolute difference between the actual value and predicted value by the learning model in regression.

Mathematically it is given as, Where T is true value i.e. actual value and Y predicted value.

The optimization of MAE is done by using a gradient descent algorithm.

1. Huber loss Function:

This Loss function is commonly used for regression problems.

Mathematically Huber loss is given as, Where, δ is set as a specific percentile of the absolute residuals i.e. |T-Y|.