It is probably, one of the simplest but strong supervised learning algorithms used for classification as well regression purposes. It is most commonly used to classify the data points that are separated into several classes, in order to make predictions for new sample data points. It is a non-parametric and lazy learning algorithm. It classifies the data points based on the similarity measure (e.g. distance measures, mostly Euclidean distance).
Principle: K- NN algorithm is based on the principle that, “the similar things exist closer to each other or Like things are near to each other.”
In this algorithm ‘K’ refers to the number of neighbors to consider for classification. It should be an odd value. The value of ‘K’ must be selected carefully otherwise it may cause defects in our model. If the value of ‘K’ is small then it causes Low Bias, High variance i.e. overfitting of the model. In the same way, if ‘K’ is very large then it leads to High Bias, Low variance i.e. underfitting of the model. There are many types of research done on the selection of the right value of K, however in most of the cases taking ‘K’ = {square-root of (total number of data ‘n’)} gives a pretty good result. If the value ‘K’ comes to be odd then it’s all right else we make it odd either by adding or subtracting 1 from it.
K-NN works pretty well with a small number of input variables (p), but there are more chances of error in prediction when the number of inputs becomes very large.
It is based on the simple mathematics that we used at the high school level to measure the distance between two data points in the graph. Some of the distance measuring techniques (Formulae) that we can use for K-NN classification are:
For p=1, we get Manhattan Distance and for p=2, we get Euclidean Distance. So, we can say that Minkowski distance is the generalized form of Manhattan Distance, Euclidean Distance.
Among these methods, the Euclidean Distance method is widely used.
Advantages:
Disadvantages:
Further reading- Implementation of KNN (from scratch in python)
Post a Comment
No Comments