Getting started with Machine learning
In this article, we know about the basics of Machine Learning. We will also discuss the most common and most frequently used terminologies in Machine learning that will help us in getting started with Machine Learning.
What is Machine Learning?
Actually, Machine Learning is a computer science field that enables the computer or any other machine to learn without explicit programming. The main focus of machine learning is to provide algorithms that can be trained to accomplish a task. It is a subset of artificial intelligence. Machine learning algorithms create a mathematical model based on sample data, known as “training data,” so that predictions or decisions can be made without explicit programming for the task.
What are the Types of Machine Learning problems?
The most common classification of Machine Learning problems includes:
- Supervised Learning: The majority of practical machine learning problems use supervised learning algorithms Supervised learning is a type of learning where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
Y = f(X)
Our aim is to train this mapping function so well that when we supply a new input data (x) to the function we can predict the output variables (Y) for that data. It is called supervised learning because the process of algorithm learning from the training dataset can be taken as an instructor supervising the learning process. We know the correct answers; the algorithm iteratively makes predictions on the training data and is corrected by the instructor.
Supervised learning problems can be further grouped into regression and classification problems.
- Classification: A classification problem is when the output variable is a certain class of data, such as “red” or “blue” or “male” and “female”.
- Regression: A regression problem is when the output is a real value, such as “price of the house” or “weight”.
Some of the popular examples of supervised machine learning algorithms are:
- Linear and polynomial regression for regression problems
- Decision Trees and Random forest for classification and regression problems.
- Support vector machines for classification problems.
- Artificial Neural Networks
- Unsupervised Learning: Unsupervised learning is where you only have input data (X) but no corresponding output variables commonly called labels. The main aim of unsupervised learning is to model the data in order to learn more about the data. Unlike supervised learning there is no instructor means we don’t have a corresponding output label for input data.
Unsupervised learning problems can be further grouped into clustering and association problems.
- Clustering: A clustering problem is where you make several clusters/groups of feature sets based on their similarities in behavior.
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Some popular examples of unsupervised learning algorithms are:
- k-means and hierarchical for clustering problems.
- Apriori algorithm for association rule learning problems.
- Semi-supervised learning: Semi-supervised learning problems includes the problems where you have a sufficiently large amount of input data also called “Training Data” and only some of the data is labeled. These types of problems lie in between both supervised and unsupervised learning. For example, a photo archive where only some of the images are labeled, (e.g. dog, cat, person) and the majority are unlabeled.
- Reinforcement learning: Reinforcement Learning certain special algorithms that includes a computer program that interacts with an external environment in which it is performing a certain goal. The algorithm contains a feedback mechanism in terms of rewards and punishments as it navigates its problem space. In formal terms, reinforcement learning is a method of machine learning wherein the agent(Sometimes called Model) learns to perform certain activities in an environment which leads it towards the maximum reward. It does so by exploring and exploiting the knowledge that it learns through repeated trials to maximize the reward.
While learning anything new, learner faces many difficulties. One of them that disturbs a learner is not knowing technical terminologies related to that subject matter. In Machine Learning, there are also such technical terms that bother learners a lot if they do not know about them. So let’s see what are the most common and technical terminologies that a learner should know about before he/she go through any materials.
Some of the most common and most frequently used terminologies in Machine learning:
- Algorithm: stepwise procedures for solving a problem
- Attribute: Also called feature or field or variables or classes, a class label that defines the class to which given data belongs to.
- Label: They are the final output classes.
- Dimension: The number of features is called dimension
- Model: It is the mathematical expression obtained from the processing of real-world data.
- Training: It is the process of generating a model bypassing ‘Training data’ into different algorithms. Sometimes it is also called ‘Learning’.
- Testing: It is the process of predicting results from a machine learning model bypassing ‘Testing data’ to it.
- Training Data: It is the data set on the basis of which a learning model is made.
- Testing Data: It is the data whose output is to be predicted by passing to the model.
- Target: The output of the input variables
- Regression: They are the techniques used to predict the real numeric values
- Classification: Categorizing the data into predefined classes.
- Overfitting: A model is said to be overfitted if it highly fits on training data but gives poor prediction for new input data.
- Underfitting: It is just the opposite of overfitting.
- Regularization: Regularization is the method to estimate a preferred complexity of the machine learning model so that the model generalizes and the over-fit/under-fit problem is avoided.
- Hyper-Parameter: They are the parameters in Machine Learning whose value is set before the learning process begins.
These are some basic and most common terminologies for understanding any materials on machine learning. However, there are still many other terminologies that we will cover in respective topics in further articles.