 medium.com Go back Open original

# Popular Machine Learning Algorithms

By joydeep bhattacharjee

Any machine learning algorithm is a hypothesis set which is taken before considering the training data and which is used for finding the optimal model. Machine learning algorithms have 3 broad categories -

• Supervised learning — the input features and the output labels are defined.
• Unsupervised learning — the dataset is unlabeled and the goal is to discover hidden relationships.
• Reinforcement learning — some form of feedback loop is present and there is a need to optimize some parameter.

In this post, we will have a high-level description of some of the common and popular machine learning algorithms and have an elevated view of them. I will take up a more in-depth analysis of these algorithms in the future posts. Please note that this post builds up on my earlier post on common machine learning terms, so please take a look at that post before reading this.

• With linear regression, the objective is to fit a line through the distribution which is nearest to most of the points in the training set.
• In simple linear regression, the regression line minimizes the sum of distances from the individual points, that is, the sum of the “Square of Residuals”. Hence, this method is also called the “Ordinary Least Square”.
• Linear regression can also be achieved in case of multidimensional data i.e. data-sets that have multiple features. In this case, the ‘line’ is just a higher dimensional plane with dimensions ‘N-1’, N being the dimension of the dataset.
• Logistic Regression although termed regression is a classification technique.
• Contrary to linear regression, logistic regression does not assume a linear relationship between the dependent and independent variables. Although a linear dependence on the logit of the independent variables is assumed.
• In other words, the decision surface is linear.
• Support vector machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression challenges.
• In SVM, we plot the data points in an N-dimensional space where N is the number of features and find a hyper-plane to differentiate the datapoints.
• This is a good algorithm when the number of dimensions is high with respect to the number of data points.
• Due to dealing with high dimensional spaces, this algorithm is computationally expensive.
• Attempts to split data into K groups that are closest to K centroids.
• This can be thought of as creating stereotypes among groups of people.

The algorithm to implement K means clustering is quite simple.

• You randomly pick K centroids
• Assign each datapoint to the centroid closest to it.
• Recompute the centroids based on the average position of each centroid’s points
• Iterate till points stop changing assignments to centroids.

To predict you just find the centroid they are closest to.