In Blog Machine Learning Algorithms:Beginners Guide Part 1, we have seen machine learning algorithms and its 3 broad categories. In this section, let’s learn the sub-types of Machine learning algorithms.

A question may arise in your mind, when using a wide variety of machine learning algorithms, “which algorithm should I use?” The answer to the question varies depending on many factors, including:

  • The size, quality, and nature of data.
  • The available computational time.
  • The urgency of the task.
  • What you want to do with the data.

For 3 broad categories, there are many subtypes of algorithms. Lets see, Top 10 Machine Learning Algorithms.

A: Supervised Learning

1. Decision Trees:

It is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance-event outcomes, resource costs, and utility.

Decision Trees Supervised Learning

A decision tree is the minimum number of yes/no questions that one has to ask, to assess the probability of making a correct decision. It allows approaching the problem in a structured and systematic way to show a logical conclusion.

2. Naive Bayes Classification:

This classifier is based on applying Bayes’ theorem with naive (strong) independence assumptions between the features. Following is equation for Naive Bayes:

 Naive Bayes Classification Supervised Learning

P(A|B) is posterior probability, P(B|A) is likelihood, P(A) is class prior probability, and P(B) is predictor prior probability.

Some of real world examples are:

  • To mark an email as spam or not spam
  • Classify a news article about technology, politics, or sports
  • Check a piece of text expressing positive emotions, or negative emotions?
  • Used for face recognition software.

3. Linear Regression:

The representation is a linear equation that combines a specific set of input values (x) the solution to which is the predicted output for that set of input values (y). As such, both the input values (x) and the output value are numeric.

While doing linear regression our objective is to fit a line through the distribution which is nearest to most of the points. Hence reducing the distance (error term) of data points from the fitted line.

 linear regression

4. Logistic Regression:

Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.

 logistic regression

In general, regressions can be used in real-world applications such as:

  • Credit Scoring
  • Predicting the revenues of a certain product

5Support Vector Machines:

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification or regression challenges. However,  it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyperplane that differentiates the two classes.

Support Vector Machines

6. Ensemble Methods:

Ensemble methods is a machine learning technique that combines several base models in order to produce one optimal predictive model.

 Ensemble Methods

B: Unsupervised Learning

7. Clustering Algorithms:

Clustering is the task of grouping a set of objects such that objects in the same group (cluster) are more similar to each other than to those in other groups.

Clustering AlgorithmsK-means is a popularly used unsupervised machine learning algorithm for cluster analysis. K-Means is a non-deterministic and iterative method. The algorithm operates on a given data set through a pre-defined number of clusters, k. The output of the K Means algorithm is k clusters with input data partitioned among the clusters.

8. Principal Component Analysis:

Principal Component Analysis (PCA) is used to make data easy to explore and visualize by reducing the number of variables. This is done by capturing the maximum variance in the data into a new coordinate system with axes called ‘principal components’. Each component is a linear combination of the original variables and is orthogonal to one another.

Principal Component Analysis9. Singular Value Decomposition:

Singular value decomposition is a method of decomposing a matrix into three other matrices:

Singular Value Decomposition


  • A is an m × n matrix
  • U is an m × n orthogonal matrix
  • S is an n × n diagonal matrix
  • V is an n × n orthogonal matrix

For a given m * n matrix M, there exists a decomposition such that M = U?V, where U and V are unitary matrices, and ? is a diagonal matrix.

10. Independent Component Analysis:

Independent Component Analysis(ICA) is a much more powerful technique that is capable of finding the underlying factors of sources when these classic methods fail completely. Its applications include digital images, document databases, economic indicators, and psychometric measurements. ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples.

Last Note: Although there are many other Machine Learning algorithms, these are the most popular ones. If you’re a beginner to Machine Learning, these would be a good starting point to learn.

References :



4701 Patrick Henry Drive,
Bldg. 16, Suite 106, Santa Clara, California 95054

Development Center

P3-603, Pentagon Tower,
Magarpatta City, Hadapsar, Pune, Maharashtra 411028