Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Unsupervised Learning Algorithms

Distance Metrics

17.3k

Start a new search

To find content from modules and lessons

Overview

Hello,

I am (name) from LearnVern.

In our previous session of Machine Learning we saw about Evaluating Regression Models Performance.

In which we saw the different types of classification metrics that are available using which we can evaluate the performance or the accuracy of the algorithm that we are using.

Now, we are going ahead to see our next topic that is,

Clustering

So, we are going to first start with its primary understanding of clustering which is Distance Based Learning also known as Distance Based Metrics.

Because the basic meaning of Clustering is grouping,

For example, you saw a lot of stuff lying around, but you don't know their names.

Then your first instinct will start identifying it with their shape and pattern and try to identify it.

As to what is their colour, what is their shape, size.

So with that you will start identifying, as you will proceed in this, then at a point you will start finding some things as similar as the others which you will find as different.

So because of the similarity and difference that you find your brain will automatically start grouping the things which are similar to their cluster whereas some other things it will group differently.

So this is known as distance based learning.

Now you might be wondering where is the distance in this?

So, it is there, show the similarity and the difference that you found in this itself is known as distance, because if mathematically computed this then you will get a distance from this. So let's understand this thing.

So clustering is an unsupervised approach because we have the input, that is X1 X2 X3 which means we have the features but there are no labels in it.

So this is what you are studying in the second point that when you have to work upon an unlabeled data set, we have to identify the patterns and that is the work which our algorithm performs.

The definition of the clustering is "a way of grouping the data points into different clusters, consisting of similar data points".

The explanation that I gave you, this definition is just precisely defining it.

Which means that if there are different types of data points then and it needs to group it based on similar data points, so club the similar data points and those who are not similar club them separately in some other groups.

If you look at the types of clustering then in this we have connectivity based clustering which we also call hierarchical (hi-raar-kikal) clustering.

Second type is centroid based clustering, we perform this through the partitioning method.

Then we also have distribution based clustering.

Next is density based clustering, and this identifies with model based methods.

Then we also have fuzzy clustering and constraint based clustering, which we call supervised based clustering.

So we have multiple types of clustering.

So, to achieve this such as to create them, the primary metrics that we need for it, let's see them now.

So in this euclidean (yu-kli-dyun) matrix is the first matrix that finds the very shortest distance in between two points.

Assuming if you create such a graph, in this particular graph how will you find the shortest distance between A and B.

So in this way by drawing a line you can create the shortest distance.

So this is known as the euclidean (yu-kli-dyun) distance.

And whatever points you have if you want to remove get distance then we can use this formula for it: under root of summation of, from I (aai) is equal to 1 to n, QI minus PI.

That means you will find the difference of each and every point, then square it and finally under root the values and find it.

So in this way euclidean distance basically finds the shortest distance.

Now, next is Manhattan distance; which is basically "the sum of absolute difference between the points across all dimensions".

For instance there are two points in between these points. There are some negative values that are coming, so in this we will not take the negative of it but only take its value and leave aside the negative.

"So sum of absolute differences between the points across all dimensions".

So here,P1 minus Q1, P2 minus Q2, so we took out the distance by minusing them, and thereafter taking out the absolute value, then next we took their sum.

So if you see in a generalised way then the summation of pi minus qi from i is equal to 1 to n.

So in this way we take out Manhattan distance.

Now third is minkowski distance, so minkowski distance is a generalised format of euclidean distance and Manhattan distance.

So you are if you see the formula it is very similar;

xi minus yi, raise to p, and outside it has 1 by p, meaning you need to basically mention the order, as to in which order you want to find out the minkowski distance, if you want to find an order of 3 then you can put the raise to 3, and instead of raise to 1 by P you will put raise to 1 by 3.

Suggest you will only have to decide as to which order we want to find out.

So, this was distance Metrics and the algorithms that are inspired from them are.

Here I have mentioned them.

K means clustering,

Hierarchical clustering,

Apart from this there is,

So, this algorithm basically does grouping and clustering.

Now the next topic that I want to see is of K means clustering, and that we will see both conceptually and practically.

Till then, remain motivated and keep learning.

Thank you.

If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.

See More

Learner's Ratings

4.4

Overall Rating

75%
13%
0%
6%
6%

Reviews

L

Losika Nicholas

5

were can i get the dataset

K

Kumar Madduru

5

Thanks for giving this course

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

M

Muhammad Nazam Maqbool

5

Absolutely good course... will suggest it to everyone. has superb content that is covered in a fantastic way.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

51135

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

19104

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate