Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Unsupervised Learning Algorithms

DBscan Spatial Clustering

16.7k

Start a new search

To find content from modules and lessons

Overview

Hello,

I am (name) from LearnVern. (7 seconds pause, music)

We will study in continuation of our previous session of Machine Learning,

So, come let us see today DB Scan algorithm,

DBscan, meaning density based spatial clustering,so this DB scan algorithm for application with noise,

So, this algorithm works on the basis of distance, though you know that all algorithm works on the basis of distance only,

So, we will see its parameters,

So, this algorithm over here, will find code samples with high density and expand the cluster from them.

How is it going to do that?

Here, we have a parameter called EPS, "this parameter will find the maximum distance between two samples, and one to be considered in the neighbourhood of the other".

So, if I have 3 data points, then I will take one datapoint as the sample to find the distance between the other two samples

So, how much is the maximum distance that needs to be found, like it can be from the centre also, so that we need to find and check. right?

So, the point that will have maximum distance, will not be the neighbour,

The point with less distance will be the neighbour. right?

So, here we can modify this parameter by ourselves, so this is the flexibility that we have,

So, now we have 0.5 as the default value of this parameter,

So we will see this later at the time of implementation, when we will perform hyper parameter tuning then if needed we will change the value that time.

So, the next parameter that we have to change is, the minimum samples, so how many minimum number of samples "we have to take in a neighbourhood for a point to be considered as a core point".

Meaning for instance if I have a total of 5 points, then how many points should I consider?For example If I take 3 points, then these points will find the distance among themselves, and after finding the distance, it will identify the centre point.

So, here this is also changeable, and by default its given value here is 5, which we can increase or decrease as per our need.

So, with the help of these two parameters we will calculate the distance, and tweak it.

Here to calculate the distance we also have metrics, here we will calculate pairwise distance,

So, let's implement this and see, as to how this DB scan Algorithm works?

So, in the first we will import all the libraries that are needed, such as

Import numpy as np.

Import pandas as pd.

So, numpy, pandas, matplotlib, then sklearn dot cluster, in this we have taken DBscan, then we will have to perform preprocessing that is the reason we have imported standard scaler, then we have taken normalise, so both of these that is standard scaler and normalise, basically works to limit the range for the values having larger differences in between them.

Along with that we have also imported sklearn dot decomposition, import PCA principal component analysis.

So, these are the things that you will learn ahead in the course, that is the reason, for now, you just have to know that standard scaler and normalise, they basically help in reducing the larger differences between the values, and here PCA holds the capability of reducing the values, say if we have 30 features, it has the capacity to reduce it to 30 features, so this is also a technique of preprocessing in Machine Learning.

Now, we have the data which is credit related data, and I will try to upload it, so this is my data and I will upload this..

So, till then I will execute the step one,

After that x is equal to pd dot read csv, so here I am reading this, and here I will also let it display, so that we can see the data.

So, this way I displayed the data also.

So, you can see this is the data that we have.

Customer ID, Balance, ok!

So, here I will put data dot INFO info, only directly, so that we can see.

So, here you can see..

So, inside info, I have not given it as a method, so I will write it in method form.

So, we have customer ID, balance, balance frequency, purchases, so like this we have 17 parameters.

So, as it is Clustering algorithm, so based on these 17 features, so here 17 is a lot of features, so moving ahead we can reduce them to may be 2 features, 3 or 4 features, with the help of PCA, that is Principal Component Analysis.

So, with the help of this we will work on this.

So, here we have already performed two three things.

So, if you have already seen EDA, that means you already must have seen that here we have dropped customer ID and with F-fill, that is forward fill, I have also filled in the missing values, so that in our processes there should be no error and issues.

So, let's do preprocessing in step 3.

So, here we will use scalar is equal to standard scalar,

In X scaled, have put in scalar dot fit and put all the data,

Then have normalise the scale that is given,

And finally have converted this into dataframe,

So, in this way we have scaled our data, then normalised it, and again converted it into a pandas dot dataframe.

Now, moving ahead I want to reduce the dimensions, so we had 17, so from that we removed one, so out of 16, only 2 components or features should remain, so PCA is equal to PCA , n component is equal to 2, because we have to convert it into only 2 dimensions,

So, next with the help of PCA dot fit transform , in that we added normalise X.

Then again, the x principal that we had converted into a dataframe.

Then, here you can see now we just have 2 components, P1 and p2.

You can see from here.

Now, you will look at things like this, you will not understand anything, but this is how algorithms work which you will learn also, so it basically took all the relevant important features, and computed them using a certain formula, and brought something like this in front of us.

And, these are the most relevant values

Now, we will start building the model,

So, to create a model, db default is equal to DBScan, and as I had told you that we can tweak eps and minimum samples.

Here, you can see I have tweaked

So, after tweaking our model is now ready, and it has got training also, through fit method, as I have run the fit method here only, so it has got the training also,

Now, we will visualise and see as to How cluster zero, one, two and three, are forming separate clusters for each one of them.

So, let us see the visualisation, so here you can see that, so here this is our entire data, and on this we have label 0, label 1, label 2 and label minus1.

So, in this way this Clustering is done, that we can see here.

Now, basically if we want to tweak something or make some changes or edit something, so we can do that in Clustering, how can we do that?

by performing as I told you earlier also that we can do hyper parameter tuning also, so here you can see we have EPS and here I have increased the samples, so we increased the samples to 50, and we can increase this also, for instance it is 0.03, so we can make it 0.04, so I have increased this also.

So, I have changed this also.

Now, we will execute this and again visualise it,

So, we will see that there will be some changes visible over here.

In both the clusters of above and below, on them you will see some changes that you will be able to view., you need to see carefully.

So, in this way you can tweak. (X 2)

So, in this way in your clustering, wherever you want to cut and get a better cluster, so there you can stop this tuning, so in this way hyper parameter tuning is also done.

Now, we just saw how through DBScan we can implement special Clustering.

So, friends, we will stop this session here, and it's further parts we will cover in the next session.

Thank you very much.

If you have any questions or comments related to this course.

then you can click on the discussion button below this video and post it there.

So, in this way you can discuss this course with many other learners of your kind

See More

Learner's Ratings

4.4

Overall Rating

71%
14%
0%
7%
8%

Reviews

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

M

Muhammad Nazam Maqbool

5

Absolutely good course... will suggest it to everyone. has superb content that is covered in a fantastic way.

S

sushma Yadla

5

super course and easily understanding and Good explaned

R

Ruturaj Nivas Patil

5

Very well explained in entire course. Great course for everyone as it takes from scratch to advance level.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

50042

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

18645

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate