Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Data Preprocessing for Machine Learning

Normalizing the Data - Part 2

18.5k

Start a new search

To find content from modules and lessons

Overview

Hello,

I am (name) from LearnVern, (8 seconds gap)

We will be studying the continuation part of our previous Machine Learning sessions.

Now, we will practically see how we can normalise the data.

So, we had already seen that in Machine Learning algorithms,when the data with us is not normalised, which means for instance one value is 100, the other is 1000, and some other is 10000, values are having huge differences between them so we consider the data as not Normalised values.

And we will have to Normalise it.

So, in the first step you can see here that I am importing the ‘pandas library’ over here, because I can create a data frame with the pandas library.

So, in the second step I have created a data frame, where you can see I have filled all the records like 25000, 200, 30.

In this way I have filled all the records and for column names have mentioned A, B and C.

So, now when we are doing display df, for data frame display,

Here in column A we have 25000,18000,9000, 40,000, so you can see there is a huge difference between these values.

So, we will have to minimise this difference of values by rescaling it.

This is known as Normalisation or Normalising the Dataset.

Now, we will visualise and see to understand it even better.

So ,here you can see we do not have proper representation of B and C as they are very small, whereas A is having a dominant representation.

So, we need to normalise the data set.

For which we will move with our first technique

That is, Maximum Absolute Scaling, about which we had already seen in our presentation.

Here we are going to receive values between -1 to 1.

Max and a-b-s means maximum value, which we will make absolute.

as minus 10 will become 10, and if it is 10, then it will be considered 10 itself.

So, we will use absolute values having maximum value, whose formula we had already seen earlier.

So, here you can see I have created a copy of data frame, see i have df underscore copy, where in we have A B C columns.

2:17

So, now, for col in df copy dot columns.

So, whatever columns we have, we have three columns A, B and C.

For which I am running, df copy col is equal to df copy of col divided by df copy col.abs.max.

So, here we will have the maximum value, so 40000 is the maximum value in column A, so here it will be divided by 40000.

So ,we will execute this,

After executing, I will show you the rescaled data frame over here.

So, here you can see the values are, 0.625, then 0.450, 0.225 and 1

So, in this way, 40000 was the maximum value so 40000/40000 gave us 1.

So this means wherever we have 1 that value will become maximum.

So, here in C column, 30 was maximum.

So, in this way we saw that the data has been rescaled completely.

So, we saw the first technique of data Normalisation where we took absolute values and divided all the columns with the maximum number. ok?

Here, you can also see that I have found df dot a-b-s dot max also.

So, A had 40000

B had 300

and C had 30.5, so these were absolute maximum values .

Now, let's see this by visualisation.

So in visualisation, you can compare the older graph with the new graph, wherein you can see that B and C had no presentation at all.

We cannot see them at all, which means a very low level of visibility is present, very light.

On the other hand, we can see the difference is very much negligible.

Presence of very Negligible distance is there.

Now, we will look at our second technique that is known as min max, where we have minimum number and maximum number, and it will form a range, what will we divide?

We will divide it to Maximum minus minimum.

It will scale between 0 to 1.

4:14

For which, here as well, I have created a copy.

which is df underscore copy underscore min max is equal to df dot copy.

So I created a copy here.

After this for confirmation purpose, we can see we have the same data.

We have just changed the formula,

For col in df copy underscore minmax dot columns, so for all the columns we are running

df copy minmax col is equal to df copy minmax col minus df copy minmax col dot minimum divided by df copy minmax col dot max minus df copy minmax col dot minimum.

So ,here we have applied the second formula.

It can also standardised or Normalized our data.

So let's execute this,

And now, we will display and see how the data has been transformed.

Here you can see, 0.51,0.29,0.000,0.1.

So, here we have received the values range between 0 to 1.

And let's see this also by plotting,here also you can see we have received the data after Normalisation.

Now, we will see our last method that is z score.

Don't get confused by the word z score, it's a very simple formula.

Here ,you can directly see I have made a copy here, and after this I have formed a formula,

Which basically means whatever value you have minus that value with the mean value, for instance I had 8,9,10, so you will have to minus 10 with mean value and then divide it by standard deviations.

Just this much is the formula.

So, here you can see df copy z score column minus df copy z score column's mean then divided by df underscore copy underscore z score col dot standard deviations s-t-d.

So, here we have basically applied the method.

After this, you will see here that we will also get some negative values, but they will all come within a range time or a range limit.

So we get the values between them only.

So, let's see this also by visualisation.

So, here also you can see that we are getting the values within a limited range.

6:21

So, these were the three ways , as to How the data can be Normalized.

And this is very important as we need to assure that our computation also works effectively, and also it makes sure that any one value doesn't dominate over other values.

So, keep watching our other topics and keep learning.

Thank you.

If you have any questions related to this course or you have any comments.

then you can click on the discussion button below this video and post it there.

So, in this way you can discuss this course with many other learners of your kind.

See More

Learner's Ratings

4.4

Overall Rating

74%
11%
0%
11%
4%

Reviews

A

Aryan Ambat

5

Yes

Z

zeyana Fathima

5

thanks for giving this wonderful course in a understandable way please provide the details from where can i get the datasets

L

Losika Nicholas

5

were can i get the dataset

K

Kumar Madduru

5

Thanks for giving this course

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

53968

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

20411

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate