Hello,
I am mohit from LearnVern.( 6 seconds pause ; music )
In our previous session, we studied splitting the Data into Training and test sets. How we can separate training as well as testing data from one single data itself, before putting them into a machine learning algorithm.
So, today, we are going to see about 'Normalising the datasets', wherein we see that there are different values in a data which many times needs to be normalised.
A little introduction on this we had already discussed while learning about data wrangling.
So, let's see and understand, as to What happens in Data Normalisation?
In data normalisation, if we have values such as 1 ,2 and 100 in any column,
So, here we cans see, then between 1 and 2 there is a less magnitude,
But when it comes to 100, then magnitude is quite large.
So, what happens in that !
when such type of data goes in the algorithm so it happens that the higher values dominate over the smaller values.
Right?
That means the effect of smaller values is decreased and the effect of higher values increases.
So, this shouldn't happen at the time of processing.
So because of all these reasons, we will transform or change these values and bring them to one scale.
To understand this, we will use an example, so here you can see that I brought the values between the scale of 0 to 1.
1 becomes 0.0097
2 becomes 0.0194
And 100, becomes 0.9708
You can see here the value of 1 has decreased, but this decreased value is still bigger than zero. It is above zero but below one.
And as 100 is really a great value, even that also lies under 1.
So, all the numbers in this way come under 0 to 1.
So, this is known as re - scaling.(pronounce - Re - skelling)
(01/56)
Let's see what are the techniques that you can use for rescaling?
In my next slide, I am telling you about techniques 1 2 and 3
First, technique is Maximum Absolute Scaling,
Second, is Normalisation,
And Third is Standardisation.
So, we will begin with the first technique
Maximum Absolute Scaling.
It might happen that someone hearing the word Maximum Absolute Scaling for the first time, might get afraid with this big word to understand.
But, this is really simple.
So, to understand this first let's see if suppose you have some values,
For instance, we have 10 values, along with that you have other values like 9 and 8 in the same column.
So, the 'maximum Absolute values' where in the maximum values divide the other values.
So, here the maximum value is 10,
so it is going to be divided by 10,
And absolute meaning we are not going to look whether it has positive or negative signs, and remove them instead and then divide.
Here, you can see that any x value is there, you will have to divide that value with the maximum value of it, and use only it's magnitude,
So, in 8,9 and 10, 10 is maximum.
So, here we will get 8/10 as first value,
9/10 as second and 10/10 as third.
So,in this way whether the calculated value that we get is positive or negative, but it is going to lie between the values of minus 1 to 1.
So,in this way, we will receive the data in the same range by rescaling it.
So, this is the first technique of Maximum Absolute Scaling.
Now, we will move towards the second technique.
So, the second technique is Normalisation, which is also known as Min-Max Scaling,
so by looking at the name itself we understand that there is a certain minimum value and certain maximum values involved.
So, below in the formula, you will understand how we receive this value.
For instance, if we take the same example of 8, 9, 10, so here the minimum value is 8.
we will write 8 minus 8 divided by maximum minus minimum, maximum is 10 and minimum value is 8,
10 minus 8, it becomes 2 , the first value becomes 0 by 2.
Second,9 minus 8 means 1, here, maximum minus minimum, the range will be the same, that is 2.
Then, the value becomes one by two.
Accordingly next will be 3/2.
So, in this way in min max Scaling we will receive a value between 0 to 1, because the first value which we received was zero, second value is bigger than zero, so we will receive the values between 0 to 1 after rescaling.
And the difference as it was too large or too low, will not be having that much of a difference, if we calculate euclidean distance or normal distance, then, this distance will be minimised and limited between zero and one. This was the second technique of min- max scaling.
Let's see the third technique that is Standardisation, or this is also known as z score.
What does this do?
First,here you have to find the sum of all the values and then divide by the total number of values and find its average.
For instance,
If 8,9,10 is the value then we will remove its sum and then divide by 3.
Now, after this we will have to do 8 minus average and divide by its standard deviations, so, for 9 it will be 9 minus average and divide by its standard deviations.
So, in this way this third technique works.
So,the main purpose of these techniques is to reduce and limit the difference that is there between values.
So, we will practically see all these techniques in our next session.
We will stop our today's session here only.
So, it's other parts we will see in our next session.
So keep learning and remain motivated.
Thank you.
if you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.