Hello,
I am mohit from LearnVern,
In our previous session, we saw about Encoding Categorical data,
Means how we can encode categorical data
For example , we will put cat as 1, dog as zero.
Likewise we learned about different encoding techniques.
In this session, we are going to see how we can split the Data into training and testing sets.
So, let us see what the actual requirements are.
Now, when we have to train the machine learning algorithms, we need Datasets to train them.
Once we have created the machine learning model after training it's algorithm, then we would also have to test it, right?.
So to test it, we should have some more Datasets left with us in spare, by using it we can test our Datasets.
Many times it happens that we just have one Dataset, and divide that same data set into a training set and a testing set.
So, in this way, we create two Datasets from one Data.
So, training Data works as an experience for Machine Learning's algorithm.
For instance, we human beings imbibe within us a lot of experiences everyday, by learning new things, watching something new everyday.
When a new situation happens to come in front of us , during that time we use these old experiences to deal with this new situation.
In the same way, an algorithm acquires experiences from all the old data, by learning from it.
After the learning is done, the next important thing is testing the Datasets.
At this time, the inputs are new for the algorithm based on which it has to give predictions in its output.
(02/00)
Accordingly the algorithm on the basis of its learning from old data, performs to give predictions to this new test data.
And along with this we also check the accuracy of that particular algorithm.
And find out how much the algorithm learned from the training data sets and at what percentage rate it can perform in the given test data.
So, this we find in percentage.
You will see that here the size of training datasets is always large and testing data is always small.
The training dataset is always kept approximately around 70-80 percent.
And the remaining 20-25 or 30 percent of the data is kept for testing.
In this way, the data is split.
So, let's see it practically,
For splitting data, we can take up a manual method,
or we can also choose sk learn for it,
or sometimes the data itself comes in an already splitted form, where we don't require to do anything.
friends, let's conclude here for today.
It's further parts, we will see in the next session.
Till then keep learning and remain motivated.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Thank you
Share a personalized message with your friends.