Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Data Preprocessing for Machine Learning

Splitting Dataset - Practical

17.1k

Start a new search

To find content from modules and lessons

Overview

Hello,

I am Kushal from LearnVern. (pause 6 sec)

You all are welcome in our Machine Learning Course.

This session is a continuation of the previous tutorial.

So, come let's see ahead

So, now we are seeing about,

Splitting the Data into training and test sets, Practically.

Here I have written the steps,

So at first, I will have to upload the Dataset,

Then we will see how we can do manual Splitting.

And then, we will see how we can use the train test split method of SK learn dot model selection for splitting.

So, at first I will click on this folder and then click on the icon of upload, and upload the iris Dataset from here.

So, as soon as this data is uploaded, thereafter I will run my further programme on it.

So, I will copy the path of it.

And I will close this, and go to a new cell.

So, first I want a Library which is

Import pandas, this is a favourite library to handle data especially when the data is structured.

So, now we have imported the library.

After this,

We will use the function called data equal to PD dot read CSV, this function will help us to read this particular iris data,

So, read CSV and in the brackets we will give the path to it.

And in this data, we know that there is no header,

So I will have to mention here, header is equal to none,

So with header is equal to none is mentioned, this header will not be there,

Along with this, I should mention the column names, so for it I will pass the column names as a list over here by mentioning name is equal to sepal (pronunciation: seh pl) length in the first column, then is sepal (pronunciation: seh pl) width, third is petal length and fourth is petal width and in the last, one more column is left which is labels…

So, in this way I am loading the data.

Here I will write data, and execute this and check if we have 152 records.

So, here I can see some Null values.

So, without doing much treatment, I will just simply use drop NA, and for now, I will just drop it.

So, to drop I will mention drop NA, and here inplace is equal to true, and now I will execute this.

And, let's see out of 152 records how many records will remain.

There are 138 records that are remaining which are without null values, on which we can further perform.

(02/40)

First we will see manual…

To see the manual, you should know that in this data, there is an input and output.

So, input is called as X,

And output is called Y…

So, I will first use this data set in X and Y.

X is equal to data dot iloc, is a function through which I will mention here, that I want all the rows but in columns I need from 0 to 3… so if I want till 3… I will have to write 4 over here.

So, I extract X from here, and you can see x has been extracted over here with sepal length, sepal width, petal length and petal width.

So, now I want to extract Y as well,

For which y is equal to data dot iloc… and in this data dot iloc, here I will pass that I want all the rows but in columns, I want only the fourth one, and I don't want any other columns.

Here you will see in Y also, we have 138 records, so in Y you will be able to see 138 records… Now let's move forward.

So, in this way data is divided between x and y.

Now, moving ahead we have to divide between training and testing,

So, first we will divide x,

Training… and Testing…

So, we will begin with x.

So, x underscore TRAIN, train, is equal to x dot, or you can simply write x of, or x dot iloc, that we just used, so here we want the first hundred records in x train.

And, in x underscore test, here we want the rest of the records after 100.

So, here x dot iloc and mention here 100 colon, so in the first one 100 will not be included and in the second one 100 will be included.

So, we have executed this also.

Now when we check the data, we can see in x train, we have 100 records, and similarly when we check x underscore test, here we are getting all the remaining.

(05/13)

So,as we have done x, so we are also required to do y as well,

So for y what are we going to do?

So, y underscore train, so we are giving 80% of the data for training, so it will come from here,

So, here we are directly going to give a colon and 100.

So we will give uptil 100 records to y train.

And here, we will give to y underscore test, y of 100 colon, that is remaining after 100 we will give to y test.

So, here you can see the output of y test in this way.

So, we will see y test also, so it has started here from 114 and gone uptil 151.

So, this is how we can manually do the conversion of training and testing.

Now, if you want to do this automated, we will be required to use some functions.

But before it there was a problem here, which I would like to discuss with you all.

That when we divided this it is done sequentially, all are virginica,

So this type of sequential division is not good for algorithms.

Because in this way the data received for training becomes completely different from the data received for testing.

So this is not good.

Always an algorithm should get data division in a balanced format.

So, in such cases what we do

So, the second method is sklearn.

This Library will help us.

So, let's see with this as well.

So,sk learn dot here we have model selection, inside that we have train test and split.

So, with the help of this we are going to do all the work.

So, here I will write all the 4 variables together

Meaning x underscore train comma, x underscore test comma, y train comma, y test is equal to.

So x train, x test, y train, y test is equal to train test split.

And inside that we will write x and y, along with that test, also the size that we want, so test size is equal to let's take it as 30 percent, and here r a n d o m.

And you can also mention the random value that you want.

So let me put a random underscore state equal to 1.

Now, enter.

Let it get executed.

Now this has been executed.

Now you can see x underscore train, how many records did it have?

So, here you can see 108, meaning 70 percent of the records have come here.

Then, you will check x underscore test, so here it must be having around 30 percent of record.

Similarly, we will do my underscore train and the output will be good.

And, same way y underscore test, we will receive the output for test.

So, this data of training and testing we will use ahead with the machine learning algorithms.

So, friends, let's conclude here for today.

Today's session will end here.

And it's further parts we will cover in the next session.

So keep learning and remain motivated.

Thank you.

If you have any questions related to this course or you have any comments.

then you can click on the discussion button below this video and post it there.

So, in this way you can discuss this course with many other learners of your kind.

See More

Learner's Ratings

4.4

Overall Rating

71%
14%
0%
7%
8%

Reviews

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

M

Muhammad Nazam Maqbool

5

Absolutely good course... will suggest it to everyone. has superb content that is covered in a fantastic way.

S

sushma Yadla

5

super course and easily understanding and Good explaned

R

Ruturaj Nivas Patil

5

Very well explained in entire course. Great course for everyone as it takes from scratch to advance level.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

50693

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

18900

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate