Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Data Preprocessing for Machine Learning

Splitting Dataset

16.7k

Start a new search

To find content from modules and lessons

Overview

Hello,

I am mohit from LearnVern,

In our previous session, we saw about Encoding Categorical data,

Means how we can encode categorical data

For example , we will put cat as 1, dog as zero.

Likewise we learned about different encoding techniques.

In this session, we are going to see how we can split the Data into training and testing sets.

So, let us see what the actual requirements are.

Now, when we have to train the machine learning algorithms, we need Datasets to train them.

Once we have created the machine learning model after training it's algorithm, then we would also have to test it, right?.

So to test it, we should have some more Datasets left with us in spare, by using it we can test our Datasets.

Many times it happens that we just have one Dataset, and divide that same data set into a training set and a testing set.

So, in this way, we create two Datasets from one Data.

So, training Data works as an experience for Machine Learning's algorithm.

For instance, we human beings imbibe within us a lot of experiences everyday, by learning new things, watching something new everyday.

When a new situation happens to come in front of us , during that time we use these old experiences to deal with this new situation.

In the same way, an algorithm acquires experiences from all the old data, by learning from it.

After the learning is done, the next important thing is testing the Datasets.

At this time, the inputs are new for the algorithm based on which it has to give predictions in its output.

(02/00)

Accordingly the algorithm on the basis of its learning from old data, performs to give predictions to this new test data.

And along with this we also check the accuracy of that particular algorithm.

And find out how much the algorithm learned from the training data sets and at what percentage rate it can perform in the given test data.

So, this we find in percentage.

You will see that here the size of training datasets is always large and testing data is always small.

The training dataset is always kept approximately around 70-80 percent.

And the remaining 20-25 or 30 percent of the data is kept for testing.

In this way, the data is split.

So, let's see it practically,

For splitting data, we can take up a manual method,

or we can also choose sk learn for it,

or sometimes the data itself comes in an already splitted form, where we don't require to do anything.

friends, let's conclude here for today.

It's further parts, we will see in the next session.

Till then keep learning and remain motivated.

If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.

Thank you

See More

Learner's Ratings

4.4

Overall Rating

71%
14%
0%
7%
8%

Reviews

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

M

Muhammad Nazam Maqbool

5

Absolutely good course... will suggest it to everyone. has superb content that is covered in a fantastic way.

S

sushma Yadla

5

super course and easily understanding and Good explaned

R

Ruturaj Nivas Patil

5

Very well explained in entire course. Great course for everyone as it takes from scratch to advance level.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

50028

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

18642

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate