Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Regression Algorithms

Regressor Model Selection

17.7k

Start a new search

To find content from modules and lessons

Overview

Hello,

I am (name) from LearnVern.

In our previous tutorial of Machine Learning, we saw decision tree regressor,

Now, today we will see as to How we can select or choose the Regressor Model?

Because in Machine Learning, when we are applying this on the dataset, then this selection becomes really important.

So, let's see what are the methods or measures, with which we can choose or decide the model that we should select.

Firstly, we should know why we should understand this.

This is important because we want goodness of fit, now, what is this goodness of fit?, I will explain this to you in this paint window,

Assuming that I have some data points, now for these data points, we need to do prediction; so for that a Regression line will be fitted in between; Regression line will fit in something like this.

So, this regression line only will predict the future new values.

This is our x axis and this is y axis, so whatever new inputs that we will get ahead, this corresponding line will be its prediction or the output or the y value.

So, this line that we fit; how we can fit that in an optimal or best way, that is our objective.

So, in model selection we will have to select that model which is a goodness of fit, which can give us the most proper line in fitting.

So, let's see what are the techniques that are there?

First, is probabilistic measure,

Next is Resampling method,

Apart from these, we will discuss two more techniques, one is AIC, and the other is BIC.

Now, first is probabilistic method, where we deal with insample error complexity, now the meaning of insample error complexity is that supposingly you have a dataset, and this dataset is something like this, so you have a complete dataset, and from this if we took some parts as sample and display them, in a diagram format like this, so from this I have created 5 samples,

So, the meaning of insample is that the sample is created from within the data itself.

So, the probabilistic measure works on the basis of this insample measure, where we run our algorithm, and then we pass this insample itself as testing data.

Upon this we see as to how much error or complexities are coming.

So, this is the first approach that is the probabilistic approach.

Second is Resampling method

In this we have out of sample error, this means that a new input has come out of the present data available with us, now on the basis of this new input we are testing the algorithm, so this approach is called as "choose a model via estimated out of sample error", that means out of the sample.

So, these are the two techniques for the beginning

Next, technique is AIC, Akaike Information Criterion, this works on frequentist probability, we will just see ahead as to what is this?

Next is BIC, which is the Bayesian Information Criterion.

So, this works upon Bayesian Probability.

So, these two give us a certain score.

Now, we will see the AIC method, so here when scoring is conducted the formula that is used is; AIC is equal to minus two divided by N multiplied by LL that is log likelihood plus 2 into K by N.

So, this is the formula to find AIC.

Now, if we see after this, what happens in Bayesian Information Criterion, then here

BIC is equal to minus 2 into log likelihood plus log off N into K.

So, both these formulas basically give us a score.

So, this score should be low, the lower the score the better is the performance of that particular model.

After scoring then we have to perform selection, so what do we want here? We want a lower score, so we will see this practically as to How it is done?

Now, after this we will discuss the Resampling model. In this Resampling we have one option, first is random state train test split, meaning we have an entire data, from it we will randomly select the data for dividing them into train test data.

Next we have Cross Validation - K fold and Bootstrap.

Now, we will understand this also,

Now, what happens in K fold, so in this we split the data that we have, into Kfold, for instance we have 5 fold, and then we will take each of them as a different sample.

As I was showing you in the paint window before,1,2., 3, 4, 5. so this K fold is the 5 fold in this way.

And, "where each example appears in a test set only once".

So, here one set at a time will be sent for testing.

First we will send the first one, 2nd one, 3rd one, 4th, uptil fifth.

So, this is K Fold cross validation technique.

Now, after this what is the procedure?

First, we will shuffle this, as we don't want the data to be selected as it is arranged in the original dataset in that sequence.

Like we have thousands of data, so we will pick the first 50 only, we don't want that.

So, we will completely shuffle this.

As, you must have played a card at any time, so you might be aware that we have to shuffle the card, so in the same way we shuffle the data.

Then we divide the data into K groups, as I divided them in 5 groups. So we divide them into K groups, and now each and every data group is treated as the test data, and then the data is sent for testing, and the rest of the data is used as training.

So, this is what happens in K Fold cross validation.

After that we apply the algorithm upon it, and then after that we take the evaluation score, and on the basis of the evaluation score we decide whether to keep or discard the model on the basis of this score.

So, in this way K Fold cross validation works.

Next is bootstrapping technique, in this we have Resampling technique, so as I showed this sample here, if we see, assume that I have chosen the sample all separately, one from here, second from here, third from here, and so on, so in this way.

But, in Resampling with replacement, here what happens is that, in your entire dataset, suppose this is your dataset, in this you selected a sample which is chosen randomly, and again you took another sample, in this also it is selected randomly, now here the possibility is that in these there can be some data which are matching that is they are repeated, or it is also possible that it is not repeated.

But here there is no strict rule that the one selected first will not be taken again, so this is known as Resampling.

So, here in replacement what happens is that we can take the data from the earlier ones also in the new ones, but if we perform without replacement then each and every sample is selected separately, and no records are repeated in them.

So, in bootstrapping technique here we do Resampling, and take the samples again and again, and this Resampling is performed on the basis of replacement, and then we perform testing.

So, this is bootstrapping technique.

Now, we will understand its process in steps.

Choose a number of Bootstrap samples, for instance if you want to take 5, then accordingly select 5.

Next, "for each bootstrap draw a sample with replacement", so first take one, and then at the second time the earlier records will also be prepared for selection again, and we can choose from it also, so in this way you will select the sample.

Now, the next step is to calculate the statistics, so apply the algorithm and calculate the statistics.

Then calculate the mean of the calculated statistics.

And on the basis of this you will decide whether you want to choose the model or not.

So, these were the techniques that we saw, with which model selection can be done.

Now, we will move towards our next topic that is evaluating Regression model performance.

So, keep watching and remain motivated.

Thank you.

If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.

See More

Learner's Ratings

4.5

Overall Rating

78%
11%
0%
6%
5%

Reviews

A

Aryan Ambat

5

Yes

Z

zeyana Fathima

5

thanks for giving this wonderful course in a understandable way please provide the details from where can i get the datasets

L

Losika Nicholas

5

were can i get the dataset

K

Kumar Madduru

5

Thanks for giving this course

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

51973

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

19470

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate