Hello, I am (name) from LearnVern.
You are welcome to the course of Machine Learning.
This tutorial is a continuation of the previous session only.
So, let's see ahead.
In today's Practical we are going through simple linear regression.
We have understood about regression, that it works upon continuous variables.
Where output is based upon continuous variables.
For example if I am running, the speed that I am running is the output.
So, speed is a continuous variable, so I can run at a speed of 2km per hour, 2.6 km per hour or 2.5 km per hour.
So, this variable is not something which is fixed, so that is the reason it is called regression because of the continuous variable.
So, let's go through an example, to understand it, thereafter we will take a dataset online from UCI Machine Learning repository, and perform on that.
So, I will show a very small example,
For that I will first, import numpy as NP
Then, from the S K learn dot linear model, from here I will import the linear regression model.
So, I have imported this.
Now, for X I have an NP dot array, and here I will take a list and pass some values, so 1,1 then 1 comma 2 , we will add 4 more values, Ctrl + Z, let's have more lists where we will add 2 comma 2 and at the last 2 comma 3.
So, this is our X created.
Now, I will create Y, so as I have 4 values for X similarly for Y also I will have 4 values.
Here, for Y we will take NP dot dot.
So, here we will take out a dot product, and in the bracket we will put an X NP dot array,so we have used one array function, and here we passed one list as 1,2 and at last we will also do plus 3.
So, we created Y by just putting a dot product.
So, you can see this is my Y.
Which is 6,8,9, 11 which is based on the input value that we have given.
Now, we have already imported from S K learn dot linear model and imported linear regression, so now we will create an object for this.
So here LR, linear regression is LR, is equal to linear regression, so this is our object.
After creating an object, now we know the steps.
Here, the brackets didn't come properly, so formed an object.
And we will fit the data in LR dot fit, so this time our data is in X and Y format, so we have fit this.
Now, after that we can remove prediction in X and Y, so LR dot predict, and in this we will have to pass, so NP dot array, and in one more bracket I will pass a single element
So, I am passing 3 comma 5.
So, here we got the output of this as 16,
So, whatever values you will pass, the model will predict the value, because our model has learned everything.
Here, also we have removed the value of Y based on a formula.
So similarly now it knows the relationship between input and output, so it has learned on this basis, and gives us output.
So, this is the way we implement linear regression.
So, let's implement now
So, we will take this dataset from here, this is a wine quality dataset.
If we read this, so dataset characteristic; this is a multivariate dataset, then attribute characteristic; it is real time.
In association task; we can perform classification as well as regression.
We have 4898 instances, 12 attributes, and this was donated in 2009, and this comes under business category, and this much is the web hits, as there are no missing values.
Now, whenever you take a dataset, first understand it by reading its description.
For example , here the two datasets are related to red and white variants of Portuguese vino Verde wine, so this is based on a wine.
And for more details, you can go here.
The dataset can be viewed as a classification or regression task, the classes are ordered and not balanced, for example there are many more normal wines than excellent or poor ones.
So, it is not balanced.
Outliers detection algorithm could be used, fewer excellent or poor wines, also we are not sure if all input variables are relevant, so it could be interesting to test feature selection method.
So, we can perform a lot of tasks on this dataset.
But for now we will work only for our objective.
Here, attribute information is given you can see that, from 1,2,3,4 up till 12.
Fixed acidity, volatile acidity, citric acid, residual sugar, chloride, free sulphur, so there are many parameters.
So, I am not someone who has a lot of knowledge on this, so I will not go in depth.
But I know that these are input features on whose basis this quality between 0 to10 can be scored.
-07:01
So, let's download this dataset.
Here I can see these two, but I will download only one, and have put this Red wine for downloading.
Now, we will come back here, and upload this dataset.
So, this is the wine quality dataset that we have uploaded now.
Now, after uploading we will work upon this dataset now.
Now, I didn't import the pandas library in the beginning, so I will import pandas as a PD.
And here data is equal to PD dot read CSV.
Read underscore CSV…and here in the bracket I will give its pass.
So, copy the file path and let's see now.
Now, let's see what is there in our data now.
So, this is our data.
So, here seeing this, it looks like this D limiter is different from usual, so it's a semicolon, so here we will put one more comma, then it will give automatic suggestions now, so What is this separator here?, so S E P (separator) is equal to, so I think this is the separator.
So, now let's see this.
Now, it has identified it properly.
And in the end it has given quality, so this is a better and correct format.
So, here we can see it has 1599 records.
So, now let's apply this.
So, we will create one more model now.
So by the name LR1 we will create a model,
So, LR1 is equal to linear regression, and has created one more model.
Now, if I see this particular dataset, so data dot iloc, so here all rows and columns also all but except the last one.
So, we have 12, so we will go from 10 to 11, and we will have to leave 11th one.
So, we will check this once, so you can see it is coming up till alcohol, and quality is removed.
So, we will pass this much of the data as X input,
And Y will be in the same way except the column will be 11 th one.
So, this is our X,
And Y is equal to again data dot iloc and here, all the rows will be there, but only the 11th column will be in Y.
By mistake I wrote 9 instead of 11. Alright.
So, this is our X and Y.
Now, we will have to train LR1.
So, LR1 is equal to dot fit and passes X and Y.
Now, LR1 dot predict and here you can put any input and see, so data dot iloc and here you keep all the rows, then colon 2.
So, there is some error,
Let me put the entire X here, as we had put in the form of X and Y here .
So, lets remove this control Z,
Here I will put complete X
And predict for this.
Now, we can see this same thing by plotting, or we can find its accuracy.
Now, to find the accuracy, here I will put the LR1 dot score, I will just remove its score, and in this score I will put X and Y.
So, this will remove one score for us.
So, 0.36 is a very bad score, not a nice one.
So, this is the way we find out regression.
So, here we removed the prediction and we did not focus on accuracy, so we will focus on accuracy, and check it.
So, till then you keep trying on this same dataset, and also on the other datasets, after downloading.
So, friends let's conclude here for today, and it's further parts we will cover in our next session.
So keep learning and remain motivated.
Thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.