Hello I am (name) from learnvern. ( 6 seconds pause ; music )
In the continuation to the previous tutorial of machine learning, we will watch this tutorial ahead. So let us get started.
In today's tutorial of machine learning we will see apriori algorithm.
It is also known as the Market Basket Algorithm.
This apriori algorithm assists sellers, retailers storekeepers and many franchises and brand outlets to help in upselling.
How to arrange products, where to position them ,in all these things this helps a lot.
So, let's see how we can apply this algorithm.
Let's start with uploading our data set.
Here, we will load the data set named market basket optimisation.
This data contains transactions, meaning the way, we buy some set of items then those set of items is entered in a bill.
In the same way, when 100 people, 200 people, thousands of people are buying a set of items,
We have imported transactions and a number of items for each person and brought that data here.
let us quickly import the library, so import numpy as np and after this we will import the other library import PANDAS pandas as pd and then, (read slowly, typing)
Let us import the third library import matplotlib dot pyplot as p l t.. (read slowly, typing)
These are the initial libraries that we require.
Clear guys !
(02/00)
And now, we will read our data set and from here I will copy its path and
here the dataset is equal to (read slowly, typing)
and here we will preprocess the data so dataset is equal to pd dot read underscore csv and here we will put the data.
Now for this data, I know that I do not have a header and
I will put a comma here and HEADER, header equal to none.
so that it does not take the first row as header.
Here, we have loaded the data set.
Now, I will just show you the data set once.
dataset dot HEAD head and (read slowly, typing)
here, you will be able to see that in the dataset dot head we have .
see this 0,1,2,3 are the column names and just see here , we have different item lists.
This first person here, 1,2,3,4,5,6.. has bought many items, he has bought 19 items .
And this second person has just bought 3 items , the fourth has just bought one, fifth has bought two,
wherever there is NA written those items people have not bought.
In this way, we can understand that wherever there are values , items have been bought and wherever NAs are , they are missing values and that person in place of that NA value that item has not been bought by that person,
which means that this person has bought only these three items. So these are item lists, OK.
I hope you are clear up till now.
On the basis of this we will apply our algorithm.
Now, what is the logic in this, there is a very simple logic,
whereas you can understand this logic in the conceptual section again
but mathematics, I will make you understand now in more detail.
See, here there are burgers, meatballs, and eggs.
I will check whether there are burgers , meatballs and if I find burger , meatballs somewhere here then,
I will understand that burgers have been bought by him also, him also and him also.
If a burger is being repeated like this, I will understand that a burger is being bought by many people..
Let's suppose that with burgers , meatballs are also selling , that also many people bought burgers and this also,
The one who is buying a burger, he is buying meatballs also.
This is how we try to identify associations like if one person is buying a particular item then how many more people are buying that item.
If one person is buying an item and along with it buying another item then how many more people are following the same pattern.
This way we identify here, Ok.
(04/41)
Now, let us now move ahead to its implementation…
Now, we will take a transaction list here, which will be an empty list.
Here, we have taken a list named transaction.
Now, for i in range , 0 to 750, ok.. (read slowly, typing)
Now, why are we saying 750.
How many numbers of records are there, ok.
So, in our dataset… we must see the number of records in it.
Here the number of records is 7501.
How many ? 7501,
Here also, we will put 7501, seven thousand five hundred one.
So for i in range 7501, what will we do?
Here, we will transaction ok,
Yes, give the transaction name and do transaction dot append .
By appending row after row will keep appending in it , and what will we append, we will append , inside the list in string format.
we will append dataset dot values and which values, i comma J Ok. I comma J and here, for J in RANGE, range 0 to 20, OK. (little slow while speaking)
So, what we are doing here , just see,
what we are doing here is, we are basically inside a transaction putting all the data
we know how much this is, see 0 to 19.
so that is why we have given 0 to 20 J.
J means column , one column after another. J is column .
I mean row.
In the first row, it will first scan 20 times whether there is any item for 20 times
and if there is an item then it will convert it to a string and store it in the list, OK.
So, let us now see transactions for you to experience. what is there in transactions.
In transactions, you will see that data is presented in this form, OK.
So data with NA, NA also is displayed.
In this way, a list is formed.
Now, we have data on transactions which is in list format…
(07/36)
Let us now move ahead.
Now advancing ahead, we will have to create an apriori algorithm module.
For the apriori algorithm, the library I would use this time is apyori. (pronounce ; a pa yo ri )
So I will first install this apyori.
Pip install, because it is not a general algorithm so i will have to install it.
So, A P Y O R I .apyori.
We installed apyori and after installing it, I will import it .
So from, see that has installed successfully.
From A P Y O R I apyori import,
we will import A P R I O apriori , see here incomes apriori,
see this is apriori and we will import apriori.
This, we have imported.
Now, we will make an object of this apriori,
but better than that is, we directly make rules because what this algorithm is for, this algorithm is for making association rules…
Understood?
We will directly make rules.
So, how we will make it is, rules is equal to A P R I O R I apriori and here, we will give transactions,
so the list of transactions that we have made that list we will pass to it.
TRANSACTION ,yes transaction we have passed.
After the transaction, we will have to give it minimum support.
So, let’s move forward !
How much minimum support should we give here?
For minimum support, we will keep the value a little less.
let us keep minimum underscore support equal to 0.003.
We keep minimum support a little less so that we can get more association rules.
And how much minimum confidence do we need, C O N F I D E N C E confidence,
so let's keep minimum confidence to 2% , 0.2 percent.
and after this, what minimum lift should we specify,
minimum lift will be MIN minimum lift we will specify , M I N minimum lift, that we will …specify underscore L I F T lift , minimum lift we will specify is three , OK
And for one rule minimum length should be, what will it do with ?
It will present it for single single also, so not like this, so minimum length should also be two but I will keep three.
We will keep the minimum length of three that means three items should be displayed together.
So, this is how I have made this model and this model will make what model, this model will basically find rules.
Here, let us check the name of our transaction , so T R A N S A C T I O N and from here only I will copy it, ( 5 second pause)
I copy it and paste it here to avoid typo errors and then execute it.
We have just executed it and you can see that it is processing and it has successfully executed also.
Now after getting executed, just see in these rules, in these rules you can see that data would have been stored, see this rule, here this is a generator object ,
This has become an object, isn’t it?
(11/09)
So from this rule we will have to extract data, we will have to bring that data.
What we will do is,
we will convert this rule into a list, alright.
we will convert it into a list,
let us name it result, results is equal to lists.
And in lists, let's do typecasting of RULES rules.
We have done typecasting.
And after doing typecasting,
Now, what we will do is make one more list.
So, R E S U L T results underscore L I S T list.
And we will consider it an empty list.
And in this list we will append all data one by one.
like we had done in transactions a while ago,
In the same manner, we will do it here.
For i in R A N G E range 0 to LEN length of results R E S U L T S results , length of results, what we will do is,
We will here R E S U L T S results underscore L I S T list dot append,
we will append here, so in append, what all are we appending .
See, first of all here rule,
what rule do we have,
In rule, we have +, converted into string format and after converting in string R E S U L T S results of i and 0 (zero) ,OK.
This rule will be formed.
Now, what is after rule, support is after rule OK,
After the rule, it is support.
we will also display Support here.
- (plus) and slash N or…. Ok.
let's put slash N, should not be a problem.
Here, S U P P O R T support.
And here after giving space by putting one more + ,
We will get another support displayed,
that also we will have to take in string , S T R,
In string format, because when we are putting + (plus) symbol as operator.
We will have to take it in string format.
(13/48)
Here results, again i but what will come along with it, along with it was zero earlier so this first column will come here.
Now, let's execute it.
And now the result list that we see,
We make a results list and we execute it.
Now. let us print it, print R E S U L T S results L I S T lists , enter.
So here, you can see what rules we have.
OK, this slash N is not working here so we will just check this slash N as to why it is not working.
So, it is not working,
We just remove it.
We remove slash N so we have data in this form.
So, Rule, and here the frozenset is a rule , light ice cream, chicken , OK..
this is a frozenset, how much is the support 0.004 because I had specified 0.003.
Now see the second rule,frozenset, then here escalope (pronounce ; es ke lo p) and mushroom sauce,
Here, we can see all the rules. Just see.
All rules can be seen in the results list.
So in this way, we can get association rules.
Now, you would think as to what is its use.
It’s use is that, now you know that, if somebody buys spaghetti then,
he will also buy frozen vegetables, milk and chocolate, which means there is a great chance that he will buy.
Well, let me increase the support here, I will increase the support, I will make it 0.005 or 0.5 but this will be 50 % , we will keep it 0.1, 10% , OK.
Now, let us generate the rules.
See, I have now generated the rules.
This generating rule has become difficult so I will keep it at 0.01 , 1% since at 10% it was more. let's see in 1% how many matches do we get.
See for 1% , herb and pepper and ground beef, these people are buying.
Herb and pepper, N A N and ground beef also people are buying, OK.
So you should keep it together.
So you have to tweak.
How much minimum support and minimum confidence, you want.
And you will get the answer accordingly and these answers will help you make a strategy.
These are rules actually as to what items you should keep together. Alright,
Got it , friends…good !
let's make it 0.003 again and we will make the minimum length as 2 and so here are your association rules being generated.
In a few seconds, it will be generated.
So, let's wait for it, yes it has been generated.
So these are your association rules.
According to this you can place your items.
It could be any item, these are some eatables but you could have some different items which could be toys, books , clothing but rules will be formed in this manner only.
So I hope you would have understood this and you will apply it in more different data sets.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.