Association Rule Learning and the Apriori Algorithm
2015-12-04 23:18
549 查看
(This article was first published on Statistical
Research » R, and kindly contributed to R-bloggers)
Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables. It is often used by grocery stores, retailers, and anyone with a large transactional databases. It’s the same way that Target
knows your pregnant or when you’re buying an item on Amazon.com they know what else you want to buy. The same idea extends to Pandora.com knowing what song you want to listen to next. All of these incorporate, at some level, data mining concepts and association
rule algorithms.
Michael Hahsler, et al. has authored and maintains two very useful R packages relating to association rule mining: the arules package
and the arulesViz package.
Furthermore, Hahsler has provided two very good example articles providing details on how to use these packages in Introduction
to arules and Visualizing Association
Rules.
Often Association Rule Learning is used to analyze the “market-basket” for retailers. Traditionally, this simply looks at whether a person has purchased an item or not and can be seen as a binary matrix.
Association rules use the R arules library. The arulesViz add additional features for graphing and plotting the rules.
For testing purposes there is a convenient way to generate random data where patterns can be mined. The random data is generated in such a way where there is correlation has correlated items.
However, a transaction dataset will usually be available using the approach described in “Data
Frames and Transactions“. The rules can then be created using the apriori function on the transaction dataset.
Once the rules have been created a researcher can then review and filter the rules down to a manageable subset. This can be done a variety of ways using both graphs and by simply inspecting the rules.
Having 317848 association rules is far too many for a human to deal with. So we’re going to trim down the rules to the ones that are more important.
Once again we can now the subset of rules to get a visual on the rules. In these graphs we can see the two parts to an association rule: the antecedent (IF) and the consequent (THEN). These patterns are found by determining frequent patterns in the data and
these are identified by the support and confidence. The support indicates how frequently the items appear in the dataset. The confidence indicates the number of times the IF/THEN statement on the data are true. These IF/THEN statements can be visualized
by the following graph:
Association Rules with Consequent and Antecedent.
This code will produce many different ways to look at the graphs and can even produce 3-D graphs.
We can then subset the rules to the top 30 most important rules and then inspect the smaller set of rules individually to determine where there are meaningful associations.
Shows the Frequent Itemsets
Here we can look at the frequent itemsets and we can use theeclat algorithm rather than the apriori algorithm.
Using these approaches a researcher can narrow down and determine association rules and determine what leads to frequent items. This is highly useful when working with extremely large datasets.
Research » R, and kindly contributed to R-bloggers)
Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables. It is often used by grocery stores, retailers, and anyone with a large transactional databases. It’s the same way that Target
knows your pregnant or when you’re buying an item on Amazon.com they know what else you want to buy. The same idea extends to Pandora.com knowing what song you want to listen to next. All of these incorporate, at some level, data mining concepts and association
rule algorithms.
Michael Hahsler, et al. has authored and maintains two very useful R packages relating to association rule mining: the arules package
and the arulesViz package.
Furthermore, Hahsler has provided two very good example articles providing details on how to use these packages in Introduction
to arules and Visualizing Association
Rules.
Often Association Rule Learning is used to analyze the “market-basket” for retailers. Traditionally, this simply looks at whether a person has purchased an item or not and can be seen as a binary matrix.
Association rules use the R arules library. The arulesViz add additional features for graphing and plotting the rules.
Frames and Transactions“. The rules can then be created using the apriori function on the transaction dataset.
Having 317848 association rules is far too many for a human to deal with. So we’re going to trim down the rules to the ones that are more important.
these are identified by the support and confidence. The support indicates how frequently the items appear in the dataset. The confidence indicates the number of times the IF/THEN statement on the data are true. These IF/THEN statements can be visualized
by the following graph:
Association Rules with Consequent and Antecedent.
This code will produce many different ways to look at the graphs and can even produce 3-D graphs.
Shows the Frequent Itemsets
Here we can look at the frequent itemsets and we can use theeclat algorithm rather than the apriori algorithm.
相关文章推荐
- 动易2006序列号破解算法公布
- Ruby实现的矩阵连乘算法
- C#插入法排序算法实例分析
- 超大数据量存储常用数据库分表分库算法总结
- C#数据结构与算法揭秘二
- C#冒泡法排序算法实例分析
- 算法练习之从String.indexOf的模拟实现开始
- C#算法之关于大牛生小牛的问题
- C#实现的算24点游戏算法实例分析
- c语言实现的带通配符匹配算法
- 浅析STL中的常用算法
- 算法之排列算法与组合算法详解
- C++实现一维向量旋转算法
- Ruby实现的合并排序算法
- C#折半插入排序算法实现方法
- 基于C++实现的各种内部排序算法汇总
- C++线性时间的排序算法分析
- C++实现汉诺塔算法经典实例
- PHP实现克鲁斯卡尔算法实例解析
- C#获取关键字附近文字算法实例