您的位置:首页 > 编程语言 > Go语言

Association Rule Learning and the Apriori Algorithm

2015-12-04 23:18 549 查看
(This article was first published on Statistical
Research » R
, and kindly contributed to R-bloggers)

Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables. It is often used by grocery stores, retailers, and anyone with a large transactional databases. It’s the same way that Target
knows your pregnant or when you’re buying an item on Amazon.com they know what else you want to buy. The same idea extends to Pandora.com knowing what song you want to listen to next. All of these incorporate, at some level, data mining concepts and association
rule algorithms.

Michael Hahsler, et al. has authored and maintains two very useful R packages relating to association rule mining: the arules package
and the arulesViz package.
Furthermore, Hahsler has provided two very good example articles providing details on how to use these packages in Introduction
to arules and Visualizing Association
Rules.

Often Association Rule Learning is used to analyze the “market-basket” for retailers. Traditionally, this simply looks at whether a person has purchased an item or not and can be seen as a binary matrix.

Association rules use the R arules library. The arulesViz add additional features for graphing and plotting the rules.

For testing purposes there is a convenient way to generate random data where patterns can be mined. The random data is generated in such a way where there is correlation has correlated items.

However, a transaction dataset will usually be available using the approach described in “Data
Frames and Transactions“. The rules can then be created using the apriori function on the transaction dataset.

Once the rules have been created a researcher can then review and filter the rules down to a manageable subset. This can be done a variety of ways using both graphs and by simply inspecting the rules.









Having 317848 association rules is far too many for a human to deal with. So we’re going to trim down the rules to the ones that are more important.

Once again we can now the subset of rules to get a visual on the rules. In these graphs we can see the two parts to an association rule: the antecedent (IF) and the consequent (THEN). These patterns are found by determining frequent patterns in the data and
these are identified by the support and confidence. The support indicates how frequently the items appear in the dataset. The confidence indicates the number of times the IF/THEN statement on the data are true. These IF/THEN statements can be visualized
by the following graph:





Association Rules with Consequent and Antecedent.

This code will produce many different ways to look at the graphs and can even produce 3-D graphs.

We can then subset the rules to the top 30 most important rules and then inspect the smaller set of rules individually to determine where there are meaningful associations.





Shows the Frequent Itemsets

Here we can look at the frequent itemsets and we can use theeclat algorithm rather than the apriori algorithm.

Using these approaches a researcher can narrow down and determine association rules and determine what leads to frequent items. This is highly useful when working with extremely large datasets.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  算法 apriori