支持度与置信度(基本示例)--《python数据挖掘入门与实践》
2018-03-05 19:24
645 查看
本文结合python数据挖掘入门与实践一书进行学习研究
python第三方库:Numpy
亲和性分析示例
1,使用Numpy导入数据集(txt数据文件)import numpy as np
dataset_filename="affinity_dataset.txt"
X=np.loadtxt(dataset_filename)
n_samples,n_features=X.shape
print("this dataset has {0} samples and {1} features".format(n_samples,n_features))
书本中少了两行。
其中 X.shape 读取数据的行和列print(X[:5])
3,利用支持度和置信度的计算方法来计算 “如果顾客购买了苹果,他们也会购买香蕉”这条规则
统计购买苹果的顾客的数量 ,即 第四行的数值为1num_apple_purchases = 0
for sample in X:
if sample[3] == 1:
num_apple_purchases += 1
print("{0} people bought Apples".format(num_apple_purchases))一行一行的读取数据
判断sample[3]是否为1,从而判断顾客是否购买苹果
同样我们也可以通过sample[4] 来统计购买香蕉的人数
4,顾客在买苹果的同时也买香蕉的人数?
通过代码实现rule_valid = 0
rule_invalid = 0
for sample in X:
if sample[3] == 1: # 购买苹果
if sample[4] == 1:
# 既购买苹果也购买香蕉
rule_valid += 1
else:
# 购买苹果但不购买香蕉
rule_invalid += 1
print("{0} cases of the rule being valid were discovered".format(rule_valid))
print("{0} cases of the rule being invalid were discovered".format(rule_invalid))最后得到同时购买苹果和香蕉的人数 和 购买苹果但不购买香蕉的人数21 cases of the rule being valid were discovered
15 cases of the rule being invalid were discovered通过统计,我们可以得到顾客购买苹果也购买香蕉的支持度为 21 即 rule_valid
置信度的算法为 同时购买苹果和香蕉的人数 / 买苹果的人数 即 rule_valid / num_apple_purchases
5,计算得到该规则的置信度 support = rule_valid # The Support is the number of times the rule is discovered.
confidence = rule_valid / num_apple_purchases
print("The support is {0} and the confidence is {1:.3f}.".format(support, confidence))
# Confidence can be thought of as a percentage using the following:
print("As a percentage, that is {0:.1f}%.".format(100 * confidence))置信度精确到小数点后三位,最后以百分制的形式显示The support is 21 and the confidence is 0.583.
As a percentage, that is 58.3%.所以我们可以通过置信度来显示消费者的消费欲望,从而制定合理的促销模式,达到利益的最大化。
python第三方库:Numpy
亲和性分析示例
1,使用Numpy导入数据集(txt数据文件)import numpy as np
dataset_filename="affinity_dataset.txt"
X=np.loadtxt(dataset_filename)
n_samples,n_features=X.shape
print("this dataset has {0} samples and {1} features".format(n_samples,n_features))
结果:This dataset has 100 samples and 5 features
书本中少了两行。
其中 X.shape 读取数据的行和列print(X[:5])
[[ 0. 0. 1. 1. 1.] [ 1. 1. 0. 1. 0.] [ 1. 0. 1. 1. 0.] [ 0. 0. 1. 1. 1.] [ 0. 1. 0. 0. 1.]]2,将五列看成五种商品features = ["bread", "milk", "cheese", "apples", "bananas"]每一行代表一位顾客的购物情况,0代表没有购买该商品,1代表购买该商品
3,利用支持度和置信度的计算方法来计算 “如果顾客购买了苹果,他们也会购买香蕉”这条规则
统计购买苹果的顾客的数量 ,即 第四行的数值为1num_apple_purchases = 0
for sample in X:
if sample[3] == 1:
num_apple_purchases += 1
print("{0} people bought Apples".format(num_apple_purchases))一行一行的读取数据
判断sample[3]是否为1,从而判断顾客是否购买苹果
同样我们也可以通过sample[4] 来统计购买香蕉的人数
4,顾客在买苹果的同时也买香蕉的人数?
通过代码实现rule_valid = 0
rule_invalid = 0
for sample in X:
if sample[3] == 1: # 购买苹果
if sample[4] == 1:
# 既购买苹果也购买香蕉
rule_valid += 1
else:
# 购买苹果但不购买香蕉
rule_invalid += 1
print("{0} cases of the rule being valid were discovered".format(rule_valid))
print("{0} cases of the rule being invalid were discovered".format(rule_invalid))最后得到同时购买苹果和香蕉的人数 和 购买苹果但不购买香蕉的人数21 cases of the rule being valid were discovered
15 cases of the rule being invalid were discovered通过统计,我们可以得到顾客购买苹果也购买香蕉的支持度为 21 即 rule_valid
置信度的算法为 同时购买苹果和香蕉的人数 / 买苹果的人数 即 rule_valid / num_apple_purchases
5,计算得到该规则的置信度 support = rule_valid # The Support is the number of times the rule is discovered.
confidence = rule_valid / num_apple_purchases
print("The support is {0} and the confidence is {1:.3f}.".format(support, confidence))
# Confidence can be thought of as a percentage using the following:
print("As a percentage, that is {0:.1f}%.".format(100 * confidence))置信度精确到小数点后三位,最后以百分制的形式显示The support is 21 and the confidence is 0.583.
As a percentage, that is 58.3%.所以我们可以通过置信度来显示消费者的消费欲望,从而制定合理的促销模式,达到利益的最大化。
相关文章推荐
- Python数据挖掘入门与实践一:计算支持度和置信度
- 支持度与置信度(找出所有规则)--《python数据挖掘入门与实践》
- python数据挖掘入门与实践---作者归属问题
- python_tweets.json (python数据挖掘入门与实践数据集下载)
- Python数据挖掘入门与实践 彩图 pdf
- 【python】《Python数据挖掘入门与实践》实验环境搭建
- Python数据挖掘入门与实践pdf
- 《python数据挖掘入门与实践》笔记2
- Learning Data Mining with Python-《Python数据挖掘入门与实践》学习后的分享
- Python数据挖掘入门与实践(二)——用scikit-learn估计器分类
- Python数据挖掘入门与实践(四)——亲和性分析
- python数据挖掘入门与实践
- 《python数据挖掘入门与实践》“电影推荐” 笔记3
- 分享《Python数据挖掘入门与实践》高清中文版+高清英文版+源代码
- Python数据挖掘入门与实践(一)——亲和性分析
- python数据挖掘入门与实践---用图挖掘找到感兴趣的人
- 《Python数据挖掘入门与实践》高清中文版+高清英文版+源代码
- 《python数据挖掘入门与实践》笔记1
- 《python数据挖掘入门与实践》决策树预测nba数据集
- Python数据挖掘入门与实践(二)——scikit-learn数据的预处理转换器以及流水线