您的位置：首页 > 编程语言 > Python开发

支持度与置信度（基本示例）--《python数据挖掘入门与实践》

2018-03-05 19:24 645 查看

本文结合python数据挖掘入门与实践一书进行学习研究
python第三方库：Numpy

亲和性分析示例
1，使用Numpy导入数据集（txt数据文件）import numpy as np

dataset_filename="affinity_dataset.txt"
X=np.loadtxt(dataset_filename)
n_samples,n_features=X.shape
print("this dataset has {0} samples and {1} features".format(n_samples,n_features))

结果：This dataset has 100 samples and 5 features

书本中少了两行。
其中 X.shape 读取数据的行和列print(X[:5])

[[ 0.  0.  1.  1.  1.]
[ 1.  1.  0.  1.  0.]
[ 1.  0.  1.  1.  0.]
[ 0.  0.  1.  1.  1.]
[ 0.  1.  0.  0.  1.]]

2，将五列看成五种商品features = ["bread", "milk", "cheese", "apples", "bananas"]每一行代表一位顾客的购物情况，0代表没有购买该商品，1代表购买该商品

3，利用支持度和置信度的计算方法来计算 “如果顾客购买了苹果，他们也会购买香蕉”这条规则
统计购买苹果的顾客的数量，即第四行的数值为1num_apple_purchases = 0
for sample in X:
if sample[3] == 1:
num_apple_purchases += 1
print("{0} people bought Apples".format(num_apple_purchases))一行一行的读取数据
判断sample[3]是否为1，从而判断顾客是否购买苹果
同样我们也可以通过sample[4] 来统计购买香蕉的人数

4，顾客在买苹果的同时也买香蕉的人数？
通过代码实现rule_valid = 0
rule_invalid = 0
for sample in X:
if sample[3] == 1: # 购买苹果
if sample[4] == 1:
# 既购买苹果也购买香蕉
rule_valid += 1
else:
# 购买苹果但不购买香蕉
rule_invalid += 1
print("{0} cases of the rule being valid were discovered".format(rule_valid))
print("{0} cases of the rule being invalid were discovered".format(rule_invalid))最后得到同时购买苹果和香蕉的人数和购买苹果但不购买香蕉的人数21 cases of the rule being valid were discovered
15 cases of the rule being invalid were discovered通过统计，我们可以得到顾客购买苹果也购买香蕉的支持度为 21 即 rule_valid
置信度的算法为同时购买苹果和香蕉的人数 / 买苹果的人数即 rule_valid / num_apple_purchases

5,计算得到该规则的置信度 support = rule_valid # The Support is the number of times the rule is discovered.
confidence = rule_valid / num_apple_purchases
print("The support is {0} and the confidence is {1:.3f}.".format(support, confidence))
# Confidence can be thought of as a percentage using the following:
print("As a percentage, that is {0:.1f}%.".format(100 * confidence))置信度精确到小数点后三位，最后以百分制的形式显示The support is 21 and the confidence is 0.583.
As a percentage, that is 58.3%.所以我们可以通过置信度来显示消费者的消费欲望，从而制定合理的促销模式，达到利益的最大化。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python数据挖掘学习

相关文章推荐

新的分享

章节导航