K-Means算法的Python实现和Matlab实现
2018-02-01 16:43
435 查看
参考:http://blog.topspeedsnail.com/archives/10349
Python版本:python3.6.2
Matlab版本:
数据集下载地址:http://blog.topspeedsnail.com/wp-content/uploads/2016/11/titanic.xls
这是我的运行结果:
Python版本:python3.6.2
Matlab版本:
一、K-Means算法的Python实现
这里以泰坦尼克号遇难者名单为例,通过除survived以外字段进行聚类(k=2,生/死),然后再和survived进行对比。数据集下载地址:http://blog.topspeedsnail.com/wp-content/uploads/2016/11/titanic.xls
#参考自:http://blog.topspeedsnail.com/archives/10349 import numpy as np from sklearn.cluster import KMeans from sklearn import preprocessing import pandas as pd """ 数据集:titanic.xls(泰坦尼克号遇难者/幸存者名单) <http://blog.topspeedsnail.com/wp-content/uploads/2016/11/titanic.xls> ***字段*** pclass: 社会阶层(1,精英;2,中产;3,船员/劳苦大众) survived: 是否幸存 name: 名字 sex: 性别 age: 年龄 sibsp: 哥哥姐姐个数 parch: 父母儿女个数 ticket: 船票号 fare: 船票价钱 cabin: 船舱 embarked boat body: 尸体 home.dest ****** 目的:使用除survived字段外的数据进行k-means分组(分成两组:生/死),然后和survived字段对比,看看分组效果。 """ #加载数据 df = pd.read_excel("titanic.xls") #print (df.shape) #print(df.head()) #print(df.tail()) """ pclass survived name sex \ 0 1 1 Allen, Miss. Elisabeth Walton female 1 1 1 Allison, Master. Hudson Trevor male 2 1 0 Allison, Miss. Helen Loraine female 3 1 0 Allison, Mr. Hudson Joshua Creighton male 4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female age sibsp parch ticket fare cabin embarked boat body \ 0 29.0000 0 0 24160 211.3375 B5 S 2 NaN 1 0.9167 1 2 113781 151.5500 C22 C26 S 11 NaN 2 2.0000 1 2 113781 151.5500 C22 C26 S NaN NaN 3 30.0000 1 2 113781 151.5500 C22 C26 S NaN 135.0 4 25.0000 1 2 113781 151.5500 C22 C26 S NaN NaN home.dest 0 St Louis, MO 1 Montreal, PQ / Chesterville, ON 2 Montreal, PQ / Chesterville, ON 3 Montreal, PQ / Chesterville, ON 4 Montreal, PQ / Chesterville, ON """ #去掉无用字段 df.drop(['body','name','ticket'],1,inplace=True) df.infer_objects() df.fillna(0,inplace=True) #把NaN替换为0 #把字符串映射为数字,例如:female:1,male:0 df_map = {} #保存映射 cols = df.columns.values for col in cols: if df[col].dtype != np.int64 and df[col].dtype != np.float64: temp = {} x=0 for ele in set(df[col].values.tolist()): if ele not in temp: temp[ele] = x x += 1 df_map[df[col].name] = temp df[col] = list(map(lambda val:temp[val],df[col])) x = np.array(df.drop(['survived'],1).astype(float)) x = preprocessing.scale(x) clf = KMeans(n_clusters=2) clf.fit(x) y = np.array(df['survived']) correct = 0 for i in range(len(x)): predict_data = np.array(x[i].astype(float)) predict_data = predict_data.reshape(-1,len(predict_data)) predict = clf.predict(predict_data) #print(predict[0],y[i]) if predict[0] == y[i]: correct +=1 print(correct*1.0/len(x))
这是我的运行结果:
"D:\Program files\python3.6.2\python.exe" D:/sunfl/sunflower/study/机器学习/聚类算法/K-Means/k_means.py 0.2987012987012987#随机分配 生:0 死:1或者生:1 死:0故可能差别会很大,再用1-就行 进程已结束,退出代码0
相关文章推荐
- 实现python调用Matlab的.m文件
- 机器学习Chapter3-(聚类分析)Python实现K-Means算法
- 实现python调用Matlab的.m文件
- 遗传算法实现之python VS matlab
- 文本聚类算法之K-means算法的python实现
- 图像灰度化的三种方法及matlab,c++,python实现
- python实现类似于Matlab中的magic函数
- logistic regression (Python&Matlab实现)
- Python实现k-means算法
- Cost Function的原理及实现(Python, matlab)
- lambda表达式在C++/MATLAB/Python语言中的实现
- k-means算法及matlab实现
- 数据挖掘之k-means算法的Python实现
- k-means算法的Python实现
- 数据挖掘基础:K-Means算法的原理与Python实现
- 图片聚类——k-means算法的python实现
- python中类似matlab的tic,toc程序自我实现
- Python 实现K-means算法
- Python调用Matlab实现混合编程
- 数据挖掘:K-Means算法的原理与Python实现