机器学习与R之KNN
2016-06-25 17:36
267 查看
k近邻法与kd树(与本文基本无关)
为了提高k近邻搜索的效率,可以考虑使用特殊的结构存储训练数据,以减少计算距离的次数。具体方法有很多,这里介绍kd树方法
参考http://blog.csdn.net/qll125596718/article/details/8426458
R语言KNN实现
library(class)
knn(train,test ,laber,k)
wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21)
K常用方法 K=训练数据数量的平方根
字符变量利用哑变量编码,eg:0/1
Python版实现
http://blog.csdn.net/q383700092/article/details/51757762
R语言版调用函数
http://blog.csdn.net/q383700092/article/details/51759313
MapReduce简化实现版
http://blog.csdn.net/q383700092/article/details/51780865
spark版
后续添加
为了提高k近邻搜索的效率,可以考虑使用特殊的结构存储训练数据,以减少计算距离的次数。具体方法有很多,这里介绍kd树方法
参考http://blog.csdn.net/qll125596718/article/details/8426458
R语言KNN实现
library(class)
knn(train,test ,laber,k)
wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21)
K常用方法 K=训练数据数量的平方根
字符变量利用哑变量编码,eg:0/1
Python版实现
http://blog.csdn.net/q383700092/article/details/51757762
R语言版调用函数
http://blog.csdn.net/q383700092/article/details/51759313
MapReduce简化实现版
http://blog.csdn.net/q383700092/article/details/51780865
spark版
后续添加
rm(list=ls()) # import the CSV file wbcd <- read.csv("wisc_bc_data.csv", stringsAsFactors = FALSE) # examine the structure of the wbcd data frame str(wbcd) # drop the id feature wbcd <- wbcd[-1] # table of diagnosis table(wbcd$diagnosis) # recode diagnosis as a factor wbcd$diagnosis <- factor(wbcd$diagnosis, levels = c("B", "M"), labels = c("Benign", "Malignant")) # table or proportions with more informative labels round(prop.table(table(wbcd$diagnosis)) * 100, digits = 1) # summarize three numeric features summary(wbcd[c("radius_mean", "area_mean", "smoothness_mean")]) # create normalization function最大最小归一化 normalize <- function(x) { return ((x - min(x)) / (max(x) - min(x))) } # test normalization function - result should be identical normalize(c(1, 2, 3, 4, 5)) normalize(c(10, 20, 30, 40, 50)) # normalize the wbcd data |lapply把函数应用到列表的每一个元素 wbcd_n <- as.data.frame(lapply(wbcd[2:31], normalize)) # confirm that normalization worked summary(wbcd_n$area_mean) # create training and test data wbcd_train <- wbcd_n[1:469, ] wbcd_test <- wbcd_n[470:569, ] # create labels for training and test data wbcd_train_labels <- wbcd[1:469, 1] wbcd_test_labels <- wbcd[470:569, 1] ## Step 3: Training a model on the data ---- # load the "class" library library(class) wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21) ## Step 4: Evaluating model performance ----评估性能 # load the "gmodels" library library(gmodels) # Create the cross tabulation of predicted vs. actual CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq = FALSE) ## Step 5: Improving model performance ----提高模型性能 # use the scale() function to z-score standardize a data frame scale()z分数归一化 wbcd_z <- as.data.frame(scale(wbcd[-1])) # confirm that the transformation was applied correctly summary(wbcd_z$area_mean) # create training and test datasets wbcd_train <- wbcd_z[1:469, ] wbcd_test <- wbcd_z[470:569, ] # re-classify test cases wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21) # Create the cross tabulation of predicted vs. actual CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq = FALSE) # try several different values of k wbcd_train <- wbcd_n[1:469, ] wbcd_test <- wbcd_n[470:569, ] wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=1) CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE) wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=5) CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE) wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=11) CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE) wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=15) CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE) wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=21) CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE) wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k=27) CrossTable(x = wbcd_test_labels, y = wbcd_test_pred, prop.chisq=FALSE)
相关文章推荐
- 我在动批这十年
- java volatile关键字的讨论
- google font 字体下载方式
- how to use tp_link.
- Linux 内核空间虚拟地址和物理地址相互转换
- C#依赖注入实例解析
- JTable 为单元格添加按钮效果和事件效果
- OkHttp的简单使用
- 字符串中连续最多的子串
- JBorder组件边框
- POJ 1149 PIGS (最大流)
- 关于 Unix 用户权限及进程权限及 Saved set-user-id
- C语言好题&错题笔记
- 第一次打字练习
- iOS 进度框(一) MBProgressHUD
- 把商品添加到购物车的动画效果(贝塞尔曲线)
- eclipse 调节字体大小
- 运动目标检测--光流法
- shell --- grep 命令详解
- JScrollPane的简单用法!