您的位置:首页 > 其它

机器学习笔记c8主成分分析(日期格式转换,cast)

2015-04-14 19:15 295 查看
主成分分析原理

读取数据

library('ggplot2')

# First code snippet
prices <- read.csv(file.path('data', 'stock_prices.csv'),
stringsAsFactors = FALSE)

prices[1, ]
# Date Stock Close
#1 2011-05-25 DTE 51.12


日期格式转换

这里用到了lub*包的ymd函数将日期转换为日期格式.

# Second code snippet
library('lubridate')

prices <- transform(prices, Date = ymd(Date))

#Date Stock Close
#1 2011-05-25   DTE 51.12


cast数据reshape

# Third code snippet
library('reshape')

date.stock.matrix <- cast(prices, Date ~ Stock, value = 'Close')
# Fourth code snippet
prices <- subset(prices, Date != ymd('2002-02-01'))
prices <- subset(prices, Stock != 'DDR')

date.stock.matrix <- cast(prices, Date ~ Stock, value = 'Close')


整理后格式如下

> date.stock.matrix[1,]
Date  ADC   AFL ARKR AZPN CLFD   DTE  ENDP  FLWS    FR GMXR   GPC
1 2002-01-02 17.7 23.78 8.15 17.1 3.19 42.37 11.54 15.77 31.16  4.5 36.09
HE ISSC  ISSI   KSS  MTSC  NWN ODFL PARL RELV SIGM   STT TRIB   UTR
1 40.41 7.82 12.78 70.23 10.03 26.2 13.4 1.92  1.3 1.75 52.11  1.5 39.34


在使用cast函数时, 在波浪符号左边指定数据用数据源中那些列作为输出矩阵的行, 在波浪符号右边指定哪些列作为输出矩阵的列.

PCA

> pca <- princomp(date.stock.matrix[, 2:ncol(date.stock.matrix)])
> pca
Call:
princomp(x = date.stock.matrix[, 2:ncol(date.stock.matrix)])

Standard deviations:
Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6
29.1001249 20.4403404 12.6726924 11.4636450  8.4963820  8.1969345
Comp.7     Comp.8     Comp.9    Comp.10    Comp.11    Comp.12
5.5438308  5.1300931  4.7786752  4.2575099  3.3050931  2.6197715
Comp.13    Comp.14    Comp.15    Comp.16    Comp.17    Comp.18
2.4986181  2.1746125  1.9469475  1.8706240  1.6984043  1.6344116
Comp.19    Comp.20    Comp.21    Comp.22    Comp.23    Comp.24
1.2327471  1.1280913  0.9877634  0.8583681  0.7390626  0.4347983

24  variables and  2366 observations.


查看第一载荷,并利用第一载荷总结数据为一列

# Eighth code snippet
principal.component <- pca$loadings[, 1]

# Ninth code snippet
loadings <- as.numeric(principal.component)

ggplot(data.frame(Loading = loadings),
aes(x = Loading, fill = 1)) +
geom_density() +
theme(legend.position = 'none')

# Tenth code snippet
market.index <- predict(pca)[, 1]




与道琼斯指数比较结果(代码略)

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: