您的位置:首页 > 其它

【R语言学习笔记】关于提取各类模型值的意外发现

2017-04-15 11:49 471 查看
之前在做各类回归方程和检验的时候,针对模型里面的值的提取总是有一种碰运气的成本,比如在做t检验的时候想提取里面的自由度,随便举个例子,基于mtcars这个数据集

a<-t.test(mtcars$vs,mtcars$cyl)


结果为

Welch Two Sample t-test

data:  mtcars$vs and mtcars$cyl
t = -17.528, df = 35.907, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.415358 -5.084642
sample estimates:
mean of x mean of y
0.4375    6.1875


里面其实是有df=35.907这个字段的,但是不能每次看到后在手工提取,之前的做法是针对这类名称,直接用a$df去看,但是其实这个字段储存在parameter里,比如

a$parameter
df
35.90693


那么问题来了,我怎么知道哪个参数储存在哪里呢?

下面意外的用到了str函数。

比如针对刚才的t检验结果a,用str

str(a)
List of 9
$ statistic  : Named num -17.5
..- attr(*, "names")= chr "t"
$ parameter  : Named num 35.9
..- attr(*, "names")= chr "df"
$ p.value    : num 3.5e-19
$ conf.int   : atomic [1:2] -6.42 -5.08
..- attr(*, "conf.level")= num 0.95
$ estimate   : Named num [1:2] 0.438 6.188
..- attr(*, "names")= chr [1:2] "mean of x" "mean of y"
$ null.value : Named num 0
..- attr(*, "names")= chr "difference in means"
$ alternative: chr "two.sided"
$ method     : chr "Welch Two Sample t-test"
$ data.name  : chr "mtcars$vs and mtcars$cyl"
- attr(*, "class")= chr "htest"


看到有各类的参数,储存在$后的字段里,比如我要提取p值,直接输入

a$p.value
[1] 3.500725e-19


就能看到p值为3.500725e-19。

同理,我做一个方差分析,比如就这个mtcars数据集了

fit.a<-aov(mpg~am,data=mtcars)
summary(fit.a)
Df Sum Sq Mean Sq F value   Pr(>F)
am           1  405.2   405.2   16.86 0.000285 ***
Residuals   30  720.9    24.0
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


看这个方差分析都有什么参数:

str(fit.a)
List of 12
$ coefficients : Named num [1:2] 17.15 7.24
..- attr(*, "names")= chr [1:2] "(Intercept)" "am"
$ residuals    : Named num [1:32] -3.39 -3.39 -1.59 4.25 1.55 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ effects      : Named num [1:32] -113.65 -20.13 -0.64 4.33 1.63 ...
..- attr(*, "names")= chr [1:32] "(Intercept)" "am" "" "" ...
$ rank         : int 2
$ fitted.values: Named num [1:32] 24.4 24.4 24.4 17.1 17.1 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ assign       : int [1:2] 0 1
$ qr           :List of 5
..$ qr   : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
.. .. ..$ : chr [1:2] "(Intercept)" "am"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.18 1.18
..$ pivot: int [1:2] 1 2
..$ tol  : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual  : int 30
$ xlevels      : Named list()
$ call         : language aov(formula = mpg ~ am, data = mtcars)
$ terms        :Classes 'terms', 'formula'  language mpg ~ am
.. ..- attr(*, "variables")=
120a9
language list(mpg, am)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "mpg" "am"
.. .. .. ..$ : chr "am"
.. ..- attr(*, "term.labels")= chr "am"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(mpg, am)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "mpg" "am"
$ model        :'data.frame': 32 obs. of  2 variables:
..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ am : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ am
.. .. ..- attr(*, "variables")= language list(mpg, am)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "mpg" "am"
.. .. .. .. ..$ : chr "am"
.. .. ..- attr(*, "term.labels")= chr "am"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(mpg, am)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "am"
- attr(*, "class")= chr [1:2] "aov" "lm"


可以看到有12个参数,比如我想看下相关系数:

fit.a$coefficients
(Intercept)          am
17.147368    7.244939


而且前面的截距就是数字,还可以计算

fit.a$coefficients[1]*5
85.73684


同理,弄一个logistic回归的广义线性模型

fit.b<-glm(am~mpg+gear,data=mtcars,family=quasibinomial())
summary(fit.b)


结果为:

Call:
glm(formula = am ~ mpg + gear, family = quasibinomial(), data = mtcars)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-1.68311  -0.00003  -0.00002   0.04042   1.17990

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -88.2992  7928.1434  -0.011   0.9912
mpg            0.3366     0.1403   2.399   0.0231 *
gear          20.3062  1982.0355   0.010   0.9919
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 0.3263161)

Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 11.659  on 29  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 19


看到mpg有点显著性,那么我想要提取这个相关系数,看到广义模型的参数更为复杂

str(fit.b)
List of 30
$ coefficients     : Named num [1:3] -88.299 0.337 20.306
..- attr(*, "names")= chr [1:3] "(Intercept)" "mpg" "gear"
$ residuals        : Named num [1:32] 2.01 2.01 1.55 -1 -1 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ fitted.values    : Named num [1:32] 4.99e-01 4.99e-01 6.46e-01 1.73e-09 6.96e-10 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ effects          : Named num [1:32] -0.44841 1.37014 -0.00585 0.13687 0.08687 ...
..- attr(*, "names")= chr [1:32] "(Intercept)" "mpg" "gear" "" ...
$ R                : num [1:3, 1:3] -1.41 0 0 -30.88 4.07 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
.. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
$ rank             : int 3
$ qr               :List of 5
..$ qr   : num [1:32, 1:3] -1.41 3.56e-01 3.40e-01 4.87e-05 3.09e-05 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
.. .. ..$ : chr [1:3] "(Intercept)" "mpg" "gear"
..$ rank : int 3
..$ qraux: num [1:3] 1.36 1.09 1
..$ pivot: int [1:3] 1 2 3
..$ tol  : num 1e-11
..- attr(*, "class")= chr "qr"
$ family           :List of 11
..$ family    : chr "quasibinomial"
..$ link      : chr "logit"
..$ linkfun   :function (mu)
..$ linkinv   :function (eta)
..$ variance  :function (mu)
..$ dev.resids:function (y, mu, wt)
..$ aic       :function (y, n, mu, wt, dev)
..$ mu.eta    :function (eta)
..$ initialize:  expression({     if (NCOL(y) == 1) {         if (is.factor(y))              y <- y != levels(y)[1L]         n <- rep.int(1, nobs)         if (any(y < 0 | y > 1))              stop("y values must be 0 <= y <= 1")         mustart <- (weights * y + 0.5)/(weights + 1)     }     else if (NCOL(y) == 2) {         n <- y[, 1] + y[, 2]         y <- ifelse(n == 0, 0, y[, 1]/n)         weights <- weights * n         mustart <- (n * y + 0.5)/(n + 1)     }     else stop("for the 'quasibinomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
..$ validmu   :function (mu)
..$ valideta  :function (eta)
..- attr(*, "class")= chr "family"
$ linear.predictors: Named num [1:32] -0.00586 -0.00586 0.60003 -20.1774 -21.08622 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ deviance         : num 11.7
$ aic              : num NA
$ null.deviance    : num 43.2
$ iter             : int 19
$ weights          : Named num [1:32] 2.50e-01 2.50e-01 2.29e-01 4.69e-09 1.89e-09 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ prior.weights    : Named num [1:32] 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ df.residual      : int 29
$ df.null          : int 31
$ y                : Named num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
$ converged        : logi TRUE
$ boundary         : logi FALSE
$ model            :'data.frame': 32 obs. of  3 variables:
..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..- attr(*, "terms")=Classes 'terms', 'formula'  language am ~ mpg + gear
.. .. ..- attr(*, "variables")= language list(am, mpg, gear)
.. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:3] "am" "mpg" "gear"
.. .. .. .. ..$ : chr [1:2] "mpg" "gear"
.. .. ..- attr(*, "term.labels")= chr [1:2] "mpg" "gear"
.. .. ..- attr(*, "order")= int [1:2] 1 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(am, mpg, gear)
.. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:3] "am" "mpg" "gear"
$ call             : language glm(formula = am ~ mpg + gear, family = quasibinomial(), data = mtcars)
$ formula          :Class 'formula'  language am ~ mpg + gear
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
$ terms            :Classes 'terms', 'formula'  language am ~ mpg + gear
.. ..- attr(*, "variables")= language list(am, mpg, gear)
.. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:3] "am" "mpg" "gear"
.. .. .. ..$ : chr [1:2] "mpg" "gear"
.. ..- attr(*, "term.labels")= chr [1:2] "mpg" "gear"
.. ..- attr(*, "order")= int [1:2] 1 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(am, mpg, gear)
.. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:3] "am" "mpg" "gear"
$ data             :'data.frame': 32 obs. of  11 variables:
..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
..$ disp: num [1:32] 160 160 108 258 360 ...
..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
$ offset           : NULL
$ control          :List of 3
..$ epsilon: num 1e-08
..$ maxit  : num 25
..$ trace  : logi FALSE
$ method           : chr "glm.fit"
$ contrasts        : NULL
$ xlevels          : Named list()
- attr(*, "class")= chr [1:2] "glm" "lm"


我只想提取显著的相关系数,则

fit.b$coefficients
(Intercept)         mpg        gear
-88.2992383   0.3366025  20.3061829


总结下,当我们做个检验、分析、回归、包括主成分分析、聚类等时候,str函数和summary函数可以配合的很好,自动化的进行下一步工作。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  r语言