dplyr do: Some Tips for Using and Programming
2016-07-08 10:21
781 查看
This post was originally posted on Quantide blog. Read the full
article here.
If you want to compute arbitrary operations on a data frame returning more than one number back, use
This post aims to explore some basic concepts of
like: filtering for rows, selecting specific columns, re-ordering rows, adding new columns, summarizing data and computing arbitrary operations.
First of all, you have to install
and to load it:
We will analyze the use of
We firstly transform it into a
As we already said,
To use
it always returns a dataframe
unlike the others data manipulation verbs of
it is conceived to be used with dplyr
the argument of
named arguments (more than one supplied) become list-columns, with one element for each group:
unnamed argument (only one supplied) must be a data frame and labels will be duplicated accordingly:
Its use is the same working with customized functions.
Let us define the following function, which performs two simple operations returning a data frame:
If the argument is named the result is:
Otherwise, if argument is unnamed the result is:
How can we enclose the previous operations inside a function? Simple! Using
Continue reading on Quantide blog…
The post dplyr do: Some Tips for Using and Programming appeared
first on MilanoR.
article here.
If you want to compute arbitrary operations on a data frame returning more than one number back, use
dplyr
do()!
This post aims to explore some basic concepts of
do(), along with giving some advice in using and programming.
do()is a verb (function) of
dplyr.
dplyris a powerful R package for data manipulation, written and maintained by Hadley Wickham. This package allows you to perform the common data manipulation tasks on data frames,
like: filtering for rows, selecting specific columns, re-ordering rows, adding new columns, summarizing data and computing arbitrary operations.
First of all, you have to install
dplyrpackage:
install.packages("dplyr")
and to load it:
require(dplyr)
We will analyze the use of
do()with the following dataset, created with random data:
set.seed(100) ds <- data.frame(group=c(rep("a",100), rep("b",100), rep("c",100)), x=rnorm(n = 300, mean = 3, sd = 2), y=rnorm(n = 300, mean = 2, sd = 2))
We firstly transform it into a
tbl_dfobject to achieve a better print method. No changes occur on the input data frame.
ds <- tbl_df(ds) ds
Source: local data frame [300 x 3] group x y (fctr) (dbl) (dbl) 1 a 1.995615 -1.71089045 2 a 3.263062 -0.03712943 3 a 2.842166 -0.09022217 4 a 4.773570 0.69742469 5 a 3.233943 2.76536531 6 a 3.637260 4.06379942 7 a 1.836419 2.26214995 8 a 4.429065 2.75438347 9 a 1.349481 -1.77539016 10 a 2.280276 3.04043881 .. ... ... ...
Base Concepts of do() (Non Standard Evaluation Version)
As we already said, do()computes arbitrary operations on a data frame returning more than one number back.
To use
do(), you must know that:
it always returns a dataframe
unlike the others data manipulation verbs of
dplyr,
do()needs the specification of
.placeholder inside the function to apply, referring to the data it has to work with.
# Head of ds ds %>% do(head(.))
Source: local data frame [6 x 3] group x y (fctr) (dbl) (dbl) 1 a 1.995615 -1.71089045 2 a 3.263062 -0.03712943 3 a 2.842166 -0.09022217 4 a 4.773570 0.69742469 5 a 3.233943 2.76536531 6 a 3.637260 4.06379942
it is conceived to be used with dplyr
group_by()to compute operations within groups:
# Head of ds by group ds %>% group_by(group) %>% do(head(.))
Source: local data frame [18 x 3] Groups: group [3] group x y (fctr) (dbl) (dbl) 1 a 1.99561530 -1.71089045 2 a 3.26306233 -0.03712943 3 a 2.84216582 -0.09022217 4 a 4.77356962 0.69742469 5 a 3.23394254 2.76536531 6 a 3.63726018 4.06379942 7 b 2.33415330 -0.56965729 8 b 5.72622741 1.71643653 9 b 2.06170532 4.87756954 10 b 4.68575126 -0.08011508 11 b 0.08401255 -0.04767590 12 b 2.19938816 4.18954758 13 c 3.05634353 -0.89257491 14 c 2.28659319 2.63171152 15 c 4.70525275 1.31450497 16 c 4.02673050 -1.86270620 17 c 5.03640599 2.48564201 18 c 0.95704183 1.27446410
the argument of
do()can be named or unnamed:
named arguments (more than one supplied) become list-columns, with one element for each group:
# Tail (last 3 obs) of x by group ds %>% group_by(group) %>% do(out=tail(.$x, 3))
Source: local data frame [3 x 2] Groups: <by row> group out (fctr) (chr) 1 a <dbl[3]> 2 b <dbl[3]> 3 c <dbl[3]>
unnamed argument (only one supplied) must be a data frame and labels will be duplicated accordingly:
# Tail (last 3 obs) of x by group ds %>% group_by(group) %>% do(data.frame(out=tail(.$x, 3)))
Source: local data frame [9 x 2] Groups: group [3] group out (fctr) (dbl) 1 a 3.8270397 2 a 0.6426337 3 a 0.6519305 4 b 3.3238824 5 b 0.8290942 6 b 4.1538746 7 c 6.5861213 8 c 4.6280643 9 c 0.3599512
Its use is the same working with customized functions.
Let us define the following function, which performs two simple operations returning a data frame:
my_fun <- function(x, y){ res_x = mean(x) + 2 res_y = mean(y) * 5 return(data.frame(res_x, res_y)) }
If the argument is named the result is:
# Apply my_fun() function to ds by group ds %>% group_by(group) %>% do(out=my_fun(x=.$x, y=.$y))
Source: local data frame [3 x 2] Groups: <by row> group out (fctr) (chr) 1 a <data.frame [1,2]> 2 b <data.frame [1,2]> 3 c <data.frame [1,2]>
Otherwise, if argument is unnamed the result is:
# Apply my_fun() function to ds by group ds %>% group_by(group) %>% do(my_fun(x=.$x, y=.$y))
Source: local data frame [3 x 3] Groups: group [3] group res_x res_y (fctr) (dbl) (dbl) 1 a 5.005825 9.167546 2 b 5.022282 8.683619 3 c 5.025586 11.240558
Programming with do_() (Standard Evaluation Version)
How can we enclose the previous operations inside a function? Simple! Using do_()(the SE version of
do()) and
interp()function of
lazyevalpackage.
Continue reading on Quantide blog…
The post dplyr do: Some Tips for Using and Programming appeared
first on MilanoR.
相关文章推荐
- DOS命令全集(一)
- DOS下内存的配置
- DOS下的必备工具
- c++中do{...}while(0)的意义和用法
- DOS 下的批处理文件
- do...while(0)的妙用详细解析
- javascript while语句和do while语句的区别分析
- DOS命令全集(二)
- perl文件包含(do,require,use)指令介绍
- Windows Powershell Do While 循环
- 自已开发组态软件(one can do it)
- Directory Opus 证书失效的解决方法
- 选择语句while/do使用说明
- for while do while continue
- .vcxproj 降级到 .vcproj
- 宏 中使用 do {...;}while(0)的意义
- do{ } while(false)
- do{} while(0)
- DISKGEN - DO
- ORA-01861: 文字与格式字符串不匹配 报错处理