Tips on non-standard evaluation in R
2016-05-24 08:36
483 查看
One of my favorite features of R is its meta-programming facilities. It can be simply demonstrated by the following examples.
An ordinary use of R is to do statistical computing. We can evaluate something like
Meta-programming in R allows users to manipulate the expression to evaluate. We can use
In this way,
Now we can use some functions to manipulate the expression so that we can alter the expression to evaluate.
Now we can see the expression is modified. This feature, as stated in the official documentation, is computing on language, that is, R not only is able to compute on literal values, but also on language itself. Then what can we do with the modified
expression? We can evaluate it using
The meta-programming feature requires the definition of language objects and a meta-function to evaluate such a language object. In R, a
a variable/symbol like
The evaluation context matters when we evaluate an expression containing symbols that are not self-contained. Consider the following example.
and evaluate the expression again,
can be evaluated.
We can create our own environment for an expression to evaluate using
If we evaluate
If we evaluate it in
We can also create two environments, one being the parent of the other, and evaluate the expression.
Now we have two environments,
encounters a symbol that is not defined in it, it will go to
In fact,
If
If
If a symbol is not found until the empty environment (the only environment having no parent) is reached, an error occurs.
This logic has some notable "strange" behaviors. For example,
The assignment works as expected. However, if we supply a named list of values to serve the evaluation, and specify
It is understandable because
Therefore the assignment does not happen in
Meta-programming allows a function to interpret its arguments in its own way. For example, we can write a
like
However,
In this case,
always evaluate the expression in the environment where it is created.
The enclosing environment is set to
Using non-standard evaluation, you have to be careful. The above shows the first point: Evaluate the expression in a context where the expression is fully meaningful. In R, you need to take care of the environments to ensure the symbol search
path is correct. To do that, you need to be aware of the scope of the evaluation context.
Another point I want to stress is that a danger of non-standard evaluation is
potential clash of symbol interpretation. If two functions both use non-standard evaluation to facilitate some kind of tasks, they might clash on interpreting certain symbols.
For example, functions in rlist package and magrittr package use some non-standard evaluation to make things easy. In the following cases, they two might clash.
In this case,
it understands
Another example can be reproduced using dplyr and the latest release of magrittr. In the latest release of
functions in many cases. However, its risk isinterpretation clash when such magic is used with other functions giving the same symbol different behaviors. For example,
In this case
an interpretation clash:
My conclusion is simple: non-standard evaluation can be magic, but be careful when you use it. It may produce unexpected errors because the evaluation context is wrong, or the interpretation of a symbol is inconsistent.
To leave
a comment for the author, please follow the link and comment on their blog: The
blog of Kun Ren.
An ordinary use of R is to do statistical computing. We can evaluate something like
sin(0)
[1] 0
Meta-programming in R allows users to manipulate the expression to evaluate. We can use
quoteto create an object that represents a function call.
quote(sin(0))
sin(0)
In this way,
sin(0)is not evaluated but parsed as a
callobject which basically can be represented as a list of function name and the arguments.
as.list(quote(sin(0)))
[[1]]
sin
[[2]]
[1] 0
Now we can use some functions to manipulate the expression so that we can alter the expression to evaluate.
expr <- quote(sin(0))expr[[1L]] <- quote(cos)
expr
cos(0)
Now we can see the expression is modified. This feature, as stated in the official documentation, is computing on language, that is, R not only is able to compute on literal values, but also on language itself. Then what can we do with the modified
expression? We can evaluate it using
eval()as if we do in the console.
eval(expr)
[1] 1
The meta-programming feature requires the definition of language objects and a meta-function to evaluate such a language object. In R, a
callobject represents a function call like
sin(x), a
nameor
symbolrepresents
a variable/symbol like
x, a
numeric,
character, etc. represents literal values like
1,
"a", and finally
eval()evaluates such a language object in a specific context.
The evaluation context matters when we evaluate an expression containing symbols that are not self-contained. Consider the following example.
expr <- quote(sin(x))
eval(expr)
Error in eval(expr, envir, enclos): object 'x' not found
xis not found because in the evaluation environment, there is no value assigned to symbol
x. If we assign some value to
xnow,
x <- 0
eval(expr)
[1] 0
and evaluate the expression again,
xcan be found and the value can be successfully calculated. More specifically,
expris evaluated in the global environment. Once
xis given a value in this environment, the expression
can be evaluated.
We can create our own environment for an expression to evaluate using
new.env().
env <- new.env() env$y <- 0 expr <- quote(cos(y)) expr
cos(y)
If we evaluate
cos(y)directly in the global environment, it should produce an error.
eval(expr)
Error in eval(expr, envir, enclos): object 'y' not found
If we evaluate it in
envwhere
yis properly defined, then it should produce the right result.
eval(expr, env)
[1] 1
We can also create two environments, one being the parent of the other, and evaluate the expression.
env1 <- new.env() env1$x <- 1 env2 <- new.env(parent = env1) env2$y <- 2
Now we have two environments,
env1and
env2.
env2's parent is
env1, and
env1's parent is global environment. This means that when
eval()evaluates an expression in the context of
env2and
encounters a symbol that is not defined in it, it will go to
env1, and then global environment, package environments, and base environment. Now we evaluate a simple arithmetic expression containing symbols in these two environments.
expr <- quote(x + y) eval(expr, env1)
Error in eval(expr, envir, enclos): object 'y' not found
eval(expr, env2)
[1] 3
In fact,
eval(expr, envir, enclos)basically follows the following logic to evaluate a quoted expression:
If
enviris an
environment, then evaluate
exprin
envirby looking for symbols all the way along
envirand its parent environments until found.
If
enviris a
list, then evaluate
exprgiven the symbols defined in the list; Whenever a symbol is not found in the list, the function will go to
enclosenvironment to find along the chain until found.
If a symbol is not found until the empty environment (the only environment having no parent) is reached, an error occurs.
This logic has some notable "strange" behaviors. For example,
env3 <- new.env() eval(quote(x <- 1), env3) ls.str(env3)
x : num 1
The assignment works as expected. However, if we supply a named list of values to serve the evaluation, and specify
env3as the enclosing environment, then the assignment does not work as some might expect.
eval(quote(y <- p), list(p = 1), env3) ls.str(env3)
x : num 1
It is understandable because
list(p = 1)provides a set of symbols that are given values. If the symbols in
exprare not defined in the list, then it should go to the enclosing environment and its parents to see if the symbol exists.
Therefore the assignment does not happen in
env3at all. Only symbol lookup happens there.
Meta-programming allows a function to interpret its arguments in its own way. For example, we can write a
slicefunction that perform easy subsetting with a vector using non-standard evaluation.
slice <- function(x, s) { s <- substitute(s) x[eval(s, list(. = length(x)))] }
substitute(s)prevents
sfrom being evaluated but substitute the input value by its expression. Then we can get a
callor a
nameso that we can manipulate it.
slice()does nothing special but evaluates argument
sin a non-standard way:
sis evaluated with a specially defined symbol whose value is the length of
x. Therefore we can use it to easily slice a vector
like
slice(1:10, 1:(.-3))
[1] 1 2 3 4 5 6 7
slice(1:10, c(1, .))
[1] 1 10
However,
slice()does not work correctly in the following example:
local({ p <- 3 slice(1:10, c(1,.,p)) })
Error in eval(expr, envir, enclos): object 'p' not found
In this case,
pis not found because
c(1,.,p)is not evaluated in the calling environment but the function environment whose parent is the environment where the function is defined (i.e. global environment). We need to modify
slice()to
always evaluate the expression in the environment where it is created.
slice <- function(x, s) { s <- substitute(s) x[eval(s, list(. = length(x)), parent.frame())] }
The enclosing environment is set to
parent.frame()to refer to the calling environment which, in this case, is exactly the context where the expression is fully meaningful.
local({ p <- 3 slice(1:10, c(1,.,p)) })
[1] 1 10 3
Using non-standard evaluation, you have to be careful. The above shows the first point: Evaluate the expression in a context where the expression is fully meaningful. In R, you need to take care of the environments to ensure the symbol search
path is correct. To do that, you need to be aware of the scope of the evaluation context.
Another point I want to stress is that a danger of non-standard evaluation is
potential clash of symbol interpretation. If two functions both use non-standard evaluation to facilitate some kind of tasks, they might clash on interpreting certain symbols.
For example, functions in rlist package and magrittr package use some non-standard evaluation to make things easy. In the following cases, they two might clash.
library("rlist") library("magrittr") data <- list("a","b","b","c","b","a") list.table(data, .)
. a b c 2 3 1
data %>% list.table(.)
Error in table(useNA = "ifany"): nothing to tabulate
In this case,
.is interpreted by
list.table()as the current element in iteration in
data, that is, it tries to create a table from
datadirectly by its element value. However,
%>%interprets
.differently:
it understands
.as the demand that the user wants to pipe
datato
.as an argument of
list.table. Therefore,
%>%basically evaluates
list.table(data)rather than
list.table(data, .). I call this behavior an interpretation clash, which might result in unexpected error.
Another example can be reproduced using dplyr and the latest release of magrittr. In the latest release of
%>%, it creates a chaining function easily by starting from
.. For example,
sapply(1:3, . %>% seq_len %>% sum)
[1] 1 3 6
. %>% seq_len %>% sumactually creates a function. It works by giving
.a special behavior: if
.appears as the start of the pipeline, then a functional sequence should be created. This largely facilitates creating such
functions in many cases. However, its risk isinterpretation clash when such magic is used with other functions giving the same symbol different behaviors. For example,
library("dplyr") mtcars %>% group_by(vs) %>% do(. %>% arrange(desc(mpg)) %>% head(3))
Error: Results are not data frames at positions: 1, 2
In this case
do()works with
.representing each group data frame. User might want to arrange each group by
mpgin descending order and take the top 3 records and finally get a combined data frame. However,
.encounters
an interpretation clash:
do()gives
.a special meaning but
%>%understands
.differently and creates a functional sequence which is not expected by
do().
mtcars %>% group_by(vs) %>% do(head(arrange(., desc(mpg)), 3))
Source: local data frame [6 x 11] Groups: vs mpg cyl disp hp drat wt qsec vs am gear carb 1 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 3 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 4 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 5 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
My conclusion is simple: non-standard evaluation can be magic, but be careful when you use it. It may produce unexpected errors because the evaluation context is wrong, or the interpretation of a symbol is inconsistent.
To leave
a comment for the author, please follow the link and comment on their blog: The
blog of Kun Ren.
相关文章推荐
- Using mutate from dplyr inside a function: getting around non-standard evaluation
- 考试篇(5.2) NSE4 题库 22. 硬件加速 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 21. IPv6 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 20. 诊断 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 19. 数据防泄漏 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 18. 认证 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 17. 单点登录 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 16. 入侵防御 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 14. 高可用性 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 13. 透明模式 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 12. 虚拟域 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 11. 路由 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 10. 应用控制 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 09. Web 过滤 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 08. 显式代理 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 07. 反病毒 ❀ 飞塔 (Fortinet) 网络安全专家
- 考试篇(5.2) NSE4 题库 04. 认证 ❀ 飞塔 (Fortinet) 网络安全专家