您的位置：首页 > 编程语言 > Lua

Tips on non-standard evaluation in R

2016-05-24 08:36 483 查看

One of my favorite features of R is its meta-programming facilities. It can be simply demonstrated by the following examples.

An ordinary use of R is to do statistical computing. We can evaluate something like

sin(0)

[1] 0

Meta-programming in R allows users to manipulate the expression to evaluate. We can use

quote

to create an object that represents a function call.

quote(sin(0))

sin(0)

In this way,

sin(0)

is not evaluated but parsed as a

call

object which basically can be represented as a list of function name and the arguments.

as.list(quote(sin(0)))

[[1]]
sin

[[2]]
[1] 0

Now we can use some functions to manipulate the expression so that we can alter the expression to evaluate.

expr <- quote(sin(0))expr[[1L]] <- quote(cos)
expr

cos(0)

Now we can see the expression is modified. This feature, as stated in the official documentation, is computing on language, that is, R not only is able to compute on literal values, but also on language itself. Then what can we do with the modified
expression? We can evaluate it using

eval()

as if we do in the console.

eval(expr)

[1] 1

The meta-programming feature requires the definition of language objects and a meta-function to evaluate such a language object. In R, a

call

object represents a function call like

sin(x)

, a

name

symbol

represents
a variable/symbol like

, a

numeric

character

, etc. represents literal values like

"a"

, and finally

eval()

evaluates such a language object in a specific context.

The evaluation context matters when we evaluate an expression containing symbols that are not self-contained. Consider the following example.

expr <- quote(sin(x))
eval(expr)

Error in eval(expr, envir, enclos): object 'x' not found

is not found because in the evaluation environment, there is no value assigned to symbol

. If we assign some value to

now,

x <- 0
eval(expr)

[1] 0

and evaluate the expression again,

can be found and the value can be successfully calculated. More specifically,

expr

is evaluated in the global environment. Once

is given a value in this environment, the expression
can be evaluated.

We can create our own environment for an expression to evaluate using

new.env()

env <- new.env()
env$y <- 0
expr <- quote(cos(y))
expr

cos(y)

If we evaluate

cos(y)

directly in the global environment, it should produce an error.

eval(expr)

Error in eval(expr, envir, enclos): object 'y' not found

If we evaluate it in

env

where

is properly defined, then it should produce the right result.

eval(expr, env)

[1] 1

We can also create two environments, one being the parent of the other, and evaluate the expression.

env1 <- new.env()
env1$x <- 1
env2 <- new.env(parent = env1)
env2$y <- 2

Now we have two environments,

env1

and

env2

env2

's parent is

env1

, and

env1

's parent is global environment. This means that when

eval()

evaluates an expression in the context of

env2

and
encounters a symbol that is not defined in it, it will go to

env1

, and then global environment, package environments, and base environment. Now we evaluate a simple arithmetic expression containing symbols in these two environments.

expr <- quote(x + y)
eval(expr, env1)

Error in eval(expr, envir, enclos): object 'y' not found

eval(expr, env2)

[1] 3

In fact,

eval(expr, envir, enclos)

basically follows the following logic to evaluate a quoted expression:
If

envir

is an

environment

, then evaluate

expr

envir

by looking for symbols all the way along

envir

and its parent environments until found.
If

envir

is a

list

, then evaluate

expr

given the symbols defined in the list; Whenever a symbol is not found in the list, the function will go to

enclos

environment to find along the chain until found.
If a symbol is not found until the empty environment (the only environment having no parent) is reached, an error occurs.

This logic has some notable "strange" behaviors. For example,

env3 <- new.env()
eval(quote(x <- 1), env3)
ls.str(env3)

x :  num 1

The assignment works as expected. However, if we supply a named list of values to serve the evaluation, and specify

env3

as the enclosing environment, then the assignment does not work as some might expect.

eval(quote(y <- p), list(p = 1), env3)
ls.str(env3)

x :  num 1

It is understandable because

list(p = 1)

provides a set of symbols that are given values. If the symbols in

expr

are not defined in the list, then it should go to the enclosing environment and its parents to see if the symbol exists.
Therefore the assignment does not happen in

env3

at all. Only symbol lookup happens there.

Meta-programming allows a function to interpret its arguments in its own way. For example, we can write a

slice

function that perform easy subsetting with a vector using non-standard evaluation.

slice <- function(x, s) {
s <- substitute(s)
x[eval(s, list(. = length(x)))]
}

substitute(s)

prevents

from being evaluated but substitute the input value by its expression. Then we can get a

call

or a

name

so that we can manipulate it.

slice()

does nothing special but evaluates argument

in a non-standard way:

is evaluated with a specially defined symbol whose value is the length of

. Therefore we can use it to easily slice a vector
like

slice(1:10, 1:(.-3))

[1] 1 2 3 4 5 6 7

slice(1:10, c(1, .))

[1]  1 10

However,

slice()

does not work correctly in the following example:

local({
p <- 3
slice(1:10, c(1,.,p))
})

Error in eval(expr, envir, enclos): object 'p' not found

In this case,

is not found because

c(1,.,p)

is not evaluated in the calling environment but the function environment whose parent is the environment where the function is defined (i.e. global environment). We need to modify

slice()

to
always evaluate the expression in the environment where it is created.

slice <- function(x, s) {
s <- substitute(s)
x[eval(s, list(. = length(x)), parent.frame())]
}

The enclosing environment is set to

parent.frame()

to refer to the calling environment which, in this case, is exactly the context where the expression is fully meaningful.

local({
p <- 3
slice(1:10, c(1,.,p))
})

[1]  1 10  3

Using non-standard evaluation, you have to be careful. The above shows the first point: Evaluate the expression in a context where the expression is fully meaningful. In R, you need to take care of the environments to ensure the symbol search
path is correct. To do that, you need to be aware of the scope of the evaluation context.

Another point I want to stress is that a danger of non-standard evaluation is

potential clash of symbol interpretation. If two functions both use non-standard evaluation to facilitate some kind of tasks, they might clash on interpreting certain symbols.

For example, functions in rlist package and magrittr package use some non-standard evaluation to make things easy. In the following cases, they two might clash.

library("rlist")
library("magrittr")
data <- list("a","b","b","c","b","a")
list.table(data, .)

.
a b c
2 3 1

data %>% list.table(.)

Error in table(useNA = "ifany"): nothing to tabulate

In this case,

is interpreted by

list.table()

as the current element in iteration in

data

, that is, it tries to create a table from

data

directly by its element value. However,

%>%

interprets

differently:
it understands

as the demand that the user wants to pipe

data

as an argument of

list.table

. Therefore,

%>%

basically evaluates

list.table(data)

rather than

list.table(data,
.)

. I call this behavior an interpretation clash, which might result in unexpected error.

Another example can be reproduced using dplyr and the latest release of magrittr. In the latest release of

%>%

, it creates a chaining function easily by starting from

. For example,

sapply(1:3, . %>% seq_len %>% sum)

[1] 1 3 6

. %>% seq_len %>% sum

actually creates a function. It works by giving

a special behavior: if

appears as the start of the pipeline, then a functional sequence should be created. This largely facilitates creating such
functions in many cases. However, its risk isinterpretation clash when such magic is used with other functions giving the same symbol different behaviors. For example,

library("dplyr")
mtcars %>%
group_by(vs) %>%
do(. %>%
arrange(desc(mpg)) %>%
head(3))

Error: Results are not data frames at positions: 1, 2

In this case

do()

works with

representing each group data frame. User might want to arrange each group by

mpg

in descending order and take the top 3 records and finally get a combined data frame. However,

encounters
an interpretation clash:

do()

gives

a special meaning but

%>%

understands

differently and creates a functional sequence which is not expected by

do()

mtcars %>%
group_by(vs) %>%
do(head(arrange(., desc(mpg)), 3))

Source: local data frame [6 x 11]
Groups: vs

mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
2 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
3 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
4 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
5 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
6 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2

My conclusion is simple: non-standard evaluation can be magic, but be careful when you use it. It may produce unexpected errors because the evaluation context is wrong, or the interpretation of a symbol is inconsistent.

To leave
a comment for the author, please follow the link and comment on their blog: The
blog of Kun Ren.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： nse

相关文章推荐

新的分享

章节导航