提问人:Ricardo Saporta 提问时间:4/29/2013 最后编辑:smciRicardo Saporta 更新时间:7/26/2015 访问量:4132
ifelse 真的每次都计算它的两个向量吗?慢吗?
Does ifelse really calculate both of its vectors every time? Is it slow?
答:
78赞
Ricardo Saporta
4/29/2013
#1
是的。(有例外)
ifelse
计算其值和值。除非条件是全部或全部。yes
no
test
TRUE
FALSE
我们可以通过生成随机数并观察实际生成了多少个数字来了解这一点。(通过还原 )。seed
# TEST CONDITION, ALL TRUE
set.seed(1)
dump <- ifelse(rep(TRUE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.true <- rnorm(1)
# TEST CONDITION, ALL FALSE
set.seed(1)
dump <- ifelse(rep(FALSE, 200), rnorm(200), rnorm(200))
next.random.number.after.all.false <- rnorm(1)
# TEST CONDITION, MIXED
set.seed(1)
dump <- ifelse(c(FALSE, rep(TRUE, 199)), rnorm(200), rnorm(200))
next.random.number.after.some.TRUE.some.FALSE <- rnorm(1)
# RESET THE SEED, GENERATE SEVERAL RANDOM NUMBERS TO SEARCH FOR A MATCH
set.seed(1)
r.1000 <- rnorm(1000)
cat("Quantity of random numbers generated during the `ifelse` statement when:",
"\n\tAll True ", which(r.1000 == next.random.number.after.all.true) - 1,
"\n\tAll False ", which(r.1000 == next.random.number.after.all.false) - 1,
"\n\tMixed T/F ", which(r.1000 == next.random.number.after.some.TRUE.some.FALSE) - 1
)
给出以下输出:
Quantity of random numbers generated during the `ifelse` statement when:
All True 200
All False 200
Mixed T/F 400 <~~ Notice TWICE AS MANY numbers were
generated when `test` had both
T & F values present
我们也可以在源代码本身中看到它:
.
.
if (any(test[!nas]))
ans[test & !nas] <- rep(yes, length.out = length(ans))[test & # <~~~~ This line and the one below
!nas]
if (any(!test[!nas]))
ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & # <~~~~ ... are the cluprits
!nas]
.
.
请注意,仅当存在
是 IS 或(分别)的一些非值。
在这一点上--这是效率方面最重要的部分--计算每个向量的全部。yes
no
NA
test
TRUE
FALSE
好的,但是它更慢吗?
让我们看看我们是否可以测试它:
library(microbenchmark)
# Create some sample data
N <- 1e4
set.seed(1)
X <- sample(c(seq(100), rep(NA, 100)), N, TRUE)
Y <- ifelse(is.na(X), rnorm(X), NA) # Y has reverse NA/not-NA setup than X
这两个语句生成相同的结果
yesifelse <- quote(sort(ifelse(is.na(X), Y+17, X-17 ) ))
noiflese <- quote(sort(c(Y[is.na(X)]+17, X[is.na(Y)]-17)))
identical(eval(yesifelse), eval(noiflese))
# [1] TRUE
但一个是另一个的两倍
microbenchmark(eval(yesifelse), eval(noiflese), times=50L)
N = 1,000
Unit: milliseconds
expr min lq median uq max neval
eval(yesifelse) 2.286621 2.348590 2.411776 2.537604 10.05973 50
eval(noiflese) 1.088669 1.093864 1.122075 1.149558 61.23110 50
N = 10,000
Unit: milliseconds
expr min lq median uq max neval
eval(yesifelse) 30.32039 36.19569 38.50461 40.84996 98.77294 50
eval(noiflese) 12.70274 13.58295 14.38579 20.03587 21.68665 50
评论
1赞
Simon O'Hanlon
4/29/2013
我+1这个,因为我认为你已经做了非常彻底的调查工作,即使我认为你正在比较两个不同的东西!
0赞
Ricardo Saporta
4/29/2013
顺便说一句,我不是在抨击.事实上,我一直在使用它,除非我需要效率。ifelse
3赞
Simon O'Hanlon
4/29/2013
我现在更明白了这一点。如果可以的话,我会给+2。我明白你的意思了。最好使用 类似的东西而不是默认值来停止不必要的 计算。ifelse
rep(yes, length.out = length(ans) - sum(! test & ok ) )
rep(yes, length.out = length(ans))[test & !nas]
yes
1赞
Ricardo Saporta
4/29/2013
实际重复 和 可以忽略不计。但只是在赋值时,被评估,同样在赋值时也被评估。因此成本yes
no
yes
yes
no
no
8赞
hadley
4/29/2013
没有办法在 R 中“部分”计算向量,所以实际上只有一种方法可以工作。ifelse
评论