在 R data.table 计算中使用上一行中的值-解网

问：

我想在 data.table 中创建一个新列，该列是根据一列的当前值和另一列的前一列计算得出的。是否可以访问前几行？

例如：

> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100)
> DT
   A  B   C
1: 1 10 100
2: 2 20 200
3: 3 30 300
4: 4 40 400
5: 5 50 500
> DT[, D := C + BPreviousRow] # What is the correct code here?

正确答案应该是

> DT
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

r 数据表

0赞 PatrickT 6/21/2016

我通常为我的data.tables设置一个键：DT <- data.table(A=..., key = "A")

答：

116赞 Arun 2/4/2013 #1

在 v1.9.6 中实现，这非常简单。shift()

DT[ , D := C + shift(B, 1L, type="lag")]
# or equivalently, in this case,
DT[ , D := C + shift(B)]

来自新闻：

新功能实现了 vector、list、data.frames 或 data.tables 的快速运行。它接受一个参数，该参数可以是“滞后”（默认）或“领先”。它可以非常方便地与或一起使用。例如：。请查看以获取更多信息。shift()lead/lagtype:=set()DT[, (cols) := shift(.SD, 1L), by=id]?shift

查看历史记录以获取以前的答案。

0赞 SlowLearner 2/4/2013

这是否包含当前行号或其他东西？很抱歉在这里问，但我似乎在帮助文件中找不到它.......N

7赞 Steve Lianoglou 2/5/2013

@SlowLearner：您可能还会发现它很有用，它保存了当前组中行的行索引。.I

7赞 mnel 2/5/2013

使用 seq_len（.N - 1）而不是 1：（.N-1）。这避免了与 1：0 相关的问题。

1赞 MichaelChirico 4/27/2015

+1 的例子——我试图使用 a 并得到时髦的结果。这要简单得多。.SDlapply

0赞 skan 5/1/2015

我在哪里可以找到包含所有这些新信息的更新的 pdf？官方的 1.9.4 小插图和 webminars 不包括它。而且 Rmd 1.9.5 的小插图不舒服，也不包括它。

9赞 Ryogi 2/4/2013 #2

按照 Arun 的求解，可以得到类似的结果，而无需参考.N

> DT[, D := C + c(NA, head(B, -1))][]
   A  B   C   D
1: 1 10 100  NA
2: 2 20 200 210
3: 3 30 300 320
4: 4 40 400 430
5: 5 50 500 540

0赞 Corvus 2/5/2013

有理由偏爱一种方法而不是另一种方法吗？还是仅仅是审美上的差异？

0赞 Ryogi 2/5/2013

我认为在这种情况下（即在容易获得的地方），它主要是审美选择。我不知道有什么重要的区别。.N

13赞 Gary Weissman 5/4/2014 #3

根据上面@Steve Lianoglou 的评论，为什么不只是：

DT[, D:= C + c(NA, B[.I - 1]) ]
#    A  B   C   D
# 1: 1 10 100  NA
# 2: 2 20 200 210
# 3: 3 30 300 320
# 4: 4 40 400 430
# 5: 5 50 500 540

并避免使用或或任何其他功能。seq_lenhead

2赞 Matthew 9/3/2014

不错 - 但是，如果您想在组中找到前一个，这将不起作用。

1赞 Gary Weissman 2/16/2015

@Matthew你是对的。如果按组子集，我会替换为.Iseq_len(.N)

24赞 dnlbrky 8/2/2014 #4

有几个人已经回答了这个具体问题。请参阅下面的代码，了解我在此类情况下使用的通用函数，这可能会有所帮助。除了获取前一行之外，您还可以根据需要在“过去”或“未来”中获取任意数量的行。

rowShift <- function(x, shiftLen = 1L) {
  r <- (1L + shiftLen):(length(x) + shiftLen)
  r[r<1] <- NA
  return(x[r])
}

# Create column D by adding column C and the value from the previous row of column B:
DT[, D := C + rowShift(B,-1)]

# Get the Old Faithul eruption length from two events ago, and three events in the future:
as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions,
                                 eruptLengthTwoPrior=rowShift(eruptions,-2), 
                                 eruptLengthThreeFuture=rowShift(eruptions,3))]
##   eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture
##1:              3.600                  NA                  2.283
##2:              1.800                  NA                  4.533
##3:              3.333               3.600                     NA
##4:              2.283               1.800                     NA
##5:              4.533               3.333                     NA

0赞 geneorama 11/4/2014

这是一个绝妙的答案，我很生气我已经对其他答案投了赞成票，因为这是一个更笼统的答案。事实上，我将在我的 geneorama 包中使用它（如果你不介意的话）。

0赞 dnlbrky 11/4/2014

当然，去做吧。我希望获得一些空闲时间并将其作为拉取请求提交到包中，但是唉......data.table

0赞 dnlbrky 2/20/2015

自 1.9.5 版起，添加了一个名为的类似函数。请参阅@Arun更新的答案。shiftdata.table

57赞 Steven Beaupré 4/27/2015 #5

使用你可以做：dplyr

mutate(DT, D = lag(B) + C)

这给出了：

#   A  B   C   D
#1: 1 10 100  NA
#2: 2 20 200 210
#3: 3 30 300 320
#4: 4 40 400 430
#5: 5 50 500 540

2赞 Abdullah Al Mahmud 7/5/2018 #6

这是我的直观解决方案：

#create data frame
df <- data.frame(A=1:5, B=seq(10,50,10), C=seq(100,500, 100))`
#subtract the shift from num rows
shift  <- 1 #in this case the shift is 1
invshift <- nrow(df) - shift
#Now create the new column
df$D <- c(NA, head(df$B, invshift)+tail(df$C, invshift))`

这里，行数减去 1，是 4。提供数据框或矢量中的行数。同样，如果您想取更早的值，请从 nrow 2、3、...等等，并相应地将 NA 放在开头。invshiftnrow(df)

-2赞 Rafael Braga 1/30/2020 #7

它可以在循环中完成。

# Create the column D
DT$D <- 0
# for every row in DT
for (i in 1:length(DT$A)) {
  if(i==1) {
    #using NA at first line
    DT[i,4] <- NA
  } else {
    #D = C + BPreviousRow
    DT[i,4] <- DT[i,3] + DT[(i-1), 2]   
  }
}

使用 for，您甚至可以使用此新列的行的上一个值DT[(i-1), 4]

上一个：将 data.frame 从宽格式调整为长格式

下一个：如果一列中的值匹配，则计算差值

在 R data.table 计算中使用上一行中的值

Use a value from the previous row in an R data.table calculation

评论

评论

评论

评论

评论