如何在 R 中的 data.table 中创建一个列,用于筛选原始数据表并根据条件返回一些值

How to create a column within a data.table in R that filters the original data table and returns some value based on a condition

提问人:user3466328 提问时间:11/17/2023 最后编辑:user3466328 更新时间:11/18/2023 访问量:59

问:

我有一个相当大的 data.table,我想创建一个新列,该列根据当前行中定义的过滤器搜索同一 data.table 的不同行。

请参阅以下示例:

library(data.table)
dt<-data.table(a=seq.Date(as.Date("2023-11-01"), as.Date("2023-11-30"), by="day"), 
               b=c(1:27, 46, 34, 101))
dt$a_m1<-c(seq.Date(as.Date("2023-11-16"), as.Date("2023-11-30"), by="day"), seq.Date(as.Date("2023-11-01"), as.Date("2023-11-15"), by="day"))

            a  b       a_m1
 1: 2023-11-01  1 2023-11-16
 2: 2023-11-02  2 2023-11-17
 3: 2023-11-03  3 2023-11-18
 4: 2023-11-04  4 2023-11-19
 5: 2023-11-05  5 2023-11-20
 6: 2023-11-06  6 2023-11-21
 7: 2023-11-07  7 2023-11-22
 8: 2023-11-08  8 2023-11-23
 9: 2023-11-09  9 2023-11-24
10: 2023-11-10 10 2023-11-25
11: 2023-11-11 11 2023-11-26
12: 2023-11-12 12 2023-11-27
13: 2023-11-13 13 2023-11-28
14: 2023-11-14 14 2023-11-29
15: 2023-11-15 15 2023-11-30
16: 2023-11-16 16 2023-11-01
17: 2023-11-17 17 2023-11-02
18: 2023-11-18 18 2023-11-03
19: 2023-11-19 19 2023-11-04
20: 2023-11-20 20 2023-11-05
21: 2023-11-21 21 2023-11-06
22: 2023-11-22 22 2023-11-07
23: 2023-11-23 23 2023-11-08
24: 2023-11-24 24 2023-11-09
25: 2023-11-25 25 2023-11-10
26: 2023-11-26 26 2023-11-11
27: 2023-11-27 27 2023-11-12
28: 2023-11-28 46 2023-11-13
29: 2023-11-29 34 2023-11-14
30: 2023-11-30 101 2023-11-15 

期望的输出为:

             a  b       a_m1 b_m1
 1: 2023-11-01  1 2023-11-16   16
 2: 2023-11-02  2 2023-11-17   17
 3: 2023-11-03  3 2023-11-18   18
 4: 2023-11-04  4 2023-11-19   19
 5: 2023-11-05  5 2023-11-20   20
 6: 2023-11-06  6 2023-11-21   21
 7: 2023-11-07  7 2023-11-22   22
 8: 2023-11-08  8 2023-11-23   23
 9: 2023-11-09  9 2023-11-24   24
10: 2023-11-10 10 2023-11-25   25
11: 2023-11-11 11 2023-11-26   26
12: 2023-11-12 12 2023-11-27   27
13: 2023-11-13 13 2023-11-28   46
14: 2023-11-14 14 2023-11-29   34
15: 2023-11-15 15 2023-11-30   101
16: 2023-11-16 16 2023-11-01    1
17: 2023-11-17 17 2023-11-02    2
18: 2023-11-18 18 2023-11-03    3
19: 2023-11-19 19 2023-11-04    4
20: 2023-11-20 20 2023-11-05    5
21: 2023-11-21 21 2023-11-06    6
22: 2023-11-22 22 2023-11-07    7
23: 2023-11-23 23 2023-11-08    8
24: 2023-11-24 24 2023-11-09    9
25: 2023-11-25 25 2023-11-10   10
26: 2023-11-26 26 2023-11-11   11
27: 2023-11-27 27 2023-11-12   12
28: 2023-11-28 46 2023-11-13   13
29: 2023-11-29 34 2023-11-14   14
30: 2023-11-30 101 2023-11-15   15

对此的帮助将不胜感激。 我尝试了如下代码:

dt[,b_m1:=dt[a_m1==dt$a]$b,]

但无济于事。

r 数据表

评论

0赞 Friede 11/17/2023
dt[, b_m1 := format(a_m1, "%d"), ]
0赞 user3466328 11/17/2023
嗨,@Friede虽然这确实产生了所需的输出,但它的计算公式是。您注意到 day in 与 in 的值匹配,但在我的实际用例中不一定是这种情况。所以这不会一概而论。a_m1a_m1b_m1
0赞 Friede 11/17/2023
它适用于您的玩具数据并产生所需的输出。如果您想要其他东西,请提供相应的详细信息。换言之,无论是数据示例还是所需输出的演示都不准确。
0赞 user3466328 11/17/2023
这是准确的,你只是选择了一条捷径,一条不会一概而论的捷径。但如你所愿,我编辑了玩具数据集。
1赞 r2evans 11/18/2023
我有点同意@Friede,当建议的代码没有达到意图时,你有责任调整你的示例数据,这有时意味着你可能需要以不同的方式解释你的需求。例如,为此,“当前行中定义的过滤器”提供的细节不足,假设您提供的代码将解释您真正需要的内容并不是一个安全的假设。话虽如此,我从你的代码中猜测你需要一个自加入;如果我错了,那么我认为如果您扩展条件会有所帮助。

答:

1赞 r2evans 11/18/2023 #1

我想你想要一个自加入任务。

dt[dt, b_m1 := i.b, on = .(a == a_m1)]
#              a     b       a_m1  b_m1
#         <Date> <num>     <Date> <num>
#  1: 2023-11-01     1 2023-11-16    16
#  2: 2023-11-02     2 2023-11-17    17
#  3: 2023-11-03     3 2023-11-18    18
#  4: 2023-11-04     4 2023-11-19    19
#  5: 2023-11-05     5 2023-11-20    20
#  6: 2023-11-06     6 2023-11-21    21
#  7: 2023-11-07     7 2023-11-22    22
#  8: 2023-11-08     8 2023-11-23    23
#  9: 2023-11-09     9 2023-11-24    24
# 10: 2023-11-10    10 2023-11-25    25
# ---                                  
# 21: 2023-11-21    21 2023-11-06     6
# 22: 2023-11-22    22 2023-11-07     7
# 23: 2023-11-23    23 2023-11-08     8
# 24: 2023-11-24    24 2023-11-09     9
# 25: 2023-11-25    25 2023-11-10    10
# 26: 2023-11-26    26 2023-11-11    11
# 27: 2023-11-27    27 2023-11-12    12
# 28: 2023-11-28    46 2023-11-13    13
# 29: 2023-11-29    34 2023-11-14    14
# 30: 2023-11-30   101 2023-11-15    15

all.equal(dt, desired)
# [1] TRUE

一般来说,这是一个合并/联接,StackOverflow 上的一个很好的特定问答是使用 data.table 的左联接。有关更一般的概念,请参阅data.table


数据:

# calculated from the question
dt <- data.table::as.data.table(structure(list(a = structure(c(19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676, 19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687, 19688, 19689, 19690, 19691), class = "Date"), b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 46, 34, 101), a_m1 = structure(c(19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687, 19688, 19689, 19690, 19691,  19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676), class = "Date"), b_m1 = c(16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 46, 34, 101, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)), row.names = c(NA, -30L), class = c("data.table", "data.frame")))
# copied and Date-ified from the question text
desired <- data.table::as.data.table(structure(list(a = structure(c(19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676, 19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687, 19688, 19689, 19690, 19691), class = "Date"), b = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 46L, 34L, 101L), a_m1 = structure(c(19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687,  19688, 19689, 19690, 19691, 19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676), class = "Date"), b_m1 = c(16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 46L, 34L, 101L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), row.names = c(NA, -30L), class = c("data.table", "data.frame")))

评论

1赞 user3466328 11/18/2023
这正是我所需要的,谢谢@r2evans的帮助!
1赞 B. Christian Kamgang 11/18/2023 #2

您还可以按如下方式解决您的问题:

dt[, b_m1 := b[match(a, a_m1)]]

             a     b       a_m1  b_m1
 1: 2023-11-01     1 2023-11-16    16
 2: 2023-11-02     2 2023-11-17    17
 3: 2023-11-03     3 2023-11-18    18
 4: 2023-11-04     4 2023-11-19    19
 5: 2023-11-05     5 2023-11-20    20
 6: 2023-11-06     6 2023-11-21    21
 7: 2023-11-07     7 2023-11-22    22
----
23: 2023-11-23    23 2023-11-08     8
24: 2023-11-24    24 2023-11-09     9
25: 2023-11-25    25 2023-11-10    10
26: 2023-11-26    26 2023-11-11    11
27: 2023-11-27    27 2023-11-12    12
28: 2023-11-28    46 2023-11-13    13
29: 2023-11-29    34 2023-11-14    14
30: 2023-11-30   101 2023-11-15    15

评论

0赞 user3466328 11/18/2023
这个答案太棒了,干净多了,真的很喜欢!谢谢