提问人:Nova 提问时间:9/27/2023 更新时间:9/27/2023 访问量:23
当引用 tibble 包含 NA/缺失值时,与 dplyr 重叠连接
Overlapping join with dplyr when reference tibble contains NA / missing values
问:
我想连接两个数据帧:参考数据,包含描述一个人何时戴项圈的数据;和 ,包含描述项圈在给定时间的位置的数据。项圈可以在人与人之间移动,如果项圈仍然戴着,则该项圈没有定义。refdata
movedata
timeend
以下是一些示例数据和我尝试过的联接之一:
library(dplyr)
refdata <- tibble(person = c("Jess", "Andy", "Cody"),
collar = c("a", "b", "a"),
timestart = c(1, 2, 5),
timeend = c(3, 4, NA))
movedata <- tibble(collar = c("a", "a", "a", "a", "a", "a", "a",
"b", "b", "b", "b", "b", "b", "b", "c", "c", "c"),
time = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3),
location = c("skatepark", "home", "library", "street", "crib", "crib", "bassinet", "street", "work", "work", "home", "street", "street", "street", "street", "street", "street"))
by <- join_by(collar,
between(time, timestart, timeend))
check <- full_join(movedata, refdata, by)
# View input data
refdata
# A tibble: 3 × 4
person collar timestart timeend
<chr> <chr> <dbl> <dbl>
1 Jess a 1 3
2 Andy b 2 4
3 Cody a 5 NA
movedata
# A tibble: 17 × 3
collar time location
<chr> <dbl> <chr>
1 a 1 skatepark
2 a 2 home
3 a 3 library
4 a 4 street
5 a 5 crib
6 a 6 crib
7 a 7 bassinet
8 b 1 street
9 b 2 work
10 b 3 work
11 b 4 home
12 b 5 street
13 b 6 street
14 b 7 street
15 c 1 street
16 c 2 street
17 c 3 street
# result of join (not desired output):
check
# A tibble: 18 × 6
collar time location person timestart timeend
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 a 1 skatepark Jess 1 3
2 a 2 home Jess 1 3
3 a 3 library Jess 1 3
4 a 4 street NA NA NA
5 a 5 crib NA NA NA
6 a 6 crib NA NA NA
7 a 7 bassinet NA NA NA
8 b 1 street NA NA NA
9 b 2 work Andy 2 4
10 b 3 work Andy 2 4
11 b 4 home Andy 2 4
12 b 5 street NA NA NA
13 b 6 street NA NA NA
14 b 7 street NA NA NA
15 c 1 street NA NA NA
16 c 2 street NA NA NA
17 c 3 street NA NA NA
18 a NA NA Cody 5 NA
但我想要的结果是,科迪已经戴上了,并且仍然戴着项圈:
# Expected result
# A tibble: 17 × 6
collar time location person timestart timeend
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 a 1 skatepark Jess 1 3
2 a 2 home Jess 1 3
3 a 3 library Jess 1 3
4 a 4 street NA 5 NA
5 a 5 crib Cody 5 NA
6 a 6 crib Cody 5 NA
7 a 7 bassinet Cody 5 NA
8 b 1 street NA NA NA
9 b 2 work Andy 2 4
10 b 3 work Andy 2 4
11 b 4 home Andy 2 4
12 b 5 street NA NA NA
13 b 6 street NA NA NA
14 b 7 street NA NA NA
15 c 1 street NA NA NA
16 c 2 street NA NA NA
17 c 3 street NA NA NA
我对两种解决方案感兴趣:一种是创建上面的预期表,另一种是删除任何与人无关的行(因此上面的第 8 行和第 12-17 行)。
答:
2赞
Maël
9/27/2023
#1
一种方法是将 NA 替换为:timeend
Inf
refdata$timeend <- replace(refdata$timeend, is.na(refdata$timeend), Inf)
by <- join_by(collar,
between(time, timestart, timeend))
full_join(movedata, refdata, by)
# # A tibble: 17 × 6
# collar time location person timestart timeend
# <chr> <dbl> <chr> <chr> <dbl> <dbl>
# 1 a 1 skatepark Jess 1 3
# 2 a 2 home Jess 1 3
# 3 a 3 library Jess 1 3
# 4 a 4 street NA NA NA
# 5 a 5 crib Cody 5 Inf
# 6 a 6 crib Cody 5 Inf
# 7 a 7 bassinet Cody 5 Inf
# 8 b 1 street NA NA NA
# 9 b 2 work Andy 2 4
# 10 b 3 work Andy 2 4
# 11 b 4 home Andy 2 4
# 12 b 5 street NA NA NA
# 13 b 6 street NA NA NA
# 14 b 7 street NA NA NA
# 15 c 1 street NA NA NA
# 16 c 2 street NA NA NA
# 17 c 3 street NA NA NA
对于第二个预期输出,请检查:inner_join
inner_join(movedata, refdata, by)
# # A tibble: 9 × 6
# collar time location person timestart timeend
# <chr> <dbl> <chr> <chr> <dbl> <dbl>
# 1 a 1 skatepark Jess 1 3
# 2 a 2 home Jess 1 3
# 3 a 3 library Jess 1 3
# 4 a 5 crib Cody 5 Inf
# 5 a 6 crib Cody 5 Inf
# 6 a 7 bassinet Cody 5 Inf
# 7 b 2 work Andy 2 4
# 8 b 3 work Andy 2 4
# 9 b 4 home Andy 2 4
评论
0赞
Nova
9/30/2023
是的!这样可以正确地完成这两项任务。这很容易 - 我确实想知道是否有任何方法可以在连接中指定这一点,这样我就不必更改我的数据帧。
评论