提问人:bodega18 提问时间:2/24/2022 更新时间:2/24/2022 访问量:209
使用滞后值 r 填充多个 NA
Fill in Multiple NAs with Lagged Values R
问:
我正在尝试使用成本列中最新的非 NA 值填充此数据框中的 NA 值。我想按城市分组 - 所以奥马哈的所有 NA 都应该是 44.50,林肯的 NA 应该是 62.50。这是我一直在使用的代码 - 它用正确的值替换每个组的第一个 NA(四月),但不会填充超过该值。
df <- df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))
运行代码前的数据:
year month city cost
2021 January Omaha 45.50
2021 February Omaha 46.75
2021 March Omaha 44.50
2021 April Omaha NA
2021 May Omaha NA
2021 June Omaha NA
2021 January Lincoln 55.25
2021 February Lincoln 53.80
2021 March Lincoln 62.50
2021 April Lincoln NA
2021 May Lincoln NA
2021 June Lincoln NA
答:
3赞
deschen
2/24/2022
#1
用:
library(tidyverse)
df %>%
group_by(city) %>%
fill(cost)
# A tibble: 12 x 4
# Groups: city [2]
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
1赞
AndrewGB
2/24/2022
#2
对于您的代码,您会希望使用而不是(尽管这里是更好的选择)。我们还需要包装.last
lag
fill
cost
na.omit
library(tidyverse)
df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))
输出
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
数据
df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L,
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January",
"February", "March", "April", "May", "June", "January", "February",
"March", "April", "May", "June"), city = c("Omaha", "Omaha",
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln",
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5,
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-12L))
评论