使用滞后值 r 填充多个 NA

Fill in Multiple NAs with Lagged Values R

提问人:bodega18 提问时间:2/24/2022 更新时间:2/24/2022 访问量:209

问:

我正在尝试使用成本列中最新的非 NA 值填充此数据框中的 NA 值。我想按城市分组 - 所以奥马哈的所有 NA 都应该是 44.50,林肯的 NA 应该是 62.50。这是我一直在使用的代码 - 它用正确的值替换每个组的第一个 NA(四月),但不会填充超过该值。

df <- df %>% 
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))

运行代码前的数据:

year   month      city     cost
2021   January    Omaha     45.50  
2021   February   Omaha     46.75
2021   March      Omaha     44.50
2021   April      Omaha     NA
2021   May        Omaha     NA
2021   June       Omaha     NA
2021   January    Lincoln   55.25
2021   February   Lincoln   53.80
2021   March      Lincoln   62.50
2021   April      Lincoln   NA
2021   May        Lincoln   NA
2021   June       Lincoln   NA
r dplyr 操作 数据 清理

评论


答:

3赞 deschen 2/24/2022 #1

用:

library(tidyverse)

df %>% 
  group_by(city) %>%
  fill(cost)

# A tibble: 12 x 4
# Groups:   city [2]
    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5
1赞 AndrewGB 2/24/2022 #2

对于您的代码,您会希望使用而不是(尽管这里是更好的选择)。我们还需要包装.lastlagfillcostna.omit

library(tidyverse)

df %>%
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))

输出

    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

数据

df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January", 
"February", "March", "April", "May", "June", "January", "February", 
"March", "April", "May", "June"), city = c("Omaha", "Omaha", 
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln", 
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5, 
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-12L))