如果 DataFrame 有任何 NA 值,如何使用 Pivot Wide(透视图)

how to use pivot wider if dataframe have any NA value

提问人:Fadhil Dzikri 提问时间:3/14/2023 最后编辑:jay.sfFadhil Dzikri 更新时间:7/25/2023 访问量:49

问:

我有一个这样的数据帧

df = data.frame(day = c("1", NA, NA, NA, NA, "2", NA, NA, NA),
                Unit = c("unit1", NA, NA, NA, "unit2", "unit1", NA, NA, "unit2"),
                Problem = c("Oil", "Engine", "Electric", NA, NA, "Oil", "Power", NA, NA),
                duration = c(2, 5, 1, NA, NA, 1.5, 3, NA, NA))

第 1:5 行是第 1 天,第 6:9 行是第 2 天,如果同一列中有重复值,则值为 NA。

我试着用

df %>% 
  pivot_wider(names_from = Problem, values_from = duration)

但它没有用, 我的预期 DF 是这样的

df1 = data.frame(day = c("1", "1", "2", "2"),
                 Unit = c("unit1", "unit2", "unit1", "unit2"),
                 Oil = c(2, 0, 1.5, 0),
                 Engine = c(5, 0, 0, 0),
                 Electric = c(1, 0, 0, 0),
                 Power = c(0, 0, 3, 0),
                 NoProblem = c(0, 0, 0, 0))
r tidyr 数据操作 重塑

评论


答:

2赞 akrun 3/14/2023 #1

我们用以前的非 NA 元素对 NA 进行 NA,获取行,并用filldistinctpivot_wider

library(dplyr)
library(tidyr)
df %>% 
  fill(day, Unit, Problem) %>%
  distinct(day, Unit, Problem, .keep_all = TRUE) %>% 
  mutate(duration = replace_na(duration, 0)) %>% 
  pivot_wider(names_from = Problem, values_from = duration, 
   values_fill = 0) %>% 
  mutate(NoProblem = 0)

-输出

# A tibble: 4 × 7
  day   Unit    Oil Engine Electric Power NoProblem
  <chr> <chr> <dbl>  <dbl>    <dbl> <dbl>     <dbl>
1 1     unit1   2        5        1     0         0
2 1     unit2   0        0        0     0         0
3 2     unit1   1.5      0        0     3         0
4 2     unit2   0        0        0     0         0
2赞 jay.sf 3/14/2023 #2

基本上,首先用 填充 day 和 unit,然后 。zoo::na.locfreshape

df[1:2] <- zoo::na.locf(df[1:2])
(res <- reshape(df, direction='wide', idvar=c('Unit', 'day'), timevar='Problem'))
#   day  Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1   1 unit1          2.0               5                 1          NA             NA
# 5   1 unit2           NA              NA                NA          NA             NA
# 6   2 unit1          1.5              NA                NA          NA              3
# 9   2 unit2           NA              NA                NA          NA             NA

在这里抛出警告,因为没有问题。unit2

NAs 可以是 d 和 ,replace0

replace(res, is.na(res), 0)
#   day  Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1   1 unit1          2.0               5                 1           0              0
# 5   1 unit2          0.0               0                 0           0              0
# 6   2 unit1          1.5               0                 0           0              3
# 9   2 unit2          0.0               0                 0           0              0

但这样做并不真正正确,而且它们实际上并不重要,因为您经常可以在进行诸如na.rm

by(res[3:7], res$Unit, colSums, na.rm=TRUE)
# res$Unit: unit1
#     duration.Oil   duration.Engine duration.Electric       duration.NA    duration.Power 
#              3.5               5.0               1.0               0.0               3.0 
# ------------------------------------------------------------------------------------------------------ 
# res$Unit: unit2
#     duration.Oil   duration.Engine duration.Electric       duration.NA    duration.Power 
#                0                 0                 0                 0                 0