提问人:Fadhil Dzikri 提问时间:3/14/2023 最后编辑:jay.sfFadhil Dzikri 更新时间:7/25/2023 访问量:49
如果 DataFrame 有任何 NA 值,如何使用 Pivot Wide(透视图)
how to use pivot wider if dataframe have any NA value
问:
我有一个这样的数据帧
df = data.frame(day = c("1", NA, NA, NA, NA, "2", NA, NA, NA),
Unit = c("unit1", NA, NA, NA, "unit2", "unit1", NA, NA, "unit2"),
Problem = c("Oil", "Engine", "Electric", NA, NA, "Oil", "Power", NA, NA),
duration = c(2, 5, 1, NA, NA, 1.5, 3, NA, NA))
第 1:5 行是第 1 天,第 6:9 行是第 2 天,如果同一列中有重复值,则值为 NA。
我试着用
df %>%
pivot_wider(names_from = Problem, values_from = duration)
但它没有用, 我的预期 DF 是这样的
df1 = data.frame(day = c("1", "1", "2", "2"),
Unit = c("unit1", "unit2", "unit1", "unit2"),
Oil = c(2, 0, 1.5, 0),
Engine = c(5, 0, 0, 0),
Electric = c(1, 0, 0, 0),
Power = c(0, 0, 3, 0),
NoProblem = c(0, 0, 0, 0))
答:
2赞
akrun
3/14/2023
#1
我们用以前的非 NA 元素对 NA 进行 NA,获取行,并用fill
distinct
pivot_wider
library(dplyr)
library(tidyr)
df %>%
fill(day, Unit, Problem) %>%
distinct(day, Unit, Problem, .keep_all = TRUE) %>%
mutate(duration = replace_na(duration, 0)) %>%
pivot_wider(names_from = Problem, values_from = duration,
values_fill = 0) %>%
mutate(NoProblem = 0)
-输出
# A tibble: 4 × 7
day Unit Oil Engine Electric Power NoProblem
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 unit1 2 5 1 0 0
2 1 unit2 0 0 0 0 0
3 2 unit1 1.5 0 0 3 0
4 2 unit2 0 0 0 0 0
2赞
jay.sf
3/14/2023
#2
基本上,首先用 填充 day 和 unit,然后 。zoo::na.locf
reshape
df[1:2] <- zoo::na.locf(df[1:2])
(res <- reshape(df, direction='wide', idvar=c('Unit', 'day'), timevar='Problem'))
# day Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1 1 unit1 2.0 5 1 NA NA
# 5 1 unit2 NA NA NA NA NA
# 6 2 unit1 1.5 NA NA NA 3
# 9 2 unit2 NA NA NA NA NA
在这里抛出警告,因为没有问题。unit2
NA
s 可以是 d 和 ,replace
0
replace(res, is.na(res), 0)
# day Unit duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 1 1 unit1 2.0 5 1 0 0
# 5 1 unit2 0.0 0 0 0 0
# 6 2 unit1 1.5 0 0 0 3
# 9 2 unit2 0.0 0 0 0 0
但这样做并不真正正确,而且它们实际上并不重要,因为您经常可以在进行诸如na.rm
by(res[3:7], res$Unit, colSums, na.rm=TRUE)
# res$Unit: unit1
# duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 3.5 5.0 1.0 0.0 3.0
# ------------------------------------------------------------------------------------------------------
# res$Unit: unit2
# duration.Oil duration.Engine duration.Electric duration.NA duration.Power
# 0 0 0 0 0
评论