提问人:maycca 提问时间:6/25/2018 最后编辑:Anoushiravan Rmaycca 更新时间:2/24/2022 访问量:11317
缺少值组合的完整数据帧
Complete dataframe with missing combinations of values
问:
我有一个包含两个因子 () 和年份 () 的数据框。我想用 0 完成每个因子的所有值。distance
years
years
即从这个:
distance years area
1 NPR 3 10
2 NPR 4 20
3 NPR 7 30
4 100 1 40
5 100 5 50
6 100 6 60
得到这个:
distance years area
1 NPR 1 0
2 NPR 2 0
3 NPR 3 10
4 NPR 4 20
5 NPR 5 0
6 NPR 6 0
7 NPR 7 30
8 100 1 40
9 100 2 0
10 100 3 0
11 100 4 0
12 100 5 50
13 100 6 60
14 100 7 0
我尝试应用功能:expand
library(tidyr)
library(dplyr, warn.conflicts = FALSE)
expand(df, years = 1:7)
但这只会生成一列数据框,而不会扩展原始数据框:
# A tibble: 7 x 1
years
<int>
1 1
2 2
3 3
4 4
5 5
6 6
7 7
或者两者都不起作用:expand.grid
require(utils)
expand.grid(df, years = 1:7)
Error in match.names(clabs, names(xi)) :
names do not match previous names
In addition: Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs
有没有一种简单的方法来处理我的数据框?以及如何根据两个类别对其进行扩展:和?expand
distance
uniqueLoc
distance <- rep(c("NPR", "100"), each = 3)
years <-c(3,4,7, 1,5,6)
area <-seq(10,60,10)
uniqueLoc<-rep(c("a", "b"), 3)
df<-data.frame(uniqueLoc, distance, years, area)
> df
uniqueLoc distance years area
1 a NPR 3 10
2 b NPR 4 20
3 a NPR 7 30
4 b 100 1 40
5 a 100 5 50
6 b 100 6 60
答:
23赞
talat
6/25/2018
#1
您可以使用以下功能:tidyr::complete
complete(df, distance, years = full_seq(years, period = 1), fill = list(area = 0))
# A tibble: 14 x 3
distance years area
<fct> <dbl> <dbl>
1 100 1. 40.
2 100 2. 0.
3 100 3. 0.
4 100 4. 0.
5 100 5. 50.
6 100 6. 60.
7 100 7. 0.
8 NPR 1. 0.
9 NPR 2. 0.
10 NPR 3. 10.
11 NPR 4. 20.
12 NPR 5. 0.
13 NPR 6. 0.
14 NPR 7. 30.
或稍短:
complete(df, distance, years = 1:7, fill = list(area = 0))
4赞
jaeyeon
5/19/2020
#2
组合 and 还使隐式缺失值显式化。tidyr::pivot_wider()
tidyr::pivot_longer()
# Load packages
library(tidyverse)
# Your data
df <- tibble(distance = c(rep("NPR",3), rep(100, 3)),
years = c(3,4,7,1,5,6),
area = seq(10, 60, by = 10))
# Solution
df %>%
pivot_wider(names_from = years,
values_from = area) %>% # pivot_wider() makes your implicit missing values explicit
pivot_longer(2:7, names_to = "years",
values_to = "area") %>% # Turn to your desired format (long)
mutate(area = replace_na(area, 0)) # Replace missing values (NA) with 0s
评论
0赞
sleepy
5/19/2022
这太棒了!特别是因为它适用于非数字。
评论