如何使用 r 中的 mutate 函数将两列中的字符串组合减少为一列?

How do you reduce string combinations across two columns into one column using the mutate function in r?

提问人:Jack 提问时间:9/29/2023 更新时间:9/29/2023 访问量:35

问:

糟糕的数据输入导致两列具有相似数据的列可以合并为一列。我有三个组和 NA,我想减少到两个组和一个 NA

x <- data.frame(
  a= c("Group 1", "Group 2", "Group 1", NA, "Group 1",NA, "Group 3", "Group 3"),
  b= c("Group 1", NA, NA, "Group 2", "Group 1", NA, NA, "Group 3")
)
x

#The combinations of the group and the desired output: 
#Group 1 + Group 1 = Group 1
#Group 2 + NA = Group 2 
# Group 1 + NA = Group 1 
#NA+NA = NA
#Group 3 + NA = Group 1
#Group 3 + Group 3= Unknown
x |> 
  dplyr::mutate(Group = case_when(a == c("Group 1")|b == c("Group 1") ~ "Group 1",
                                  a == c("Group 2")|b == c("NA") ~ "Group 2",
                                  a == c("Group 1")|b == c("NA") ~ "Group 2",
                                  a == c("NA")|b == c("NA") ~ NA,
                                  a == c("Group 3")|b == c("NA") ~ "Group 1",
                                  a == c("Group 3")|b == c("Group 3") ~ "Unknown",
  ))

结果如下,第 3 组 + 第 3 组应该是未知数,但被归类为第 1 组。

        a       b   Group
1 Group 1 Group 1 Group 1
2 Group 2    <NA> Group 2
3 Group 1    <NA> Group 1
4    <NA> Group 2    <NA>
5 Group 1 Group 1 Group 1
6    <NA>    <NA>    <NA>
7 Group 3    <NA> Unknown
8 Group 3 Group 3 Group 1

我将不胜感激。

r dplyr 数据操作 突变

评论

0赞 pbraeutigm 9/29/2023
也许你可以取消 和 '&' istead of 或 '|'
0赞 Ritchie Sacramento 9/29/2023
我认为您可以将其简化为case_when(a == "Group 3" & b == "Group 3" ~ "Unknown", a == "Group 3" | b == "Group 3" ~ "Group 1", .default = coalesce(a, b))
0赞 Mark 9/30/2023
我注意到您在使用的代码中,当您的数据不包含 s 时,它有 s。a == c("NA")|b == c("NA")"NA"NA
0赞 Jack 10/5/2023
@Mark - 因为我想识别一个空单元格,所以我应该写 c(“NA”),而不是 is.na(a),因为我从线程中的有用评论中学到了。

答:

0赞 Chris Ruehlemann 9/29/2023 #1

这是你需要的吗?

x |> 
  dplyr::mutate(Group = case_when(a == "Group 1" & b == "Group 1" ~ "Group 1",
                                  a == "Group 2" & is.na(b) | is.na(a) & b == "Group 2" ~ "Group 2",
                                  a == "Group 1" & is.na(b) ~ "Group 1",
                                  is.na(a) & is.na(b) ~ NA,
                                  a == "Group 3" & is.na(b) | is.na(a) & b == "Group 3" ~ "Group 1",
                                  a == "Group 3" & b == "Group 3" ~ "Unknown",
  ))
        a       b   Group
1 Group 1 Group 1 Group 1
2 Group 2    <NA> Group 2
3 Group 1    <NA> Group 1
4    <NA> Group 2 Group 2
5 Group 1 Group 1 Group 1
6    <NA>    <NA>    <NA>
7 Group 3    <NA> Group 1
8 Group 3 Group 3 Unknown