通过匹配第二个数据帧中的值来填充值

Filling values by matching values in a second dataframe

提问人:JRock 提问时间:9/20/2023 更新时间:9/20/2023 访问量:23

问:

我有两个数据帧,数据和词汇表。数据有四列(Code1、Code2、Code3、Code4)具有值,其中一些与词汇表匹配,还有四列空白列(CodeA、CodeB、CodeC、CodeD)。

值在术语表中仅列出一次,但可能会在数据中多次出现。

data <- data.frame(ID = c(1, 2, 3, 4, 5),
                   Code1 = c("team", "team", "crew", "group", "crew"),
                   Code2 = c("trust", "trust", "comms", "liking", "comms"),
                   Code3 = c("virtual", "virtual", "intact", "hybrid", "intact"),
                   Code4 = c("pooled", "pooled", "seqent", "recip", "sequent"),
                   CodeA = NA,
                   CodeB = NA,
                   CodeC = NA,
                   CodeD = NA)

glossary <- data.frame(CodeA = c("team", "crew", "group"),
                       CodeB = c("trust", "comms", "liking"),
                       CodeC = c("virtual", "intact", "adhoc"),
                       CodeD = c("pooled", "intensive", "recip"))

我想根据匹配 data$Code1 和 glossary$CodeA 来用词汇表中的值填充 CodeA-CodeD。词汇表值在数据中与代码 1-4 类似,但有一些小的更改。代码 1 和 CodeA 将相同,但代码 2-4 和代码 B-D 将略有不同。因此,虽然数据中的 Code3 列出了“混合”,但对于数据中的新 CodeC,这将被列为“临时”,而不是在词汇表之后。

最终结果如下所示:

data <- data.frame(ID = c(1, 2, 3, 4, 5),
                   Code1 = c("team", "team", "crew", "group", "crew"),
                   Code2 = c("trust", "trust", "comms", "liking", "comms"),
                   Code3 = c("virtual", "virtual", "intact", "hybrid", "intact"),
                   Code4 = c("pooled", "pooled", "seqent", "recip", "sequent"),
                   CodeA = c("team", "team", "crew", "group", "crew"),
                   CodeB = c("trust", "trust", "comms", "liking", "comms"),
                   CodeC = c("virtual", "virtual", "intact", "adhoc", "intact"),
                   CodeD = c("pooled", "pooled", "intensive", "recip", "intensive"))

我已经在 dplyr 中使用 match 和 mutate 查看了一些解决方案,尽管我还没有找到任何可以让我一举填写所有四个值的东西。

r dplyr 匹配

评论


答:

1赞 Onyambu 9/20/2023 #1

left_join

data[1:5] %>%
   left_join(mutate(glossary, Code1 = CodeA), by = 'Code1')

  ID Code1  Code2   Code3   Code4 CodeA  CodeB   CodeC     CodeD
1  1  team  trust virtual  pooled  team  trust virtual    pooled
2  2  team  trust virtual  pooled  team  trust virtual    pooled
3  3  crew  comms  intact  seqent  crew  comms  intact intensive
4  4 group liking  hybrid   recip group liking   adhoc     recip
5  5  crew  comms  intact sequent  crew  comms  intact intensive

评论

0赞 JRock 9/20/2023
这是一个很好的解决方案,谢谢!我以前使用过连接,但没想过在这里使用。唯一的问题是连接正在创建额外的行,知道是什么原因造成的吗?
0赞 JRock 9/20/2023
添加了 unique(词汇表),这似乎解决了额外的行问题。感谢您提供简单而优雅的解决方案!