如何将一个数据帧中的NA替换为另一个数据帧中的非唯一键的值?

How to replace NAs in a dataframe with values in another dataframe for non-unique keys?

提问人:Rara 提问时间:7/25/2023 最后编辑:Rara 更新时间:7/25/2023 访问量:39

问:

当 df1 中相同列中的 NA 等于 NA 时,我想替换 df1 中两列(board 和 date)中的值。关键是主语:所以在这种情况下,对于 vand 和 haap。问题是 df1 中的主题(例如 vand)不是唯一的,但 board 和 date 列中的值始终相同。我将不胜感激您的建议。

df1 <- structure(list(nature = c("sop", "dior", "coats", "sem", "wia", 
    "bodo"), subject = c("gank", "vand", "vand", 
    "jav", "vand", "haap"), board = c("REW", "EWW", "EWW", "SSD", 
    "EWW", "MMB"), date = c("2023-07-12", 
    "2023-06-09", "2023-06-09", 
    "2023-06-09", "2023-06-09", 
    "2023-03-05")), row.names = c(NA, -6L), class = c("tbl_df", 
    "tbl", "data.frame"))

df2 <- structure(list(type = c("single", "couple", "couple", "couple", "couple", 
  "couple", "single", "couple", "couple", "couple"), name = c("ZIA", 
  "MIA", "lMIA", "LIA", 
  "LIA", "LIA", "DIA", 
  "LIA", "MIA", "SIA"
  ), subject = c("vand", "vank", "vank", 
  "jav", "tral", "twe", 
  "haap", "der", "leo", 
  "sdee"), board = c(NA, 
  "SSD", "REW", "EWW", "WWS, DDC", "SSD", 
  NA, "QQW", "XXD", "GGH"
  ), date = c(NA, "2023-07-03", "2023-07-03", 
  "2023-07-17", "2023-07-17", 
  "2023-01-16", NA, 
  "2023-07-17", "2023-06-08", 
  "2023-07-17")), class = "data.frame", row.names = c(NA, 
  -10L))

期望输出:

 df3 <- structure(list(type = c("single", "couple", "couple", "couple", "couple", 
  "couple", "single", "couple", "couple", "couple"), name = c("ZIA", 
  "MIA", "lMIA", "LIA", 
  "LIA", "LIA", "DIA", 
  "LIA", "MIA", "SIA"
  ), subject = c("vand", "vank", "vank", 
  "jav", "tral", "twe", 
  "haap", "der", "leo", 
  "sdee"), board = c("EWW", 
  "SSD", "REW", "EWW", "WWS, DDC", "SSD", 
  "MMB", "QQW", "XXD", "GGH"
  ), date = c("2023-06-09", "2023-07-03", "2023-07-03", 
  "2023-07-17", "2023-07-17", 
  "2023-01-16", "2023-03-05", 
  "2023-07-17", "2023-06-08", 
  "2023-07-17")), class = "data.frame", row.names = c(NA, 
  -10L))    
r 替换 合并

评论


答:

1赞 Maël 7/25/2023 #1

您可以使用将第一个 data.frame 减少为唯一行,然后用于替换以下值:unique(df1)dplyr::rows_update

dplyr::rows_update(df2, unique(df1), unmatched = "ignore")

输出

#      type name subject    board       date
# 1  single  ZIA    vand      EWW 2023-06-09
# 2  couple  MIA    vank      SSD 2023-07-03
# 3  couple lMIA    vank      REW 2023-07-03
# 4  couple  LIA     jav      SSD 2023-06-09
# 5  couple  LIA    tral WWS, DDC 2023-07-17
# 6  couple  LIA     twe      SSD 2023-01-16
# 7  single  DIA    haap      MMB 2023-03-05
# 8  couple  LIA     der      QQW 2023-07-17
# 9  couple  MIA     leo      XXD 2023-06-08
# 10 couple  SIA    sdee      GGH 2023-07-17

评论

0赞 Rara 7/25/2023
感谢您的出色解决方案。但是,我应该注意,在我的真实数据中,df1 包含 df2 中不存在的其他列,不幸的是,代码无法抛出此错误:键值必须是唯一的。y
0赞 Maël 7/25/2023
然后,您可以创建另一个 df1,其列仅在df2
0赞 Rara 7/25/2023
我只需选择 df1 中的共享列即可获得所需的输出。谢谢你的建议。