如果数字或字符串与两个数据帧 R 匹配,则在新列中计数器

Counter in new column if a number or string matches from two data-frames R

提问人:Usman YousafZai 提问时间:6/20/2023 更新时间:6/20/2023 访问量:25

问:

我将尝试给出一个最好的例子来理解。我有两个数据帧:df1df2。在下面的数据帧中,如果来自 df1Drug_Code_1与来自 df2Drug_Code_2匹配,则应在 df1 中创建一个名为 Drug_count 的新列并将其递增 1。如果同一Pat_ID有两种不同的药物,那么它应该算作 2。每个Drug_Code都有独特的Drug_Names。这还应该用逗号“,”合并每个Pat_ID的列值。下面是给定 df1、df2 和合成 df 的示例。谢谢

DF1

Pat_ID       Date       Drug_Code     Drug_Names
1         2010-12-09     1.1.1          Alpha-21
1         2010-12-15     1.1.2          Alpha
1         2010-12-15     1.1.3          Beta
1         2010-12-15     1.1.3          Beta-29
2         2010-12-09     1.1.3          Beta
2         2010-12-17     1.1.4          Beta-32
1         2010-12-15     1.1.3          Beta
3         2011-02-09     1.2.1          Gamma-21
3         2011-04-25     1.2.2          Gamma
3         2011-04-25     1.3.1          Tango

DF2型

Drug_Code     Drug_Names
1.1.1           Alpha-21
1.1.2           Alpha
1.1.3           Beta
1.1.4           Beta-2
1.2.1           Gamma-21
1.2.2           Gamma
1.3.1           Tango

合成 df

Pat_ID  Date                                 Drug_Code            Drug_Names           Count
1       2010-12-09, 2010-12-15, 2010-12-15   1.1.1, 1.1.2, 1.1.3  Alpha-21,Alpha, Beta    3                                         

2       2010-12-09, 2010-12-15               1.1.3, 1.2.1         Beta, Gamma-21          2            

3       2011-02-09, 2011-04-25               1.2.1, 1.2.2         Gamma-21, Gamma         2

    

            
    
R 数据帧 合并 匹配

评论


答:

1赞 Phoenix 6/20/2023 #1

这应该有效:

df1 <- data.frame(
  Pat_ID = c(1, 1, 1, 1, 2, 2, 1, 3, 3, 3),
  Date = c("2010-12-09", "2010-12-15", "2010-12-15", "2010-12-15", "2010-12-09",
           "2010-12-17", "2010-12-15", "2011-02-09", "2011-04-25", "2011-04-25"),
  Drug_Code = c("1.1.1", "1.1.2", "1.1.3", "1.1.3", "1.1.3", "1.1.4", "1.1.3", "1.2.1",
                "1.2.2", "1.3.1"),
  Drug_Names = c("Alpha-21", "Alpha", "Beta", "Beta-29", "Beta", "Beta-32", "Beta",
                 "Gamma-21", "Gamma", "Tango")
)

df2 <- data.frame(
  Drug_Code = c("1.1.1", "1.1.2", "1.1.3", "1.1.4", "1.2.1", "1.2.2", "1.3.1"),
  Drug_Names = c("Alpha-21", "Alpha", "Beta", "Beta-2", "Gamma-21", "Gamma", "Tango")
)
library(dplyr)

merged_df <- df1 %>%
  left_join(df2, by = "Drug_Code")

result_df <- merged_df %>%
  group_by(Pat_ID) %>%
  summarise(
    Date = paste(Date, collapse = ", "),
    Drug_Code = paste(Drug_Code, collapse = ", "),
    Drug_Names = paste(`Drug_Names.x`, collapse = ", "),  # Updated column name
    Count = n_distinct(Drug_Code)
  )

print(result_df)

输出将为:

  Pat_ID Date                                         Drug_Code Drug_Names Count
   <dbl> <chr>                                        <chr>     <chr>      <int>
1      1 2010-12-09, 2010-12-15, 2010-12-15, 2010-12~ 1.1.1, 1~ Alpha-21,~     1
2      2 2010-12-09, 2010-12-17                       1.1.3, 1~ Beta, Bet~     1
3      3 2011-02-09, 2011-04-25, 2011-04-25           1.2.1, 1~ Gamma-21,~     1

评论

0赞 Melissa Key 6/20/2023
在解决方案中,所有 3 的计数都是 1。在折叠药物代码之前,您需要计数。您还可以使用pasteacross