在 R 中查找匹配品牌的重叠日期-解网

问：

我得到了一个 R 数据帧，并试图查找同时列 data_recalls$Attributed_Brand 的条目匹配且时间段重叠的行。时间段可以在 Before_Date_Recall 和 After_Date_Recall 列中找到。可能的匹配项如下所示：

（1） data_recalls$Attributed_Brand：耐克;Before_Date_Recall： 2018-09-22;After_Date_Recall： 2018-10-21

（2） data_recalls$Attributed_Brand：耐克;Before_Date_Recall： 2018-10-19;After_Date_Recall： 2018-10-24

不幸的是，我只能想出一个解决方案，即代码在列 data_recalls$Attributed_Brand 中查找匹配项，然后查找重叠的日期，这不会创建所需的结果。我的数据框如下所示

data_recalls <- data.frame(Attributed_Brand = c("Nike", "Adidas", "Nike", "Puma"),
                       Before_Date_Recall = c("2018-09-22", "2018-09-20", "2018-10-19", "2018-11-01"),
                       After_Date_Recall = c("2018-10-21", "2018-10-24", "2018-10-31", "2018-11-10"),
                       stringsAsFactors = FALSE)

感谢您的任何建议！

R 循环重叠匹配

这听起来像是非 equi 合并，使用的合并（always）、（dplyr >= 1.1.0）或 .缺少任何示例数据或其他任何东西，我认为这是我认为我能建议的最多。见 stackoverflow.com/q/64362881/3358272、stackoverflow.com/q/64539945/3358272。如果您在这里需要更多帮助，我建议使用示例数据和代码进行可重现的问题，请阅读 stackoverflow.com/q/5963269、最小可重现示例和 stackoverflow.com/tags/r/info。data.tabledplyr::join_byfuzzyjoin::sqldf::

0赞 Niklas Schröder 3/23/2023

感谢您的回复！我提供了一些示例数据。尝试的代码是：无论如何，我找不到任何重叠的日期，其中 data_recalls$Attributed_Brand 的品牌匹配。

library(dplyr) data_recalls %>% group_by(Attributed_Brand) %>% filter( any(Before_Date_Recall <= max(Before_Date_Recall)) & any(After_Date_Recall >= min(After_Date_Recall))

答：

0赞 Davis Vaughan 3/23/2023 #1

听起来 ivs 包适合您。该列允许您识别每个品牌名称中的重叠行。groups

library(ivs)
library(dplyr, warn.conflicts = FALSE)

data_recalls <- tibble(
  Attributed_Brand = c(
    "Nike", "Nike", "Nike", 
    "Adidas", "Adidas", "Adidas", "Adidas"
  ),
  Before_Date_Recall = as.Date(c(
    "2018-09-22", "2018-10-19", "2018-10-26",
    "2018-09-01", "2018-10-05", "2018-09-03", "2018-10-02"
  )),
  After_Date_Recall = as.Date(c(
    "2018-10-21", "2018-10-24", "2018-10-30",
    "2018-09-05", "2018-10-15", "2018-09-07", "2018-10-10"
  ))
)

data_recalls %>%
  mutate(Date_Recall = iv(Before_Date_Recall, After_Date_Recall), .keep = "unused") %>%
  mutate(group = iv_identify_group(Date_Recall), .by = Attributed_Brand) %>%
  filter(n() > 1, .by = c(Attributed_Brand, group))
#> # A tibble: 6 × 3
#>   Attributed_Brand              Date_Recall                    group
#>   <chr>                          <iv<date>>               <iv<date>>
#> 1 Nike             [2018-09-22, 2018-10-21) [2018-09-22, 2018-10-24)
#> 2 Nike             [2018-10-19, 2018-10-24) [2018-09-22, 2018-10-24)
#> 3 Adidas           [2018-09-01, 2018-09-05) [2018-09-01, 2018-09-07)
#> 4 Adidas           [2018-10-05, 2018-10-15) [2018-10-02, 2018-10-15)
#> 5 Adidas           [2018-09-03, 2018-09-07) [2018-09-01, 2018-09-07)
#> 6 Adidas           [2018-10-02, 2018-10-10) [2018-10-02, 2018-10-15)

这是另一种替代方法，根据实际数据集的结构，它可能会更快

data_recalls %>%
  mutate(Date_Recall = iv(Before_Date_Recall, After_Date_Recall), .keep = "unused") %>%
  mutate(count = iv_count_overlaps(Date_Recall, Date_Recall), .by = Attributed_Brand) %>%
  filter(count > 1)
#> # A tibble: 6 × 3
#>   Attributed_Brand              Date_Recall count
#>   <chr>                          <iv<date>> <int>
#> 1 Nike             [2018-09-22, 2018-10-21)     2
#> 2 Nike             [2018-10-19, 2018-10-24)     2
#> 3 Adidas           [2018-09-01, 2018-09-05)     2
#> 4 Adidas           [2018-10-05, 2018-10-15)     2
#> 5 Adidas           [2018-09-03, 2018-09-07)     2
#> 6 Adidas           [2018-10-02, 2018-10-10)     2

df_matches <- data.frame( Attributed_Brand = c("Adidas", "Adidas", "Adidas", "Adidas", "Nike", "Nike"), Before_Date_Recall = as.Date(c("2018-10-05", "2018-10-02", "2018-09-03", "2018-09-01", "2018-10-19", "2018-09-22")), After_Date_Recall = as.Date(c("2018-10-15", "2018-10-10", "2018-09-07", "2018-09-05", "2018-10-24", "2018-10-21")))

0赞 Davis Vaughan 3/23/2023

我已经更新了一个额外的内容，应该会产生你要找的东西filter()

0赞 r2evans 3/23/2023 #2

试试这个：

sqldf::sqldf("
  select distinct t1.*
  from data_recalls t1
    inner join data_recalls t2 on t1.Attributed_Brand = t2.Attributed_Brand
      and ((t1.Before_Date_Recall > t2.Before_Date_Recall and t1.Before_Date_Recall < t2.After_Date_Recall)
           or (t1.After_Date_Recall > t2.Before_Date_Recall and t1.After_Date_Recall < t2.After_Date_Recall))")
#   Attributed_Brand Before_Date_Recall After_Date_Recall
# 1             Nike         2018-09-22        2018-10-21
# 2             Nike         2018-10-19        2018-10-31

我首先尝试使用 sql 的，但这是包容性的，这会导致每一行都匹配自己。我们可以通过增加一些部分来防止这种情况，但这只是一个开始。between

为了允许重叠，您的数据需要为每一行提供唯一的 ID：

data_recalls$id <- seq_len(nrow(data_recalls))

从那里，我们可以使用然后添加字段的不等式。betweenid

sqldf::sqldf("
  select distinct t1.*
  from data_recalls t1
    inner join data_recalls t2 on t1.Attributed_Brand = t2.Attributed_Brand
      and (t1.Before_Date_Recall between t2.Before_Date_Recall and t2.After_Date_Recall
           or t1.After_Date_Recall between t2.Before_Date_Recall and t2.After_Date_Recall)
      and t1.id <> t2.id")
#   Attributed_Brand Before_Date_Recall After_Date_Recall id
# 1             Nike         2018-09-22        2018-10-21  1
# 2             Nike         2018-10-19        2018-10-31  3

data_recalls <- data.frame(Attributed_Brand = c("Adidas", "Adidas", "Nike", "Nike"),                            Before_Date_Recall = c("2018-09-22", "2018-09-20", "2018-11-02", "2018-11-03"),                            After_Date_Recall = c("2018-10-21", "2018-10-24", "2018-11-10", "2018-11-09"),                            stringsAsFactors = FALSE)

0赞 r2evans 3/23/2023

也许 stackoverflow.com/a/74679562/3358272

上一个：如何证明 oracle sql 中组内的重叠日期

下一个：Python 3 正则表达式 - 在字符串中查找所有重叠匹配项的开始和结束索引

在 R 中查找匹配品牌的重叠日期

Finding Overlapping Dates for Matching Brands in R

评论

评论

评论