按段数（R）分隔相交时间间隔-解网

问：

我正在研究一个时间间隔数据集。某些间隔重叠。我想获取原始区间数据，并按重叠次数将其分解为连续区间。在下面的玩具数据中，有 3 个区间。我想要的输出是一个数据框，其中包含只有一个 ID 的开始和停止，然后在 ID 1 和 ID 2 相交的地方开始和停止，然后在 ID 1-3 相交的地方开始和停止，然后开始和停止 ID 1 和 3 相交的地方，最后是 ID 1 其余部分的开始和停止。

library(lubridate)
library(ggplot2)

df <- structure(list(ID = 1:3, Start = structure(c(1690740180, 1690740480, 
1690741380), class = c("POSIXct", "POSIXt"), tzone = "America/Iqaluit"), 
    End = structure(c(1690751520, 1690742140, 1690742280), class = c("POSIXct", 
    "POSIXt"), tzone = "America/Iqaluit")), row.names = 3:5, class = "data.frame")

ggplot(df) + geom_segment(aes(x = Start, xend = End, y = as.factor(ID), yend = as.factor(ID)))

所需的输出应如下所示：

  Intervals               Start                 End
         1 2023-07-30 14:03:00 2023-07-30 14:07:59
         2 2023-07-30 14:08:00 2023-07-30 14:22:59
         3 2023-07-30 14:23:00 2023-07-30 14:35:40
         2 2023-07-30 14:35:40 2023-07-30 14:38:00
         1 2023-07-30 14:38:00 2023-07-30 15:06:40

我可以通过将数据插值到 1 秒并检查交叉点来做到这一点，但我希望有一个更清晰的解决方案。

R 间隔润滑剂

alltimes <- unique(sort(c(df$Start, df$End)))
intervals <- sapply(alltimes[-length(alltimes)],
                    function(tm) df$Start <= tm & tm < df$End)
intervals
#       [,1]  [,2] [,3]  [,4]  [,5]
# [1,]  TRUE  TRUE TRUE  TRUE  TRUE
# [2,] FALSE  TRUE TRUE FALSE FALSE
# [3,] FALSE FALSE TRUE  TRUE FALSE

在中，每一行都是原始行，每列都是一个时间段，该值指示是否在该时间段中找到原始行。我们可以取每列的总和来创建列，然后和列只是我们向量的对。intervalsdfdfIntervalsStartEndalltimes

data.frame(
  Intervals = colSums(intervals),
  Start = alltimes[-length(alltimes)],
  End = alltimes[-1]
)
#   Intervals               Start                 End
# 1         1 2023-07-30 14:03:00 2023-07-30 14:08:00
# 2         2 2023-07-30 14:08:00 2023-07-30 14:23:00
# 3         3 2023-07-30 14:23:00 2023-07-30 14:35:40
# 4         2 2023-07-30 14:35:40 2023-07-30 14:38:00
# 5         1 2023-07-30 14:38:00 2023-07-30 17:12:00

我不确定新的是否应该与下一个相同或偏移一秒，您的预期输出同时使用两者。另外，我不知道你的最后一行是怎么回事（不在你的原始数据中），我怀疑它是你真实数据的产物，而不是样本。EndStart15:06:40

library(ivs)
library(dplyr, warn.conflicts = FALSE)

start <- structure(
  c(1690740180, 1690740480, 1690741380),
  class = c("POSIXct", "POSIXt"),
  tzone = "America/Iqaluit"
)
end <- structure(
  c(1690751520, 1690742140, 1690742280),
  class = c("POSIXct", "POSIXt"),
  tzone = "America/Iqaluit"
)

x <- iv(start, end)
x
#> <iv<datetime<America/Iqaluit>>[3]>
#> [1] [2023-07-30 14:03:00, 2023-07-30 17:12:00)
#> [2] [2023-07-30 14:08:00, 2023-07-30 14:35:40)
#> [3] [2023-07-30 14:23:00, 2023-07-30 14:38:00)

iv_locate_splits(x) |>
  as_tibble() |>
  mutate(count = lengths(loc))
#> # A tibble: 5 × 3
#>                                          key loc       count
#>                                   <iv<dttm>> <list>    <int>
#> 1 [2023-07-30 14:03:00, 2023-07-30 14:08:00) <int [1]>     1
#> 2 [2023-07-30 14:08:00, 2023-07-30 14:23:00) <int [2]>     2
#> 3 [2023-07-30 14:23:00, 2023-07-30 14:35:40) <int [3]>     3
#> 4 [2023-07-30 14:35:40, 2023-07-30 14:38:00) <int [2]>     2
#> 5 [2023-07-30 14:38:00, 2023-07-30 17:12:00) <int [1]>     1

如果需要从列中获取开始/结束，请使用和。keyiv_start()iv_end()

上一个：删除基于 3-4 列的重复项（dplyr）

下一个：如何找到时间列的平均值并使用 r 对其进行分组？

按段数（R）分隔相交时间间隔

Separate intersecting time intervals by number of segments (R)

评论

评论

按段数 （R） 分隔相交时间间隔

Separate intersecting time intervals by number of segments (R)

评论

评论

按段数（R）分隔相交时间间隔