如何在午夜之前剪切日期时间行?

How to cut datetime rows by midnight?

提问人:diggi2395 提问时间:10/19/2023 最后编辑:diggi2395 更新时间:10/20/2023 访问量:67

问:

我的 DF 看起来像这样:

event           start.time      end.time
connected       04.08.21 20:48  04.08.21 20:59
disconnected    04.08.21 20:59  06.08.21 14:29

我怎样才能像这样在午夜之前拆分行?请注意,跳过的天数可能会有所不同:在我的示例中,它是 8 月 4 日至 6 日,但也可能是 9 月 10 日至 17 日。一个整洁的解决方案将不胜感激!

event           start.time      end.time
connected       04.08.21 20:48  04.08.21 20:59
disconnected    04.08.21 20:59  04.08.21 23:59
disconnected    05.08.21 00:00  05.08.21 23:59
disconnected    06.08.21 00:00  06.08.21 14:29

这是我的可重现代码:

data <- data.frame(
  event = c("connected", "disconnected"),
  start.time = c("04.08.21 20:48", "04.08.21 20:59"),
  end.time = c("04.08.21 20:59", "06.08.21 14:29")
)
r 日期时间

评论


答:

1赞 jay.sf 10/19/2023 #1

使用 ,您可以为每个 .在序列中,查看午夜或午夜 -1 不重复的位置。将上次 23:59 值替换为上次观察到的时间戳。byseq.POSIXt"min"

data[-1] <- lapply(data[-1], strptime, format='%d.%m.%y %H:%M')  ## format time first

by(data, data$event, \(x) {
  sq <- seq.POSIXt(x[[2]], x[[3]], 'min')
  r <- data.frame(sq[!duplicated(strptime(sq, '%F'))], 
                  c(sq[!duplicated(strptime(sq, '%F 23:59'))][-1],  as.POSIXct(x[[3]]))) |> 
    setNames(names(x)[-1])
}) |> c(make.row.names=F) |> do.call(what='rbind')
#            start.time            end.time
# 1 2021-08-04 20:48:00 2021-08-04 20:59:00
# 2 2021-08-04 20:59:00 2021-08-04 23:59:00
# 3 2021-08-05 00:00:00 2021-08-05 23:59:00
# 4 2021-08-06 00:00:00 2021-08-06 14:29:00

旧 R 版本:

do.call(c(by(data, data$event, function(x) {
    sq <- seq.POSIXt(x[[2]], x[[3]], "min")
    r <- setNames(data.frame(sq[!duplicated(strptime(sq, "%F"))], 
        c(sq[!duplicated(strptime(sq, "%F 23:59"))][-1], as.POSIXct(x[[3]]))), 
        names(x)[-1])
}), make.row.names = F), what = "rbind")

数据:

data <- structure(list(event = c("connected", "disconnected"), start.time = c("04.08.21 20:48", 
"04.08.21 20:59"), end.time = c("04.08.21 20:59", "06.08.21 14:29"
)), class = "data.frame", row.names = c(NA, -2L))

评论

0赞 diggi2395 10/20/2023
感谢您的帮助!我可以请您提供一个也适用于 tibbles 的解决方案吗?您的代码仅适用于结构。另外,格式(例如,(x)或“|>”)从何而来?
1赞 jay.sf 10/20/2023
@diggi2395 在我看来,你使用的是一个非常过时的 R 版本。添加了替代代码。最好使 R 保持最新状态。
0赞 diggi2395 10/20/2023 #2

多亏了 jay.sf 的回答,我才能复制逻辑来创建最喜欢的 tidyverse 解决方案:

data <- data.frame(
  event = c("connected", "disconnected"),
  start.time = c("04.08.21 20:48", "04.08.21 20:59"),
  end.time = c("04.08.21 20:59", "06.08.21 14:29")
)
data[-1] <- lapply(data[-1], function(x) as.POSIXct(strptime(x, format='%d.%m.%y %H:%M')))

data %>%
  # Group data by event
  group_by(event) %>%
  rowwise() %>%
  # Create rows when times are "overnight"
  mutate(timestamp_seq = list(seq(from = start.time, to = end.time, by = "secs"))) %>%
  ungroup() %>%
  mutate(timestamp_seq = map(timestamp_seq, ~ .x[!duplicated(as.Date(.x)) | (as.POSIXct(.x) == max(.x))])) %>%
  unnest(timestamp_seq) %>%
  filter(!start.time==timestamp_seq) %>%
  # Correct the times
  mutate(start.time = if_else(row_number() == 1, start.time, lag(timestamp_seq)),
         tmp = if_else(lead(hms(format(start.time, format = "%H:%M:%S"))) == hms("00:00:00"), timestamp_seq, end.time),
         end.time = if_else(!is.na(tmp), tmp, end.time)) %>%
  select(-c(timestamp_seq, tmp))