计算每个二进制/布尔列作为参考的时间长度

Calculating the time length of each binary/boolean column as reference

提问人:ambergris 提问时间:5/12/2022 最后编辑:Otto Kässiambergris 更新时间:5/12/2022 访问量:92

问:

我有两列。对于一系列数据,一个被列为 True/False。整个数据集还具有一个时间步长列。我想编写可以读取布尔列变为 true 的代码,时间从时间戳列开始计算,直到布尔变回 false。并在整个系列中重复此操作,并将时间装箱在数据框中作为直方图。为糟糕的尝试道歉,我真的不知道从哪里开始。请注意,运行列被列为字符 - 也许我需要转换为布尔值才能正常工作?

running  <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)

datafinal <- data.frame()    
for (i in dataset){
   if running == f,
   result <- sum(i:n)
datafinal <- c(datafinal, result)
}
R for 循环 直方图 逻辑

评论


答:

0赞 RobertoT 5/12/2022 #1

将列转换为布尔值并使用 for 循环是一种方法。此外,您还可以在数据帧中进行操作。你已经有一个了!这是一个使用库和一些日期操作的解决方案,这要归功于库。我鼓励你学会使用这些库来解决这种问题。runningtidyverselubridate

rleid()函数库中的函数 每当目标列中的值发生更改时,都会添加 +1。data.tablerunning

running  <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)

# times to date time object
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")

library(tidyverse)
solution = dataset %>% 
  mutate(Grp=data.table::rleid(running)) %>% # rows in the same state before change get same value
  group_by(Grp) %>% # rows in the same state are grouped together
  slice(1) %>% # keep first row
  ungroup %>%  # you don't need grouping anymore
  mutate(timeLength = difftime(time, lag(time), units="secs")) 
  # calculate the differences between a row and previous one (lag(n=1))

输出:

# A tibble: 5 x 4
  time                running   Grp timeLength
  <dttm>              <chr>   <int> <drtn>    
1 2022-01-01 00:00:10 t           1 NA secs   
2 2022-01-01 00:00:30 f           2 20 secs   
3 2022-01-01 00:00:50 t           3 20 secs   
4 2022-01-01 00:01:00 f           4 10 secs   
5 2022-01-01 00:01:10 t           5 10 secs   

如果要删除第一个 NA 行,只需添加到管道中即可。%>% filter(!is.na(timeLength))

更新以添加如何使用 for 循环和嵌套的 if-else 来做到这一点。但请注意,代码更长,更难跟踪。

dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
# empty array for tracking changes in rows
current = c()
# datafinal  empty dataframe
datafinal  = data.frame()
# better working with the rows index
for (i in seq(nrow(dataset))){
  # extract current vale of running
  current = c(current,dataset[i,]$running)
  if (i>1){ # we can't operate with first row, right?
    if (current[i] == current[i-1]){
      next # pass iteration if they keep in same state (true or false)
    }
    else {  # different state? let's operate
      result = difftime(dataset[i,]$time, previous_time, units="secs")
    }
    # (note: if 'next' jump in if-loop this part doesn't jump)
    
    # create the outcome row for iteration
    new_row = cbind(dataset[i,],result)
    # add row to final dataframe
    datafinal = rbind(datafinal,new_row)
  }
  # keep first time of state when it changes or we initiate the loop
  previous_time = dataset[i,]$time 
}

评论

0赞 RobertoT 5/12/2022
@ambergris如果有帮助,请单击答案左上角右侧的绿色勾号。我还添加了如何使用 for 循环来做到这一点,但请注意它更难。最后,这个 for 循环模仿函数的行为tidyverse
0赞 ambergris 6/1/2022
嗨,罗伯托,再次感谢您在这方面的帮助。我有一个后续问题:如果我尝试过滤一定长度的差异时间,我该如何实现?我对此有困难。从您的第一个答案来看,我相信它会添加 %>% filter(timeLength>0) 或类似的东西?但是尝试使用 timeLength 列时出错,因为它以“x 秒”为单位列出
0赞 RobertoT 6/1/2022
@ambergris 申请应该有效。如果不知道错误输出和实际数据如何,我无法为您提供更多帮助。我建议发布一个新问题,其中包含查看问题所需的所有信息。此外,当您编写新问题时,SO 会建议可能已经回答的帖子,这些帖子可能会对您有所帮助。%>% filter(timeLength>0)