如何找到时间列的平均值并使用 r 对其进行分组?

How to find the mean of a time column and group it using r?

提问人:Nicolas Grisé 提问时间:10/12/2023 最后编辑:GuedesBFNicolas Grisé 更新时间:10/12/2023 访问量:74

问:

我有一个数据框,其中包含一个名为 ride_length 的列,该列已经是 hh:mm:ss 格式。我想从该列中计算平均值,并按其两个类别对其进行分组:成员和休闲(在member_casual列中找到)。

我已经尝试了带有润滑剂库的这个管道:

df %>%
  group_by(member_casual) %>%
  seconds_to_period(mean(period_to_seconds(hms(ride_length))))

即使我的论点与网上找到的其他示例相同,我仍然收到以下消息:

seconds_to_period(., mean(period_to_seconds(hms(ride_length)))) 中的错误: 未使用的参数 (mean(period_to_seconds(hms(ride_length))))

我还尝试了更长的路径:

df$nride_length <- difftime(strptime(df$ride_length,"%H:%M:%S"),
                     strptime("00:00:00","%H:%M:%S"),
                     units="mins")
df.means <- aggregate(df$nride_length,by=list(df$member_casual),mean)
df.means$ride_length <- format(.POSIXct(df.means$x,tz="GMT"), "%H:%M:%S")
df.means

但结果仍然存在问题:

Group.1 x ride_length 1 休闲 NA 分钟 2 会员 NA 分钟

我也尝试过总结:

df %>%
  group_by(member_casual) %>%
  summarise(length_mean = seconds_to_period(mean(period_to_seconds(hms(ride_length)))))

但随后这表明:

# A tibble: 2 × 2
  member_casual length_mean
  <chr>         <Period>   
1 casual        NA         
2 member        NA         

Warning message:
There were 2 warnings in `summarise()`.
The first warning was:
ℹ In argument: `length_mean =
  seconds_to_period(mean(period_to_seconds(hms(ride_length))))`.
ℹ In group 1: `member_casual = "casual"`.
Caused by warning in `.parse_hms()`:
! Some strings failed to parse, or all strings are NAs
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 

请帮忙

R DataFrame 时间 组 - 按 Lubridate

评论

0赞 Nicolas Grisé 10/12/2023
我尝试了 mutate,但我没有得到我所期望的: # A tibble: 5,021,491 × 17 # Groups: member_casual [2] ...警告消息:中有 2 个警告。第一个警告是:i 在参数中:.i 在第 1 组中:。由警告引起:!某些字符串解析失败,或者所有字符串都是 NA 我运行 dplyr::last_dplyr_warnings() 以查看剩余的 1 个警告。mutate()foo_mean = seconds_to_period(mean(period_to_seconds(hms(ride_length))))member_casual = "casual".parse_hms()
1赞 AkselA 10/12/2023
你能分享(部分)你的数据吗?类似的东西dput(head(df))
1赞 margusl 10/12/2023
很难在评论中跟踪代码和错误,请在您的问题中添加其他详细信息和代码更新 - 您仍然可以编辑它。对于聚合,您希望使用而不是summarise()mutate()
0赞 Nicolas Grisé 10/12/2023
structure(list(rideable_type = c(“electric_bike”, “classic_bike”, “classic_bike”, “electric_bike”, “classic_bike”, “classic_bike”), day_of_week = c(1, 1, 1, 6, 7, 2), ride_length = structure(c(990, 810, 576, 296, 686, 294), class = c(“hms”, “difftime”), units = “secs”), member_casual = c(“member”, “member”, “member”, “member”, “member”, “member”), nride_length = structure(c(16.5, 13.5, 9.6, 4.93, 11.43, 4.9), class = “difftime”, units = “mins”)), row.names = c(NA, -6L), class = c(“tbl_df”, “tbl”, “data.frame”))

答:

-1赞 Ke Liu 10/12/2023 #1

假设您的数据如下:

x <- c("09:10:01", "10:10:02", "09:40:03","07:10:16", "09:20:02", "08:52:10")
df <- data.frame(member_casual=c(rep('A',3),rep('B',3)),
                   ride_length=hms(x),stringsAsFactors = F)

df
  member_casual ride_length
1             A   9H 10M 1S
2             A  10H 10M 2S
3             A   9H 40M 3S
4             B  7H 10M 16S
5             B   9H 20M 2S
6             B  8H 52M 10S

我尝试了您上面尝试的代码,它对我来说效果很好。

df %>%
   group_by(member_casual) %>%
   summarise(mean=seconds_to_period(mean(period_to_seconds(ride_length))))
# A tibble: 2 × 2
  member_casual mean                    
  <chr>         <Period>                
1 A             9H 40M 2S               
2 B             8H 27M 29.3333333333321S

因此,请确认您的数据格式正确,尤其是名为“ride_length”的列,单独运行并检查它是否成功运行。hms(df$ride_length)

评论

0赞 Limey 10/12/2023
你的答案并没有真正回答OP的问题,因为它做出了OP需要确认的假设,即数据的格式。最好在评论中向 OP 询问测试数据,然后根据他们给您的内容提供答案。
1赞 AkselA 10/12/2023 #2

您可以单独使用。指定分组,就像对方差分析所做的那样。我稍微更改了 data.frame,因此有三个“成员”和“休闲”。aggregate()

dtf <- structure(list(rideable_type=c("electric_bike",
  "classic_bike", "classic_bike", "electric_bike",
  "classic_bike", "classic_bike"), day_of_week=c(1, 1, 1, 6, 7,
  2), ride_length=structure(c(990, 810, 576, 296, 686, 294),
  class=c("hms", "difftime"), units="secs"),
  member_casual=c("member", "member", "member", "casual",
  "casual", "casual"), nride_length=structure(c(16.5, 13.5, 9.6,
  4.93, 11.43, 4.9), class="difftime", units="mins")),
  row.names=c(NA, -6L), class=c("tbl_df", "tbl", "data.frame"))
    
aggregate(ride_length ~ member_casual, data=dtf, mean)
  #   member_casual    ride_length
  # 1        casual 425.33333 secs
  # 2        member 792.00000 secs

评论

0赞 Nicolas Grisé 10/12/2023
聚合函数奏效了!谢谢