提问人:Álvaro Méndez Civieta 提问时间:11/13/2023 更新时间:11/13/2023 访问量:49
如何按日期聚合变量
How to aggregate a variable by date
问:
我有一个包含列的 R 数据集:
date
country_code
version
money
该变量是数值的。变量 和 是分类的。和 的每个组合都定义了一个唯一的观测值,其值是可测量的。money
version
country_code
date
country_code
version
money
我想对变量进行分组,并在数据集中计算 3 个新变量,它们是:version
country_code
- money_week:对于每个观测值,此变量存储前 7 天的聚合。
money
money_15
:对于每个观测值,此变量存储过去 15 天的聚合money
money_total
:对于每个观测值,此变量存储截至当前日期的聚合。money
和的不同组合可能具有不同的开始和结束日期,也可能缺少日期。country_code
version
例
data = tibble(
date = seq(as.Date('2022-01-01'), as.Date('2022-01-13'), by=1),
money = 1:13)
data = data[-c(2,3,4,10,11),]
# [Some computations for adding the expected new variables]
# A tibble: 10 × 2
date money money_week
2022-01-01 1 1
2022-01-05 5 6
2022-01-06 6 12
2022-01-07 7 19
2022-01-08 8 26
2022-01-09 9 35
2022-01-12 12 42
2022-01-13 13 49
请注意,每个观测值不仅仅是前 7 个观测值的总和,而是前 7 天的总和。例如,日期观测值是日期 、 、 、 的观测值的总和,它将包括观测值 、 ,但缺少最后一个观测值。2022-01-08
2022-01-08
2022-01-07
2022-01-06
2022-01-05
2022-01-04
2022-01-03
2022-01-02
答:
1赞
dufei
11/13/2023
#1
您可以使用重叠联接将日期中的值合并到日期间隔(分别为 7 天或 15 天):
library(tidyverse)
# define sample data
data = tibble(
date = seq(as.Date('2022-01-01'), as.Date('2022-01-13'), by=1),
money = 1:13)
data = data[-c(2,3,4,10,11),]
# prepare intervals for overlap joins
df <- data |>
mutate(
date = ymd(date),
date_7 = date - days(6),
date_15 = date - days(14)
)
# merge by 7-day interval
sum_7 <- df |>
select(date, date_7) |>
inner_join(df, join_by(between(y$date, x$date_7, x$date))) |>
summarize(money = sum(money), .by = date.x) |>
rename(date = date.x, money_7 = money)
# merge by 15-day interval
sum_15 <- df |>
select(date, date_15) |>
inner_join(df, join_by(between(y$date, x$date_15, x$date))) |>
summarize(money = sum(money), .by = date.x) |>
rename(date = date.x, money_15 = money)
# bind cols
data |>
inner_join(sum_7, join_by(date)) |>
inner_join(sum_15, join_by(date))
#> # A tibble: 8 × 4
#> date money money_7 money_15
#> <date> <int> <int> <int>
#> 1 2022-01-01 1 1 1
#> 2 2022-01-05 5 6 6
#> 3 2022-01-06 6 12 12
#> 4 2022-01-07 7 19 19
#> 5 2022-01-08 8 26 27
#> 6 2022-01-09 9 35 36
#> 7 2022-01-12 12 42 48
#> 8 2022-01-13 13 49 61
创建于 2023-11-12 with reprex v2.0.2
1赞
Seth
11/13/2023
#2
您可以将变量用作滑动窗口函数的索引。date
library(slider)
library(tidyverse)
data %>%
mutate(money_week = slide_index_dbl(money, date, sum, .before = days(6)))
#> # A tibble: 8 × 3
#> date money money_week
#> <date> <int> <dbl>
#> 1 2022-01-01 1 1
#> 2 2022-01-05 5 6
#> 3 2022-01-06 6 12
#> 4 2022-01-07 7 19
#> 5 2022-01-08 8 26
#> 6 2022-01-09 9 35
#> 7 2022-01-12 12 42
#> 8 2022-01-13 13 49
更多测试数据可供显示money_15
set.seed(123)
newdata = tibble(
date = seq(as.Date('2022-01-01'), as.Date('2022-03-31'), by=1),
money = sample(1:20,90, replace = TRUE)) %>%
slice_sample(n = 60) %>%
arrange(date)
newdata %>%
mutate(money_week = slide_index_dbl(money, date, sum, .before = days(6)),
money_15 = slide_index_dbl(money, date, sum, .before = days(14)),
money_total = cumsum(money))
#> # A tibble: 60 × 5
#> date money money_week money_15 money_total
#> <date> <int> <dbl> <dbl> <int>
#> 1 2022-01-01 15 15 15 15
#> 2 2022-01-02 19 34 34 34
#> 3 2022-01-03 14 48 48 48
#> 4 2022-01-05 10 58 58 58
#> 5 2022-01-06 18 76 76 76
#> 6 2022-01-07 11 87 87 87
#> 7 2022-01-08 5 77 92 92
#> 8 2022-01-09 20 78 112 112
#> 9 2022-01-11 5 69 117 117
#> 10 2022-01-13 9 50 126 126
#> # ℹ 50 more rows
评论
transform(data, money_week=ave(money, strftime(date, '%V'), FUN=cumsum))