提问人:Marco 提问时间:3/10/2023 最后编辑:Marco 更新时间:3/10/2023 访问量:83
tidyverse 中每组的滚动平均值
Rolling mean per group in tidyverse
问:
我汇总了每组的数据并计算了每组的平均值,以简化可视化。不幸的是,我的一些小组非常大,有些相当空旷。我喜欢有一个滚动平均值计算来进一步平滑图片。以下是类似的数据:
# load package
library(haven)
# read dta file from github
soep <- read_dta("https://github.com/MarcoKuehne/marcokuehne.github.io/blob/main/data/SOEP/soep_lebensz_en/soep_lebensz_en.dta?raw=true")
soep %>%
group_by(education, sex) %>%
summarise(across(satisf_org, mean, na.rm = TRUE),
n = n()) %>%
ggplot(aes(x = education, y = satisf_org, col = as.factor(sex))) +
geom_point() +
labs(title = "Mean Satisfaction per Education Level by Gender",
x = "Education", y = "Mean Satisfaction", color = "Gender")
女性对教育的平均满意度为8.5,看起来是一个异常值。在每一年的教育中,我假设人们的差异不会太大而无法总结,即计算所有人在教育 7、8.5 和 9(按性别分组)的平均满意度,并将其存储为滚动平均值 8.5(按性别分组)。
从标准分组开始均值:
soep %>%
group_by(education, sex) %>%
summarise(across(satisf_org, mean, na.rm = TRUE),
n = n())
# A tibble: 28 × 4
# Groups: education [14]
education sex satisf_org n
<dbl> <dbl+lbl> <dbl> <int>
1 7 0 [male] 6.16 73
2 7 1 [female] 6.59 113
3 8.5 0 [male] 7.16 37
4 8.5 1 [female] 8.56 18
5 9 0 [male] 6.88 430
6 9 1 [female] 7.00 633
7 10 0 [male] 7.19 144
8 10 1 [female] 7.36 221
9 10.5 0 [male] 6.96 1538
10 10.5 1 [female] 7.02 1493
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows
以下是我期望的数字
soep %>%
filter(sex == 1) %>% # only looks at females
filter(education %in% c(7, 8.5, 9)) %>% # take education level before and after
summarise(mean(satisf_org)) # calculate the "rolling mean" per group
# A tibble: 1 × 1
`mean(satisf_org)`
<dbl>
1 6.97
这是我期望每个值的每组滚动平均值,即 6.97 而不是 8.56。
PS:在我的真实数据中,我以年为单位调查年龄,我通常至少有一些各个年龄段的人。因此,滚动窗口可以是 -1 到 +1(数字),而不是超前/滞后邻居。
答:
2赞
Maël
3/10/2023
#1
您可以在那里做爱并做滚动平均值:group_by
library(dplyr)
library(slider)
soep %>%
group_by(education, sex) %>%
summarise(across(satisf_org, mean, na.rm = TRUE),
n = n()) %>%
group_by(sex) %>%
mutate(rolling_mean = slide_dbl(satisf_org, mean, .before = 1, .after = 1))
输出
# A tibble: 28 × 5
# Groups: sex [2]
education sex satisf_org n rolling_mean
<dbl> <dbl+lbl> <dbl> <int> <dbl>
1 7 0 [male] 6.16 73 6.66
2 7 1 [female] 6.59 113 7.57
3 8.5 0 [male] 7.16 37 6.73
4 8.5 1 [female] 8.56 18 7.38
5 9 0 [male] 6.88 430 7.08
6 9 1 [female] 7.00 633 7.64
7 10 0 [male] 7.19 144 7.01
8 10 1 [female] 7.36 221 7.13
9 10.5 0 [male] 6.96 1538 7.14
10 10.5 1 [female] 7.02 1493 7.20
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows
上一个:将行与不同的变量组合在一起
评论
8.5