tidyverse 中每组的滚动平均值

Rolling mean per group in tidyverse

提问人:Marco 提问时间:3/10/2023 最后编辑:Marco 更新时间:3/10/2023 访问量:83

问:

我汇总了每组的数据并计算了每组的平均值,以简化可视化。不幸的是,我的一些小组非常大,有些相当空旷。我喜欢有一个滚动平均值计算来进一步平滑图片。以下是类似的数据:

# load package
library(haven)
# read dta file from github
soep <- read_dta("https://github.com/MarcoKuehne/marcokuehne.github.io/blob/main/data/SOEP/soep_lebensz_en/soep_lebensz_en.dta?raw=true")

soep %>% 
  group_by(education, sex) %>% 
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n()) %>% 
  ggplot(aes(x = education, y = satisf_org, col = as.factor(sex))) +
  geom_point() +
  labs(title = "Mean Satisfaction per Education Level by Gender",
       x = "Education", y = "Mean Satisfaction", color = "Gender")

enter image description here

女性对教育的平均满意度为8.5,看起来是一个异常值。在每一年的教育中,我假设人们的差异不会太大而无法总结,即计算所有人在教育 7、8.5 和 9(按性别分组)的平均满意度,并将其存储为滚动平均值 8.5(按性别分组)。

从标准分组开始均值:

soep %>% 
  group_by(education, sex) %>% 
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n())

# A tibble: 28 × 4
# Groups:   education [14]
   education sex        satisf_org     n
       <dbl> <dbl+lbl>       <dbl> <int>
 1       7   0 [male]         6.16    73
 2       7   1 [female]       6.59   113
 3       8.5 0 [male]         7.16    37
 4       8.5 1 [female]       8.56    18
 5       9   0 [male]         6.88   430
 6       9   1 [female]       7.00   633
 7      10   0 [male]         7.19   144
 8      10   1 [female]       7.36   221
 9      10.5 0 [male]         6.96  1538
10      10.5 1 [female]       7.02  1493
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows

以下是我期望的数字

soep %>% 
  filter(sex == 1) %>%  # only looks at females
  filter(education %in% c(7, 8.5, 9)) %>%  # take education level before and after
  summarise(mean(satisf_org)) # calculate the "rolling mean" per group 

# A tibble: 1 × 1
  `mean(satisf_org)`
               <dbl>
1               6.97

这是我期望每个值的每组滚动平均值,即 6.97 而不是 8.56。

PS:在我的真实数据中,我以年为单位调查年龄,我通常至少有一些各个年龄段的人。因此,滚动窗口可以是 -1 到 +1(数字),而不是超前/滞后邻居。

r tidyverse 数据操作 计算

评论

0赞 Maël 3/10/2023
您在示例中找到如此小的均值这一事实是因为您在过滤器中排除了。这是你对每个群体的期望吗?8.5
0赞 Marco 3/10/2023
我的错误,我会更新。均值仍然几乎相同。最好

答:

2赞 Maël 3/10/2023 #1

您可以在那里做爱并做滚动平均值:group_by

library(dplyr)
library(slider)
soep %>% 
  group_by(education, sex) %>% 
  summarise(across(satisf_org, mean, na.rm = TRUE),
            n = n()) %>% 
  group_by(sex) %>%
  mutate(rolling_mean = slide_dbl(satisf_org, mean, .before = 1, .after = 1))

输出

# A tibble: 28 × 5
# Groups:   sex [2]
   education sex        satisf_org     n rolling_mean
       <dbl> <dbl+lbl>       <dbl> <int>        <dbl>
 1       7   0 [male]         6.16    73         6.66
 2       7   1 [female]       6.59   113         7.57
 3       8.5 0 [male]         7.16    37         6.73
 4       8.5 1 [female]       8.56    18         7.38
 5       9   0 [male]         6.88   430         7.08
 6       9   1 [female]       7.00   633         7.64
 7      10   0 [male]         7.19   144         7.01
 8      10   1 [female]       7.36   221         7.13
 9      10.5 0 [male]         6.96  1538         7.14
10      10.5 1 [female]       7.02  1493         7.20
# … with 18 more rows
# ℹ Use `print(n = ...)` to see more rows