按组计算唯一值-解网

问：

    DATA = data.frame("TRIMESTER" = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
                      "STUDENT" = c(1,2,3,4,5,6,7,1,2,3,5,9,10,11,3,7,10,6,12,15,17,16,21))
    
    WANT = data.frame("TRIMESTER" = c(1,2,3),
                      "NEW_ENROLL" = c(7,3,5),
                      "TOTAL_ENROLL" = c(7,10,15))

我有“数据”，想做“想要”，它有三列，对于每个“三个月”，你计算新的“学生”的数量，然后对于“TOTAL_ENROLL”，你只计算每个三个月唯一“学生”的总数。

我的尝试只计算每个三个月的数字。

library(dplyr)
DATA %>%
group_by(TRIMESTER) %>%
count()

r dplyr

distinct(DATA, STUDENT, .keep_all = TRUE) %>%
 group_by(TRIMESTER) %>% 
 summarise(NEW_ENROLL = n(), .groups = 'drop') %>% 
 mutate(TOTAL_ENROLL = cumsum(NEW_ENROLL))
# A tibble: 3 × 3
  TRIMESTER NEW_ENROLL TOTAL_ENROLL
      <dbl>      <int>        <int>
1         1          7            7
2         2          3           10
3         3          5           15

9赞 Rui Barradas 8/15/2022 #2

这是一种方法。

suppressPackageStartupMessages(library(dplyr))

DATA <- data.frame("TRIMESTER" = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
                  "STUDENT" = c(1,2,3,4,5,6,7,1,2,3,5,9,10,11,3,7,10,6,12,15,17,16,21))
DATA %>% 
  mutate(NEW_ENROLL = !duplicated(STUDENT)) %>%
  group_by(TRIMESTER) %>%
  summarise(NEW_ENROLL = sum(NEW_ENROLL)) %>%
  ungroup() %>%
  mutate(TOTAL_ENROLL = cumsum(NEW_ENROLL))
#> # A tibble: 3 × 3
#>   TRIMESTER NEW_ENROLL TOTAL_ENROLL
#>       <dbl>      <int>        <int>
#> 1         1          7            7
#> 2         2          3           10
#> 3         3          5           15

^{创建于 2022-08-14 由 reprex 软件包（v2.0.1）}

9赞 Mohamed Desouky 8/15/2022 #3

对于品种，我们可以使用Base Raggregatetransform

transform(aggregate(. ~ TRIMESTER , DATA[!duplicated(DATA$STUDENT),] , length),
         TOTAL_ENROLL = cumsum(STUDENT))

输出


  TRIMESTER STUDENT TOTAL_ENROLL
1         1       7            7
2         2       3           10
3         3       5           15

上一个：在绘图区域外添加文本

下一个：根据向量重命名列 [duplicate]

按组计算唯一值

Count unique values by group

评论