计算给定因子中每个水平有多少个值?

Count how many values per level in a given factor?

提问人:Ahsk 提问时间:3/2/2023 最后编辑:Ahsk 更新时间:3/2/2023 访问量:121

问:

对于每年,我想创建两个新列,并分别计算每个列和列中的出现次数。这 如何计算给定因子中每个水平有多少个值? 回答您是否按一个变量分组,但我想使用 .这是我的数据截图temp_countrh_counttemp_catoghumidity_catoggroup_by(year, humidity_catog, temp_catog)

enter image description here

我可以使用以下代码创建单个列来计算每个类别列中的出现次数。humidity_counthumidity_catog

df <- group_by(year, humidity_catog) %>%
  summarize(humidity_count = n())

这是输出

enter image description here

但我想在同一数据框中创建另一列来计算每个类别列的数量。我怎样才能做到这一点?这是我通过 dput 函数创建的数据的可重现示例。temp_counttemp_count

df <- structure(
  list(
    year = structure(
      c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
        1L, 1L, 1L),
      .Label = c(
        "2006",
        "2007",
        "2012",
        "2013",
        "2014",
        "2014_c",
        "2015_a",
        "2015_b",
        "2016",
        "2017",
        "2020"
      ),
      class = "factor"
    ),
    min_rh = c(47.9, 49, 44.7, 40.2, 50, 52.3, 51.5, 82.8, 73.8,
               47.1),
    min_temp = c(12.4, 14.3, 15.1, 16.1, 12.7, 16.1, 14.4,
                 15.1, 11.8, 9.5),
    temp_catog = structure(
      c(2L, 2L, 3L, 3L,
        2L, 3L, 2L, 3L, 2L, 2L),
      .Label = c("T1(<=8)", "T2(>8, <=15)",
                 "T3(>15)"),
      class = "factor"
    ),
    humidity_catog = structure(
      c(1L,
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L),
      .Label = c("RH1(<=65)",
                 "RH2(>65)"),
      class = "factor"
    )
  ),
  class = c("grouped_df",
            "tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-10L),
  groups = structure(
    list(
      year = structure(
        1L,
        .Label = c(
          "2006",
          "2007",
          "2012",
          "2013",
          "2014",
          "2014_c",
          "2015_a",
          "2015_b",
          "2016",
          "2017",
          "2020"
        ),
        class = "factor"
      ),
      .rows = structure(
        list(1:10),
        ptype = integer(0),
        class = c("vctrs_list_of",
                  "vctrs_vctr", "list")
      )
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-1L),
    .drop = TRUE
  )
)

注意:我不想出现唯一情况。我只想计算每个类别被记录了多少次。

R DataFrame DPLYR 计数 数据操作

评论

1赞 Onyambu 3/2/2023
使用功能或等count(...)tallygroup_by(...)%>%mutate(n=n())
0赞 Ahsk 3/2/2023
@onyambu 这是因子变量 - 那些不会起作用。我可以使用然后组合两个不同的 df,但我想要一个优雅的解决方案df1 <- group_by(year, humidity_catog) %>% summarize(humidity_count = n())df2 <- group_by(year, temp_catog) %>% summarize(temp_count = n())

答:

1赞 GuedesBF 3/2/2023 #1

不太确定 OP 如何合并两个汇总结果,但我们可以按顺序调用而不是按顺序将分组变量提供给参数。mutatesummarise.by

obs:玩具数据帧是按年份分组的,我事先取消了分组

library(dplyr) #requires dplyr 1.1.0 for the .by solution

df %>%
    ungroup() %>%
    mutate(rh_count = n(), .by = c(year, humidity_catog)) %>%
    mutate(temp_count = n(), .by = c(year, temp_catog))

# A tibble: 10 × 7
   year  min_rh min_temp temp_catog   humidity_catog rh_count temp_count
   <fct>  <dbl>    <dbl> <fct>        <fct>             <int>      <int>
 1 2006    47.9     12.4 T2(>8, <=15) RH1(<=65)             8          6
 2 2006    49       14.3 T2(>8, <=15) RH1(<=65)             8          6
 3 2006    44.7     15.1 T3(>15)      RH1(<=65)             8          4
 4 2006    40.2     16.1 T3(>15)      RH1(<=65)             8          4
 5 2006    50       12.7 T2(>8, <=15) RH1(<=65)             8          6
 6 2006    52.3     16.1 T3(>15)      RH1(<=65)             8          4
 7 2006    51.5     14.4 T2(>8, <=15) RH1(<=65)             8          6
 8 2006    82.8     15.1 T3(>15)      RH2(>65)              2          4
 9 2006    73.8     11.8 T2(>8, <=15) RH2(>65)              2          6
10 2006    47.1      9.5 T2(>8, <=15) RH1(<=65)             8          6

评论

0赞 Ahsk 3/2/2023
是的,加入 df 并不能回答这个问题。也许不可能获得计数的两个汇总统计信息。我也可以通过 mutate 做到这一点。我的观点是获得一个像屏幕截图中一样的列。有两个 <=65 和 >65。所以对于2006年,我只有两个计数。对于温度,我有三个类别,所以我应该只得到 2006 年(每年)的计数。humidity_counthumidity_catogthree