提问人:Ahsk 提问时间:3/2/2023 最后编辑:Ahsk 更新时间:3/2/2023 访问量:121
计算给定因子中每个水平有多少个值?
Count how many values per level in a given factor?
问:
对于每年,我想创建两个新列,并分别计算每个列和列中的出现次数。这 如何计算给定因子中每个水平有多少个值? 回答您是否按一个变量分组,但我想使用 .这是我的数据截图temp_count
rh_count
temp_catog
humidity_catog
group_by(year, humidity_catog, temp_catog)
我可以使用以下代码创建单个列来计算每个类别列中的出现次数。humidity_count
humidity_catog
df <- group_by(year, humidity_catog) %>%
summarize(humidity_count = n())
这是输出
但我想在同一数据框中创建另一列来计算每个类别列的数量。我怎样才能做到这一点?这是我通过 dput 函数创建的数据的可重现示例。temp_count
temp_count
df <- structure(
list(
year = structure(
c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L),
.Label = c(
"2006",
"2007",
"2012",
"2013",
"2014",
"2014_c",
"2015_a",
"2015_b",
"2016",
"2017",
"2020"
),
class = "factor"
),
min_rh = c(47.9, 49, 44.7, 40.2, 50, 52.3, 51.5, 82.8, 73.8,
47.1),
min_temp = c(12.4, 14.3, 15.1, 16.1, 12.7, 16.1, 14.4,
15.1, 11.8, 9.5),
temp_catog = structure(
c(2L, 2L, 3L, 3L,
2L, 3L, 2L, 3L, 2L, 2L),
.Label = c("T1(<=8)", "T2(>8, <=15)",
"T3(>15)"),
class = "factor"
),
humidity_catog = structure(
c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L),
.Label = c("RH1(<=65)",
"RH2(>65)"),
class = "factor"
)
),
class = c("grouped_df",
"tbl_df", "tbl", "data.frame"),
row.names = c(NA,-10L),
groups = structure(
list(
year = structure(
1L,
.Label = c(
"2006",
"2007",
"2012",
"2013",
"2014",
"2014_c",
"2015_a",
"2015_b",
"2016",
"2017",
"2020"
),
class = "factor"
),
.rows = structure(
list(1:10),
ptype = integer(0),
class = c("vctrs_list_of",
"vctrs_vctr", "list")
)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA,-1L),
.drop = TRUE
)
)
注意:我不想出现唯一情况。我只想计算每个类别被记录了多少次。
答:
1赞
GuedesBF
3/2/2023
#1
不太确定 OP 如何合并两个汇总结果,但我们可以按顺序调用而不是按顺序将分组变量提供给参数。mutate
summarise
.by
obs:玩具数据帧是按年份分组的,我事先取消了分组
library(dplyr) #requires dplyr 1.1.0 for the .by solution
df %>%
ungroup() %>%
mutate(rh_count = n(), .by = c(year, humidity_catog)) %>%
mutate(temp_count = n(), .by = c(year, temp_catog))
# A tibble: 10 × 7
year min_rh min_temp temp_catog humidity_catog rh_count temp_count
<fct> <dbl> <dbl> <fct> <fct> <int> <int>
1 2006 47.9 12.4 T2(>8, <=15) RH1(<=65) 8 6
2 2006 49 14.3 T2(>8, <=15) RH1(<=65) 8 6
3 2006 44.7 15.1 T3(>15) RH1(<=65) 8 4
4 2006 40.2 16.1 T3(>15) RH1(<=65) 8 4
5 2006 50 12.7 T2(>8, <=15) RH1(<=65) 8 6
6 2006 52.3 16.1 T3(>15) RH1(<=65) 8 4
7 2006 51.5 14.4 T2(>8, <=15) RH1(<=65) 8 6
8 2006 82.8 15.1 T3(>15) RH2(>65) 2 4
9 2006 73.8 11.8 T2(>8, <=15) RH2(>65) 2 6
10 2006 47.1 9.5 T2(>8, <=15) RH1(<=65) 8 6
评论
0赞
Ahsk
3/2/2023
是的,加入 df 并不能回答这个问题。也许不可能获得计数的两个汇总统计信息。我也可以通过 mutate 做到这一点。我的观点是获得一个像屏幕截图中一样的列。有两个 <=65 和 >65。所以对于2006年,我只有两个计数。对于温度,我有三个类别,所以我应该只得到 2006 年(每年)的计数。humidity_count
humidity_catog
three
下一个:如何显示成对比较图的字母?
评论
count(...)
tally
group_by(...)%>%mutate(n=n())
df1 <- group_by(year, humidity_catog) %>% summarize(humidity_count = n())
df2 <- group_by(year, temp_catog) %>% summarize(temp_count = n())