提问人:ZainNST 提问时间:8/11/2023 更新时间:8/11/2023 访问量:40
(R) 装箱数值列以计算分组后出现的次数
(R) Bin a numeric column to count occurrences after group by
问:
如果帖子的标题有点令人困惑,我们深表歉意。假设我有以下数据框:
set.seed(123)
test <- data.frame("chr" = rep("chr1",30), "position" = sample(c(1:50), 30, replace = F) ,
"info" = sample(c("X","Y"), 30, replace = T),
"condition"= sample(c("soft","stiff"), 30, replace = T) )
## head(test)
chr position info condition
1 chr1 31 Y soft
2 chr1 15 Y soft
3 chr1 14 X soft
4 chr1 3 X soft
5 chr1 42 X stiff
6 chr1 43 X stiff
我想将列装箱。假设尺寸为 10。然后根据条件(软或硬),我想计算列中的出现次数。因此,数据将如下所示(不是上述数据的实际结果)position
info
chr start end condition count_Y count_X
1 chr1 1 10 soft 2 3
2 chr1 1 10 stiff 0 2
3 chr1 11 20 soft 2 5
4 chr1 11 20 soft 1 2
5 chr1 21 30 soft 2 0
6 chr1 21 30 stiff 0 4
为了方便起见,最好根据条件创建两个数据帧,然后应用装箱和计数,但我卡在了这部分。任何帮助都是值得赞赏的。非常感谢。
答:
3赞
stefan
8/11/2023
#1
使用甚至更容易地使用整数除法进行分箱(Thx to @MrFlick 作为提示),您可以执行以下操作:cut
%/%
dplyr::count
tidyr::pivot_wider
library(dplyr, warn=FALSE)
library(tidyr)
test |>
mutate(
bin = position %/% 10 + 1,
start = (bin - 1) * 10 + 1,
end = bin * 10
) |>
count(chr, start, end, condition, info) |>
tidyr::pivot_wider(
names_from = info,
values_from = n,
names_prefix = "count_",
values_fill = 0
)
#> # A tibble: 9 × 6
#> chr start end condition count_X count_Y
#> <chr> <dbl> <dbl> <chr> <int> <int>
#> 1 chr1 1 10 soft 4 0
#> 2 chr1 1 10 stiff 2 1
#> 3 chr1 11 20 soft 3 3
#> 4 chr1 21 30 soft 1 1
#> 5 chr1 21 30 stiff 3 1
#> 6 chr1 31 40 soft 0 2
#> 7 chr1 31 40 stiff 2 1
#> 8 chr1 41 50 soft 0 1
#> 9 chr1 41 50 stiff 4 1
1赞
jkatam
8/11/2023
#2
或者,请检查以下代码方法base r
# Bin the "position" column with a bin size of 10
test$position_bin <- cut(test$position, breaks = seq(0, 50, by = 10), include.lowest = TRUE)
# Count occurrences in the "info" column based on the "condition"
count_result <- table(test$position_bin, test$condition, test$info) %>% as.data.frame() %>%
setNames(c('position_bin','condition','info','Freq')) %>%
reshape(idvar = c('position_bin','condition'), timevar = 'info', v.names = 'Freq', direction = 'wide')
创建于 2023-08-10 使用 reprex v2.0.2
position_bin condition Freq.X Freq.Y
1 [0,10] soft 4 0
2 (10,20] soft 3 3
3 (20,30] soft 1 1
4 (30,40] soft 0 2
5 (40,50] soft 0 1
6 [0,10] stiff 2 1
7 (10,20] stiff 0 0
8 (20,30] stiff 3 1
9 (30,40] stiff 2 1
10 (40,50] stiff 4 1
评论