按行从矩阵中抽样

Sample from a matrix by rows

提问人:Noale 提问时间:11/9/2023 最后编辑:Darren TsaiNoale 更新时间:11/9/2023 访问量:65

问:

我有以下矩阵:

 Mat1 <- structure(c("Procedure_B", "Procedure_C", "Procedure_B", NA, 
"Procedure_B", "Procedure_A", "Procedure_C", "Procedure_B", NA, 
"Procedure_B", NA, "Procedure_B", NA, NA, "Procedure_A", "Procedure_A", 
"Procedure_C", "Procedure_A", "Procedure_A", "Procedure_B", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_B", "Procedure_A", "Procedure_A", 
NA, NA, "Procedure_C", NA, "Procedure_C", NA, "Procedure_A", 
"Procedure_B", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_B", 
"Procedure_A", "Procedure_B", "Procedure_C", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_C", "Procedure_C", "Procedure_A", NA, 
NA, NA, NA, NA, NA, "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", 
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", 
"Procedure_C", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), dim = c(53L, 5L))

我想从每行中抽取一个值,概率如下:

P = c(0.99, 0.002992, 0.003186, 0.003018, 0.000804)

即,对于每一行,用这些概率对其 5 个值中的每一个进行采样

预期输出为 53 个值。

我试过了:

sample(Mat1, size = nrow(Mat1), prob = rep(P, nrow(Mat1)), replace = T)

但是,结果与预期的分布不符。 我不想在循环/应用中执行此操作,因为我的矩阵可以有很多行。

这个命令有什么问题?

R 矩阵 示例

评论


答:

3赞 Marcello Zago 11/9/2023 #1

您可以利用该功能:apply

apply(Mat1, 1, sample, prob=c(0.5, 0.2412, 0.2397, 0.0191), size=1)

指示要对每一行使用一个函数,然后必须指定要对每一行执行的函数。就您而言,它是.之后,您必须指定示例函数的参数。1sample

评论

0赞 Noale 11/9/2023
我想避免应用,因为这是基于循环的,并且我有大矩阵
0赞 Marcello Zago 11/9/2023
试着纠正我,但你并不总是必须以任何一种方式遍历这些行。所以我在这里没有看到速度问题。
0赞 Darren Tsai 11/9/2023
是的,你总是必须遍历这些行,但是在 R 中使用 or -loop 进行迭代很慢。如果可能,最好将代码重构为矢量化计算。applyfor
0赞 Marcello Zago 11/9/2023
哦,谢谢你的解释,很高兴知道!
2赞 Darren Tsai 11/9/2023 #2
  • 您可以对列索引进行采样,并根据采样的列号从矩阵中提取值。

  • 此方法是矢量化的。

set.seed(1)

cols <- sample(1:ncol(Mat1), size = nrow(Mat1), replace = TRUE, prob = P)
Mat1[cbind(1:nrow(Mat1), cols)]

#  [1] "Procedure_B" "Procedure_C" "Procedure_B" NA            "Procedure_B"
#  [6] "Procedure_A" "Procedure_C" "Procedure_B" NA            "Procedure_B"
# [11] NA            "Procedure_B" NA            NA            "Procedure_A"
# [16] "Procedure_A" "Procedure_C" "Procedure_B" "Procedure_A" "Procedure_B"
# [21] "Procedure_C" "Procedure_C" "Procedure_C" "Procedure_B" "Procedure_A"
# [26] "Procedure_A" NA            NA            "Procedure_C" NA           
# [31] "Procedure_C" NA            "Procedure_A" "Procedure_B" "Procedure_A"
# [36] "Procedure_A" "Procedure_A" "Procedure_B" "Procedure_A" "Procedure_B"
# [41] "Procedure_C" "Procedure_B" "Procedure_B" "Procedure_B" "Procedure_C"
# [46] "Procedure_C" "Procedure_A" NA            NA            NA           
# [51] NA            NA            NA