提问人:Noale 提问时间:11/9/2023 最后编辑:Darren TsaiNoale 更新时间:11/9/2023 访问量:65
按行从矩阵中抽样
Sample from a matrix by rows
问:
我有以下矩阵:
Mat1 <- structure(c("Procedure_B", "Procedure_C", "Procedure_B", NA,
"Procedure_B", "Procedure_A", "Procedure_C", "Procedure_B", NA,
"Procedure_B", NA, "Procedure_B", NA, NA, "Procedure_A", "Procedure_A",
"Procedure_C", "Procedure_A", "Procedure_A", "Procedure_B", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_B", "Procedure_A", "Procedure_A",
NA, NA, "Procedure_C", NA, "Procedure_C", NA, "Procedure_A",
"Procedure_B", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_B",
"Procedure_A", "Procedure_B", "Procedure_C", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_C", "Procedure_C", "Procedure_A", NA,
NA, NA, NA, NA, NA, "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A", "Procedure_A",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B", "Procedure_B",
"Procedure_B", "Procedure_B", "Procedure_B", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C", "Procedure_C",
"Procedure_C", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), dim = c(53L, 5L))
我想从每行中抽取一个值,概率如下:
P = c(0.99, 0.002992, 0.003186, 0.003018, 0.000804)
即,对于每一行,用这些概率对其 5 个值中的每一个进行采样
预期输出为 53 个值。
我试过了:
sample(Mat1, size = nrow(Mat1), prob = rep(P, nrow(Mat1)), replace = T)
但是,结果与预期的分布不符。 我不想在循环/应用中执行此操作,因为我的矩阵可以有很多行。
这个命令有什么问题?
答:
3赞
Marcello Zago
11/9/2023
#1
您可以利用该功能:apply
apply(Mat1, 1, sample, prob=c(0.5, 0.2412, 0.2397, 0.0191), size=1)
指示要对每一行使用一个函数,然后必须指定要对每一行执行的函数。就您而言,它是.之后,您必须指定示例函数的参数。1
sample
评论
0赞
Noale
11/9/2023
我想避免应用,因为这是基于循环的,并且我有大矩阵
0赞
Marcello Zago
11/9/2023
试着纠正我,但你并不总是必须以任何一种方式遍历这些行。所以我在这里没有看到速度问题。
0赞
Darren Tsai
11/9/2023
是的,你总是必须遍历这些行,但是在 R 中使用 or -loop 进行迭代很慢。如果可能,最好将代码重构为矢量化计算。apply
for
0赞
Marcello Zago
11/9/2023
哦,谢谢你的解释,很高兴知道!
2赞
Darren Tsai
11/9/2023
#2
您可以对列索引进行采样,并根据采样的列号从矩阵中提取值。
此方法是矢量化的。
set.seed(1)
cols <- sample(1:ncol(Mat1), size = nrow(Mat1), replace = TRUE, prob = P)
Mat1[cbind(1:nrow(Mat1), cols)]
# [1] "Procedure_B" "Procedure_C" "Procedure_B" NA "Procedure_B"
# [6] "Procedure_A" "Procedure_C" "Procedure_B" NA "Procedure_B"
# [11] NA "Procedure_B" NA NA "Procedure_A"
# [16] "Procedure_A" "Procedure_C" "Procedure_B" "Procedure_A" "Procedure_B"
# [21] "Procedure_C" "Procedure_C" "Procedure_C" "Procedure_B" "Procedure_A"
# [26] "Procedure_A" NA NA "Procedure_C" NA
# [31] "Procedure_C" NA "Procedure_A" "Procedure_B" "Procedure_A"
# [36] "Procedure_A" "Procedure_A" "Procedure_B" "Procedure_A" "Procedure_B"
# [41] "Procedure_C" "Procedure_B" "Procedure_B" "Procedure_B" "Procedure_C"
# [46] "Procedure_C" "Procedure_A" NA NA NA
# [51] NA NA NA
评论