提问人:cliu 提问时间:7/16/2022 最后编辑:Henrikcliu 更新时间:7/17/2022 访问量:188
将排名矩阵 (1 ~ 4) 扩展为更大的二进制矩阵
Expand a matrix of rankings (1 ~ 4) to a bigger binary matrix
问:
我有一个矩阵,我想将其转换为具有二进制输出(0 vs 1)的矩阵。要转换的矩阵包含四行排名(1 到 4):
mat1.data <- c(4, 3, 3, 3, 3, 2, 2, 1, 1, 1,
3, 4, 2, 4, 2, 3, 1, 3, 3, 2,
2, 2, 4, 1, 1, 1, 4, 4, 2, 4,
1, 1, 1, 2, 4, 4, 3, 2, 4, 3)
mat1 <- matrix(mat1.data,nrow=4,ncol=10,byrow=TRUE)
mat1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 3 3 3 3 2 2 1 1 1
[2,] 3 4 2 4 2 3 1 3 3 2
[3,] 2 2 4 1 1 1 4 4 2 4
[4,] 1 1 1 2 4 4 3 2 4 3
对于输入矩阵中的每一行,我想创建四个二进制行 - 每个排名值 (1-4) 对应一行。在二进制矩阵中,每个逐行条目在输入矩阵中焦点排名出现的位置上为 1,否则为 0。原始矩阵中的每一行应在输出矩阵中产生 10*4=40 个条目。
例如,对于输入矩阵中的第一行...
4 3 3 3 3 2 2 1 1 1
...输出应为:
0 0 0 0 0 0 0 1 1 1 # Rank 1 in input
0 0 0 0 0 1 1 0 0 0 # Rank 2 in input
0 1 1 1 1 0 0 0 0 0 # Rank 3 in input
1 0 0 0 0 0 0 0 0 0 # Rank 4 in input
继续此过程,所有四行排名的预期输出应如下所示:
0 0 0 0 0 0 0 1 1 1 #first row of rankings starts
0 0 0 0 0 1 1 0 0 0
0 1 1 1 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 #first row of rankings ends
0 0 0 0 0 0 1 0 0 0 #second row of rankings starts
0 0 1 0 1 0 0 0 0 1
1 0 0 0 0 1 0 1 1 0
0 1 0 1 0 0 0 0 0 0 #second row of rankings ends
0 0 0 1 1 1 0 0 0 0 #third row of rankings starts
1 1 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 1 0 1 #third row of rankings ends
1 1 1 0 0 0 0 0 0 0 #fourth row of rankings starts
0 0 0 1 0 0 0 1 0 0
0 0 0 0 0 0 1 0 0 1
0 0 0 0 1 1 0 0 1 0 #fourth row of rankings ends
我该如何实现?我有一个更大的数据集,所以首选更有效的方法,但任何帮助将不胜感激!
答:
5赞
Zheyuan Li
7/16/2022
#1
matrix(sapply(mat1, \(i) replace(numeric(4), i, 1)), ncol = ncol(mat1))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 0 0 0 0 0 0 1 1 1
# [2,] 0 0 0 0 0 1 1 0 0 0
# [3,] 0 1 1 1 1 0 0 0 0 0
# [4,] 1 0 0 0 0 0 0 0 0 0
# [5,] 0 0 0 0 0 0 1 0 0 0
# [6,] 0 0 1 0 1 0 0 0 0 1
# [7,] 1 0 0 0 0 1 0 1 1 0
# [8,] 0 1 0 1 0 0 0 0 0 0
# [9,] 0 0 0 1 1 1 0 0 0 0
#[10,] 1 1 0 0 0 0 0 0 1 0
#[11,] 0 0 0 0 0 0 0 0 0 0
#[12,] 0 0 1 0 0 0 1 1 0 1
#[13,] 1 1 1 0 0 0 0 0 0 0
#[14,] 0 0 0 1 0 0 0 1 0 0
#[15,] 0 0 0 0 0 0 1 0 0 1
#[16,] 0 0 0 0 1 1 0 0 1 0
它需要 2 个步骤,管道语法可能看起来更清晰:
sapply(mat1, \(i) replace(numeric(4), i, 1)) |> ## each value to binary vector
matrix(ncol = ncol(mat1)) ## reshape
实际上,我不需要那个匿名功能。我可以直接传递 ,以及它的参数。\(i)
replace
sapply
matrix(sapply(mat1, replace, x = numeric(4), values = 1), ncol = ncol(mat1))
sapply(mat1, replace, x = numeric(4), values = 1) |> matrix(ncol = ncol(mat1))
杂项
user20650 和我在评论中讨论了一点,这里有一个“矢量化”的方法,使用:outer
matrix(+outer(1:4, c(mat1), "=="), ncol = ncol(mat1))
Henrik 的答案是一种更节省内存的“矢量化”方法,但它使索引计算过于复杂。这里有一些更简单的东西:
out <- matrix(0, nrow(mat1) * 4, ncol(mat1))
pos1 <- seq(0, length(mat1) - 1) * 4 + c(mat1)
out[pos1] <- 1
到目前为止,所有方法都会创建一个密集的输出矩阵。这是可以的,因为非零元素的百分比为 25%,这通常不是稀疏的。但是,如果我们想要一个稀疏的,它也很简单:
## in fact, this is what Henrik aims to compute
ij <- arrayInd(pos1, c(4 * nrow(mat1), ncol(mat1)))
## sparse matrix
Matrix::sparseMatrix(i = ij[, 1], j = ij[, 2], x = rep(1, length(mat1)))
#16 x 10 sparse Matrix of class "dgCMatrix"
#
# [1,] . . . . . . . 1 1 1
# [2,] . . . . . 1 1 . . .
# [3,] . 1 1 1 1 . . . . .
# [4,] 1 . . . . . . . . .
# [5,] . . . . . . 1 . . .
# [6,] . . 1 . 1 . . . . 1
# [7,] 1 . . . . 1 . 1 1 .
# [8,] . 1 . 1 . . . . . .
# [9,] . . . 1 1 1 . . . .
#[10,] 1 1 . . . . . . 1 .
#[11,] . . . . . . . . . .
#[12,] . . 1 . . . 1 1 . 1
#[13,] 1 1 1 . . . . . . .
#[14,] . . . 1 . . . 1 . .
#[15,] . . . . . . 1 . . 1
#[16,] . . . . 1 1 . . 1 .
4赞
Henrik
7/16/2022
#2
使用 、 和矩阵索引:row
col
m = matrix(0, nr = 4 * nrow(mat1), nc = ncol(mat1))
m[cbind(c(row(mat1) + seq(0, by = (4 - 1), len = nrow(mat1)) + (mat1 - 1)),
c(col(mat1)))] = 1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 1 1 1
[2,] 0 0 0 0 0 1 1 0 0 0
[3,] 0 1 1 1 1 0 0 0 0 0
[4,] 1 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 1 0 0 0
[6,] 0 0 1 0 1 0 0 0 0 1
[7,] 1 0 0 0 0 1 0 1 1 0
[8,] 0 1 0 1 0 0 0 0 0 0
[9,] 0 0 0 1 1 1 0 0 0 0
[10,] 1 1 0 0 0 0 0 0 1 0
[11,] 0 0 0 0 0 0 0 0 0 0
[12,] 0 0 1 0 0 0 1 1 0 1
[13,] 1 1 1 0 0 0 0 0 0 0
[14,] 0 0 0 1 0 0 0 1 0 0
[15,] 0 0 0 0 0 0 1 0 0 1
[16,] 0 0 0 0 1 1 0 0 1 0
2赞
ThomasIsCoding
7/17/2022
#3
也许我们可以从使用+中受益,如下所示kronecker
rep
> +(kronecker(mat1, matrix(rep(1, 4))) == rep(1:4, nrow(mat1)))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 1 1 1
[2,] 0 0 0 0 0 1 1 0 0 0
[3,] 0 1 1 1 1 0 0 0 0 0
[4,] 1 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 1 0 0 0
[6,] 0 0 1 0 1 0 0 0 0 1
[7,] 1 0 0 0 0 1 0 1 1 0
[8,] 0 1 0 1 0 0 0 0 0 0
[9,] 0 0 0 1 1 1 0 0 0 0
[10,] 1 1 0 0 0 0 0 0 1 0
[11,] 0 0 0 0 0 0 0 0 0 0
[12,] 0 0 1 0 0 0 1 1 0 1
[13,] 1 1 1 0 0 0 0 0 0 0
[14,] 0 0 0 1 0 0 0 1 0 0
[15,] 0 0 0 0 0 0 1 0 0 1
[16,] 0 0 0 0 1 1 0 0 1 0
评论
0赞
ThomasIsCoding
7/17/2022
@ZheyuanLi有助于用另一个给定矩阵扩展一个矩阵的尺寸,这与仅将两个矩阵都简化为列/行向量时相同。kronecker
outer
评论
rowsum(1:4 * m, rep(1:4, each=4))
,其中是二进制矩阵输出(并更改以概括秩数)m
4
mat <- matrix(which(m == 1) %% 4, 4); mat[mat == 0] <- 4
matrix( (which(m == 1) - 1) %% 4 + 1 , 4)
)-1
+1