提问人:solarlate 提问时间:11/17/2023 更新时间:11/17/2023 访问量:45
按非零值的数量将稀疏矩阵划分为更小的块
Partitioning Sparse Matrix into Smaller Chunks by the number of nonzero values
问:
我有以下代码:
import io, sys, numpy, scipy, struct, os
import numpy as np
from scipy import io
from scipy import sparse
from numpy import inf
import random as rnd
import sys
from scipy import sparse
def myhorsplit(matrix,numPartitions):
csr = sparse.csr_matrix(matrix)
input_array = csr.getnnz(0)
print(input_array)
# Step 1: Calculate the total
total = sum(input_array)
# Step 3: Set number of compute units to 4
num_compute_units = 4
partitions = [[] for _ in range(num_compute_units)]
current_sums = [0] * num_compute_units
for element in input_array:
# Find the partition with the smallest current sum
min_partition = min(range(num_compute_units), key=lambda i: current_sums[i])
# Add the element to the selected partition
partitions[min_partition].append(element)
current_sums[min_partition] += element
for i, partition in enumerate(partitions):
print(f"Partition {i}: {partition}")
print("Rows Result:")
print(csr.tolil().rows[0])
print("Get Column Result:")
print(csr.tolil().getcol(1))
return partition
# Create an 8x8 adjacency matrix with the modified element
adjacency_matrix = [
[1, 1, 1, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0],
[1, 1, 0, 1, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 0, 1],
[0, 0, 1, 0, 1, 0, 1, 0]
]
csr_matrix = sparse.csr_matrix(adjacency_matrix)
# horizontalSplit(csr_matrix,4)
myhorsplit(csr_matrix,4)
打印出来:
[4 2 5 2 3 2 1 2]
Partition 0: [4, 1]
Partition 1: [2, 3]
Partition 2: [5]
Partition 3: [2, 2, 2]
Rows Result:
[0, 1, 2, 3]
Get Column Result:
(0, 0) 1
(2, 0) 1
我找不到一种方法将这个稀疏矩阵划分为更小的块,以便前 3 个块(假设我想要 4 个块)具有总非零值/4,最后一个块具有其余的非零值?
答:
1赞
Reinderien
11/17/2023
#1
对于并行化分区问题,您会遇到很多似乎不必要的麻烦。如果保持分区连续,这将在安装运行时方面表现得更好。(使用数组,而不是矩阵!)
def myhorsplit(
matrix: sparse.sparray, n_compute_units: int = 4,
) -> list[sparse.sparray]:
nnz = matrix.getnnz(axis=1).cumsum()
total = nnz[-1]
ideal_breaks = np.arange(0, total, total/n_compute_units)
break_idx = [*nnz.searchsorted(ideal_breaks), None]
return [
matrix[i: j, :]
for i, j in zip(break_idx[:-1], break_idx[1:])
]
对于小数组,输出不是很准确:
Partition 0: 4 ones, shape (1, 8)
[[1 1 1 1 0 0 0 0]]
Partition 1: 5 ones, shape (2, 8)
[[1 0 1 0 0 0 0 0]
[1 1 0 1 0 0 0 0]]
Partition 2: 6 ones, shape (3, 8)
[[1 0 1 0 0 0 0 0]
[0 0 1 0 0 1 0 1]
[0 0 0 0 1 0 0 0]]
Partition 3: 6 ones, shape (2, 8)
[[0 0 0 0 1 1 0 1]
[0 0 1 0 1 0 1 0]]
但是您的实际数组很大,并且随着数组大小的增长,准确性会提高;使用一个大型的、随机生成的数组:
def main() -> None:
rand = np.random.default_rng(seed=0)
# Create an 8x8 adjacency matrix with the modified element
adjacency_matrix = [
(1, 1, 1, 1, 0, 0, 0, 0),
(1, 0, 1, 0, 0, 0, 0, 0),
(1, 1, 0, 1, 0, 0, 0, 0),
(1, 0, 1, 0, 0, 0, 0, 0),
(0, 0, 1, 0, 0, 1, 0, 1),
(0, 0, 0, 0, 1, 0, 0, 0),
(0, 0, 0, 0, 1, 1, 0, 1),
(0, 0, 1, 0, 1, 0, 1, 0),
]
# csr_matrix = sparse.csr_array(adjacency_matrix)
csr_matrix = sparse.csr_array(
rand.integers(low=0, high=2, size=(10_000, 50), dtype=np.uint8)
)
partitions = myhorsplit(csr_matrix)
for i, partition in enumerate(partitions):
print(f"Partition {i}: {partition.nnz} ones, shape {partition.shape}")
# print(partition.toarray())
Partition 0: 62316 ones, shape (2498, 50)
Partition 1: 62339 ones, shape (2506, 50)
Partition 2: 62332 ones, shape (2494, 50)
Partition 3: 62348 ones, shape (2502, 50)
如果你想在分区精度方面做得更好,并且允许任意的、不连续的分配,这是一个背包问题,它需要自己的时间来运行,并且会使用例如.scipy.optimize.milp
评论
0赞
solarlate
11/19/2023
它为 sparse.sparray 提供了属性错误,并建议将其更改为 sparse.bsr_array。我确实用谷歌搜索并安装了更新版本的 scipy 和降级的 networx 包,但它不起作用。我想知道两者之间是否有任何区别?
0赞
Reinderien
11/19/2023
“它没有用”是不够的。但是,是的,您应该运行现代版本的 scipy。
0赞
solarlate
11/19/2023
我把它修好了,可以工作了。非常感谢,但我还有一个简短的问题。分区 2 中的分区顺序发生了变化,因为它通过 searchsort 函数对它们进行排序。有没有办法不对这些值进行排序,而只是按顺序排列它们的值。喜欢 : 分区 2 : [[0 0 0 0 1 0 0 1],[1 0 1 0 0 0 0 0] ,[0 0 0 0 0 0 1 1 0]]
0赞
Reinderien
11/19/2023
“排序”是指非零计数的累积总和保证是单调的。没有执行其他排序。searchsorted
0赞
solarlate
11/19/2023
它不应该对分区 2 重新排序,这会导致错误的值,因为这是一个图形表示。这与回报的设置方式有关吗?
评论
numpy