按非零值的数量将稀疏矩阵划分为更小的块-解网

问：

我有以下代码：

import io, sys, numpy, scipy, struct, os
import numpy as np
from scipy import io
from scipy import sparse
from numpy import inf
import random as rnd
import sys
from scipy import sparse

def myhorsplit(matrix,numPartitions):
    csr = sparse.csr_matrix(matrix)
    input_array = csr.getnnz(0)
    print(input_array)
        # Step 1: Calculate the total
    total = sum(input_array)

    # Step 3: Set number of compute units to 4
    num_compute_units = 4
 
    partitions = [[] for _ in range(num_compute_units)]
    current_sums = [0] * num_compute_units
    for element in input_array:
        # Find the partition with the smallest current sum
        min_partition = min(range(num_compute_units), key=lambda i: current_sums[i])

        # Add the element to the selected partition
        partitions[min_partition].append(element)
        current_sums[min_partition] += element

    for i, partition in enumerate(partitions):
        print(f"Partition {i}: {partition}")
    print("Rows Result:")
    print(csr.tolil().rows[0])
    print("Get Column Result:")
    print(csr.tolil().getcol(1))
    
    return partition

# Create an 8x8 adjacency matrix with the modified element
adjacency_matrix = [
    [1, 1, 1, 1, 0, 0, 0, 0],
    [1, 0, 1, 0, 0, 0, 0, 0],
    [1, 1, 0, 1, 0, 0, 0, 0],
    [1, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 1, 1, 0, 1],
    [0, 0, 1, 0, 1, 0, 1, 0]
]
csr_matrix = sparse.csr_matrix(adjacency_matrix)
# horizontalSplit(csr_matrix,4)
myhorsplit(csr_matrix,4)

打印出来：

[4 2 5 2 3 2 1 2]
Partition 0: [4, 1]
Partition 1: [2, 3]
Partition 2: [5]
Partition 3: [2, 2, 2]
Rows Result:
[0, 1, 2, 3]
Get Column Result:
  (0, 0)        1
  (2, 0)        1

我找不到一种方法将这个稀疏矩阵划分为更小的块，以便前 3 个块（假设我想要 4 个块）具有总非零值/4，最后一个块具有其余的非零值？

python 稀疏矩阵

def myhorsplit(
    matrix: sparse.sparray, n_compute_units: int = 4,
) -> list[sparse.sparray]:
    nnz = matrix.getnnz(axis=1).cumsum()
    total = nnz[-1]
    ideal_breaks = np.arange(0, total, total/n_compute_units)
    break_idx = [*nnz.searchsorted(ideal_breaks), None]
    return [
        matrix[i: j, :]
        for i, j in zip(break_idx[:-1], break_idx[1:])
    ]

对于小数组，输出不是很准确：

Partition 0: 4 ones, shape (1, 8)
[[1 1 1 1 0 0 0 0]]
Partition 1: 5 ones, shape (2, 8)
[[1 0 1 0 0 0 0 0]
 [1 1 0 1 0 0 0 0]]
Partition 2: 6 ones, shape (3, 8)
[[1 0 1 0 0 0 0 0]
 [0 0 1 0 0 1 0 1]
 [0 0 0 0 1 0 0 0]]
Partition 3: 6 ones, shape (2, 8)
[[0 0 0 0 1 1 0 1]
 [0 0 1 0 1 0 1 0]]

但是您的实际数组很大，并且随着数组大小的增长，准确性会提高;使用一个大型的、随机生成的数组：

def main() -> None:
    rand = np.random.default_rng(seed=0)
    # Create an 8x8 adjacency matrix with the modified element
    adjacency_matrix = [
        (1, 1, 1, 1, 0, 0, 0, 0),
        (1, 0, 1, 0, 0, 0, 0, 0),
        (1, 1, 0, 1, 0, 0, 0, 0),
        (1, 0, 1, 0, 0, 0, 0, 0),
        (0, 0, 1, 0, 0, 1, 0, 1),
        (0, 0, 0, 0, 1, 0, 0, 0),
        (0, 0, 0, 0, 1, 1, 0, 1),
        (0, 0, 1, 0, 1, 0, 1, 0),
    ]
    # csr_matrix = sparse.csr_array(adjacency_matrix)
    csr_matrix = sparse.csr_array(
        rand.integers(low=0, high=2, size=(10_000, 50), dtype=np.uint8)
    )

    partitions = myhorsplit(csr_matrix)

    for i, partition in enumerate(partitions):
        print(f"Partition {i}: {partition.nnz} ones, shape {partition.shape}")
        # print(partition.toarray())

Partition 0: 62316 ones, shape (2498, 50)
Partition 1: 62339 ones, shape (2506, 50)
Partition 2: 62332 ones, shape (2494, 50)
Partition 3: 62348 ones, shape (2502, 50)

如果你想在分区精度方面做得更好，并且允许任意的、不连续的分配，这是一个背包问题，它需要自己的时间来运行，并且会使用例如.scipy.optimize.milp

我把它修好了，可以工作了。非常感谢，但我还有一个简短的问题。分区 2 中的分区顺序发生了变化，因为它通过 searchsort 函数对它们进行排序。有没有办法不对这些值进行排序，而只是按顺序排列它们的值。喜欢：分区 2 ： [[0 0 0 0 1 0 0 1]，[1 0 1 0 0 0 0 0] ，[0 0 0 0 0 0 1 1 0]]

0赞 Reinderien 11/19/2023

“排序”是指非零计数的累积总和保证是单调的。没有执行其他排序。searchsorted

0赞 solarlate 11/19/2023

它不应该对分区 2 重新排序，这会导致错误的值，因为这是一个图形表示。这与回报的设置方式有关吗？

上一个：是否有表示排列的数组中 1 的位置的解析公式？

下一个：如何将CSR格式存储的稀疏矩阵转换为VBR格式？

按非零值的数量将稀疏矩阵划分为更小的块

Partitioning Sparse Matrix into Smaller Chunks by the number of nonzero values

评论

评论