如何在python中加速多个嵌套for循环？-解网

问：

我想知道是否有另一种方法可以加速矩阵函数中的多个嵌套循环。for

这是我的功能：

def matrix(Xbin, y):
    labels = np.unique(y)
    con_matrix = []
    start = time.time()
    for i in range(len(labels)):
        for j in range(i + 1, len(labels)):
            # Crossover
            for u in Xbin[y == labels[i]]:
                for v in Xbin[y == labels[j]]:
                    con_matrix.append(np.bitwise_xor(u, v))
    end = time.time()
    duration = end - start
    print("total time for nested loop: ", duration)
    constraint_matrix = np.array(con_matrix)
    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]

    df = pd.DataFrame(constraint_matrix, columns=bin_attr_dim)

    return df

请注意，这是一个 . 表示唯一组（1， 2， 3）。下图表示这种（从 a 列到 h 列开始）图 1：Xbinnumpy.ndarrayyndarray

上面描述的函数矩阵将生成与下图相对应的输出（a 到 h 列）。它是不同组中元素之间的组合。图2：DataFrame

下面是我生成二值化格式数据集的代码，如图 1 所示：

def binarize_dataset(X, y):
    cutpoints = {}

    att = -1
    for row in X.T:
        att += 1
        labels = None  # Previous labels
        u = -9999  # Previous xi

        # Finding transitions
        for v in sorted(np.unique(row)):
            variation = v - u  # Current - Previous

            # Classes where current v appears
            indexes = np.where(row == v)[0]
            # current label
            __labels = set(y[indexes])

            # Main condition
            if labels is not None and variation > 0:
                # Testing for transition to find the essential cut-points
                if (len(labels) > 1 or len(__labels) > 1) or labels != __labels:
                    # cut-point id
                    cid = len(cutpoints)
                    cutpoints[cid] = (att, u + variation / 2.0)

            labels = __labels
            # previous equals current
            u = v
    new_dict = {}
    # Iterate over the values in the original dictionary
    for key, value in cutpoints.items():
        first_element = value[0]
        second_element = value[1]

        # Check if the first_element is already a key in new_dict
        if first_element in new_dict:
            new_dict[first_element].append(second_element)
        else:
            new_dict[first_element] = [second_element]
    # Generate combinations of the second elements within each group
    for key, value in new_dict.items():
        comb = combinations(value, 2)
        # Append the combinations to the value list
        for c in comb:
            new_dict[key].append(c)
    arrays = []
    for attr, cutpoints in new_dict.items():
        for cutpoint in cutpoints:
            row = X.T[attr]
            if isinstance(cutpoint, tuple):
                lowerbound = cutpoint[0] <= row.reshape(X.shape[0], 1)
                upperbound = row.reshape(X.shape[0], 1) < cutpoint[1]
                row = np.logical_and(lowerbound, upperbound)
                arrays.append(row)
            else:
                row = row.reshape(X.shape[0], 1) >= cutpoint
                arrays.append(row)
    
    Xbin = np.concatenate(arrays, axis=1)

    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]
    df = pd.DataFrame(Xbin, columns=bin_attr_dim)
    start = 0
    dict_parent_children = {}
    for key, list_value in new_dict.items():
        dict_parent_children[key] = list(df.columns[start: start + len(list_value)])
        start += len(list_value)

    return Xbin, df, dict_parent_children

当我使用虹膜数据集（一个小数据集）进行测试时，它的工作速度非常快。

X, y = datasets.load_iris(return_X_y=True)
bin_dataset, data, dict_parent_children = binarize_dataset(X, y)

con_matrix = matrix(bin_dataset, y)

当我使用更大的数据集（如乳腺癌）进行测试时，它开始变得越来越长。

X, y = datasets.load_breast_cancer(return_X_y=True)
bin_dataset, data, dict_parent_children = binarize_dataset(X, y)
con_matrix = matrix(bin_dataset, y)

想象一下，使用比乳腺癌更大的数据集进行测试，问题是在这种情况下，我如何尽快加快我的函数矩阵，或者是否有更快的方法来重写更快的矩阵函数？

python numpy for 循环优化套

当使用加速循环的明显步骤是用全数组方法（“向量化”）替换它们时。它们仍然循环，但在编译的代码中。有时这很容易做到（有经验）。有时需要一些技巧。或者一次专注于一个循环，或者寻找一些自上而下的模式。但正如@jared所写的，如果问题/代码太大，我们大多数人都会转向一个更有趣、更集中的问题。numpy

答：

0赞 Sahil Manchanda 8/8/2023 #1

我想我们可以通过避免内部 for 循环并使用优化的方式（例如 Vectorization、itertools.combinations 和 Parallel Processing）（如果系统支持）来提高性能

请尝试以下代码：

import numpy as np
import pandas as pd
import time
from itertools import combinations

def matrix(Xbin, y):
    labels = np.unique(y)
    con_matrix = []
    start = time.time()

    for i, j in combinations(range(len(labels)), 2):
        xor_results = np.bitwise_xor(Xbin[y == labels[i]], Xbin[y == labels[j]])
        con_matrix.extend(xor_results)

    end = time.time()
    duration = end - start
    print("total time for nested loop: ", duration)

    constraint_matrix = np.array(con_matrix)
    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]

    df = pd.DataFrame(constraint_matrix, columns=bin_attr_dim)

    return df

我使用您提供的数据二值化代码尝试了乳腺癌数据集，但自己遇到了问题。我不得不稍微改变我的代码，因为中间数组变得非常大。因此，我改用了一种不太矢量化的解决方案。对此的潜在改进可能是在多个流程上执行此操作，但我不确定数据在流程之间的共享效率如何。

但是，您是否知道结果将占用超过 30 GB 的内存？无论如何，这是代码：

整个脚本在配备 M2 Pro CPU 的 Macbook Pro 上大约需要 31 秒

import numpy as np
import pandas as pd
from numpy.typing import NDArray
from itertools import product, pairwise, chain, combinations
from numpy.lib.stride_tricks import sliding_window_view
from sys import getsizeof


def binarize_dataset(X, y):
    """This is the binarization code you provided"""
    cutpoints = {}

    att = -1
    for row in X.T:
        att += 1
        labels = None  # Previous labels
        u = -9999  # Previous xi

        # Finding transitions
        for v in sorted(np.unique(row)):
            variation = v - u  # Current - Previous

            # Classes where current v appears
            indexes = np.where(row == v)[0]
            # current label
            __labels = set(y[indexes])

            # Main condition
            if labels is not None and variation > 0:
                # Testing for transition to find the essential cut-points
                if (len(labels) > 1 or len(__labels) > 1) or labels != __labels:
                    # cut-point id
                    cid = len(cutpoints)
                    cutpoints[cid] = (att, u + variation / 2.0)

            labels = __labels
            # previous equals current
            u = v
    new_dict = {}
    # Iterate over the values in the original dictionary
    for key, value in cutpoints.items():
        first_element = value[0]
        second_element = value[1]

        # Check if the first_element is already a key in new_dict
        if first_element in new_dict:
            new_dict[first_element].append(second_element)
        else:
            new_dict[first_element] = [second_element]
    # Generate combinations of the second elements within each group
    for key, value in new_dict.items():
        comb = combinations(value, 2)
        # Append the combinations to the value list
        for c in comb:
            new_dict[key].append(c)
    arrays = []
    for attr, cutpoints in new_dict.items():
        for cutpoint in cutpoints:
            row = X.T[attr]
            if isinstance(cutpoint, tuple):
                lowerbound = cutpoint[0] <= row.reshape(X.shape[0], 1)
                upperbound = row.reshape(X.shape[0], 1) < cutpoint[1]
                row = np.logical_and(lowerbound, upperbound)
                arrays.append(row)
            else:
                row = row.reshape(X.shape[0], 1) >= cutpoint
                arrays.append(row)

    Xbin = np.concatenate(arrays, axis=1)

    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]
    df = pd.DataFrame(Xbin, columns=bin_attr_dim)
    start = 0
    dict_parent_children = {}
    for key, list_value in new_dict.items():
        dict_parent_children[key] = list(df.columns[start : start + len(list_value)])
        start += len(list_value)

    return Xbin, df, dict_parent_children


def xor_grouped_pairs(X: NDArray, y: NDArray, y_is_sorted: bool = False):
    """
    If X is already sorted respective to y you can save some time by specifying that it is already sorted
    """
    if not y_is_sorted:
        X = X[y.argsort()]

    pairs = np.concatenate(([0], np.cumsum(np.unique(y, return_counts=True)[1])))
    pairs = sliding_window_view(pairs, 2)
    pairs = pairwise(pairs)
    pairs = chain.from_iterable(product(range(*l), range(*r)) for l, r in pairs)

    xor_matrix = np.array([np.bitwise_xor(X[a], X[b]) for a, b in pairs], dtype=np.uint8)
    return pd.DataFrame(data=xor_matrix)


if __name__ == "__main__":
    X = np.array(
        [
            [0, 1, 0, 1, 0, 1, 1, 0],
            [1, 1, 0, 1, 1, 0, 0, 1],
            [0, 1, 1, 0, 1, 0, 0, 1],
            [1, 0, 1, 0, 1, 0, 1, 1],
            [0, 0, 0, 1, 1, 1, 0, 0],
            [1, 1, 0, 1, 0, 1, 0, 1],
            [0, 0, 1, 0, 1, 0, 1, 0],
        ]
    )

    y = np.array([1, 1, 1, 2, 2, 3, 3])
    print(f"Result on the small example:")
    print(xor_grouped_pairs(X, y))

    from sklearn import datasets
    X, y = datasets.load_breast_cancer(return_X_y=True)
    bin_dataset, data, dict_parent_children = binarize_dataset(X, y)
    print(f"The size of the binarized dataset is {bin_dataset.shape}, memory={getsizeof(bin_dataset) / 1e9} GB")
    print("Result on binarized breast cancer dataset:")
    print(result := xor_grouped_pairs(bin_dataset, y))
    print(f"Memory usage to store the result of binarized breast cancer: {getsizeof(result) / 1e9} GB")

输出：

Result on the small example:
   0  1  2  3  4  5  6  7
0  1  1  1  1  1  1  0  1
1  0  1  0  0  1  0  1  0
2  0  1  1  1  0  0  1  0
3  1  1  0  0  0  1  0  1
4  1  1  0  0  0  0  1  0
5  0  1  1  1  0  1  0  1
6  0  1  1  1  1  1  1  0
7  1  0  0  0  0  0  0  1
8  1  1  0  0  1  0  0  1
9  0  0  1  1  0  1  1  0
The size of the binarized dataset is (569, 487179), memory=0.277204979 GB
Result on binarized breast cancer dataset:
       0       1       2       3       4       5       6       ...  487172  487173  487174  487175  487176  487177  487178
0           0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
1           0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
2           1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0
3           1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0
4           0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
...       ...     ...     ...     ...     ...     ...     ...  ...     ...     ...     ...     ...     ...     ...     ...
75679       1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0
75680       0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
75681       0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
75682       0       0       0       0       0       0       1  ...       0       0       0       0       0       0       0
75683       1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0

[75684 rows x 487179 columns]
Memory usage to store the result of binarized breast cancer: 36.87165558 GB

如何在python中加速多个嵌套for循环？

How to speed up multiple nested for loops in python?

评论

评论

评论