如何在python中加速多个嵌套for循环?

How to speed up multiple nested for loops in python?

提问人:Erwin 提问时间:8/8/2023 最后编辑:jaredErwin 更新时间:8/11/2023 访问量:154

问:

我想知道是否有另一种方法可以加速矩阵函数中的多个嵌套循环。for

这是我的功能:

def matrix(Xbin, y):
    labels = np.unique(y)
    con_matrix = []
    start = time.time()
    for i in range(len(labels)):
        for j in range(i + 1, len(labels)):
            # Crossover
            for u in Xbin[y == labels[i]]:
                for v in Xbin[y == labels[j]]:
                    con_matrix.append(np.bitwise_xor(u, v))
    end = time.time()
    duration = end - start
    print("total time for nested loop: ", duration)
    constraint_matrix = np.array(con_matrix)
    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]

    df = pd.DataFrame(constraint_matrix, columns=bin_attr_dim)

    return df

请注意,这是一个 . 表示唯一组 (1, 2, 3)。下图表示这种(从 a 列到 h 列开始)图 1:Xbinnumpy.ndarrayyndarray

enter image description here

上面描述的函数矩阵将生成与下图相对应的输出(a 到 h 列)。它是不同组中元素之间的组合。图2:DataFrame

enter image description here

下面是我生成二值化格式数据集的代码,如图 1 所示:

def binarize_dataset(X, y):
    cutpoints = {}

    att = -1
    for row in X.T:
        att += 1
        labels = None  # Previous labels
        u = -9999  # Previous xi

        # Finding transitions
        for v in sorted(np.unique(row)):
            variation = v - u  # Current - Previous

            # Classes where current v appears
            indexes = np.where(row == v)[0]
            # current label
            __labels = set(y[indexes])

            # Main condition
            if labels is not None and variation > 0:
                # Testing for transition to find the essential cut-points
                if (len(labels) > 1 or len(__labels) > 1) or labels != __labels:
                    # cut-point id
                    cid = len(cutpoints)
                    cutpoints[cid] = (att, u + variation / 2.0)

            labels = __labels
            # previous equals current
            u = v
    new_dict = {}
    # Iterate over the values in the original dictionary
    for key, value in cutpoints.items():
        first_element = value[0]
        second_element = value[1]

        # Check if the first_element is already a key in new_dict
        if first_element in new_dict:
            new_dict[first_element].append(second_element)
        else:
            new_dict[first_element] = [second_element]
    # Generate combinations of the second elements within each group
    for key, value in new_dict.items():
        comb = combinations(value, 2)
        # Append the combinations to the value list
        for c in comb:
            new_dict[key].append(c)
    arrays = []
    for attr, cutpoints in new_dict.items():
        for cutpoint in cutpoints:
            row = X.T[attr]
            if isinstance(cutpoint, tuple):
                lowerbound = cutpoint[0] <= row.reshape(X.shape[0], 1)
                upperbound = row.reshape(X.shape[0], 1) < cutpoint[1]
                row = np.logical_and(lowerbound, upperbound)
                arrays.append(row)
            else:
                row = row.reshape(X.shape[0], 1) >= cutpoint
                arrays.append(row)
    
    Xbin = np.concatenate(arrays, axis=1)

    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]
    df = pd.DataFrame(Xbin, columns=bin_attr_dim)
    start = 0
    dict_parent_children = {}
    for key, list_value in new_dict.items():
        dict_parent_children[key] = list(df.columns[start: start + len(list_value)])
        start += len(list_value)

    return Xbin, df, dict_parent_children

当我使用虹膜数据集(一个小数据集)进行测试时,它的工作速度非常快。

X, y = datasets.load_iris(return_X_y=True)
bin_dataset, data, dict_parent_children = binarize_dataset(X, y)

con_matrix = matrix(bin_dataset, y)

当我使用更大的数据集(如乳腺癌)进行测试时,它开始变得越来越长。

X, y = datasets.load_breast_cancer(return_X_y=True)
bin_dataset, data, dict_parent_children = binarize_dataset(X, y)
con_matrix = matrix(bin_dataset, y)

想象一下,使用比乳腺癌更大的数据集进行测试,问题是在这种情况下,我如何尽快加快我的函数矩阵,或者是否有更快的方法来重写更快的矩阵函数?

python numpy for 循环 优化

评论

0赞 8/8/2023
这回答了你的问题吗?如何在 Python 中加速嵌套 for 循环
1赞 Erwin 8/8/2023
可悲的是没有。我已经提前看过了。
1赞 jared 8/8/2023
说到numpy,如果你正在寻找速度,你应该不惜一切代价避免编写循环。for
1赞 jared 8/8/2023
好吧,你将不得不编写一个最小的可重现示例,这意味着你应该将你的代码简化为对问题重要的内容,仅此而已;您的代码比大多数人想要阅读的要长。
1赞 hpaulj 8/8/2023
当使用加速循环的明显步骤是用全数组方法(“向量化”)替换它们时。它们仍然循环,但在编译的代码中。有时这很容易做到(有经验)。有时需要一些技巧。或者一次专注于一个循环,或者寻找一些自上而下的模式。但正如@jared所写的,如果问题/代码太大,我们大多数人都会转向一个更有趣、更集中的问题。numpy

答:

0赞 Sahil Manchanda 8/8/2023 #1

我想我们可以通过避免内部 for 循环并使用优化的方式(例如 Vectorization、itertools.combinations 和 Parallel Processing)(如果系统支持)来提高性能

请尝试以下代码:

import numpy as np
import pandas as pd
import time
from itertools import combinations

def matrix(Xbin, y):
    labels = np.unique(y)
    con_matrix = []
    start = time.time()

    for i, j in combinations(range(len(labels)), 2):
        xor_results = np.bitwise_xor(Xbin[y == labels[i]], Xbin[y == labels[j]])
        con_matrix.extend(xor_results)

    end = time.time()
    duration = end - start
    print("total time for nested loop: ", duration)

    constraint_matrix = np.array(con_matrix)
    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]

    df = pd.DataFrame(constraint_matrix, columns=bin_attr_dim)

    return df

评论

0赞 Erwin 8/8/2023
--@Sahil Manchanda,我的代码有错误。ValueError:操作数无法与形状一起广播 (212,487179) (357,487179)
1赞 Sahil Manchanda 8/8/2023
好的,让我检查一下并回复您
0赞 Sahil Manchanda 8/8/2023
@Erwin 该错误是由于我们尝试异或的数组具有不兼容的形状。np.bitwise_xor操作需要相同形状的数组。我已经更新了代码。请检查并让我知道这是否适合您
0赞 Erwin 8/8/2023
--@Sahil Manchanda 更新后的代码我仍然遇到相同的错误。
0赞 Naphat Amundsen 8/9/2023 #2

我使用您提供的数据二值化代码尝试了乳腺癌数据集,但自己遇到了问题。我不得不稍微改变我的代码,因为中间数组变得非常大。因此,我改用了一种不太矢量化的解决方案。对此的潜在改进可能是在多个流程上执行此操作,但我不确定数据在流程之间的共享效率如何。

但是,您是否知道结果将占用超过 30 GB 的内存?无论如何,这是代码:

整个脚本在配备 M2 Pro CPU 的 Macbook Pro 上大约需要 31 秒

import numpy as np
import pandas as pd
from numpy.typing import NDArray
from itertools import product, pairwise, chain, combinations
from numpy.lib.stride_tricks import sliding_window_view
from sys import getsizeof


def binarize_dataset(X, y):
    """This is the binarization code you provided"""
    cutpoints = {}

    att = -1
    for row in X.T:
        att += 1
        labels = None  # Previous labels
        u = -9999  # Previous xi

        # Finding transitions
        for v in sorted(np.unique(row)):
            variation = v - u  # Current - Previous

            # Classes where current v appears
            indexes = np.where(row == v)[0]
            # current label
            __labels = set(y[indexes])

            # Main condition
            if labels is not None and variation > 0:
                # Testing for transition to find the essential cut-points
                if (len(labels) > 1 or len(__labels) > 1) or labels != __labels:
                    # cut-point id
                    cid = len(cutpoints)
                    cutpoints[cid] = (att, u + variation / 2.0)

            labels = __labels
            # previous equals current
            u = v
    new_dict = {}
    # Iterate over the values in the original dictionary
    for key, value in cutpoints.items():
        first_element = value[0]
        second_element = value[1]

        # Check if the first_element is already a key in new_dict
        if first_element in new_dict:
            new_dict[first_element].append(second_element)
        else:
            new_dict[first_element] = [second_element]
    # Generate combinations of the second elements within each group
    for key, value in new_dict.items():
        comb = combinations(value, 2)
        # Append the combinations to the value list
        for c in comb:
            new_dict[key].append(c)
    arrays = []
    for attr, cutpoints in new_dict.items():
        for cutpoint in cutpoints:
            row = X.T[attr]
            if isinstance(cutpoint, tuple):
                lowerbound = cutpoint[0] <= row.reshape(X.shape[0], 1)
                upperbound = row.reshape(X.shape[0], 1) < cutpoint[1]
                row = np.logical_and(lowerbound, upperbound)
                arrays.append(row)
            else:
                row = row.reshape(X.shape[0], 1) >= cutpoint
                arrays.append(row)

    Xbin = np.concatenate(arrays, axis=1)

    bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]
    df = pd.DataFrame(Xbin, columns=bin_attr_dim)
    start = 0
    dict_parent_children = {}
    for key, list_value in new_dict.items():
        dict_parent_children[key] = list(df.columns[start : start + len(list_value)])
        start += len(list_value)

    return Xbin, df, dict_parent_children


def xor_grouped_pairs(X: NDArray, y: NDArray, y_is_sorted: bool = False):
    """
    If X is already sorted respective to y you can save some time by specifying that it is already sorted
    """
    if not y_is_sorted:
        X = X[y.argsort()]

    pairs = np.concatenate(([0], np.cumsum(np.unique(y, return_counts=True)[1])))
    pairs = sliding_window_view(pairs, 2)
    pairs = pairwise(pairs)
    pairs = chain.from_iterable(product(range(*l), range(*r)) for l, r in pairs)

    xor_matrix = np.array([np.bitwise_xor(X[a], X[b]) for a, b in pairs], dtype=np.uint8)
    return pd.DataFrame(data=xor_matrix)


if __name__ == "__main__":
    X = np.array(
        [
            [0, 1, 0, 1, 0, 1, 1, 0],
            [1, 1, 0, 1, 1, 0, 0, 1],
            [0, 1, 1, 0, 1, 0, 0, 1],
            [1, 0, 1, 0, 1, 0, 1, 1],
            [0, 0, 0, 1, 1, 1, 0, 0],
            [1, 1, 0, 1, 0, 1, 0, 1],
            [0, 0, 1, 0, 1, 0, 1, 0],
        ]
    )

    y = np.array([1, 1, 1, 2, 2, 3, 3])
    print(f"Result on the small example:")
    print(xor_grouped_pairs(X, y))

    from sklearn import datasets
    X, y = datasets.load_breast_cancer(return_X_y=True)
    bin_dataset, data, dict_parent_children = binarize_dataset(X, y)
    print(f"The size of the binarized dataset is {bin_dataset.shape}, memory={getsizeof(bin_dataset) / 1e9} GB")
    print("Result on binarized breast cancer dataset:")
    print(result := xor_grouped_pairs(bin_dataset, y))
    print(f"Memory usage to store the result of binarized breast cancer: {getsizeof(result) / 1e9} GB")

输出:

Result on the small example:
   0  1  2  3  4  5  6  7
0  1  1  1  1  1  1  0  1
1  0  1  0  0  1  0  1  0
2  0  1  1  1  0  0  1  0
3  1  1  0  0  0  1  0  1
4  1  1  0  0  0  0  1  0
5  0  1  1  1  0  1  0  1
6  0  1  1  1  1  1  1  0
7  1  0  0  0  0  0  0  1
8  1  1  0  0  1  0  0  1
9  0  0  1  1  0  1  1  0
The size of the binarized dataset is (569, 487179), memory=0.277204979 GB
Result on binarized breast cancer dataset:
       0       1       2       3       4       5       6       ...  487172  487173  487174  487175  487176  487177  487178
0           0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
1           0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
2           1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0
3           1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0
4           0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
...       ...     ...     ...     ...     ...     ...     ...  ...     ...     ...     ...     ...     ...     ...     ...
75679       1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0
75680       0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
75681       0       0       0       0       0       0       0  ...       0       0       0       0       0       0       0
75682       0       0       0       0       0       0       1  ...       0       0       0       0       0       0       0
75683       1       1       1       1       1       1       1  ...       0       0       0       0       0       0       0

[75684 rows x 487179 columns]
Memory usage to store the result of binarized breast cancer: 36.87165558 GB

评论

0赞 Erwin 8/9/2023
你能用乳腺癌等实际数据集来测试它吗?
0赞 Naphat Amundsen 8/11/2023
我现在已经测试过了,看看我更新的答案。我不得不对我的代码进行一些调整。