计算非唯一数组元素的顺序-解网

问：

我正在寻找一种有效的方法来计算每个的“顺序” numpy 数组中的项，其中 “order” 定义为前面的元素等于元素。例：

order([4, 2, 3, 2, 6, 4, 4, 6, 2, 4])
[0 0 0 1 0 1 2 1 2 3]

当前的解决方案在纯 Python 中循环，速度不够快：

def order(A):
    cnt = defaultdict(int)
    O = np.zeros_like(A)
    for i, r in enumerate(A):
        O[i] = cnt[r]
        cnt[r] += 1
    return O

我用来实现：orderscatter

def scatter(A, c):
    R = A % c
    I = c * order(R) + R
    B = np.full(np.max(I) + 1, -1)
    B[I] = A
    return B

这对于多线程很有用。例如，如果分散的数组包含要写入的地址，然后没有两个线程处理并行数组将看到相同的地址。

问题是我是否缺少可以使用的numpy内置函数使速度更快并消除显式循环？order

python 数组 numpy

对数组进行排序，跟踪每个元素的来源。
将排序数组的每个元素与下一个元素进行比较。如果不同，则为新运行的开始。(run)
计算出每组的长度。(rep)
做一个累积总和，对于不属于新运行的每个元素递增 1。(out)
跟踪每个组受其之前的组的影响程度，这不应计算在内。(out[run])
重复该值以减去。rep
撤消初始排序，将元素放回其原始位置。

以下是如何在不依赖任何 Pandas 的情况下做同样的事情。

def order(array):
    # https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/core/groupby/groupby.py#L1493
    if len(array) == 0:
        return np.array([])
    count = len(array)
    # Can remove 'stable' here to increase speed if you
    # don't care what order the order is assigned in
    ind = np.argsort(array, kind='stable')
    array = array[ind]
    run = np.r_[True, array[:-1] != array[1:]]
    rep = np.diff(np.r_[np.nonzero(run)[0], count])
    out = (~run).cumsum()
    out -= np.repeat(out[run], rep)
    rev = np.empty(count, dtype=np.intp)
    rev[ind] = np.arange(count, dtype=np.intp)
    out = out[rev]
    return out

我发现对于 1000 个元素或更大的数组，这大约快 10 倍。

计算非唯一数组元素的顺序

Compute the order of non-unique array elements

评论

评论