使用稀疏矩阵的非均匀域移动平均

Non-uniform domain moving average using sparse matrices

提问人:sds 提问时间:10/26/2023 更新时间:10/26/2023 访问量:32

问:

具有非均匀域的移动平均线工作正常:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def convolve(s, f):
    """Compute the convolution of series S with a universal function F
    (see https://numpy.org/doc/stable/reference/ufuncs.html).
    This amounts to a moving average of S with weights F based on S.index."""
    index_v = s.index.values
    weight_mx = f(index_v - index_v[:, np.newaxis])
    weighted_sum = np.sum(s.values[:, np.newaxis] * weight_mx, axis=0)
    normalization = np.sum(weight_mx, axis=0)
    return pd.Series(weighted_sum/normalization, index=s.index)

size = 1000
df = pd.DataFrame({"x":np.random.normal(size=size)},
                  index=np.random.exponential(size=size)).sort_index()
def f(x):
    return np.exp(-x*x*30)
df["avg"] = convolve(df.x, f)
plt.scatter(df.index, df.avg, s=1, label="average")
plt.scatter(df.index, df.x, s=1, label="random")
plt.title("Moving Average for random data")
plt.legend()

Moving Average for random data

但是,这会分配一个数组:O(size^3)

MemoryError:无法为形状(14454、14454、14454)和数据类型为 float64 的数组分配 22.0 TiB

是否可以“稀疏化”函数?convolve

具体而言,对于相当窄的值范围,通常返回非 0。f

python numpy 稀疏矩阵 卷积 滚动计算

评论


答:

2赞 Paul Brodersen 10/26/2023 #1
weighted_sum = np.sum(s.values[:, np.newaxis] * weight_mx, axis=0)

相当于

weighted_sum = weight_mx @ s.values

这是在不创建大型中间数组的情况下计算的。

证据就在布丁里

enter image description here

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def convolve(s, f):
    """Compute the convolution of series S with a universal function F
    (see https://numpy.org/doc/stable/reference/ufuncs.html).
    This amounts to a moving average of S with weights F based on S.index."""
    index_v = s.index.values
    weight_mx = f(index_v - index_v[:, np.newaxis])
    weighted_sum = weight_mx @ s.values
    normalization = np.sum(weight_mx, axis=0)
    return pd.Series(weighted_sum/normalization, index=s.index)

size = 20_000 # 1000
df = pd.DataFrame({"x":np.random.normal(size=size)},
                  index=np.random.exponential(size=size)).sort_index()
def f(x):
    return np.exp(-x*x*30)
df["avg"] = convolve(df.x, f)
plt.scatter(df.index, df.avg, s=1, label="average")
plt.scatter(df.index, df.x, s=1, label="random")
plt.title("Moving Average for random data")
plt.legend()