如何计算 pandas/numpy 中仅定义值的滚动函数?[已结束]

How to calculate a rolling function of only defined values in pandas/numpy? [closed]

提问人:David Boshton 提问时间:11/16/2023 最后编辑:David Boshton 更新时间:11/23/2023 访问量:64

问:


编辑问题以包括所需的行为、特定问题或错误以及重现问题所需的最短代码。这将有助于其他人回答这个问题。

7天前关闭。

这篇文章已于昨天编辑并提交审核。

我发现使用 Pandas 应用滚动时如何忽略 NaN

但这无济于事。

  • 我有数组,其中每个行项目都是特定列的特定时间索引的特定值。
  • 每行可能有多个非 NaN 条目,但肯定不会很多。
  • 我想计算一个函数,该函数根据定义的值相对于索引计算梯度。

无法弄清楚如何用 numpy 或 pandas 做到这一点。

建议很有帮助。pandas.drop_na,skip_na也无济于事。

输出模板

fa = np.random.randn(10,4)
mask = np.zeros(40, dtype=bool)
mask[:15] = True
np.random.shuffle(mask)
mask = mask.reshape(10,4)
fa[mask] = np.nan
fa
Out[40]:
array([[        nan, -0.57681061,         nan,  0.23047461],
       [ 0.26260072, -0.62024175,  0.35678478,         nan],
       [-0.5781359 , -0.17364336,         nan,         nan],
       [-0.58982883,         nan,  0.07114217,  1.03781762],
       [-0.03906354, -0.49546887,         nan,         nan],
       [-0.3988263 ,  0.21794358,         nan, -0.04167338],
       [ 0.35731643, -0.80956629, -0.29624602,  2.59351753],
       [-0.02804324,         nan,         nan,         nan],
       [        nan,  0.75344618, -0.52145898,         nan],
       [-0.45565981,  0.26946552,         nan,  1.64095417]])
dx = pd.date_range("2023-01-01", periods=10, freq="S")
df = pd.DataFrame(fa, index=idx)
## Apply function 
df.rolling(3).apply(lambda s: s.sum())
Out[52]:
                            0         1   2   3
2018-01-01 00:00:00       NaN       NaN NaN NaN
2018-01-01 00:00:01       NaN       NaN NaN NaN
2018-01-01 00:00:02       NaN -1.370696 NaN NaN
2018-01-01 00:00:03 -0.905364       NaN NaN NaN
2018-01-01 00:00:04 -1.207028       NaN NaN NaN
2018-01-01 00:00:05 -1.027719       NaN NaN NaN
2018-01-01 00:00:06 -0.080573 -1.087092 NaN NaN
2018-01-01 00:00:07 -0.069553       NaN NaN NaN
2018-01-01 00:00:08 -0.126387       NaN NaN NaN
2018-01-01 00:00:09 -0.126387       NaN NaN NaN

## What would be good is:
2018-01-01 00:00:00       NaN       NaN       NaN       NaN
2018-01-01 00:00:01       NaN       NaN       NaN       NaN
2018-01-01 00:00:02       NaN -1.370696       NaN       NaN
2018-01-01 00:00:03 -0.589829 -1.289354       NaN       NaN
2018-01-01 00:00:04 -0.039064 -0.451169       NaN       NaN
2018-01-01 00:00:05 -0.398826 -1.087092       NaN2 1.226619
2018-01-01 00:00:06  0.357316       NaN  0.131681  3.589662
2018-01-01 00:00:07 -0.028043       NaN       NaN       NaN
2018-01-01 00:00:08       NaN  0.161823 -0.746563       NaN
2018-01-01 00:00:09 -0.455660 0.2133456       NaN  4.192798

最后一行是通过做

df[n].dropna().rolling(3).apply(lambda s: s.sum())

在每一列上,然后手工填写。

现在我要运行的实际函数也使用时间索引作为输入,所以它比这更复杂一些(否则很容易 - 只需将所有 's 换成 's,我们就完成了)。nan0

蟒蛇 熊猫 numpy

评论

4赞 roganjosh 11/16/2023
为什么你不能至少为人们设置一个输入和一个预期的输出?

答: 暂无答案