Numpy 可变切片大小(可能为零)

Numpy variable slice size (possibly zero)

提问人:beyarkay 提问时间:5/7/2023 更新时间:5/15/2023 访问量:65

问:

假设我有一些时间序列数据:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
x = np.linspace(0, 10, num=100)
time_series = np.sin(x) + np.random.random(100)
plt.plot(x, time_series)

sin curve with some small amount of randomness at each time step

如果我想将时间序列“延迟”一些时间序列,我可以这样做:

delay = 10
x_delayed = x[delay:]
time_series_delayed = time_series[:-delay]

plt.plot(x, time_series, label='original')
plt.plot(x_delayed, time_series_delayed, label='delayed')
plt.legend()

Same as previous, but with another orange time series that is the original time series shifted to the right by 10 time steps

这一切都很好,但我想保持代码干净,同时仍然允许为零。就目前而言,我收到一个错误,因为切片只是计算出始终是空切片,而不是完整的数组。delaymy_arr[:-0]my_arr[:0]

>>> time_series[:-0]
array([], dtype=float64)

这意味着,如果我想对零延迟与原始数组相同的想法进行编码,则每次使用切片时都必须进行特殊情况处理。这很乏味且容易出错:

# Make 3 plots, for negative, zero, and positive delays
for delay in (0, 5, -5):

    if delay > 0:
        x_delayed = x[delay:]
        time_series_delayed = time_series[:-delay]

    elif delay < 0:
        # Negative delay is the complement of positive delay
        x_delayed = x[:delay]
        time_series_delayed = time_series[-delay:]

    else:
        # Zero delay just copies the array
        x_delayed = x[:]
        time_series_delayed = time_series[:]
    # Add the delayed time series to the plot
    plt.plot(
        x_delayed, 
        time_series_delayed, 
        label=f'delay={delay}',
        # change the alpha to make things less cluttered
        alpha=1 if delay == 0 else 0.3
    )
plt.legend()

Now there are 3 time series: the original, one which is shifted left by 5 time steps, and one which is shifted right by 5 time steps

我看过麻木的切片对象和np._s,但我似乎无法弄清楚。

有没有一种简洁/pythonic 的方法来编码零延迟是原始数组的想法?

python numpy 索引 切片

评论

2赞 slothrop 5/7/2023
my_arr[:-delay or len(my_arr)]有效,但不知道它有多整洁!
0赞 beyarkay 5/7/2023
+1 黑客!这很整洁,但不是很明显。(我猜你的意思是?你能把它作为答案发布出来吗,除非有更明确的东西出现,否则我会选择它?my_arr[:-delay or len(my_arr)]
1赞 slothrop 5/7/2023
完成(并修复了错别字,对不起!

答:

1赞 slothrop 5/7/2023 #1

我不知道这是否像人们希望的那样整洁,但你可以利用 Python 处理真实性和虚假性的方式,所以它等于 if 为 0,但如果是任何其他整数。i or xxiii

因此,您可以将条件的各个分支替换为:

time_series_delayed = time_series[:-delay or len(time_series)]

当为 0 时,其计算结果与自身相同。delaytime_series[:len(time_series)]time_series

作为快速演示:

time_series = list(range(10))

def f(i):
    return time_series[:-i or len(time_series)]

print(time_series)
for n in (2, 1, 0):
    print(f"{n}: {f(n)}")

指纹:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2: [0, 1, 2, 3, 4, 5, 6, 7]
1: [0, 1, 2, 3, 4, 5, 6, 7, 8]
0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

评论

0赞 beyarkay 5/7/2023
谢谢!正如评论中提到的,我会等一会儿,看看是否有一个花哨的麻木技巧来解决这个问题(当然我不能是唯一一个遇到这个问题的人?)但如果似乎没有更好的答案,我会接受你的答案。
0赞 beyarkay 5/8/2023
啊,对不起,我刚刚意识到这不适用于负面延迟,所以我不能将此作为答案。列表中的延迟 -2 应导致,而 +2 的延迟应导致[0,1,2,3,4,5,6][0,1,2,3,4][2,3,4,5,6]
1赞 slothrop 5/8/2023
啊,我明白了 - 我想不出一种方法来实现这一目标,而没有基于标志的声明和单独的行为。这里有一些食谱可以同时使用这两个符号 - stackoverflow.com/questions/30399534/... - 但是 (a) 通常它们在引擎盖下有一个声明,(b) 它们不是减少数组的长度,而是将 NaN 抛在后面。(当然,删除这些 NaN 很容易。ifif
0赞 beyarkay 5/15/2023 #2

我采用的解决方案使用的事实相当于:my_arr[2:]my_arr[2:None]

arr[(d if d > 0 else None):(d if d < 0 else None)]

更具可读性:

arr = [0, 1, 2, 3, 4, 5]
delay = 3

start_delay = delay if delay > 0 else None
finish_delay = delay if delay < 0 else None

delayed_arr = arr[start_delay:finish_delay]

用一个很好的方法包装起来,并用一些断言来证明它有效:

def delay_array(array, delay):
    """Delays the values in `array` by the amount `delay`.

    Regular slicing struggles with this since negative slicing (which goes from
    the end of the array) and positive slicing (going from the front of the
    array) meet at zero and don't play nicely.

    We use the fact that Python's slicing syntax treats `None` as though it
    didn't exist, so `arr[2:]` is equivalent to `arr[2:None]`.

    This can be used on numpy arrays, but also works on native python lists.
    """
    start_index = delay if delay > 0 else None
    finish_index = delay if delay < 0 else None
    return array[start_index:finish_index]

arr = [0, 1, 2, 3, 4, 5]
# Zero delay results in the same array
assert delay_array(arr,  0) == [0, 1, 2, 3, 4, 5]

# Delay greater/less than zero removes `delay` elements from the front/back
# of the array
assert delay_array(arr, +3) == [         3, 4, 5]
assert delay_array(arr, -3) == [0, 1, 2,        ]

# A delay longer than the array results in an empty array
assert delay_array(arr, +6) == []
assert delay_array(arr, -6) == []

总而言之:

def delay_array(array, delay):
    start_index = delay if delay > 0 else None
    finish_index = delay if delay < 0 else None
    return array[start_index:finish_index]

np.random.seed(42)
x = np.linspace(0, 10, num=100)
time_series = np.sin(x) + np.random.random(100)

for delay in (0, 5, -5):
    x_delayed = delay_array(x, delay)
    time_series_delayed = delay_array(time_series, -delay)
    plt.plot(
        x_delayed, 
        time_series_delayed, 
        label=f'delay={delay}',
        alpha=1 if delay == 0 else 0.3
    )
plt.legend()

enter image description here