选择满足条件并连续出现 10 秒的行

Select rows that meet a condition and occur for 10 consecutive seconds

提问人:IDK 提问时间:9/20/2023 最后编辑:IDK 更新时间:9/20/2023 访问量:40

问:

我在下面有一些时间序列数据:

    Truck                       Timestamp           b2_v_fult2_tms_err  tm2_b_fault_level_a tm2_b_fault_code_a
0   251_EV01_PB_579_HD_756828   2023-05-01 13:11:48 NaN                 2.0                 30.0                    
1   251_EV01_PB_579_HD_756828   2023-05-01 13:11:51 1.0                 0.0                 0.0                 
2   251_EV01_PB_579_HD_756828   2023-05-01 13:11:52 1.0                 0.0                 0.0                 
3   251_EV01_PB_579_HD_756828   2023-05-01 13:11:53 1.0                 0.0                 0.0                 
4   251_EV01_PB_579_HD_756828   2023-05-01 13:11:54 1.0                 0.0                 0.0                 
5   251_EV01_PB_579_HD_756828   2023-05-01 13:11:55 1.0                 0.0                 0.0                 
6   251_EV01_PB_579_HD_756828   2023-05-01 13:11:56 1.0                 0.0                 0.0                 
7   251_EV01_PB_579_HD_756828   2023-05-01 13:11:57 1.0                 0.0                 0.0                 
8   251_EV01_PB_579_HD_756828   2023-05-01 13:11:58 1.0                 0.0                 0.0                 
9   251_EV01_PB_579_HD_756828   2023-05-01 13:11:59 1.0                 0.0                 0.0                 
10  251_EV01_PB_579_HD_756828   2023-05-01 13:12:00 1.0                 0.0                 0.0                 
11  251_EV01_PB_579_HD_756828   2023-05-01 14:58:05 1.0                 0.0                 0.0                 
12  251_EV01_PB_579_HD_756828   2023-05-01 14:58:10 0.0                 0.0                 0.0                 
13  251_EV01_PB_579_HD_756828   2023-05-01 14:58:11 0.0                 0.0                 0.0                 
14  251_EV01_PB_579_HD_756828   2023-05-01 14:58:15 0.0                 0.0                 0.0                 
15  251_EV01_PB_579_HD_756828   2023-05-01 14:58:16 0.0                 0.0                 0.0                 
16  251_EV01_PB_579_HD_756828   2023-05-01 14:58:20 0.0                 0.0                 0.0                 
17  251_EV01_PB_579_HD_756828   2023-05-01 14:58:21 0.0                 0.0                 0.0                 
18  251_EV01_PB_579_HD_756828   2023-05-01 14:58:26 0.0                 0.0                 0.0                 
19  251_EV01_PB_579_HD_756828   2023-05-01 14:58:27 0.0                 0.0                 0.0                 
20  251_EV01_PB_579_HD_756828   2023-05-01 14:58:31 0.0                 0.0                 0.0 

我试图完成的是找到“b2_v_fult2_tms_err == 1.0”、“tm2_b_fault_level_a == 0.0”、“tm2_b_fault_code_a == 0.0”的行,创建一个名为“bin”的新列,并在三个值(“b2_v_fult2_tms_err == 1.0”, “tm2_b_fault_level_a == 0.0”、“tm2_b_fault_code_a == 0.0”) 连续出现 10 秒。“bin”列中的所有其他行都可以留空或标记为“False”。

这看起来像是:

    Truck                       Timestamp           b2_v_fult2_tms_err  tm2_b_fault_level_a tm2_b_fault_code_a  bin
0   251_EV01_PB_579_HD_756828   2023-05-01 13:11:48 NaN                 2.0                 30.0                False
1   251_EV01_PB_579_HD_756828   2023-05-01 13:11:51 1.0                 0.0                 0.0                 True
2   251_EV01_PB_579_HD_756828   2023-05-01 13:11:52 1.0                 0.0                 0.0                 True
3   251_EV01_PB_579_HD_756828   2023-05-01 13:11:53 1.0                 0.0                 0.0                 True
4   251_EV01_PB_579_HD_756828   2023-05-01 13:11:54 1.0                 0.0                 0.0                 True
5   251_EV01_PB_579_HD_756828   2023-05-01 13:11:55 1.0                 0.0                 0.0                 True
6   251_EV01_PB_579_HD_756828   2023-05-01 13:11:56 1.0                 0.0                 0.0                 True
7   251_EV01_PB_579_HD_756828   2023-05-01 13:11:57 1.0                 0.0                 0.0                 True
8   251_EV01_PB_579_HD_756828   2023-05-01 13:11:58 1.0                 0.0                 0.0                 True
9   251_EV01_PB_579_HD_756828   2023-05-01 13:11:59 1.0                 0.0                 0.0                 True
10  251_EV01_PB_579_HD_756828   2023-05-01 13:12:00 1.0                 0.0                 0.0                 True
11  251_EV01_PB_579_HD_756828   2023-05-01 14:58:05 1.0                 0.0                 0.0                 False
12  251_EV01_PB_579_HD_756828   2023-05-01 14:58:10 0.0                 0.0                 0.0                 False
13  251_EV01_PB_579_HD_756828   2023-05-01 14:58:11 0.0                 0.0                 0.0                 False
14  251_EV01_PB_579_HD_756828   2023-05-01 14:58:15 0.0                 0.0                 0.0                 False
15  251_EV01_PB_579_HD_756828   2023-05-01 14:58:16 0.0                 0.0                 0.0                 False
16  251_EV01_PB_579_HD_756828   2023-05-01 14:58:20 0.0                 0.0                 0.0                 False
17  251_EV01_PB_579_HD_756828   2023-05-01 14:58:21 0.0                 0.0                 0.0                 False
18  251_EV01_PB_579_HD_756828   2023-05-01 14:58:26 0.0                 0.0                 0.0                 False
19  251_EV01_PB_579_HD_756828   2023-05-01 14:58:27 0.0                 0.0                 0.0                 False
20  251_EV01_PB_579_HD_756828   2023-05-01 14:58:31 0.0                 0.0                 0.0                 False

我已经过滤了数据,然后尝试使用来计算时间戳之间的差异,这是成功的。但是我很难弄清楚如何将 10 次出现的 1 变成“真”值。在那之后,我曾经过滤结果以显示时间增量为一秒的行,然后用于显示之前的行,因为它会省略第一行。bin1_df = results_df.loc[(results_df['b2_v_fult2_tms_err'] == 1.0) & (results_df['tm2_b_fault_level_a'] == 0.0) & (results_df['tm2_b_fault_code_a'] == 0.0) ].reset_index(drop=True)bin1_df['time_diff'] = bin1_df['Timestamp'].diff()m = bin1_df['time_diff'] == pd.Timedelta(seconds=1)bin1_df = bin1_df[m.shift(-1)|m]

我还尝试了其他一些无处可去的事情,但上述尝试是我能得到的最接近的。

有谁知道如何做到这一点?它不一定是这个结果,但可以清楚地传达这些条件满足 10 秒的东西是我最终想要的。

python-3.x pandas dataframe 日期时间

评论

0赞 Suraj Shourie 9/20/2023
你能把你的例子做到最小吗?删除不需要的列并清理格式,数据当前不可读
0赞 IDK 9/20/2023
@SurajShourie我已经清理了格式以使其更具可读性,感谢您的反馈。

答:

0赞 Andrej Kesely 9/20/2023 #1

IIUC,您可以做到:

m1 = df["b2_v_fult2_tms_err"] == 1.0
m2 = df["tm2_b_fault_level_a"] == 0.0
m3 = df["tm2_b_fault_code_a"] == 0.0

m = m1 & m2 & m3

df["bin"] = df.groupby((m != m.shift()).cumsum())["Timestamp"].transform(
    lambda g: ([True] * 10 + [False] * (len(g) - 10))
    if len(g) >= 10
    else [False] * len(g)
)

print(df)

指纹:

                        Truck            Timestamp  b2_v_fult2_tms_err  tm2_b_fault_level_a  tm2_b_fault_code_a    bin
0   251_EV01_PB_579_HD_756828  2023-05-01 13:11:48                 NaN                  2.0                30.0  False
1   251_EV01_PB_579_HD_756828  2023-05-01 13:11:51                 1.0                  0.0                 0.0   True
2   251_EV01_PB_579_HD_756828  2023-05-01 13:11:52                 1.0                  0.0                 0.0   True
3   251_EV01_PB_579_HD_756828  2023-05-01 13:11:53                 1.0                  0.0                 0.0   True
4   251_EV01_PB_579_HD_756828  2023-05-01 13:11:54                 1.0                  0.0                 0.0   True
5   251_EV01_PB_579_HD_756828  2023-05-01 13:11:55                 1.0                  0.0                 0.0   True
6   251_EV01_PB_579_HD_756828  2023-05-01 13:11:56                 1.0                  0.0                 0.0   True
7   251_EV01_PB_579_HD_756828  2023-05-01 13:11:57                 1.0                  0.0                 0.0   True
8   251_EV01_PB_579_HD_756828  2023-05-01 13:11:58                 1.0                  0.0                 0.0   True
9   251_EV01_PB_579_HD_756828  2023-05-01 13:11:59                 1.0                  0.0                 0.0   True
10  251_EV01_PB_579_HD_756828  2023-05-01 13:12:00                 1.0                  0.0                 0.0   True
11  251_EV01_PB_579_HD_756828  2023-05-01 14:58:05                 1.0                  0.0                 0.0  False
12  251_EV01_PB_579_HD_756828  2023-05-01 14:58:10                 0.0                  0.0                 0.0  False
13  251_EV01_PB_579_HD_756828  2023-05-01 14:58:11                 0.0                  0.0                 0.0  False
14  251_EV01_PB_579_HD_756828  2023-05-01 14:58:15                 0.0                  0.0                 0.0  False
15  251_EV01_PB_579_HD_756828  2023-05-01 14:58:16                 0.0                  0.0                 0.0  False
16  251_EV01_PB_579_HD_756828  2023-05-01 14:58:20                 0.0                  0.0                 0.0  False
17  251_EV01_PB_579_HD_756828  2023-05-01 14:58:21                 0.0                  0.0                 0.0  False
18  251_EV01_PB_579_HD_756828  2023-05-01 14:58:26                 0.0                  0.0                 0.0  False
19  251_EV01_PB_579_HD_756828  2023-05-01 14:58:27                 0.0                  0.0                 0.0  False
20  251_EV01_PB_579_HD_756828  2023-05-01 14:58:31                 0.0                  0.0                 0.0  False
0赞 Suraj Shourie 9/20/2023 #2

这里的主要困难是找到 n 个连续出现的 1 秒差异。

首先在秒级上获得差异:

# create a mask based on your filter
df['mask'] = (df['b2_v_fult2_tms_err'] == 1) & (df['tm2_b_fault_code_a']  == 0) & (df['tm2_b_fault_level_a'] ==0)

# get sec difference in the Timestamp column
df['time_diff'] = df['Timestamp'].diff().dt.seconds

接下来,确定连续且低于阈值的时间步长

threshold = 9 # set your threshold for consecutive secs here
# this will identify consecutive occurences of same value (1 second in this example)
df['bin'] = df.time_diff.groupby((df.time_diff != df.time_diff.shift()).cumsum()).transform('size') * df.time_diff
df['bin'] = (df.bin >= threshold).astype(int)
df['bin'] = df['bin'] * df['mask'] # consecutive time and satisfy filter

# i've added helper columns for readability but they are not a necessity
cols_to_keep = ['Truck', 'Timestamp', 'b2_v_fult2_tms_err', 'tm2_b_fault_level_a', 'tm2_b_fault_code_a', 'bin']

df[cols_to_keep]

输出:

卡车 时间戳 b2_v_fult2_tms_err tm2_b_fault_level_a tm2_b_fault_code_a
251_EV01_PB_579_HD_756828 2023-05-01 13:11:48 2 30 0
251_EV01_PB_579_HD_756828 2023-05-01 13:11:51 1 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 13:11:52 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:53 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:54 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:55 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:56 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:57 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:58 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:11:59 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 13:12:00 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 14:58:05 1 0 0 1
251_EV01_PB_579_HD_756828 2023-05-01 14:58:10 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:11 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:15 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:16 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:20 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:21 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:26 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:27 0 0 0 0
251_EV01_PB_579_HD_756828 2023-05-01 14:58:31 0 0 0 0