提问人:IDK 提问时间:9/20/2023 最后编辑:IDK 更新时间:9/20/2023 访问量:40
选择满足条件并连续出现 10 秒的行
Select rows that meet a condition and occur for 10 consecutive seconds
问:
我在下面有一些时间序列数据:
Truck Timestamp b2_v_fult2_tms_err tm2_b_fault_level_a tm2_b_fault_code_a
0 251_EV01_PB_579_HD_756828 2023-05-01 13:11:48 NaN 2.0 30.0
1 251_EV01_PB_579_HD_756828 2023-05-01 13:11:51 1.0 0.0 0.0
2 251_EV01_PB_579_HD_756828 2023-05-01 13:11:52 1.0 0.0 0.0
3 251_EV01_PB_579_HD_756828 2023-05-01 13:11:53 1.0 0.0 0.0
4 251_EV01_PB_579_HD_756828 2023-05-01 13:11:54 1.0 0.0 0.0
5 251_EV01_PB_579_HD_756828 2023-05-01 13:11:55 1.0 0.0 0.0
6 251_EV01_PB_579_HD_756828 2023-05-01 13:11:56 1.0 0.0 0.0
7 251_EV01_PB_579_HD_756828 2023-05-01 13:11:57 1.0 0.0 0.0
8 251_EV01_PB_579_HD_756828 2023-05-01 13:11:58 1.0 0.0 0.0
9 251_EV01_PB_579_HD_756828 2023-05-01 13:11:59 1.0 0.0 0.0
10 251_EV01_PB_579_HD_756828 2023-05-01 13:12:00 1.0 0.0 0.0
11 251_EV01_PB_579_HD_756828 2023-05-01 14:58:05 1.0 0.0 0.0
12 251_EV01_PB_579_HD_756828 2023-05-01 14:58:10 0.0 0.0 0.0
13 251_EV01_PB_579_HD_756828 2023-05-01 14:58:11 0.0 0.0 0.0
14 251_EV01_PB_579_HD_756828 2023-05-01 14:58:15 0.0 0.0 0.0
15 251_EV01_PB_579_HD_756828 2023-05-01 14:58:16 0.0 0.0 0.0
16 251_EV01_PB_579_HD_756828 2023-05-01 14:58:20 0.0 0.0 0.0
17 251_EV01_PB_579_HD_756828 2023-05-01 14:58:21 0.0 0.0 0.0
18 251_EV01_PB_579_HD_756828 2023-05-01 14:58:26 0.0 0.0 0.0
19 251_EV01_PB_579_HD_756828 2023-05-01 14:58:27 0.0 0.0 0.0
20 251_EV01_PB_579_HD_756828 2023-05-01 14:58:31 0.0 0.0 0.0
我试图完成的是找到“b2_v_fult2_tms_err == 1.0”、“tm2_b_fault_level_a == 0.0”、“tm2_b_fault_code_a == 0.0”的行,创建一个名为“bin”的新列,并在三个值(“b2_v_fult2_tms_err == 1.0”, “tm2_b_fault_level_a == 0.0”、“tm2_b_fault_code_a == 0.0”) 连续出现 10 秒。“bin”列中的所有其他行都可以留空或标记为“False”。
这看起来像是:
Truck Timestamp b2_v_fult2_tms_err tm2_b_fault_level_a tm2_b_fault_code_a bin
0 251_EV01_PB_579_HD_756828 2023-05-01 13:11:48 NaN 2.0 30.0 False
1 251_EV01_PB_579_HD_756828 2023-05-01 13:11:51 1.0 0.0 0.0 True
2 251_EV01_PB_579_HD_756828 2023-05-01 13:11:52 1.0 0.0 0.0 True
3 251_EV01_PB_579_HD_756828 2023-05-01 13:11:53 1.0 0.0 0.0 True
4 251_EV01_PB_579_HD_756828 2023-05-01 13:11:54 1.0 0.0 0.0 True
5 251_EV01_PB_579_HD_756828 2023-05-01 13:11:55 1.0 0.0 0.0 True
6 251_EV01_PB_579_HD_756828 2023-05-01 13:11:56 1.0 0.0 0.0 True
7 251_EV01_PB_579_HD_756828 2023-05-01 13:11:57 1.0 0.0 0.0 True
8 251_EV01_PB_579_HD_756828 2023-05-01 13:11:58 1.0 0.0 0.0 True
9 251_EV01_PB_579_HD_756828 2023-05-01 13:11:59 1.0 0.0 0.0 True
10 251_EV01_PB_579_HD_756828 2023-05-01 13:12:00 1.0 0.0 0.0 True
11 251_EV01_PB_579_HD_756828 2023-05-01 14:58:05 1.0 0.0 0.0 False
12 251_EV01_PB_579_HD_756828 2023-05-01 14:58:10 0.0 0.0 0.0 False
13 251_EV01_PB_579_HD_756828 2023-05-01 14:58:11 0.0 0.0 0.0 False
14 251_EV01_PB_579_HD_756828 2023-05-01 14:58:15 0.0 0.0 0.0 False
15 251_EV01_PB_579_HD_756828 2023-05-01 14:58:16 0.0 0.0 0.0 False
16 251_EV01_PB_579_HD_756828 2023-05-01 14:58:20 0.0 0.0 0.0 False
17 251_EV01_PB_579_HD_756828 2023-05-01 14:58:21 0.0 0.0 0.0 False
18 251_EV01_PB_579_HD_756828 2023-05-01 14:58:26 0.0 0.0 0.0 False
19 251_EV01_PB_579_HD_756828 2023-05-01 14:58:27 0.0 0.0 0.0 False
20 251_EV01_PB_579_HD_756828 2023-05-01 14:58:31 0.0 0.0 0.0 False
我已经过滤了数据,然后尝试使用来计算时间戳之间的差异,这是成功的。但是我很难弄清楚如何将 10 次出现的 1 变成“真”值。在那之后,我曾经过滤结果以显示时间增量为一秒的行,然后用于显示之前的行,因为它会省略第一行。bin1_df = results_df.loc[(results_df['b2_v_fult2_tms_err'] == 1.0) & (results_df['tm2_b_fault_level_a'] == 0.0) & (results_df['tm2_b_fault_code_a'] == 0.0) ].reset_index(drop=True)
bin1_df['time_diff'] = bin1_df['Timestamp'].diff()
m = bin1_df['time_diff'] == pd.Timedelta(seconds=1)
bin1_df = bin1_df[m.shift(-1)|m]
我还尝试了其他一些无处可去的事情,但上述尝试是我能得到的最接近的。
有谁知道如何做到这一点?它不一定是这个结果,但可以清楚地传达这些条件满足 10 秒的东西是我最终想要的。
答:
IIUC,您可以做到:
m1 = df["b2_v_fult2_tms_err"] == 1.0
m2 = df["tm2_b_fault_level_a"] == 0.0
m3 = df["tm2_b_fault_code_a"] == 0.0
m = m1 & m2 & m3
df["bin"] = df.groupby((m != m.shift()).cumsum())["Timestamp"].transform(
lambda g: ([True] * 10 + [False] * (len(g) - 10))
if len(g) >= 10
else [False] * len(g)
)
print(df)
指纹:
Truck Timestamp b2_v_fult2_tms_err tm2_b_fault_level_a tm2_b_fault_code_a bin
0 251_EV01_PB_579_HD_756828 2023-05-01 13:11:48 NaN 2.0 30.0 False
1 251_EV01_PB_579_HD_756828 2023-05-01 13:11:51 1.0 0.0 0.0 True
2 251_EV01_PB_579_HD_756828 2023-05-01 13:11:52 1.0 0.0 0.0 True
3 251_EV01_PB_579_HD_756828 2023-05-01 13:11:53 1.0 0.0 0.0 True
4 251_EV01_PB_579_HD_756828 2023-05-01 13:11:54 1.0 0.0 0.0 True
5 251_EV01_PB_579_HD_756828 2023-05-01 13:11:55 1.0 0.0 0.0 True
6 251_EV01_PB_579_HD_756828 2023-05-01 13:11:56 1.0 0.0 0.0 True
7 251_EV01_PB_579_HD_756828 2023-05-01 13:11:57 1.0 0.0 0.0 True
8 251_EV01_PB_579_HD_756828 2023-05-01 13:11:58 1.0 0.0 0.0 True
9 251_EV01_PB_579_HD_756828 2023-05-01 13:11:59 1.0 0.0 0.0 True
10 251_EV01_PB_579_HD_756828 2023-05-01 13:12:00 1.0 0.0 0.0 True
11 251_EV01_PB_579_HD_756828 2023-05-01 14:58:05 1.0 0.0 0.0 False
12 251_EV01_PB_579_HD_756828 2023-05-01 14:58:10 0.0 0.0 0.0 False
13 251_EV01_PB_579_HD_756828 2023-05-01 14:58:11 0.0 0.0 0.0 False
14 251_EV01_PB_579_HD_756828 2023-05-01 14:58:15 0.0 0.0 0.0 False
15 251_EV01_PB_579_HD_756828 2023-05-01 14:58:16 0.0 0.0 0.0 False
16 251_EV01_PB_579_HD_756828 2023-05-01 14:58:20 0.0 0.0 0.0 False
17 251_EV01_PB_579_HD_756828 2023-05-01 14:58:21 0.0 0.0 0.0 False
18 251_EV01_PB_579_HD_756828 2023-05-01 14:58:26 0.0 0.0 0.0 False
19 251_EV01_PB_579_HD_756828 2023-05-01 14:58:27 0.0 0.0 0.0 False
20 251_EV01_PB_579_HD_756828 2023-05-01 14:58:31 0.0 0.0 0.0 False
这里的主要困难是找到 n 个连续出现的 1 秒差异。
首先在秒级上获得差异:
# create a mask based on your filter
df['mask'] = (df['b2_v_fult2_tms_err'] == 1) & (df['tm2_b_fault_code_a'] == 0) & (df['tm2_b_fault_level_a'] ==0)
# get sec difference in the Timestamp column
df['time_diff'] = df['Timestamp'].diff().dt.seconds
接下来,确定连续且低于阈值的时间步长
threshold = 9 # set your threshold for consecutive secs here
# this will identify consecutive occurences of same value (1 second in this example)
df['bin'] = df.time_diff.groupby((df.time_diff != df.time_diff.shift()).cumsum()).transform('size') * df.time_diff
df['bin'] = (df.bin >= threshold).astype(int)
df['bin'] = df['bin'] * df['mask'] # consecutive time and satisfy filter
# i've added helper columns for readability but they are not a necessity
cols_to_keep = ['Truck', 'Timestamp', 'b2_v_fult2_tms_err', 'tm2_b_fault_level_a', 'tm2_b_fault_code_a', 'bin']
df[cols_to_keep]
输出:
卡车 | 时间戳 | b2_v_fult2_tms_err | tm2_b_fault_level_a | tm2_b_fault_code_a | 站 |
---|---|---|---|---|---|
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:48 | 南 | 2 | 30 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:51 | 1 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:52 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:53 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:54 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:55 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:56 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:57 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:58 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:11:59 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 13:12:00 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:05 | 1 | 0 | 0 | 1 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:10 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:11 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:15 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:16 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:20 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:21 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:26 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:27 | 0 | 0 | 0 | 0 |
251_EV01_PB_579_HD_756828 | 2023-05-01 14:58:31 | 0 | 0 | 0 | 0 |
上一个:消除日期时间格式的小数
评论