提问人:earnric 提问时间:10/25/2023 更新时间:10/25/2023 访问量:32
pandas -- 计算两个不同数据帧中日期时间之间的最小差异
pandas -- compute min difference between datetimes in two different dataframes
问:
我有 2 个由时间组成的数据框。我想找到 ALL df1 times 和 EACH df2 之间的最短时间,当为正时。例如:df2['Start Time'] - df1['Stop Time'] = dt
DF1型
Stop Time Site 2023-10-17 20:10:00.310 P2 2023-10-17 21:20:00.440 P1 2023-10-17 23:30:00.200 P2 2023-10-18 00:00:00.190 P1 2023-10-18 01:00:00.130 P1 2023-10-18 02:00:00.500 P2 2023-10-18 03:00:00.480 P1 2023-10-18 04:00:00.020 P2 2023-10-18 05:00:00.000 P1 2023-10-18 06:00:00.580 P2
DF2型
Start Time Site 2023-10-17 16:00:00.190 SMR 2023-10-17 17:05:00.050 SMR 2023-10-17 19:10:00.550 SMR 2023-10-17 21:40:00.530 SMR 2023-10-17 22:21:00.180 SMR 2023-10-18 05:21:00.090 SMR 2023-10-18 09:15:00.360 SMR 2023-10-18 11:54:00.160 SMR
因此,对于此数据集,第一个正差异是 for 和 。我想在新数据帧中保留的最小值介于站点名称 P1 之间。因此,第一个条目将是:
df_bestdf2: 2023-10-17 21:40:00.530
df1: 2023-10-17 20:10:00.310 AND 2023-10-17 21:20:00.440
df_best
2023-10-17 21:40:00.530 - 2023-10-17 21:20:00.440 = 20 min
diff_min Site 5 P1
最后一个 d2 条目 2023-10-18 11:54:00.160 与 d1 中的最后一个条目有一个最小值......约5小时54分钟。
我可以用几个 for 循环来做到这一点,但我敢打赌有一种很酷的 pandas 方法可以快速做到这一点。
感谢
答:
3赞
mozway
10/25/2023
#1
您不需要找到所有匹配项,只需找到所需方向上最接近的匹配项。
为此,请使用merge_asof
:
df1['Stop Time'] = pd.to_datetime(df1['Stop Time'])
df2['Start Time'] = pd.to_datetime(df2['Start Time'])
out = (pd
.merge_asof(df2.sort_values(by='Start Time')
.reset_index(),
df1.sort_values(by='Stop Time'),
left_on='Start Time', right_on='Stop Time',
suffixes=(None, '_df1')
)
.set_index('index').reindex(df2.index)
.assign(diff_min=lambda d: d['Start Time'].sub(d['Stop Time'])
.dt.total_seconds().div(60))
)
print(out)
输出:
Start Time Site Stop Time Site_df1 diff_min
0 2023-10-17 16:00:00.190 SMR NaT NaN NaN
1 2023-10-17 17:05:00.050 SMR NaT NaN NaN
2 2023-10-17 19:10:00.550 SMR NaT NaN NaN
3 2023-10-17 21:40:00.530 SMR 2023-10-17 21:20:00.440 P1 20.001500
4 2023-10-17 22:21:00.180 SMR 2023-10-17 21:20:00.440 P1 60.995667
5 2023-10-18 05:21:00.090 SMR 2023-10-18 05:00:00.000 P1 21.001500
6 2023-10-18 09:15:00.360 SMR 2023-10-18 06:00:00.580 P2 194.996333
7 2023-10-18 11:54:00.160 SMR 2023-10-18 06:00:00.580 P2 353.993000
如果您只对具有匹配项的行感兴趣,则可以进一步。dropna
df2
评论
diff_min