Python pandas 两张表匹配查找最新日期-解网

问：

我想在 Excel 中的 Vlookup 等熊猫中进行一些匹配。根据表1中的一些条件，找到表2中的最新日期：

表1：

Name  Threshold1   Threshold2
A     9            8
B     14           13

表2：

Date   Name   Value   
1/1    A      10
1/2    A      9
1/3    A      9
1/4    A      8
1/5    A      8
1/1    B      15
1/2    B      14
1/3    B      14
1/4    B      13
1/5    B      13

所需的表格如下所示：

Name  Threshold1   Threshold1_Date   Threshold2   Threshold2_Date
A     9            1/3               8            1/5
B     14           1/3               13           1/5

提前致谢！

Python Pandas DataFrame 匹配查找

(df_out := df1.melt('Name', value_name='Value')\
   .merge(df2, on=['Name', 'Value'])\
   .sort_values('Date')\
   .drop_duplicates(['Name', 'variable'], keep='last')\
   .set_index(['Name', 'variable'])\
   .unstack().sort_index(level=1, axis=1))\
.set_axis(df_out.columns.map('_'.join), axis=1).reset_index()

输出：

  Name Date_Threshold1  Value_Threshold1 Date_Threshold2  Value_Threshold2
0    A             1/3                 9             1/5                 8
1    B             1/3                14             1/5                13

3赞 Shubham Sharma 3/8/2023 #2

法典

# assuming dataframe is already sorted on `date`
# drop the duplicates per Name and Value keeping the max date
cols = ['Name', 'Value']
s = df2.drop_duplicates(cols, keep='last').set_index(cols)['Date']

# for each threshold column use multindex.map to substitute 
# values from df2 based on matching Name and Threshold value
for c in df1.filter(like='Threshold'):
    df1[c + '_date'] = df1.set_index(['Name', c]).index.map(s)

结果

  Name  Threshold1  Threshold2 Threshold1_date Threshold2_date
0    A           9           8             1/3             1/5
1    B          14          13             1/3             1/5

latestDtByNameVal = df2.groupby(['Name','Value']).last()
resCols = (y for x in df1.columns if x != 'Name' for y in [x, f'{x}_Date'])
res = df1.assign(**( df1
    .set_index('Name')
    .pipe(lambda d:
        {f'{col}_Date': d[[col]]
            .rename(columns={col:'Value'})
            .set_index('Value', append=True)
            .pipe(lambda d2: latestDtByNameVal.Date[d2.index].to_numpy()) 
        for col in d.columns}) ))[resCols]

解释：

用于获取，这是 df2 中按唯一对索引的最新日期groupby().last()latestDtByNameValName, Value
在生成器中准备结果列顺序，如问题中所示resColsThreshold1, Threshold1_Date, ...
要增加 df1 的列以包含阈值日期结果作为标签以结尾的列，请将字典映射传递到相应对索引处的行值_Dateassign()<threshColName>_DateDatelatestDtByNameValName, Value
使用按所需顺序排列列。resCols

输出：

  Name  Threshold1 Threshold1_Date  Threshold2 Threshold2_Date
0    A           9             1/3           8             1/5
1    B          14             1/3          13             1/5

上一个：如何从另一个数据帧中获取适当的类别 - 一对多匹配

下一个：匹配来自另一个数据帧的唯一组 - 一对多匹配

Python pandas 两张表匹配查找最新日期

Python pandas two table match to find latest date

评论

法典

结果

评论