合并到字符串第一部分的 DataFrames-解网

问：

我有两个数据帧

df1 = pd.DataFrame({'id':['XYZ', 'ABC1', 'CDS'], 'col1':[1,2,3]})
df2 = pd.DataFrame({'id':['XYZ1', 'XYZ2', 'ABC1', 'ABC11', 'CDSS', 'CDS', 'ABC2', 'ABC', 'XYA'], 
                    'col2':[1,2,3,4,5,6,7,8,9]})

    id   col1
0   XYZ     1
1   ABC1    2
2   CDS     3

和

    id   col2
0   XYZ1    1
1   XYZ2    2
2   ABC1    3
3   ABC11   4
4   CDSS    5
5   CDS     6
6   ABC2    7
7   ABC     8
8   XYA     9

我想在 df1 的完整 id 和与之匹配的 df2 的第一个字符上将 df1 合并到 df2，这样我就得到了这个数据帧

      id col2  col1
0   XYZ1    1   1.0
1   XYZ2    2   1.0
2   ABC1    3   2.0
3   ABC11   4   2.0
4   CDSS    5   3.0
5   CDS     6   3.0
6   ABC2    7   NaN
7   ABC     8   NaN
8   XYA     9   NaN

我该怎么做？

python pandas 数据帧合并

ids1 = df1.id.to_list()
def id_subset(id, ids1):
  for s in ids1:
    if s in id:
      return s
  return False
# add a new substring id column
df2['id2'] = df2['id'].apply(lambda x: id_subset(x, ids1) )

# merge and clean
df_out = df2.merge(df1, left_on = 'id2', right_on = 'id', how='left')
df_out.rename(columns={'id_x':'id'}).drop(columns=['id_y', 'id2'])

输出：

      id  col2  col1
0   XYZ1     1   1.0
1   XYZ2     2   1.0
2   ABC1     3   2.0
3  ABC11     4   2.0
4   CDSS     5   3.0
5    CDS     6   3.0
6   ABC2     7   NaN
7    ABC     8   NaN
8    XYA     9   NaN

1赞 RomanPerekhrest 11/15/2023 #2

应用为 pd 中使用的映射。系列地图：df1

id_map = df1.set_index('id')['col1'].to_dict()
new_df = (df2.assign(col1=df2['id'].map(lambda x:
                                        next((v for k, v in id_map.items()
                                              if x.startswith(k)), None))))
print(new_df)

      id  col2  col1
0   XYZ1     1   1.0
1   XYZ2     2   1.0
2   ABC1     3   2.0
3  ABC11     4   2.0
4   CDSS     5   3.0
5    CDS     6   3.0
6   ABC2     7   NaN
7    ABC     8   NaN
8    XYA     9   NaN

1赞 Shubham Sharma 11/15/2023 #3

从以 ids from 开头的 id 中提取密钥，然后使用这些密钥执行操作df2df1merge

df2['key'] = df2['id'].str.extract(r'^(%s)' % '|'.join(df1['id']))
result = df2.merge(df1.rename(columns={'id': 'key'}), on='key', how='left')

      id  col2   key  col1
0   XYZ1     1   XYZ   1.0
1   XYZ2     2   XYZ   1.0
2   ABC1     3  ABC1   2.0
3  ABC11     4  ABC1   2.0
4   CDSS     5   CDS   3.0
5    CDS     6   CDS   3.0
6   ABC2     7   NaN   NaN
7    ABC     8   NaN   NaN
8    XYA     9   NaN   NaN

上一个：根据 FULL JOIN 结果的值合并不同表中的列

下一个：Stata 在 master 和 using 中都与重复项合并

合并到字符串第一部分的 DataFrames

merge to dataframes on first part of a string

评论