提问人:doraemon 提问时间:11/7/2023 最后编辑:doraemon 更新时间:11/7/2023 访问量:69
有没有办法合并两个具有两个相似键列但不是相同键列的数据集?
Is there anyway to merge two datasets that with two similar key column but not the same key column?
问:
我有两个数据集,如下所示:
一个:
吃午饭 | 吃晚饭 |
---|---|
设置一个 | 套装 1 |
B组 | 套装 2 |
... | ... |
b:
变量 | variable_value | 价值 |
---|---|---|
吃午饭 | 设置一个 | 0.2 |
吃午饭 | B组 | 0.5 |
吃晚饭 | 套装 1 | 1.5 |
吃晚饭 | 套装 2 | 1.7 |
... | ... | ... |
我的预期成果:
c:
吃午饭 | 有午餐价值 | 吃晚饭 | 有晚餐价值 |
---|---|---|---|
设置一个 | 0.2 | 套装 1 | 1.5 |
B组 | 0.5 | 套装 2 | 1.7 |
... | ... | ... | ... |
有没有办法在 python/r 中合并这两个数据集?
我能想到R中的一个解决方案:
library(tidyverse)
library(magrittr)
b %<>% pivot_wider(names_from = variable, values_from = value)
left_join(a,b)
答:
2赞
mozway
11/7/2023
#1
out = (a
.melt(ignore_index=False, value_name='variable_value')
.reset_index()
.merge(b, how='left')
.pivot(index='index', columns='variable')
)
输出:
variable_value value
variable have dinner have lunch have dinner have lunch
index
0 set 1 set a 1.5 0.2
1 set 2 set b 1.7 0.5
若要使格式更接近现有格式,请进一步重新处理列名:
out = (a.melt(ignore_index=False, value_name='variable_value')
.reset_index()
.merge(b, how='left')
.pivot(index='index', columns='variable')
.sort_index(level=1, axis=1)
.rename_axis(index=None)
)
out.columns = out.columns.map(lambda x: f'{x[1]}{" value" if x[0] == "variable_value" else ""}')
输出:
have dinner have dinner value have lunch have lunch value
0 1.5 set 1 0.2 set a
1 1.7 set 2 0.5 set b
另类
或者,使用循环和 concat
:
out = pd.concat([a[[col]].merge(b.loc[b['variable'].eq(col),
['variable_value', 'value']]
.rename(columns={'variable_value': col,
'value': f'{col} value'
}),
how='left'
)
for col in a], axis=1)
输出:
have lunch have lunch value have dinner have dinner value
0 set a 0.2 set 1 1.5
1 set b 0.5 set 2 1.7
评论
0赞
doraemon
11/7/2023
“精确格式”是什么意思?P.S. 我没有对你的回答投反对票。
1赞
mozway
11/7/2023
我的意思是像我第一步一样用/代替 MultiIndex。xx
xx value
1赞
Kirill Kondratenko
11/7/2023
#2
尝试一下:
lunch = b[b['variable']=='have lunch'][['variable_value', 'value']]
dinner = b[b['variable']=='have dinner'][['variable_value', 'value']]
c = a.merge(lunch ,left_on='have lunch', right_on='variable_value', how='left').merge(
dinner,left_on='have dinner', right_on='variable_value', how='left')[
['have lunch', 'variable_value_x', 'have dinner', 'variable_value_y']].set_axis(
['have lunch', 'have lunch value', 'have dinner', 'have dinner value'], axis=1)
print(c)
输出:
have lunch have lunch value have dinner have dinnervalue
0 set a set a set 1 set 1
1 set b set b set2 set2
评论
0赞
doraemon
11/7/2023
感谢您的努力,但我有大约 50+ 个类别,所以你的方法可能太麻烦了......variables
0赞
Kirill Kondratenko
11/7/2023
感谢您的反馈,在撰写未来的回复时,我会牢记这一点。
评论