提问人:Leon 提问时间:3/18/2023 最后编辑:halferLeon 更新时间:7/3/2023 访问量:50
使用 MultiIndex 分配给 Pandas DataFrame 的整列?
Assign to a whole column of a Pandas DataFrame with MultiIndex?
问:
我有一个带有 MultiIndex 的 DataFrame(调用),我想将另一个具有单级索引的 DataFrame(调用) 的整列中的值分配给 。midx_df
sour_df
midx_df
的所有索引值都存在于 的顶级索引中,我需要指定 1 级索引来添加/修改具有相同 1 级索引的行的所有值。sour_df
midx_df
例如:
beg_min = pd.to_datetime('2023/03/18 18:50', yearfirst=True)
end_min = pd.to_datetime('2023/03/18 18:53', yearfirst=True)
minutes = pd.date_range(start=beg_min, end=end_min, freq='1min')
actions = ['Buy', 'Sell']
m_index = pd.MultiIndex.from_product([minutes, actions], names=['time', 'action'])
sour_df = pd.DataFrame(index=minutes, columns=['price'])
sour_df.index.rename('time', inplace=True)
sour_df.loc[minutes[0], 'price'] = 'b0'
sour_df.loc[minutes[1], 'price'] = 'b1'
sour_df.loc[minutes[3], 'price'] = 'b2'
midx_df = pd.DataFrame(index=m_index, columns=['price'])
print(midx_df)
midx_df.loc[(beg_min, 'Buy'), 'price'] = 123 # works but only for one row!
midx_df.loc[(end_min, 'Buy')]['price'] = 124 # doesn't work!
print(midx_df)
midx_df.loc[(slice(None), 'Buy'), 'price'] = sour_df # doesn't work!
print(midx_df)
midx_df.loc[(slice(None), 'Buy'), 'price'] = sour_df['price'] # doesn't work!
print(midx_df)
#midx_df.loc[(slice(None), 'Buy')]['price'] = sour_df['price'] # doesn't work!
#print(midx_df)
midx_df.loc[pd.IndexSlice[:, 'Buy'], :] = sour_df # doesn't work!
print(midx_df)
正确的方法是什么?
答:
2赞
Corralien
3/18/2023
#1
这是一个有趣的问题。这里的问题是你的索引没有对齐:仅这样熊猫就无法设置正确的值。('time', 'action')
'time'
您必须重用 的索引才能重新索引。 CAN BU 用于完成此任务:midx_df
sour_df
pd.concat
midx_df.loc[(slice(None), 'Buy'), 'price'] = \
pd.concat([sour_df], keys=['Buy'], names=['action']).swaplevel()
print(midx_df)
# Output
price
time action
2023-03-18 18:50:00 Buy b0
Sell NaN
2023-03-18 18:51:00 Buy b1
Sell NaN
2023-03-18 18:52:00 Buy NaN
Sell NaN
2023-03-18 18:53:00 Buy b2
Sell NaN
或使用:pd.MultiIndex.from_product
midx_df.loc[(slice(None), 'Buy'), 'price'] = \
sour_df.set_index(pd.MultiIndex.from_product([sour_df.index, ['Buy']]))
详:
>>> midx_df.loc[(slice(None), 'Buy'), 'price']
time action
2023-03-18 18:50:00 Buy NaN
2023-03-18 18:51:00 Buy NaN
2023-03-18 18:52:00 Buy NaN
2023-03-18 18:53:00 Buy NaN
Name: price, dtype: object
>>> pd.concat([sour_df], keys=['Buy'], names=['action']).swaplevel()
price
time action
2023-03-18 18:50:00 Buy b0
2023-03-18 18:51:00 Buy b1
2023-03-18 18:52:00 Buy NaN
2023-03-18 18:53:00 Buy b2
>>> sour_df.set_index(pd.MultiIndex.from_product([sour_df.index, ['Buy']]))
price
time
2023-03-18 18:50:00 Buy b0
2023-03-18 18:51:00 Buy b1
2023-03-18 18:52:00 Buy NaN
2023-03-18 18:53:00 Buy b2
现在,索引已与设定值很好地对齐。
评论