提问人:Alexander Zorin 提问时间:10/20/2023 更新时间:10/20/2023 访问量:38
如何更改 pandas 数据透视表中列多索引的分组?
How to change the grouping of a column multi-index in a pandas pivot table?
问:
假设我有一个这样的数据帧:
data = {'City': ['Rochester', 'Anaheim', 'Toledo', 'Rochester', 'Anaheim', 'Anaheim', 'Toledo', 'Rochester', 'Rochester', 'Rochester', 'Toledo', 'Toledo', 'Toledo', 'Anaheim'],
'PersonID': [4930, 7343, 4368, 6909, 4574, 4086, 5024, 3642, 9997, 4745, 1207, 6081, 7832, 6309],
'MoneySpent': [100, 1710, 20, 910, 2040, 1100, 490, 70, 1940, 100, 1240, 80, 1420, 2090],
'StayDuration': ['< 2 days', '2-7 days', '2-7 days', '7-30 days', '7-30 days', '< 2 days', '2-7 days', '7-30 days', '7-30 days', '2-7 days', '7-30 days', '< 2 days', '< 2 days', '7-30 days']
}
df = pd.DataFrame(data)
City PersonID MoneySpent StayDuration
0 Rochester 4930 100 < 2 days
1 Anaheim 7343 1710 2-7 days
2 Toledo 4368 20 2-7 days
3 Rochester 6909 910 7-30 days
4 Anaheim 4574 2040 7-30 days
5 Anaheim 4086 1100 < 2 days
6 Toledo 5024 490 2-7 days
7 Rochester 3642 70 7-30 days
8 Rochester 9997 1940 7-30 days
9 Rochester 4745 100 2-7 days
10 Toledo 1207 1240 7-30 days
11 Toledo 6081 80 < 2 days
12 Toledo 7832 1420 < 2 days
13 Anaheim 6309 2090 7-30 days
然后,我正在构建一个数据透视表,以显示每个城市的停留时间的人数及其总支出:
pv = pd.pivot_table(df,
index='City',
columns='StayDuration',
values=['PersonID', 'MoneySpent'],
aggfunc={'PersonID': 'count', 'MoneySpent': 'sum'}
)
我看到的是第一级的指标(员工人数或费用),然后是其中的类别:
MoneySpent PersonID
StayDuration 2-7 days 7-30 days < 2 days 2-7 days 7-30 days < 2 days
City
Anaheim 1710 4130 1100 1 2 1
Rochester 100 2920 100 1 3 1
Toledo 510 1240 1500 2 1 2
我想要的是首先有类别,并在其中有指标,如下所示:
2-7 days 7-30 days < 2 days
PersonID MoneySpent PersonID MoneySpent PersonID MoneySpent
Anaheim 1 1710 2 4130 1 1100
Rochester 1 100 3 2920 1 100
Toledo 2 510 1 1240 2 1500
顺便说一句,这是 Excel 数据透视表的默认视图。
我花了很长时间才弄清楚如何让 Python 产生相同的结果。是否可以更改列的分组顺序?
答:
0赞
russhoppa
10/20/2023
#1
解决此问题的一种方法是将 columns 属性中的值反转为新的 MultiIndex:
new_multiindex = [(stay_dur,mon_spent) for stay_dur in df.StayDuration.unique() for mon_spent in ['MoneySpent', 'PersonID']]
pv.columns = pd.MultiIndex.from_tuples(new_multiindex, names=('StayDuration', None))
pv
>>>
0赞
Suraj Shourie
10/20/2023
#2
据我所知,pandas pivot 将始终以这种方式对列进行排序。您将需要一些操作才能获得所需的输出:
pv.swaplevel(0,1,axis=1).sort_index(axis=1).reindex(['PersonID', 'MoneySpent'], level=1, axis=1)
输出:
StayDuration 2-7 days 7-30 days < 2 days
PersonID MoneySpent PersonID MoneySpent PersonID MoneySpent
City
Anaheim 1 1710 2 4130 1 1100
Rochester 1 100 3 2920 1 100
Toledo 2 510 1 1240 2 1500
评论