提问人:Chris White 提问时间:3/23/2023 更新时间:3/23/2023 访问量:66
如何在 Pandas DataFrame 中融化多个列对,每列对分别包含每年男性和女性性别的值
How to melt multiple column pairs each pair containing values for male and female genders separately for each year in Pandas DataFrame
问:
我有一个数据帧,其中有 、 、 等列,也有 、 、 等列。如何将这种宽格式转换为深格式,以获取先前的列 、 和 ,但附加列,其中每个先前的样本将被分成两行,一行用于男性,一行用于女性?"id"
"company name"
"company type"
"Year 1 Males Total"
"Year 1 Females Total"
"Year 2 Males Total"
"Year 2 Females Total"
"id"
"company name"
"company type"
"Year 1"
"Year 2"
"Gender"
我尝试了这个方法,结果正确地熔化了年份,但对于每一行,所有其他列都是 NaN
df_copy = df.copy()
for i in range(1,12):
df_copy = pd.concat([df_copy, df.melt(value_vars=[f'Year {i} Males Total', f'Year {i} Females Total'], var_name='Gender', value_name=f'Year {i}')], axis=1)
df_copy.drop(columns=[f'Year {i} Males Total',f'Year {i} Females Total', f'Year {i} Total'],axis=1,inplace=True)
答:
0赞
jezrael
3/23/2023
#1
用:
#create MultiIndex in index
df1 = df.set_index(['id','company name','company type'])
#create MultiIndex in columns extract years and Males or Females substrings
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract(r'(\d+)\s+(Males|Females)'))
#reshape for years in columns
df1 = df1.rename_axis([None, 'Gender'], axis=1).stack().add_prefix('Year ').reset_index()
评论