如何在 Pandas DataFrame 中融化多个列对,每列对分别包含每年男性和女性性别的值

How to melt multiple column pairs each pair containing values for male and female genders separately for each year in Pandas DataFrame

提问人:Chris White 提问时间:3/23/2023 更新时间:3/23/2023 访问量:66

问:

我有一个数据帧,其中有 、 、 等列,也有 、 、 等列。如何将这种宽格式转换为深格式,以获取先前的列 、 和 ,但附加列,其中每个先前的样本将被分成两行,一行用于男性,一行用于女性?"id""company name""company type""Year 1 Males Total""Year 1 Females Total""Year 2 Males Total""Year 2 Females Total""id""company name""company type""Year 1""Year 2""Gender"

我尝试了这个方法,结果正确地熔化了年份,但对于每一行,所有其他列都是 NaN

df_copy = df.copy()
for i in range(1,12):
    df_copy = pd.concat([df_copy, df.melt(value_vars=[f'Year {i} Males Total', f'Year {i} Females Total'], var_name='Gender', value_name=f'Year {i}')], axis=1)
    df_copy.drop(columns=[f'Year {i} Males Total',f'Year {i} Females Total', f'Year {i} Total'],axis=1,inplace=True)
python-3.x pandas 帧数据 操作

评论

0赞 sammywemmy 3/23/2023
请分享具有预期输出的示例数据

答:

0赞 jezrael 3/23/2023 #1

用:

#create MultiIndex in index
df1 = df.set_index(['id','company name','company type'])
#create MultiIndex in columns extract years and Males or Females substrings
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract(r'(\d+)\s+(Males|Females)'))
#reshape for years in columns
df1 = df1.rename_axis([None, 'Gender'], axis=1).stack().add_prefix('Year ').reset_index()