提问人:aj95 提问时间:2/24/2022 更新时间:2/24/2022 访问量:922
如何在python中将带有行和列的矩阵数据转换为单列数据框
How to convert matrix data with rows and column to a single column data frame in python
问:
我有一个数据集,其年份与月份值如下
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2004 1.9 1.7 1.7 2.3 3.1 3.3 3 2.7 2.5 3.2 3.5 3.3
2005 3 3 3.1 3.5 2.8 2.5 3.2 3.6 4.7 4.3 3.5 3.4
2006 4 3.6 3.4 3.5 4.2 4.3 4.1 3.8 2.1 1.3 2 2.5
2007 2.1 2.4 2.8 2.6 2.7 2.7 2.4 2 2.8 3.5 4.3 4.1
2008 4.3 4 4 3.9 4.2 5 5.6 5.4 4.9 3.7 1.1 0.1
我想使用 Python / Pandas 将其转换为如下所示:
Date Value
Jan-04 1.9
Feb-04 1.7
Mar-04 1.7
Apr-04 2.3
May-04 3.1
Jun-04 3.3
Jul-04 3
Aug-04 2.7
Sep-04 2.5
Oct-04 3.2
Nov-04 3.5
Dec-04 3.3
Jan-05 3
Feb-05 3
Mar-05 3.1
Apr-05 3.5
May-05 2.8
Jun-05 2.5
Jul-05 3.2
Aug-05 3.6
Sep-05 4.7
Oct-05 4.3
Nov-05 3.5
Dec-05 3.4
这怎么能做到?
答:
0赞
jezrael
2/24/2022
#1
使用 DataFrame.stack
进行重塑,然后如果可能,将每秒的最后 2 个值与月份名称联接:year
df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = df.pop('level_1') + '-' +df['date'].astype(str).str[2:]
print (df.head())
date Value
0 JAN-04 1.9
1 FEB-04 1.7
2 MAR-04 1.7
3 APR-04 2.3
4 MAY-04 3.1
或者转换为日期时间:
df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1') + df['date'].astype(str) , format='%b%Y')
print (df.head())
date Value
0 2004-01-01 1.9
1 2004-02-01 1.7
2 2004-03-01 1.7
3 2004-04-01 2.3
4 2004-05-01 3.1
df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1') + df['date'].astype(str) , format='%b%Y').dt.strftime('%b-%y').str.upper()
print (df.head())
date Value
0 JAN-04 1.9
1 FEB-04 1.7
2 MAR-04 1.7
3 APR-04 2.3
4 MAY-04 3.1
2赞
PrinsEdje80
2/24/2022
#2
或者使用熔体:
cols = ['JAN', 'FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
rows=list(range(2004,2009))
#I've used random numbers instead of your values.
df = pd.DataFrame(index=rows,columns=cols,data=np.random.rand(5,12)).reset_index()
tdf = df.melt(id_vars=['index'])
tdf['d'] = pd.to_datetime(tdf['variable']+tdf['index'].astype(str), format='%b%Y')
print(tdf)
输出:
index variable value comb d
0 2004 JAN 0.963338 JAN2004 2004-01-01
1 2005 JAN 0.265815 JAN2005 2005-01-01
2 2006 JAN 0.254360 JAN2006 2006-01-01
3 2007 JAN 0.275372 JAN2007 2007-01-01
4 2008 JAN 0.042116 JAN2008 2008-01-01
柱子的清洁,我留给OP。
评论