如何在python中将带有行和列的矩阵数据转换为单列数据框

How to convert matrix data with rows and column to a single column data frame in python

提问人:aj95 提问时间:2/24/2022 更新时间:2/24/2022 访问量:922

问:

我有一个数据集,其年份与月份值如下

    JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2004    1.9 1.7 1.7 2.3 3.1 3.3 3   2.7 2.5 3.2 3.5 3.3
2005    3   3   3.1 3.5 2.8 2.5 3.2 3.6 4.7 4.3 3.5 3.4
2006    4   3.6 3.4 3.5 4.2 4.3 4.1 3.8 2.1 1.3 2   2.5
2007    2.1 2.4 2.8 2.6 2.7 2.7 2.4 2   2.8 3.5 4.3 4.1
2008    4.3 4   4   3.9 4.2 5   5.6 5.4 4.9 3.7 1.1 0.1

我想使用 Python / Pandas 将其转换为如下所示:

Date    Value
Jan-04  1.9
Feb-04  1.7
Mar-04  1.7
Apr-04  2.3
May-04  3.1
Jun-04  3.3
Jul-04  3
Aug-04  2.7
Sep-04  2.5
Oct-04  3.2
Nov-04  3.5
Dec-04  3.3
Jan-05  3
Feb-05  3
Mar-05  3.1
Apr-05  3.5
May-05  2.8
Jun-05  2.5
Jul-05  3.2
Aug-05  3.6
Sep-05  4.7
Oct-05  4.3
Nov-05  3.5
Dec-05  3.4

这怎么能做到?

Python Pandas DataFrame 矩阵 数据操作

评论


答:

0赞 jezrael 2/24/2022 #1

使用 DataFrame.stack 进行重塑,然后如果可能,将每秒的最后 2 个值与月份名称联接:year

df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = df.pop('level_1') + '-' +df['date'].astype(str).str[2:]
print (df.head())
     date  Value
0  JAN-04    1.9
1  FEB-04    1.7
2  MAR-04    1.7
3  APR-04    2.3
4  MAY-04    3.1

或者转换为日期时间:

df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1') + df['date'].astype(str) , format='%b%Y')
print (df.head())
        date  Value
0 2004-01-01    1.9
1 2004-02-01    1.7
2 2004-03-01    1.7
3 2004-04-01    2.3
4 2004-05-01    3.1

df = df.rename_axis('date').stack().reset_index(name='Value')
df['date'] = pd.to_datetime(df.pop('level_1') + df['date'].astype(str) , format='%b%Y').dt.strftime('%b-%y').str.upper()
print (df.head())
     date  Value
0  JAN-04    1.9
1  FEB-04    1.7
2  MAR-04    1.7
3  APR-04    2.3
4  MAY-04    3.1
2赞 PrinsEdje80 2/24/2022 #2

或者使用熔体:

cols = ['JAN', 'FEB','MAR','APR','MAY','JUN','JUL','AUG','SEP','OCT','NOV','DEC']
rows=list(range(2004,2009))
#I've used random numbers instead of your values.
df = pd.DataFrame(index=rows,columns=cols,data=np.random.rand(5,12)).reset_index()
tdf = df.melt(id_vars=['index'])
tdf['d'] = pd.to_datetime(tdf['variable']+tdf['index'].astype(str), format='%b%Y')
print(tdf)

输出:

    index variable     value     comb          d
0    2004      JAN  0.963338  JAN2004 2004-01-01
1    2005      JAN  0.265815  JAN2005 2005-01-01
2    2006      JAN  0.254360  JAN2006 2006-01-01
3    2007      JAN  0.275372  JAN2007 2007-01-01
4    2008      JAN  0.042116  JAN2008 2008-01-01

柱子的清洁,我留给OP。