按水年月重塑 Pandas 数据帧

Reshaping Pandas dataframe by water year's months

提问人:Michi 提问时间:11/3/2023 最后编辑:Michi 更新时间:11/3/2023 访问量:45

问:

主要目标是将以下数据转换为水年格式:

import pandas as pd
import numpy as np

index = pd.date_range('1999-1-1', periods=1075, freq='D')
df = pd.DataFrame(np.random.randn(1075), index=index, columns=["values"])

df.groupby(by=[df.index.year, df.index.month]).sum()

In[1]: df

out[1]: df

values
1999    1   1.851986
        2   6.919213
        3   1.303098
        4   -2.473795
        5   -3.619660
        6   -3.144469
        7   3.911351
        8   -1.368267
        9   -6.119913
        10  -0.497942
        11  -5.029329
        12  5.557823
2000    1   -0.919850
        2   -2.660170
        3   -14.915212
        4   -7.693769
        5   3.676597
        6   4.019901
        7   2.496312
        8   -4.616666
        9   6.824058
        10  3.076758
        11  -2.330824
        12  -8.058456
2001    1   -7.555463
        2   -3.959312
        3   -2.788833
        4   4.420411
        5   9.289880
        6   -3.070722
        7   5.145892
        8   5.153723
        9   10.993538
        10  -8.572401
        11  1.139847
        12  -4.097670

我尝试使用下面的代码来重塑我的 dataFrame

df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum() \
    .unstack()

所以我明白了,但不是我需要的,或者至少不是水年格式

        Jan  Feb  Mar   Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1999    1.8  6.9  1.3  -2.4 -3.6 -3.1  3.9 -1.3 -6.1 -0.4 -5.0  5.5
2000   -0.9 -2.6 -14.9 -7.6  3.6  4.0 2.4  -4.6  6.8  3.0 -2.3 -8.0
2001   -7.5 -3.9 -2.7   4.4  9.2 -3.0  5.1  5.1 10.9 -5.2  1.1 -4.0

它应该看起来像这样:

             Aug  Sep  Oct  Nov  Dec  Jan  Feb  Mar   Apr  May  Jun  Jul
1999/2000   -1.3 -6.1 -0.4 -5.0  5.5 -0.9 -2.6 -14.9 -7.6  3.6  4.0  2.4
2000/2001   -4.6  6.8  3.0 -2.3 -8.0 -7.5 -3.9 -2.7   4.4  9.2 -3.0  5.1

有没有一种快速的熊猫方法可以获得正确的格式?

Python Pandas 时间序列 数据科学

评论


答:

1赞 Suraj Shourie 11/3/2023 #1

你的方向是正确的,我认为你只需要使用“water_year”来分组。我使用这里提到的逻辑来生成水年的列,然后按该列进行分组:

df['water_year'] = df.index.year.where(df.index.month <10, df.index.year + 1)
df['values'].groupby([df['water_year'], df.index.strftime('%b')], sort=False).sum() \
    .unstack()

输出:

                Jan       Feb       Mar       Apr       May        Jun  \
water_year                                                                
1999       -0.543094 -1.331629 -1.026292  5.175553 -5.679639 -12.595839   
2000       -5.404292  1.698754 -9.035793 -5.871088 -5.226419  -8.697871   
2001       -7.719539  3.881611 -6.575974 -3.199067 -1.498846  -5.207359   
2002             NaN       NaN       NaN       NaN       NaN        NaN   

                 Jul        Aug       Sep       Oct        Nov       Dec  
water_year                                                                
1999       -6.846494  -3.857698 -1.074428       NaN        NaN       NaN  
2000        3.038337  -4.461454  2.503393 -5.718663  10.825165  6.027865  
2001       -1.062329  10.221003 -4.396054 -6.107847  -1.656854 -3.305165  
2002             NaN        NaN       NaN  3.967753   1.760093  2.591576  

评论

0赞 Michi 11/3/2023
是的!我离得很近,但仍然很远,哈哈
0赞 Andrej Kesely 11/3/2023 #2

另一种解决方案:

import calendar

import numpy as np
import pandas as pd

index = pd.date_range("1999-1-1", periods=1075, freq="D")
df = pd.DataFrame(np.random.randn(1075), index=index, columns=["values"])

x = df.groupby(by=[df.index.year, df.index.month]).sum().reset_index()

x["idx"] = x.groupby((x.level_1 == 8).cumsum())["level_0"].transform(
    lambda x: "/".join(map(str, x.unique()))
)
out = (
    x.pivot(index="idx", columns="level_1", values="values")
    .dropna()
    .rename_axis(index=None, columns=None)
)
out = out[sorted(out.columns, key=lambda c: c + 100 if c < 8 else c)]
out.columns = [calendar.month_abbr[c] for c in out.columns]

print(out)

指纹:

                Aug       Sep       Oct       Nov       Dec        Jan       Feb       Mar       Apr        May       Jun       Jul
1999/2000 -0.456728 -2.596009 -1.914507  3.466804  5.765993  -0.345997 -5.835962 -3.768793 -0.974437   1.446318 -0.759709 -8.616992
2000/2001 -6.645369  7.700233 -3.406501 -7.932338 -1.441951  12.786219 -3.285047  4.248312 -2.681908  11.335019  2.811957  1.832149

评论

1赞 Michi 11/3/2023
非常感谢。这正是我想做的,我不知道如何输入代码。:^)