提问人:Michi 提问时间:11/3/2023 最后编辑:Michi 更新时间:11/3/2023 访问量:45
按水年月重塑 Pandas 数据帧
Reshaping Pandas dataframe by water year's months
问:
主要目标是将以下数据转换为水年格式:
import pandas as pd
import numpy as np
index = pd.date_range('1999-1-1', periods=1075, freq='D')
df = pd.DataFrame(np.random.randn(1075), index=index, columns=["values"])
df.groupby(by=[df.index.year, df.index.month]).sum()
In[1]: df
out[1]: df
values
1999 1 1.851986
2 6.919213
3 1.303098
4 -2.473795
5 -3.619660
6 -3.144469
7 3.911351
8 -1.368267
9 -6.119913
10 -0.497942
11 -5.029329
12 5.557823
2000 1 -0.919850
2 -2.660170
3 -14.915212
4 -7.693769
5 3.676597
6 4.019901
7 2.496312
8 -4.616666
9 6.824058
10 3.076758
11 -2.330824
12 -8.058456
2001 1 -7.555463
2 -3.959312
3 -2.788833
4 4.420411
5 9.289880
6 -3.070722
7 5.145892
8 5.153723
9 10.993538
10 -8.572401
11 1.139847
12 -4.097670
我尝试使用下面的代码来重塑我的 dataFrame
df['values'].groupby([df.index.year, df.index.strftime('%b')], sort=False).sum() \
.unstack()
所以我明白了,但不是我需要的,或者至少不是水年格式
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1999 1.8 6.9 1.3 -2.4 -3.6 -3.1 3.9 -1.3 -6.1 -0.4 -5.0 5.5
2000 -0.9 -2.6 -14.9 -7.6 3.6 4.0 2.4 -4.6 6.8 3.0 -2.3 -8.0
2001 -7.5 -3.9 -2.7 4.4 9.2 -3.0 5.1 5.1 10.9 -5.2 1.1 -4.0
它应该看起来像这样:
Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul
1999/2000 -1.3 -6.1 -0.4 -5.0 5.5 -0.9 -2.6 -14.9 -7.6 3.6 4.0 2.4
2000/2001 -4.6 6.8 3.0 -2.3 -8.0 -7.5 -3.9 -2.7 4.4 9.2 -3.0 5.1
有没有一种快速的熊猫方法可以获得正确的格式?
答:
1赞
Suraj Shourie
11/3/2023
#1
你的方向是正确的,我认为你只需要使用“water_year”来分组。我使用这里提到的逻辑来生成水年的列,然后按该列进行分组:
df['water_year'] = df.index.year.where(df.index.month <10, df.index.year + 1)
df['values'].groupby([df['water_year'], df.index.strftime('%b')], sort=False).sum() \
.unstack()
输出:
Jan Feb Mar Apr May Jun \
water_year
1999 -0.543094 -1.331629 -1.026292 5.175553 -5.679639 -12.595839
2000 -5.404292 1.698754 -9.035793 -5.871088 -5.226419 -8.697871
2001 -7.719539 3.881611 -6.575974 -3.199067 -1.498846 -5.207359
2002 NaN NaN NaN NaN NaN NaN
Jul Aug Sep Oct Nov Dec
water_year
1999 -6.846494 -3.857698 -1.074428 NaN NaN NaN
2000 3.038337 -4.461454 2.503393 -5.718663 10.825165 6.027865
2001 -1.062329 10.221003 -4.396054 -6.107847 -1.656854 -3.305165
2002 NaN NaN NaN 3.967753 1.760093 2.591576
评论
0赞
Michi
11/3/2023
是的!我离得很近,但仍然很远,哈哈
0赞
Andrej Kesely
11/3/2023
#2
另一种解决方案:
import calendar
import numpy as np
import pandas as pd
index = pd.date_range("1999-1-1", periods=1075, freq="D")
df = pd.DataFrame(np.random.randn(1075), index=index, columns=["values"])
x = df.groupby(by=[df.index.year, df.index.month]).sum().reset_index()
x["idx"] = x.groupby((x.level_1 == 8).cumsum())["level_0"].transform(
lambda x: "/".join(map(str, x.unique()))
)
out = (
x.pivot(index="idx", columns="level_1", values="values")
.dropna()
.rename_axis(index=None, columns=None)
)
out = out[sorted(out.columns, key=lambda c: c + 100 if c < 8 else c)]
out.columns = [calendar.month_abbr[c] for c in out.columns]
print(out)
指纹:
Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul
1999/2000 -0.456728 -2.596009 -1.914507 3.466804 5.765993 -0.345997 -5.835962 -3.768793 -0.974437 1.446318 -0.759709 -8.616992
2000/2001 -6.645369 7.700233 -3.406501 -7.932338 -1.441951 12.786219 -3.285047 4.248312 -2.681908 11.335019 2.811957 1.832149
评论
1赞
Michi
11/3/2023
非常感谢。这正是我想做的,我不知道如何输入代码。:^)
评论