提问人:Student 提问时间:8/2/2022 最后编辑:petezurichStudent 更新时间:8/4/2022 访问量:73
使用 .groupby() 从数据帧 A 中获取一个组的最小日期,并替换数据帧 B 中满足条件的日期
Get minimum date of one group from dataframe A using .groupby() and replace the date in dataframe B where conditions are met
问:
我有两个数据帧,如下所示:
df_A = pd.DataFrame({'Date': ['1/1/2016', '1/2/2016', '1/3/2016', '1/4/2016', '1/5/2016', '1/6/2016', '1/7/2016', '1/8/2016', '1/9/2016', '1/10/2016', '1/11/2016', '1/12/2016', '1/13/2016', '1/14/2016', '1/15/2016', '1/16/2016', '1/17/2016', '1/18/2016', '1/19/2016', '1/20/2016', '1/21/2016', '1/22/2016', '1/23/2016', '1/24/2016', '1/25/2016', '1/26/2016', '1/27/2016', '1/28/2016', '1/29/2016', '1/30/2016', '1/31/2016', '2/1/2016', '2/2/2016', '2/3/2016', '2/4/2016', '2/5/2016', '2/6/2016', '2/7/2016'],
'445_Week': [20160101, 20160101, 20160101, 20160101, 20160101, 20160101, 20160101, 20160101, 20160102, 20160102, 20160102, 20160102, 20160102, 20160102, 20160102, 20160103, 20160103, 20160103, 20160103, 20160103, 20160103, 20160103, 20160103, 20160104, 20160104, 20160104, 20160104, 20160104, 20160104, 20160104, 20160104, 20160201, 20160201, 20160201, 20160201, 20160201, 20160201, 20160201],
'Week': ['1','1','1','1','1','1','1','2','2','2','2','2','2','2','3','3','3','3','3','3','3','3','4','4','4','4','4','4','4','4','1','1','1','1','1','1','1','1',],
'Sales': ['10', '15', '20', '15','10','20', '10','15', '10', '15','20', '15','10','20', '10','15','10', '15', '20', '15','10','20', '10','15', '10', '15','20', '15','10','20', '10','15', '10','15', '20', '15','10','20']})
df_B = pd.DataFrame({'Date': ['1/1/2016','1/2/2016', '1/3/2016', '1/4/2016','2/1/2016'],
'445_Week': [20160101, 20160102, 20160103, 20160104, 20160201],
'Week': ['1', '2', '3', '4', '5'],
'Sales': ['10','15', '20', '15', '10']})
我正在使用 4-4-5 日历,如上面的“445_Week”列所示。我的目标是将df_B格式为“m/w/yyyy”的“日期”列替换为正确的“日期”,如df_A所示。我想通过从df_A获取“445_Week”中每个组的最短日期来实现这一目标。这是理想的最终结果:
df_C = pd.DataFrame({'Date': ['1/1/2016','1/9/2016', '1/16/2016', '1/24/2016','2/1/2016'],
'445_Week': [20160101, 20160102, 20160103, 20160104, 20160201],
'Week': ['1', '2', '3', '4', '5'],
'Sales': ['114.375','14.285', '14.375', '14.375', '15']})
请注意,最终数据帧的 Sales 部分只是每个组值的平均值。
这是我到目前为止尝试的:
dfc = df_A.groupby('445_Week')['Date']
new_df = df_A.assign(Date = dfc.transform(min))
这只是从df_A创建一个新的数据帧,我们在其中获得每个445_Week的最小日期。我相信下一步是合并这两个数据帧,但我不确定这是否正确。
答:
0赞
inquirer
8/4/2022
#1
您可以尝试简单地替换所需列中的数据,并在计算中添加另一个附加数据。 我将列 new_df['Sales'] 类型转换为浮点数,将列 df_A['Date'] 转换为日期格式。 如果不转换 df_A['Date'] 列,则无法正确计算索引为 1 的行(它将是 2016 年 1 月 10 日)。您的第一个平均值是 114.375,它应该是 14.375000。
df_A['Date'] = pd.to_datetime(df_A['Date'])
dfc = df_A.groupby('445_Week')['Date']
new_df = df_A.assign(Date=dfc.transform(min))
new_df['Sales'] = new_df['Sales'].astype(float)
aaa = new_df.groupby('Date')['Sales'].mean()
df_B['Date'] = aaa.index
df_B['Sales'] = aaa.values
print(df_B)
输出
Date 445_Week Week Sales
0 2016-01-01 20160101 1 14.375000
1 2016-01-09 20160102 2 14.285714
2 2016-01-16 20160103 3 14.375000
3 2016-01-24 20160104 4 14.375000
4 2016-02-01 20160201 5 15.000000
如果需要原始格式的列 df_B['Date'] 数据,则可以将其转换回字符串:
df_B['Date'] = df_B['Date'].dt.strftime("%-m/%-d/%Y")
输出
Date 445_Week Week Sales
0 1/1/2016 20160101 1 14.375000
1 1/9/2016 20160102 2 14.285714
2 1/16/2016 20160103 3 14.375000
3 1/24/2016 20160104 4 14.375000
4 2/1/2016 20160201 5 15.000000
上一个:按指标变量计算日期差异
下一个:R:生成遵循模式的随机日期
评论