如何计算日期前 x 天(如果该日不是节假日)的指数平均值并将其合并到 DataFrame?

How to compute avg of an index for x days before a date (if the day is not a holiday) and merge it to dataframe?

提问人:skwolvie 提问时间:11/1/2021 更新时间:11/1/2021 访问量:40

问:

我有一个数据集,其中有给定日期某个位置的流量指数。 对于给定日期,我想在给定日期前 30 天计算所有流量指数的平均值,如果该日不是假期,则仅考虑这 30 天子集中的天数。

我想使用 python 进行此计算。我下面有一个屏幕截图,直观地代表了我的要求。

Explanation of the screenshot

On April 1, 2019: 
I want to calculate the 30 Day Non-Holiday traffic Index Average,
for a given location and map it to a new column with a similar column name.

The column weekend_holiday is a boolean column that is true (1) for days that are public holidays or weekends. 
We must ignore such entries in the computation of Average Location's Traffic index. 

示例数据集链接https://gist.github.com/skwolvie/f01c027de0816c28337870286ee61a9d

enter image description here

请建议蟒蛇熊猫技巧来达到这个结果。

Python Pandas DataFrame numpy 数据操作

评论


答:

3赞 Simone 11/1/2021 #1

您可以使用熊猫滚动来计算滚动平均值,该滚动接受具有基于时间长度的窗口。

以下代码计算数据帧的每一行所需的平均值:

# Set date as index because it is needed if you want to do time-based rolling
df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date')

# Drop weekends/holidays and then compute the average of the previous 30 days
df['DELHI'] = df.where(df.weekend_or_holiday == 0).rolling('30D').mean()['New Delhi']
df['MUMBAI'] = df.where(df.weekend_or_holiday == 0).rolling('30D').mean()['Mumbai']

# Get back Date column
df = df.reset_index()