加快 Python 日期时间比较以生成值-解网

问：

我想以 pythonic 的方式做到这一点，而无需使用 1）嵌套的 if 语句和 2）使用 iterrows。

我有专栏

Date in | Date Out | 1/22 | 2/22 | ... | 12/22
1/1/19    5/5/22
5/5/22    7/7/22

对于像“1/22”这样的列，我想插入一个计算值，该值将是以下值之一：

尚未创建
闭
打开

对于第一行，第 1/22 列将显示为“打开”，因为它是在 22 年 1 月打开的。这种情况将持续到第5/22栏，其中将标有“已关闭”。

对于第二行，第 1/22 列在 5/22 之前将显示为“尚未创建”，直到 7/22 为止，其值为“已关闭”。

我不一定需要完整的表，但我想计算每个月有多少个已关闭/未打开/尚未创建。

这是我正在使用的代码，它有效，但花费的时间比我想象的要长：

table={}
for i in mcLogsClose.iterrows():
    table[i[0]] = {}
    for month in pd.date_range(start='9/2021', end='9/2022', freq='M'):
        if i[1]['Notif Date'] <= month:
            if i[1]['Completion Date'] <= month:
                table[i[0]][month]="Closed"
            else:
                table[i[0]][month]="Open"
        else:
            table[i[0]][month]="Not Yet Created"

然后我想跑table['1/22'].value_counts()

感谢您的关注！

Python Pandas 日期时间比较分析

1.使用循环

# The date range you are calculating for
min_date = pd.Period("2022-01")
max_date = pd.Period("2022-12")
span = (max_date - min_date).n + 1

# Strip the "Date In" and "Date Out" columns down to the month
date_in = pd.to_datetime(df["Date In"]).dt.to_period("M")
date_out = pd.to_datetime(df["Date Out"]).dt.to_period("M")

data = []
for d_in, d_out in zip(date_in, date_out):
    if d_in > max_date:
        # If date in is after max date, the whole span is under "Not Created" status
        data.append((span, 0, 0))
    elif d_out < min_date:
        # If date out is before min date, the whole span is under "Closed" status
        data.append((0, span, 0))
    else:
        # Now that we have some overlap between (d_in, d_out) and (min_date,
        # max_date), we need to calculate time spent in each status
        closed = (max_date - min(d_out, max_date)).n
        not_created = (max(d_in, min_date) - min_date).n
        open_ = span - closed - not_created
        data.append((not_created, closed, open_))

cols = ["Not Created Yet", "Closed", "Open"]
df[cols] = pd.DataFrame(data, columns=cols, index=df.index)

2.使用 numpy

def to_n(arr: np.array) -> np.array:
    """Convert an array of pd.Period to array of integers"""
    return np.array([i.n for i in arr])

# The date range you are calculating for. Since we intend to use vectorized
# code, we need to turn them into numpy arrays
min_date = np.repeat(pd.Period("2022-01"), len(df))
max_date = np.repeat(pd.Period("2022-12"), len(df))
span = to_n(max_date - min_date) + 1

date_in = pd.to_datetime(df["Date In"]).dt.to_period("M")
date_out = pd.to_datetime(df["Date Out"]).dt.to_period("M")

df["Not Created Yet"] = np.where(
    date_in > max_date,
    span,
    to_n(np.max([date_in, min_date], axis=0) - min_date),
)
df["Closed"] = np.where(
    date_out < min_date,
    span,
    to_n(max_date - np.min([date_out, max_date], axis=0)),
)
df["Open"] = span - df["Not Created Yet"] - df["Closed"]

结果（为我的测试添加了一些行）：

  Date In  Date Out  Not Created Yet  Closed  Open
0  1/1/19    5/5/22                0       7     5
1  5/5/22    7/7/22                4       5     3
2  1/1/20  12/12/20                0      12     0
3  1/1/23    6/6/23               12       0     0
4  6/6/21    6/6/23                0       0    12

上一个：是否有来自 Microsoft 的 .NET 文本差异库？[已结束]

下一个：Javascript - 比较两个日期（减去 X 天）以查看它是真是假

加快 Python 日期时间比较以生成值

Speeding up a Python datetime comparison to generate values

评论

1.使用循环

2.使用 numpy