SQL查询根据事件时间将数据聚合为班次?

SQL query to aggregate data into shifts based on time of events?

提问人:crackersNcheese 提问时间:10/27/2023 最后编辑:CharliefacecrackersNcheese 更新时间:10/31/2023 访问量:159

问:

更新为使用 MS SQL Server 指定。

我有一个表,其中有一行对应工作人员执行的每个操作,每个操作都带有日期/时间戳。

Action_Log

Action_ID Action_DTTM
1 04/19/2023 11:15:37
2 04/19/2023 12:16:14
3 04/19/2023 12:16:14
4 04/19/2023 14:21:03
5 04/19/2023 16:55:41
6 04/19/2023 18:10:43
7 04/19/2023 20:24:15
8 04/19/2023 22:43:29
9 04/20/2023 13:21:04
10 04/20/2023 13:55:57
11 04/20/2023 13:55:57
12 04/20/2023 16:28:11
13 04/20/2023 17:35:45
14 04/20/2023 18:48:01
15 04/20/2023 19:32:26
16 04/20/2023 20:16:39
17 04/20/2023 21:03:25

我需要一个查询,将操作聚合到班次中。上表显示了单个工作人员在 2 个班次期间的操作。工人可以工作几种不同的班次类型——ShiftA 是上午 11 点至晚上 11 点,ShiftB 是下午 1 点至晚上 9 点——但我没有轮班时间表告诉我每种类型的班次何时安排工人。我所拥有的只是操作日志。

我希望的输出类型示例:

Action_ID Shift_Number Shift_Type
1 1 班次A
2 1 班次A
3 1 班次A
4 1 班次A
5 1 班次A
6 1 班次A
7 1 班次A
8 1 班次A
9 2 移位B
10 2 移位B
11 2 移位B
12 2 移位B
13 2 移位B
14 2 移位B
15 2 移位B
16 2 移位B
17 2 移位B

我什至不确定这是否可以在 SQL 中完成?也不确定我的示例表是否以清晰的方式呈现。如果更容易阅读,我可以添加屏幕截图

sql-server t-sql 间隙和孤岛

评论

3赞 Heinz Siahaan 10/27/2023
请指定您的 DBMS(SQL Server、MySQL 等)。请解释为什么 9 (13:21)='ShiftB' ?(13:21) 也可以在“ShiftA”中。并在制作表格时添加换行符
0赞 Adrian Maxwell 10/27/2023
用实际的 dbms 标记!如果将鼠标悬停在 sql 标签上,它会说: 结构化查询语言 (SQL) 是一种用于查询数据库的语言。问题应包括代码示例、表结构、示例数据以及正在使用的 DBMS 实现(例如 MySQL、PostgreSQL、Oracle、MS SQL Server、IBM DB2 等)的标签。如果您的问题仅与特定 DBMS(使用特定扩展/功能)相关,请改用该 DBMS 的标记。用 SQL 标记的问题的答案应使用 ISO/IEC 标准 SQL
1赞 Adrian Maxwell 10/27/2023
实际 dbms 之所以如此重要,是因为每个 dbms 对于如何处理日期和时间都有非常不同的语法。这意味着我们无法为您提供有用的解决方案,因为我们无法选择要使用的正确语法。所以 - 要得到答案,我们确实需要您选择正确的标签。
1赞 shawnt00 10/27/2023
你们的班次重叠?
1赞 Clockwork-Muse 10/27/2023
如果你的班次重叠,你希望如何显示/说明事情?

答:

0赞 Charlieface 10/29/2023 #1

这是一个典型的间隙和孤岛问题。

您需要首先计算出每个班次的开始时间(通过寻找超过 8 小时的间隔)。然后,您需要在整个班次中应用该计算,然后在此基础上分配班次。

我注意到您的班次是重叠的,因此结果可能并不完全准确。

一种选择是分配一个有条件的轮班开始计数,它为每个班次提供一个数字,然后取每个班次的最小值。Action_DTTM

WITH Previous AS (
    SELECT *,
      PrevDate = LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
    FROM Action_Log al
),
Grouped AS (
    SELECT *,
      Shift_Number = COUNT(CASE WHEN PrevDate > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE 1 END)
                 OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
    FROM Previous
),
PerGroup AS (
    SELECT *,
      FirstAction = DATEPART(hour, MIN(Action_DTTM) OVER (PARTITION BY Shift_Number))
    FROM Grouped
)
SELECT
  Action_ID,
  Shift_Number,
  Shift_Type =
    CASE WHEN FirstAction BETWEEN 11 AND 12
            THEN 'A'
         WHEN FirstAction BETWEEN 13 AND 17
            THEN 'B'
         WHEN FirstAction BETWEEN 18 AND 21
            THEN 'D'
         ELSE 'C'
    END
FROM PerGroup;

一个更好(更有效)的选项可能是仅在班次开始时复制,然后使用它来创建班次编号并将其应用于所有其他列。Action_DTTM

WITH Previous AS (
    SELECT *,
      ShiftStart = CASE WHEN LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
                   > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE Action_DTTM END
    FROM Action_Log al
),
Grouped AS (
    SELECT *,
      AllShiftStart =  MAX(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING),
      Shift_Number = COUNT(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
    FROM Previous
),
PerGroup AS (
    SELECT *,
      FirstAction = DATEPART(hour, AllShiftStart)
    FROM Grouped
)
SELECT
  Action_ID,
  Shift_Number,
  Shift_Type =
    CASE WHEN FirstAction BETWEEN 11 AND 12
            THEN 'A'
         WHEN FirstAction BETWEEN 13 AND 17
            THEN 'B'
         WHEN FirstAction BETWEEN 18 AND 21
            THEN 'D'
         ELSE 'C'
    END
FROM PerGroup;

db<>小提琴

1赞 ILoveSQL 10/29/2023 #2

OP,操作 ID 17 超移 225 秒。查理法斯的回答对第一班的动作(幻数)做出了假设。我的方法很自然地得到了这一点。代码包含所有注释来解释所有内容。我还添加到您的样本数据集中,以获取所有 4 个班次的数据以证明这有效。此外,我的解决方案支持重叠班次,但假设任何给定秒都只有一个班次在工作。如果不是这样,那么绝对没有办法做到这一点。

我的建议是在用户表(而不是变量)中设置移位时间,类似于我的设置,因此您可以轻松更新该表,而不是使用幻数更新查询。您永远不希望使用幻数进行生产查询。您甚至可以更进一步,添加两个日期时间列 - date_begin和date_end描述一个班次有效的日期范围。在此设置中,您将date_end保留为 null,直到它变得过时,然后根据需要添加新行来描述新的班次时间。您想要这样做的原因是,如果轮班时间发生变化,并且您需要及时查询,那么您将不得不执行多个查询来支持不同的轮班时间更改,或者使用更新的轮班时间表来支持所有这些。

不幸的是,结果集很大(显示支持数据的列很多,深度为 34 行),并且在询问之前不会发布图片。我也不使用 db fiddle。我相信这里的一些人与拥有它的人密切相关,并为它带来流量。

declare @shift_times table
(
    shift_number tinyint not null,
    shift_type varchar(10) not null,
    time_start time not null,
    time_end time not null
)

declare @action_log table
(
    action_id int not null,
    action_dttm datetime not null
)

-- it is nice to have the action_dttm as a date as well for comparison
-- we use this table in place of @action_log after populating
declare @action_log_expanded table
(
    action_id int not null,
    action_dttm datetime not null,
    action_date date not null
)

-- this holds the min and max action dttm by day (the x means by, like this by that)
declare @min_max_dttm_x_date table
(
    action_date date not null,
    min_dttm_x_day datetime not null,
    max_dttm_x_day datetime not null
)

-- this table holds data guaranteed to match the shift to the STARTING day worked
-- based on comparing shift start time to the first day's transaction (look at the top 1 order by clause)
-- so if a shift went over midnight, this table will hold its action date as 1 day less than the action log dttm
-- this will not account for certain oddities that we make up for later
-- if you select from this table after it is insert into, you will see what i mean
declare @guaranteed_shift_x_min_action_dttm_x_day table
(
    shift_number tinyint not null,
    shift_type varchar(10) not null,
    time_start time not null,
    time_end time not null,
    action_date date not null,
    min_dttm_x_day datetime not null,
    max_dttm_x_day datetime not null,
    action_date_shift_start datetime not null,
    action_date_shift_end datetime not null,
    action_date_before_midnight datetime not null
)


-- in your sample data, action id 17 went over by 3.x minutes from shift B end
-- this variable allows you some flexibility in allowing overage.
-- in your sample data (which i did not change, but added to it),
-- this record would have gone to shift C, but this allows it to 
-- allocate to shift B. If it does not fit the criteria, 
-- the record is then omitted from the result set because it matches no shift.
-- your business would need to make a decision about how to handle these. 
declare @max_seconds_after_shift int = 300 -- 5 minutes


insert  into @shift_times
select  1, 'ShiftA', '11:00:00', '23:00:00' union all
select  2, 'ShiftB', '13:00:00', '21:00:00' union all
select  3, 'ShiftC', '22:00:00', '06:00:00' union all
select  4, 'ShiftD', '18:00:00', '06:00:00'


insert  into @action_log
select 1, '04/19/2023 11:15:37' union all
select 2, '04/19/2023 12:16:14' union all
select 3, '04/19/2023 12:16:14' union all
select 4, '04/19/2023 14:21:03' union all
select 5, '04/19/2023 16:55:41' union all
select 6, '04/19/2023 18:10:43' union all
select 7, '04/19/2023 20:24:15' union all
select 8, '04/19/2023 22:43:29' union all
select 9, '04/20/2023 13:21:04' union all
select 10, '04/20/2023 13:55:57' union all
select 11, '04/20/2023 13:55:57' union all
select 12, '04/20/2023 16:28:11' union all
select 13, '04/20/2023 17:35:45' union all
select 14, '04/20/2023 18:48:01' union all
select 15, '04/20/2023 19:32:26' union all
select 16, '04/20/2023 20:16:39' union all
select 17, '04/20/2023 21:03:25' union all
select 18, '04/21/2023 22:21:04' union all
select 19, '04/21/2023 22:55:57' union all
select 20, '04/21/2023 22:55:57' union all
select 21, '04/22/2023 01:28:11' union all
select 22, '04/22/2023 02:35:45' union all
select 23, '04/22/2023 03:48:01' union all
select 24, '04/22/2023 04:32:26' union all
select 25, '04/22/2023 05:16:39' union all
select 26, '04/22/2023 06:03:25' union all
select 27, '04/23/2023 18:15:37' union all
select 28, '04/23/2023 19:16:14' union all
select 29, '04/23/2023 19:16:14' union all
select 30, '04/23/2023 21:21:03' union all
select 31, '04/23/2023 23:55:41' union all
select 32, '04/24/2023 01:10:43' union all
select 33, '04/24/2023 03:24:15' union all
select 34, '04/24/2023 05:43:29' 


-- expand the action log
insert  into @action_log_expanded
select  ac.action_id,
        ac.action_dttm,
        convert(date, ac.action_dttm) as action_date
from    @action_log ac


-- get the min and max action dttm per day
insert  into @min_max_dttm_x_date
select  top 1 with ties convert(date, a.action_dttm) as action_date,
        min(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as min_dttm_x_day,
        max(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as max_dttm_x_day
from    @action_log a
order by    ROW_NUMBER() over (partition by convert(date, a.action_dttm) order by a.action_dttm) 


-- based on what we know, we can compare shift start time and the first day's transaction
-- to atleast get some guaranteed knowns
insert  into @guaranteed_shift_x_min_action_dttm_x_day
select  top 1 with ties 
        s.shift_number,
        s.shift_type,
        s.time_start,
        s.time_end,
        d.action_date,
        d.min_dttm_x_day,
        d.max_dttm_x_day,
        cast(d.action_date as datetime) + cast(s.time_start as datetime) as action_date_shift_start,
        cast(d.action_date as datetime) + cast(s.time_end as datetime) as action_date_shift_end,
        cast(d.action_date as datetime) + cast('23:59:59' as datetime) as action_date_before_midnight
from    @shift_times s
                inner join
        @min_max_dttm_x_date d on 
                    s.time_start <= convert(time, d.min_dttm_x_day)  -- purposefully do not check the end time because data shows (action id 17) that users can go over shift)
order by    ROW_NUMBER() over (partition by d.action_date order by abs(datediff(second, convert(time, s.time_start), convert(time, d.min_dttm_x_day)))) asc 



-- join the action log
-- for the dates that are not on here, we can simply go into the next day in the join within the action log by adding the action date + 1 and checking for time
select  *
from    @guaranteed_shift_x_min_action_dttm_x_day g
            inner join
        @action_log_expanded e on
                (
                    g.action_date = e.action_date and -- the shift's first transaction date equals the action dttm
                    (
                        g.time_start < g.time_end and -- this shift does not overlap midnight
                        (
                            (   -- the action dttm betwen the shift times - just adding time to a date time
                                e.action_dttm between g.action_date_shift_start and g.action_date_shift_end
                            ) or
                            (
                                -- existence and flexibility of time outside of shift end - see your action log id 17
                                -- this is adjustable
                                datediff(second, g.action_date_shift_end, e.action_dttm) between 0 and @max_seconds_after_shift
                            )
                        )
                    ) or
                    (
                        g.time_start > g.time_end and -- shift spans two days but this join is for start day
                        e.action_dttm between g.action_date_shift_start and g.action_date_before_midnight
                    )                   
                ) or
                (
                    -- the shift's first transaction happened on the previous day
                    cast(dateadd(day, 1, g.action_date) as date) = e.action_date and
                    g.time_start > g.time_end and
                    (
                        (
                            e.action_dttm between cast(e.action_date as datetime) and dateadd(day, 1, g.action_date_shift_end)
                        ) or
                        (
                            datediff(second, dateadd(day, 1, g.action_date_shift_end), e.action_dttm) between 0 and @max_seconds_after_shift

                        )

                    )
                )
order by    e.action_id asc

评论

0赞 crackersNcheese 10/29/2023
神圣的烟雾,这太棒了。初步测试似乎完全解决了我的问题。我将尝试一下(我有 3 种额外的工人类型,具有更复杂的轮班类型)——但我谨慎乐观。我永远无法弄清楚这一点。谢谢!!
0赞 ILoveSQL 10/29/2023
别客气。请记住,这是代码演示,是起草解决方案的一种方式。但是,如果您想尝试复制粘贴,请务必这样做。如果您对我的代码有任何疑问,请发表评论。无论这里的答案适合您,请适当标记。
0赞 crackersNcheese 10/30/2023
哈哈,对不起,你的感情受到了伤害。我确实说过,我知道工人可以轮班工作,但不知道实际轮班的时间表。把它归咎于没有意识到我可以创建一个班次表
0赞 ILoveSQL 10/30/2023
@topsail 在问题的评论中,当前轮班时间的细节与其他细节一起出现。谢谢你的赞美。
0赞 shawnt00 10/31/2023 #3

如果你真的有没有长时间中断的数据,那么我认为你会想要找到这些集群,然后在每个跨度内使用最小(和/或最大)时间。

with data as (
    select *,
        case when datediff(hour, Action_DTTM,
                 lead(Action_DTTM) over (partition by Worker_ID order by Action_DTTM)) > 3
             then 1 else 0 end as shift_Break
    from T
), shifts as (
    select *,
        sum(shift_Break) over (partition by Worker_ID order by Action_DTTM) as shift_Counter
    from data
), shift_Agg
    select *,
        datepart(hour, min(Action_DTTM) over (partition by Worker_ID, shift_Counter)) as first_Hour
    from shifts
)
select Worker_ID, Action, DTTM
    case when first_Hour < 15 then -- A or B
        case when first_Hour < 13 then 'A' else 'B' end else
        case when first_Hour < 21 then 'C' else 'D' end
    end as shift_Name
from shift_Agg;

我认为有可能早晚开始轮班,所以我试着稍微捏造一下范围。显然,你会想要玩这个逻辑,或者通过查看结束时间或总持续时间来使其稍微健壮一些。您似乎还能够区分工人,我认为这是某种工人 ID。总而言之,这和查理读完后几乎一模一样。