SQL查询根据事件时间将数据聚合为班次？-解网

问：

更新为使用 MS SQL Server 指定。

我有一个表，其中有一行对应工作人员执行的每个操作，每个操作都带有日期/时间戳。

Action_Log

Action_ID	Action_DTTM
1	04/19/2023 11:15:37
2	04/19/2023 12:16:14
3	04/19/2023 12:16:14
4	04/19/2023 14:21:03
5	04/19/2023 16:55:41
6	04/19/2023 18:10:43
7	04/19/2023 20:24:15
8	04/19/2023 22:43:29
9	04/20/2023 13:21:04
10	04/20/2023 13:55:57
11	04/20/2023 13:55:57
12	04/20/2023 16:28:11
13	04/20/2023 17:35:45
14	04/20/2023 18:48:01
15	04/20/2023 19:32:26
16	04/20/2023 20:16:39
17	04/20/2023 21:03:25

我需要一个查询，将操作聚合到班次中。上表显示了单个工作人员在 2 个班次期间的操作。工人可以工作几种不同的班次类型——ShiftA 是上午 11 点至晚上 11 点，ShiftB 是下午 1 点至晚上 9 点——但我没有轮班时间表告诉我每种类型的班次何时安排工人。我所拥有的只是操作日志。

我希望的输出类型示例：

Action_ID	Shift_Number	Shift_Type
1	1	班次A
2	1	班次A
3	1	班次A
4	1	班次A
5	1	班次A
6	1	班次A
7	1	班次A
8	1	班次A
9	2	移位B
10	2	移位B
11	2	移位B
12	2	移位B
13	2	移位B
14	2	移位B
15	2	移位B
16	2	移位B
17	2	移位B

我什至不确定这是否可以在 SQL 中完成？也不确定我的示例表是否以清晰的方式呈现。如果更容易阅读，我可以添加屏幕截图

sql-server t-sql 间隙和孤岛

请用实际的 dbms 标记！如果将鼠标悬停在 sql 标签上，它会说：结构化查询语言（SQL）是一种用于查询数据库的语言。问题应包括代码示例、表结构、示例数据以及正在使用的 DBMS 实现（例如 MySQL、PostgreSQL、Oracle、MS SQL Server、IBM DB2 等）的标签。如果您的问题仅与特定 DBMS（使用特定扩展/功能）相关，请改用该 DBMS 的标记。用 SQL 标记的问题的答案应使用 ISO/IEC 标准 SQL

1赞 Adrian Maxwell 10/27/2023

实际 dbms 之所以如此重要，是因为每个 dbms 对于如何处理日期和时间都有非常不同的语法。这意味着我们无法为您提供有用的解决方案，因为我们无法选择要使用的正确语法。所以 - 要得到答案，我们确实需要您选择正确的标签。

1赞 shawnt00 10/27/2023

你们的班次重叠？

1赞 Clockwork-Muse 10/27/2023

如果你的班次重叠，你希望如何显示/说明事情？

答：

0赞 Charlieface 10/29/2023 #1

这是一个典型的间隙和孤岛问题。

您需要首先计算出每个班次的开始时间（通过寻找超过 8 小时的间隔）。然后，您需要在整个班次中应用该计算，然后在此基础上分配班次。

我注意到您的班次是重叠的，因此结果可能并不完全准确。

一种选择是分配一个有条件的轮班开始计数，它为每个班次提供一个数字，然后取每个班次的最小值。Action_DTTM

WITH Previous AS (
    SELECT *,
      PrevDate = LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
    FROM Action_Log al
),
Grouped AS (
    SELECT *,
      Shift_Number = COUNT(CASE WHEN PrevDate > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE 1 END)
                 OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
    FROM Previous
),
PerGroup AS (
    SELECT *,
      FirstAction = DATEPART(hour, MIN(Action_DTTM) OVER (PARTITION BY Shift_Number))
    FROM Grouped
)
SELECT
  Action_ID,
  Shift_Number,
  Shift_Type =
    CASE WHEN FirstAction BETWEEN 11 AND 12
            THEN 'A'
         WHEN FirstAction BETWEEN 13 AND 17
            THEN 'B'
         WHEN FirstAction BETWEEN 18 AND 21
            THEN 'D'
         ELSE 'C'
    END
FROM PerGroup;

一个更好（更有效）的选项可能是仅在班次开始时复制，然后使用它来创建班次编号并将其应用于所有其他列。Action_DTTM

WITH Previous AS (
    SELECT *,
      ShiftStart = CASE WHEN LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
                   > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE Action_DTTM END
    FROM Action_Log al
),
Grouped AS (
    SELECT *,
      AllShiftStart =  MAX(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING),
      Shift_Number = COUNT(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
    FROM Previous
),
PerGroup AS (
    SELECT *,
      FirstAction = DATEPART(hour, AllShiftStart)
    FROM Grouped
)
SELECT
  Action_ID,
  Shift_Number,
  Shift_Type =
    CASE WHEN FirstAction BETWEEN 11 AND 12
            THEN 'A'
         WHEN FirstAction BETWEEN 13 AND 17
            THEN 'B'
         WHEN FirstAction BETWEEN 18 AND 21
            THEN 'D'
         ELSE 'C'
    END
FROM PerGroup;

db<>小提琴

1赞 ILoveSQL 10/29/2023 #2

OP，操作 ID 17 超移 225 秒。查理法斯的回答对第一班的动作（幻数）做出了假设。我的方法很自然地得到了这一点。代码包含所有注释来解释所有内容。我还添加到您的样本数据集中，以获取所有 4 个班次的数据以证明这有效。此外，我的解决方案支持重叠班次，但假设任何给定秒都只有一个班次在工作。如果不是这样，那么绝对没有办法做到这一点。

我的建议是在用户表（而不是变量）中设置移位时间，类似于我的设置，因此您可以轻松更新该表，而不是使用幻数更新查询。您永远不希望使用幻数进行生产查询。您甚至可以更进一步，添加两个日期时间列 - date_begin和date_end描述一个班次有效的日期范围。在此设置中，您将date_end保留为 null，直到它变得过时，然后根据需要添加新行来描述新的班次时间。您想要这样做的原因是，如果轮班时间发生变化，并且您需要及时查询，那么您将不得不执行多个查询来支持不同的轮班时间更改，或者使用更新的轮班时间表来支持所有这些。

不幸的是，结果集很大（显示支持数据的列很多，深度为 34 行），并且在询问之前不会发布图片。我也不使用 db fiddle。我相信这里的一些人与拥有它的人密切相关，并为它带来流量。

declare @shift_times table
(
    shift_number tinyint not null,
    shift_type varchar(10) not null,
    time_start time not null,
    time_end time not null
)

declare @action_log table
(
    action_id int not null,
    action_dttm datetime not null
)

-- it is nice to have the action_dttm as a date as well for comparison
-- we use this table in place of @action_log after populating
declare @action_log_expanded table
(
    action_id int not null,
    action_dttm datetime not null,
    action_date date not null
)

-- this holds the min and max action dttm by day (the x means by, like this by that)
declare @min_max_dttm_x_date table
(
    action_date date not null,
    min_dttm_x_day datetime not null,
    max_dttm_x_day datetime not null
)

-- this table holds data guaranteed to match the shift to the STARTING day worked
-- based on comparing shift start time to the first day's transaction (look at the top 1 order by clause)
-- so if a shift went over midnight, this table will hold its action date as 1 day less than the action log dttm
-- this will not account for certain oddities that we make up for later
-- if you select from this table after it is insert into, you will see what i mean
declare @guaranteed_shift_x_min_action_dttm_x_day table
(
    shift_number tinyint not null,
    shift_type varchar(10) not null,
    time_start time not null,
    time_end time not null,
    action_date date not null,
    min_dttm_x_day datetime not null,
    max_dttm_x_day datetime not null,
    action_date_shift_start datetime not null,
    action_date_shift_end datetime not null,
    action_date_before_midnight datetime not null
)


-- in your sample data, action id 17 went over by 3.x minutes from shift B end
-- this variable allows you some flexibility in allowing overage.
-- in your sample data (which i did not change, but added to it),
-- this record would have gone to shift C, but this allows it to 
-- allocate to shift B. If it does not fit the criteria, 
-- the record is then omitted from the result set because it matches no shift.
-- your business would need to make a decision about how to handle these. 
declare @max_seconds_after_shift int = 300 -- 5 minutes


insert  into @shift_times
select  1, 'ShiftA', '11:00:00', '23:00:00' union all
select  2, 'ShiftB', '13:00:00', '21:00:00' union all
select  3, 'ShiftC', '22:00:00', '06:00:00' union all
select  4, 'ShiftD', '18:00:00', '06:00:00'


insert  into @action_log
select 1, '04/19/2023 11:15:37' union all
select 2, '04/19/2023 12:16:14' union all
select 3, '04/19/2023 12:16:14' union all
select 4, '04/19/2023 14:21:03' union all
select 5, '04/19/2023 16:55:41' union all
select 6, '04/19/2023 18:10:43' union all
select 7, '04/19/2023 20:24:15' union all
select 8, '04/19/2023 22:43:29' union all
select 9, '04/20/2023 13:21:04' union all
select 10, '04/20/2023 13:55:57' union all
select 11, '04/20/2023 13:55:57' union all
select 12, '04/20/2023 16:28:11' union all
select 13, '04/20/2023 17:35:45' union all
select 14, '04/20/2023 18:48:01' union all
select 15, '04/20/2023 19:32:26' union all
select 16, '04/20/2023 20:16:39' union all
select 17, '04/20/2023 21:03:25' union all
select 18, '04/21/2023 22:21:04' union all
select 19, '04/21/2023 22:55:57' union all
select 20, '04/21/2023 22:55:57' union all
select 21, '04/22/2023 01:28:11' union all
select 22, '04/22/2023 02:35:45' union all
select 23, '04/22/2023 03:48:01' union all
select 24, '04/22/2023 04:32:26' union all
select 25, '04/22/2023 05:16:39' union all
select 26, '04/22/2023 06:03:25' union all
select 27, '04/23/2023 18:15:37' union all
select 28, '04/23/2023 19:16:14' union all
select 29, '04/23/2023 19:16:14' union all
select 30, '04/23/2023 21:21:03' union all
select 31, '04/23/2023 23:55:41' union all
select 32, '04/24/2023 01:10:43' union all
select 33, '04/24/2023 03:24:15' union all
select 34, '04/24/2023 05:43:29' 


-- expand the action log
insert  into @action_log_expanded
select  ac.action_id,
        ac.action_dttm,
        convert(date, ac.action_dttm) as action_date
from    @action_log ac


-- get the min and max action dttm per day
insert  into @min_max_dttm_x_date
select  top 1 with ties convert(date, a.action_dttm) as action_date,
        min(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as min_dttm_x_day,
        max(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as max_dttm_x_day
from    @action_log a
order by    ROW_NUMBER() over (partition by convert(date, a.action_dttm) order by a.action_dttm) 


-- based on what we know, we can compare shift start time and the first day's transaction
-- to atleast get some guaranteed knowns
insert  into @guaranteed_shift_x_min_action_dttm_x_day
select  top 1 with ties 
        s.shift_number,
        s.shift_type,
        s.time_start,
        s.time_end,
        d.action_date,
        d.min_dttm_x_day,
        d.max_dttm_x_day,
        cast(d.action_date as datetime) + cast(s.time_start as datetime) as action_date_shift_start,
        cast(d.action_date as datetime) + cast(s.time_end as datetime) as action_date_shift_end,
        cast(d.action_date as datetime) + cast('23:59:59' as datetime) as action_date_before_midnight
from    @shift_times s
                inner join
        @min_max_dttm_x_date d on 
                    s.time_start <= convert(time, d.min_dttm_x_day)  -- purposefully do not check the end time because data shows (action id 17) that users can go over shift)
order by    ROW_NUMBER() over (partition by d.action_date order by abs(datediff(second, convert(time, s.time_start), convert(time, d.min_dttm_x_day)))) asc 



-- join the action log
-- for the dates that are not on here, we can simply go into the next day in the join within the action log by adding the action date + 1 and checking for time
select  *
from    @guaranteed_shift_x_min_action_dttm_x_day g
            inner join
        @action_log_expanded e on
                (
                    g.action_date = e.action_date and -- the shift's first transaction date equals the action dttm
                    (
                        g.time_start < g.time_end and -- this shift does not overlap midnight
                        (
                            (   -- the action dttm betwen the shift times - just adding time to a date time
                                e.action_dttm between g.action_date_shift_start and g.action_date_shift_end
                            ) or
                            (
                                -- existence and flexibility of time outside of shift end - see your action log id 17
                                -- this is adjustable
                                datediff(second, g.action_date_shift_end, e.action_dttm) between 0 and @max_seconds_after_shift
                            )
                        )
                    ) or
                    (
                        g.time_start > g.time_end and -- shift spans two days but this join is for start day
                        e.action_dttm between g.action_date_shift_start and g.action_date_before_midnight
                    )                   
                ) or
                (
                    -- the shift's first transaction happened on the previous day
                    cast(dateadd(day, 1, g.action_date) as date) = e.action_date and
                    g.time_start > g.time_end and
                    (
                        (
                            e.action_dttm between cast(e.action_date as datetime) and dateadd(day, 1, g.action_date_shift_end)
                        ) or
                        (
                            datediff(second, dateadd(day, 1, g.action_date_shift_end), e.action_dttm) between 0 and @max_seconds_after_shift

                        )

                    )
                )
order by    e.action_id asc

with data as (
    select *,
        case when datediff(hour, Action_DTTM,
                 lead(Action_DTTM) over (partition by Worker_ID order by Action_DTTM)) > 3
             then 1 else 0 end as shift_Break
    from T
), shifts as (
    select *,
        sum(shift_Break) over (partition by Worker_ID order by Action_DTTM) as shift_Counter
    from data
), shift_Agg
    select *,
        datepart(hour, min(Action_DTTM) over (partition by Worker_ID, shift_Counter)) as first_Hour
    from shifts
)
select Worker_ID, Action, DTTM
    case when first_Hour < 15 then -- A or B
        case when first_Hour < 13 then 'A' else 'B' end else
        case when first_Hour < 21 then 'C' else 'D' end
    end as shift_Name
from shift_Agg;

我认为有可能早晚开始轮班，所以我试着稍微捏造一下范围。显然，你会想要玩这个逻辑，或者通过查看结束时间或总持续时间来使其稍微健壮一些。您似乎还能够区分工人，我认为这是某种工人 ID。总而言之，这和查理读完后几乎一模一样。

上一个：历史表上的合并操作

下一个：在子窗口 SQL Presto 中执行具有特定条件的值计数

SQL查询根据事件时间将数据聚合为班次？

SQL query to aggregate data into shifts based on time of events?

评论

评论