提问人:crackersNcheese 提问时间:10/27/2023 最后编辑:CharliefacecrackersNcheese 更新时间:10/31/2023 访问量:159
SQL查询根据事件时间将数据聚合为班次?
SQL query to aggregate data into shifts based on time of events?
问:
更新为使用 MS SQL Server 指定。
我有一个表,其中有一行对应工作人员执行的每个操作,每个操作都带有日期/时间戳。
Action_Log
Action_ID | Action_DTTM |
---|---|
1 | 04/19/2023 11:15:37 |
2 | 04/19/2023 12:16:14 |
3 | 04/19/2023 12:16:14 |
4 | 04/19/2023 14:21:03 |
5 | 04/19/2023 16:55:41 |
6 | 04/19/2023 18:10:43 |
7 | 04/19/2023 20:24:15 |
8 | 04/19/2023 22:43:29 |
9 | 04/20/2023 13:21:04 |
10 | 04/20/2023 13:55:57 |
11 | 04/20/2023 13:55:57 |
12 | 04/20/2023 16:28:11 |
13 | 04/20/2023 17:35:45 |
14 | 04/20/2023 18:48:01 |
15 | 04/20/2023 19:32:26 |
16 | 04/20/2023 20:16:39 |
17 | 04/20/2023 21:03:25 |
我需要一个查询,将操作聚合到班次中。上表显示了单个工作人员在 2 个班次期间的操作。工人可以工作几种不同的班次类型——ShiftA 是上午 11 点至晚上 11 点,ShiftB 是下午 1 点至晚上 9 点——但我没有轮班时间表告诉我每种类型的班次何时安排工人。我所拥有的只是操作日志。
我希望的输出类型示例:
Action_ID | Shift_Number | Shift_Type |
---|---|---|
1 | 1 | 班次A |
2 | 1 | 班次A |
3 | 1 | 班次A |
4 | 1 | 班次A |
5 | 1 | 班次A |
6 | 1 | 班次A |
7 | 1 | 班次A |
8 | 1 | 班次A |
9 | 2 | 移位B |
10 | 2 | 移位B |
11 | 2 | 移位B |
12 | 2 | 移位B |
13 | 2 | 移位B |
14 | 2 | 移位B |
15 | 2 | 移位B |
16 | 2 | 移位B |
17 | 2 | 移位B |
我什至不确定这是否可以在 SQL 中完成?也不确定我的示例表是否以清晰的方式呈现。如果更容易阅读,我可以添加屏幕截图
答:
这是一个典型的间隙和孤岛问题。
您需要首先计算出每个班次的开始时间(通过寻找超过 8 小时的间隔)。然后,您需要在整个班次中应用该计算,然后在此基础上分配班次。
我注意到您的班次是重叠的,因此结果可能并不完全准确。
一种选择是分配一个有条件的轮班开始计数,它为每个班次提供一个数字,然后取每个班次的最小值。Action_DTTM
WITH Previous AS (
SELECT *,
PrevDate = LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
FROM Action_Log al
),
Grouped AS (
SELECT *,
Shift_Number = COUNT(CASE WHEN PrevDate > DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE 1 END)
OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
FROM Previous
),
PerGroup AS (
SELECT *,
FirstAction = DATEPART(hour, MIN(Action_DTTM) OVER (PARTITION BY Shift_Number))
FROM Grouped
)
SELECT
Action_ID,
Shift_Number,
Shift_Type =
CASE WHEN FirstAction BETWEEN 11 AND 12
THEN 'A'
WHEN FirstAction BETWEEN 13 AND 17
THEN 'B'
WHEN FirstAction BETWEEN 18 AND 21
THEN 'D'
ELSE 'C'
END
FROM PerGroup;
一个更好(更有效)的选项可能是仅在班次开始时复制,然后使用它来创建班次编号并将其应用于所有其他列。Action_DTTM
WITH Previous AS (
SELECT *,
ShiftStart = CASE WHEN LAG(Action_DTTM) OVER (ORDER BY Action_DTTM)
> DATEADD(hour, -8, Action_DTTM) THEN NULL ELSE Action_DTTM END
FROM Action_Log al
),
Grouped AS (
SELECT *,
AllShiftStart = MAX(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING),
Shift_Number = COUNT(ShiftStart) OVER (ORDER BY Action_DTTM ROWS UNBOUNDED PRECEDING)
FROM Previous
),
PerGroup AS (
SELECT *,
FirstAction = DATEPART(hour, AllShiftStart)
FROM Grouped
)
SELECT
Action_ID,
Shift_Number,
Shift_Type =
CASE WHEN FirstAction BETWEEN 11 AND 12
THEN 'A'
WHEN FirstAction BETWEEN 13 AND 17
THEN 'B'
WHEN FirstAction BETWEEN 18 AND 21
THEN 'D'
ELSE 'C'
END
FROM PerGroup;
OP,操作 ID 17 超移 225 秒。查理法斯的回答对第一班的动作(幻数)做出了假设。我的方法很自然地得到了这一点。代码包含所有注释来解释所有内容。我还添加到您的样本数据集中,以获取所有 4 个班次的数据以证明这有效。此外,我的解决方案支持重叠班次,但假设任何给定秒都只有一个班次在工作。如果不是这样,那么绝对没有办法做到这一点。
我的建议是在用户表(而不是变量)中设置移位时间,类似于我的设置,因此您可以轻松更新该表,而不是使用幻数更新查询。您永远不希望使用幻数进行生产查询。您甚至可以更进一步,添加两个日期时间列 - date_begin和date_end描述一个班次有效的日期范围。在此设置中,您将date_end保留为 null,直到它变得过时,然后根据需要添加新行来描述新的班次时间。您想要这样做的原因是,如果轮班时间发生变化,并且您需要及时查询,那么您将不得不执行多个查询来支持不同的轮班时间更改,或者使用更新的轮班时间表来支持所有这些。
不幸的是,结果集很大(显示支持数据的列很多,深度为 34 行),并且在询问之前不会发布图片。我也不使用 db fiddle。我相信这里的一些人与拥有它的人密切相关,并为它带来流量。
declare @shift_times table
(
shift_number tinyint not null,
shift_type varchar(10) not null,
time_start time not null,
time_end time not null
)
declare @action_log table
(
action_id int not null,
action_dttm datetime not null
)
-- it is nice to have the action_dttm as a date as well for comparison
-- we use this table in place of @action_log after populating
declare @action_log_expanded table
(
action_id int not null,
action_dttm datetime not null,
action_date date not null
)
-- this holds the min and max action dttm by day (the x means by, like this by that)
declare @min_max_dttm_x_date table
(
action_date date not null,
min_dttm_x_day datetime not null,
max_dttm_x_day datetime not null
)
-- this table holds data guaranteed to match the shift to the STARTING day worked
-- based on comparing shift start time to the first day's transaction (look at the top 1 order by clause)
-- so if a shift went over midnight, this table will hold its action date as 1 day less than the action log dttm
-- this will not account for certain oddities that we make up for later
-- if you select from this table after it is insert into, you will see what i mean
declare @guaranteed_shift_x_min_action_dttm_x_day table
(
shift_number tinyint not null,
shift_type varchar(10) not null,
time_start time not null,
time_end time not null,
action_date date not null,
min_dttm_x_day datetime not null,
max_dttm_x_day datetime not null,
action_date_shift_start datetime not null,
action_date_shift_end datetime not null,
action_date_before_midnight datetime not null
)
-- in your sample data, action id 17 went over by 3.x minutes from shift B end
-- this variable allows you some flexibility in allowing overage.
-- in your sample data (which i did not change, but added to it),
-- this record would have gone to shift C, but this allows it to
-- allocate to shift B. If it does not fit the criteria,
-- the record is then omitted from the result set because it matches no shift.
-- your business would need to make a decision about how to handle these.
declare @max_seconds_after_shift int = 300 -- 5 minutes
insert into @shift_times
select 1, 'ShiftA', '11:00:00', '23:00:00' union all
select 2, 'ShiftB', '13:00:00', '21:00:00' union all
select 3, 'ShiftC', '22:00:00', '06:00:00' union all
select 4, 'ShiftD', '18:00:00', '06:00:00'
insert into @action_log
select 1, '04/19/2023 11:15:37' union all
select 2, '04/19/2023 12:16:14' union all
select 3, '04/19/2023 12:16:14' union all
select 4, '04/19/2023 14:21:03' union all
select 5, '04/19/2023 16:55:41' union all
select 6, '04/19/2023 18:10:43' union all
select 7, '04/19/2023 20:24:15' union all
select 8, '04/19/2023 22:43:29' union all
select 9, '04/20/2023 13:21:04' union all
select 10, '04/20/2023 13:55:57' union all
select 11, '04/20/2023 13:55:57' union all
select 12, '04/20/2023 16:28:11' union all
select 13, '04/20/2023 17:35:45' union all
select 14, '04/20/2023 18:48:01' union all
select 15, '04/20/2023 19:32:26' union all
select 16, '04/20/2023 20:16:39' union all
select 17, '04/20/2023 21:03:25' union all
select 18, '04/21/2023 22:21:04' union all
select 19, '04/21/2023 22:55:57' union all
select 20, '04/21/2023 22:55:57' union all
select 21, '04/22/2023 01:28:11' union all
select 22, '04/22/2023 02:35:45' union all
select 23, '04/22/2023 03:48:01' union all
select 24, '04/22/2023 04:32:26' union all
select 25, '04/22/2023 05:16:39' union all
select 26, '04/22/2023 06:03:25' union all
select 27, '04/23/2023 18:15:37' union all
select 28, '04/23/2023 19:16:14' union all
select 29, '04/23/2023 19:16:14' union all
select 30, '04/23/2023 21:21:03' union all
select 31, '04/23/2023 23:55:41' union all
select 32, '04/24/2023 01:10:43' union all
select 33, '04/24/2023 03:24:15' union all
select 34, '04/24/2023 05:43:29'
-- expand the action log
insert into @action_log_expanded
select ac.action_id,
ac.action_dttm,
convert(date, ac.action_dttm) as action_date
from @action_log ac
-- get the min and max action dttm per day
insert into @min_max_dttm_x_date
select top 1 with ties convert(date, a.action_dttm) as action_date,
min(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as min_dttm_x_day,
max(a.action_dttm) over (partition by convert(date, a.action_dttm) order by convert(date, a.action_dttm)) as max_dttm_x_day
from @action_log a
order by ROW_NUMBER() over (partition by convert(date, a.action_dttm) order by a.action_dttm)
-- based on what we know, we can compare shift start time and the first day's transaction
-- to atleast get some guaranteed knowns
insert into @guaranteed_shift_x_min_action_dttm_x_day
select top 1 with ties
s.shift_number,
s.shift_type,
s.time_start,
s.time_end,
d.action_date,
d.min_dttm_x_day,
d.max_dttm_x_day,
cast(d.action_date as datetime) + cast(s.time_start as datetime) as action_date_shift_start,
cast(d.action_date as datetime) + cast(s.time_end as datetime) as action_date_shift_end,
cast(d.action_date as datetime) + cast('23:59:59' as datetime) as action_date_before_midnight
from @shift_times s
inner join
@min_max_dttm_x_date d on
s.time_start <= convert(time, d.min_dttm_x_day) -- purposefully do not check the end time because data shows (action id 17) that users can go over shift)
order by ROW_NUMBER() over (partition by d.action_date order by abs(datediff(second, convert(time, s.time_start), convert(time, d.min_dttm_x_day)))) asc
-- join the action log
-- for the dates that are not on here, we can simply go into the next day in the join within the action log by adding the action date + 1 and checking for time
select *
from @guaranteed_shift_x_min_action_dttm_x_day g
inner join
@action_log_expanded e on
(
g.action_date = e.action_date and -- the shift's first transaction date equals the action dttm
(
g.time_start < g.time_end and -- this shift does not overlap midnight
(
( -- the action dttm betwen the shift times - just adding time to a date time
e.action_dttm between g.action_date_shift_start and g.action_date_shift_end
) or
(
-- existence and flexibility of time outside of shift end - see your action log id 17
-- this is adjustable
datediff(second, g.action_date_shift_end, e.action_dttm) between 0 and @max_seconds_after_shift
)
)
) or
(
g.time_start > g.time_end and -- shift spans two days but this join is for start day
e.action_dttm between g.action_date_shift_start and g.action_date_before_midnight
)
) or
(
-- the shift's first transaction happened on the previous day
cast(dateadd(day, 1, g.action_date) as date) = e.action_date and
g.time_start > g.time_end and
(
(
e.action_dttm between cast(e.action_date as datetime) and dateadd(day, 1, g.action_date_shift_end)
) or
(
datediff(second, dateadd(day, 1, g.action_date_shift_end), e.action_dttm) between 0 and @max_seconds_after_shift
)
)
)
order by e.action_id asc
评论
如果你真的有没有长时间中断的数据,那么我认为你会想要找到这些集群,然后在每个跨度内使用最小(和/或最大)时间。
with data as (
select *,
case when datediff(hour, Action_DTTM,
lead(Action_DTTM) over (partition by Worker_ID order by Action_DTTM)) > 3
then 1 else 0 end as shift_Break
from T
), shifts as (
select *,
sum(shift_Break) over (partition by Worker_ID order by Action_DTTM) as shift_Counter
from data
), shift_Agg
select *,
datepart(hour, min(Action_DTTM) over (partition by Worker_ID, shift_Counter)) as first_Hour
from shifts
)
select Worker_ID, Action, DTTM
case when first_Hour < 15 then -- A or B
case when first_Hour < 13 then 'A' else 'B' end else
case when first_Hour < 21 then 'C' else 'D' end
end as shift_Name
from shift_Agg;
我认为有可能早晚开始轮班,所以我试着稍微捏造一下范围。显然,你会想要玩这个逻辑,或者通过查看结束时间或总持续时间来使其稍微健壮一些。您似乎还能够区分工人,我认为这是某种工人 ID。总而言之,这和查理读完后几乎一模一样。
评论