提问人:Carl Blunck 提问时间:9/11/2023 最后编辑:CharliefaceCarl Blunck 更新时间:9/11/2023 访问量:86
在 SQL 中按连续日期周期对记录进行分组
Group records by continuous date periods in SQL
问:
我有一张这样的表格:
t104f005_employee_no | t104f040_position_no | t104f025_date_effective | t104f030_date_to |
---|---|---|---|
11354 | 89043 | 01/07/1999 | 01/04/2012 |
11354 | 89043 | 02/04/2012 | 31/08/2014 |
11354 | 89043 | 01/09/2014 | 21/09/2014 |
11354 | 89043 | 22/09/2014 | 27/09/2015 |
11354 | 23273 | 04/05/2015 | 27/09/2015 |
11354 | 94040 | 28/09/2015 | 10/01/2016 |
11354 | 94040 | 11/01/2016 | 08/05/2017 |
11354 | 94040 | 09/05/2017 | 24/03/2019 |
11354 | 99406 | 26/02/2018 | 01/07/2018 |
11354 | 99406 | 02/07/2018 | 30/09/2018 |
11354 | 99406 | 01/10/2018 | 01/11/2018 |
11354 | 97293 | 02/11/2018 | 25/11/2018 |
11354 | 99406 | 26/11/2018 | 10/03/2019 |
11354 | 97293 | 11/03/2019 | 24/03/2019 |
11354 | 99406 | 25/03/2019 | 30/09/2019 |
11354 | 99406 | 01/10/2019 | 15/03/2020 |
11354 | 97293 | 11/11/2019 | 24/11/2019 |
11354 | 99406 | 16/03/2020 | 10/05/2020 |
11354 | 101808 | 11/05/2020 | 14/06/2020 |
11354 | 99406 | 11/05/2020 | 14/06/2020 |
11354 | 101808 | 15/06/2020 | 04/10/2020 |
11354 | 99406 | 15/06/2020 | 04/10/2020 |
11354 | 101808 | 05/10/2020 | 25/10/2020 |
11354 | 99406 | 05/10/2020 | 21/11/2021 |
11354 | 101808 | 26/10/2020 | 31/01/2021 |
11354 | 101808 | 01/02/2021 | 28/02/2021 |
11354 | 101808 | 01/03/2021 | 30/06/2021 |
11354 | 101806 | 01/07/2021 | 18/07/2021 |
11354 | 101808 | 19/07/2021 | 21/11/2021 |
11354 | 101808 | 22/11/2021 | 28/11/2021 |
11354 | 99406 | 22/11/2021 | 28/11/2021 |
11354 | 101808 | 29/11/2021 | 20/03/2022 |
11354 | 101806 | 21/03/2022 | 17/04/2022 |
11354 | 101808 | 21/03/2022 | 22/05/2022 |
11354 | 101808 | 23/05/2022 | 09/04/2023 |
11354 | 101903 | 15/08/2022 | 12/02/2023 |
11354 | 105009 | 13/02/2023 | 28/02/2023 |
11354 | 105009 | 01/03/2023 | 09/04/2023 |
11354 | 101808 | 10/04/2023 | 零 |
11354 | 80717 | 15/05/2023 | 12/05/2024 |
而且我需要能够识别没有周期的连续位置,以及每个连续周期输出的最小开始和最大结束日期。预期结果如下所示:
t104f005_employee_no | t104f040_position_no | t104f025_date_effective | t104f030_date_to |
---|---|---|---|
11354 | 89043 | 01/07/1999 | 27/09/2015 |
11354 | 23273 | 04/05/2015 | 27/09/2015 |
11354 | 94040 | 28/09/2015 | 24/03/2019 |
11354 | 99406 | 26/02/2018 | 01/11/2018 |
11354 | 97293 | 02/11/2018 | 25/11/2018 |
11354 | 99406 | 26/11/2018 | 10/03/2019 |
11354 | 97293 | 11/03/2019 | 24/03/2019 |
11354 | 99406 | 25/03/2019 | 28/11/2021 |
11354 | 97293 | 11/11/2019 | 24/11/2019 |
11354 | 101808 | 11/05/2020 | 30/06/2021 |
11354 | 101806 | 01/07/2021 | 18/07/2021 |
11354 | 101808 | 19/07/2021 | 零 |
11354 | 101806 | 21/03/2022 | 17/04/2022 |
11354 | 101903 | 15/08/2022 | 12/02/2023 |
11354 | 105009 | 13/02/2023 | 28/02/2023 |
11354 | 105009 | 01/03/2023 | 09/04/2023 |
11354 | 80717 | 15/05/2023 | 12/05/2024 |
我尝试了以下方法,但没有成功,我现在有点难住了。
WITH RankedPositions AS (
SELECT
[t104f005_employee_no],
[t104f040_position_no],
[t104f025_date_effective],
[t104f030_date_to],
ROW_NUMBER() OVER (PARTITION BY [t104f005_employee_no] ORDER BY [t104f025_date_effective]) -
ROW_NUMBER() OVER (PARTITION BY [t104f005_employee_no], [t104f040_position_no] ORDER BY [t104f025_date_effective]) AS grp
FROM [AUR11PROD].[dbo].[t104_employment_history] with (nolock)
WHERE [t104f005_employee_no] = '11354'
)
SELECT
[t104f005_employee_no],
[t104f040_position_no],
MIN([t104f025_date_effective]) AS min_startdate,
MAX([t104f030_date_to]) AS max_enddate
FROM RankedPositions
GROUP BY [t104f005_employee_no], [t104f040_position_no], grp
ORDER BY [t104f005_employee_no], min_startdate;
还有这个
WITH RECURSIVE ContinuousPositions AS (
SELECT
employee_no,
position_no,
startdate,
enddate
FROM your_table_name
WHERE NOT EXISTS (
SELECT 1
FROM your_table_name AS t2
WHERE t2.employee_no = your_table_name.employee_no
AND t2.position_no = your_table_name.position_no
AND t2.startdate < your_table_name.startdate
)
UNION ALL
SELECT
cp.employee_no,
cp.position_no,
cp.startdate,
t.enddate
FROM ContinuousPositions AS cp
JOIN your_table_name AS t ON (
cp.employee_no = t.employee_no
AND cp.position_no = t.position_no
AND cp.enddate = DATEADD(day, -1, t.startdate)
)
)
SELECT
employee_no,
position_no,
MIN(startdate) AS min_startdate,
MAX(enddate) AS max_enddate
FROM ContinuousPositions
GROUP BY employee_no, position_no
ORDER BY employee_no, min_startdate;
答:
2赞
Squirrel
9/11/2023
#1
这是一个差距和孤岛问题。一种方法是使用或标识何时连续日期分隔并设置标志()。对标志执行累积 sum() 会为您提供所需的分组 ()。LEAD()
LAG()
g
grp
with
cte as
(
select employee_no, position_no, date_effective, date_expiry,
g = case when dateadd(day, -1, date_effective)
<> lag(date_expiry) over (partition by employee_no, position_no
order by date_effective)
then 1
else 0
end
from employment_history
),
cte2 as
(
select employee_no, position_no, date_effective,
date_expiry = isnull(date_expiry, '99991231'),
grp = sum(g) over (partition by employee_no, position_no
order by date_effective)
from cte
)
select employee_no, position_no,
date_effective = min(date_effective),
date_expiry = nullif(max(date_expiry), '99991231')
from cte2
group by employee_no, position_no, grp
order by employee_no, date_effective
评论
0赞
Carl Blunck
9/11/2023
嘿@squirrel,谢谢!我试过了这个,但我收到这个错误。消息 8127,级别 16,状态 1,第 27 行列“cte2.t104f025_date_effective”在 ORDER BY 子句中无效,因为它不包含在聚合函数或 GROUP BY 子句中。我使用的是这个版本 - Microsoft SQL Server 2016 (SP3-GDR) (KB5021129) - 13.0.6430.49 (x64) 1月 22, 2023 17:38:22 版权所有 (c) Microsoft Corporation Enterprise Edition: Windows Server 2016 Datacenter 10.0 <X64> 上的基于内核的许可(64 位)(内部版本 14393:)(虚拟机监控程序)
0赞
Squirrel
9/11/2023
我没有使用与您的列完全相同的列名,因为我的大脑无法处理这些额外的前缀。只需比较 DDL 并查找并替换为您的实际列名,它就可以工作了t104...
0赞
Carl Blunck
9/11/2023
干杯@squirrell。我一定是第一次错误地编辑了某些内容。
0赞
Carl Blunck
9/11/2023
@DaleK - 我做到了,还在学习。
评论