提问人:xMRi 提问时间:11/29/2022 最后编辑:xMRi 更新时间:11/29/2022 访问量:93
SQL 查询以计算时间范围内的记录数?
SQL query to count the number of records in a time range?
问:
假设您有一个包含用户 ID 和 date+time 的表(为简单起见,步长为 1 小时)
此处的表格按代理和时间戳排序。
Usr Date Comment
1 2022-11-29 12:00 <- Start of a sequence
1 2022-11-29 13:00
1 2022-11-29 14:00
1 2022-11-30 12:00 <- Start of a sequence
1 2022-11-30 16:00 <- Start of a sequence
2 2022-11-29 22:00 <- Start of a sequence
2 2022-11-29 23:00
2 2022-11-30 00:00 <- Start of a sequence
2 2022-11-30 01:00
3 2022-11-29 13:00 <- Start of a sequence
3 2022-11-29 14:00
3 2022-11-30 12:00 <- Start of a sequence
3 2022-11-30 13:00
3 2022-11-30 14:00
4 2022-11-30 12:00 <- Start of a sequence
4 2022-11-30 13:00
4 2022-11-30 14:00
5 2022-11-30 16:00 <- Start of a sequence
- 预期结果是序列的开始及其长度。
- 为简单起见,每个间隙为 1 小时。
- 新一天的开始 (00:00) 总是开始一个新的序列
Usr Date Length
1 2022-11-29 12:00 3
1 2022-11-30 12:00 1
1 2022-11-30 16:00 1
2 2022-11-29 22:00 2
2 2022-11-30 00:00 2
3 2022-11-29 13:00 2
3 2022-11-30 12:00 3
4 2022-11-30 12:00 3
5 2022-11-30 16:00 1
我找到了一些代码示例,但没有得到预期的结果。dense_rank
row_number
我有一个解决方案,在源表中的每条记录上运行并创建结果表,但它很慢。
查询必须在 SQL 2012 或更高版本上运行。
答:
1赞
Arzanis
11/29/2022
#1
这个问题有很多解决方案。这是所需的查询性能的问题。 如果存在速度问题,我建议在录制时计算持续时间。
返回所需结果的查询示例:
DROP TABLE IF EXISTS dbo.test;
CREATE TABLE dbo.test
(
Usr INT, [Date] DATETIME);
DECLARE @gap INT = 1;
INSERT INTO dbo.test (Usr, [Date])
VALUES (1, '2022-11-29T12:00:00')
, (1, '2022-11-29T13:00:00')
, (1, '2022-11-29T14:00:00')
, (1, '2022-11-30T12:00:00')
, (1, '2022-11-30T16:00:00')
, (2, '2022-11-29T22:00:00')
, (2, '2022-11-29T23:00:00')
, (2, '2022-11-30T00:00:00')
, (2, '2022-11-30T01:00:00')
, (3, '2022-11-29T13:00:00')
, (3, '2022-11-29T14:00:00')
, (3, '2022-11-30T12:00:00')
, (3, '2022-11-30T13:00:00')
, (3, '2022-11-30T14:00:00')
, (4, '2022-11-30T12:00:00')
, (4, '2022-11-30T13:00:00')
, (4, '2022-11-30T14:00:00')
, (5, '2022-11-30T16:00:00');
WITH lag_cte AS
(
SELECT *
, LAG([Date], 1, [Date]) OVER (PARTITION BY Usr, CAST([Date] AS DATE) ORDER BY [Date]) lead_date --previous time by usr and date
, DATEADD(HOUR, -@gap, [Date]) gap_date --calc same group time for comprassion
, ROW_NUMBER() OVER (ORDER BY Usr, [Date]) rn --sort of identity
FROM dbo.test
)
SELECT lc.Usr
, MIN(lc.[Date]) AS [date]
, COUNT(1) AS [length]
FROM lag_cte lc
OUTER APPLY (--previous start of sequence
SELECT TOP 1 rn AS grouping_rn
FROM lag_cte li
WHERE li.Usr = lc.Usr
AND li.[Date] <= lc.[Date]
AND li.lead_date != li.gap_date --sequence staert marker
ORDER BY li.[Date] DESC
) g
GROUP BY lc.usr, CAST(lc.[Date] AS DATE), g.grouping_rn
1赞
Dan Guzman
11/29/2022
#2
@Arzanis如前所述,有许多解决方案可以提供所需的结果。下面的示例在复合主键 和 上应该表现得相当好。Usr
Date
WITH time_sequences AS (
SELECT
Usr
,Date
,LAG(Usr) OVER(PARTITION BY Usr ORDER BY Date) AS PrevUsr
,LAG(Date) OVER(PARTITION BY Usr ORDER BY Date) AS PrevDate
,LEAD(Usr) OVER(PARTITION BY Usr ORDER BY Date) AS NextUsr
,LEAD(Date) OVER(PARTITION BY Usr ORDER BY Date) AS NextDate
FROM dbo.test
)
,start_sequences AS (
SELECT
Usr
,Date
,'start' AS comment
,ROW_NUMBER() OVER(PARTITION BY Usr ORDER BY Date) AS seq
FROM time_sequences
WHERE PrevUsr IS NULL OR PrevDate <> DATEADD(hour, -1, Date) OR CAST(Date AS date) <> CAST(PrevDate AS date)
)
,end_sequences AS (
SELECT
Usr
,Date
,'end' AS comment
,ROW_NUMBER() OVER(PARTITION BY Usr ORDER BY Date) AS seq
FROM time_sequences
WHERE NextUsr IS NULL OR NextDate <> DATEADD(hour, 1, Date) OR CAST(Date AS date) <> CAST(NextDate AS date)
)
SELECT ss.Usr, ss.Date, DATEDIFF(hour, ss.Date, es.Date) + 1 AS SeqLength
FROM start_sequences AS ss
JOIN end_sequences AS es ON es.Usr = ss.Usr AND es.seq = ss.seq
ORDER BY
es.Usr
, es.Date;
1赞
RF1991
11/29/2022
#3
您的数据
CREATE TABLE mytable(
Usr INTEGER NOT NULL
,Date DATEtime NOT NULL
,Comment VARCHAR(100)
);
INSERT INTO mytable(Usr,Date,Comment) VALUES
(1,'2022-11-29 12:00','<- Start of a sequence'),
(1,'2022-11-29 13:00',NULL),
(1,'2022-11-29 14:00',NULL),
(1,'2022-11-30 12:00','<- Start of a sequence'),
(1,'2022-11-30 16:00','<- Start of a sequence'),
(2,'2022-11-29 22:00','<- Start of a sequence'),
(2,'2022-11-29 23:00',NULL),
(2,'2022-11-30 00:00','<- Start of a sequence'),
(2,'2022-11-30 01:00',NULL),
(3,'2022-11-29 13:00','<- Start of a sequence'),
(3,'2022-11-29 14:00',NULL),
(3,'2022-11-30 12:00','<- Start of a sequence'),
(3,'2022-11-30 13:00',NULL),
(3,'2022-11-30 14:00',NULL),
(4,'2022-11-30 12:00','<- Start of a sequence'),
(4,'2022-11-30 13:00',NULL),
(4,'2022-11-30 14:00',NULL),
(5,'2022-11-30 16:00','<- Start of a sequence');
您的查询
select usr,
comment,
count(comment) Length
From (
SELECT
usr
,CASE
WHEN Comment IS NULL THEN (
SELECT TOP 1
cast(inner_table.Date as varchar(100))
FROM
mytable as inner_table
WHERE
inner_table.Usr = mytable.Usr
AND inner_table.Date < mytable.Date
AND inner_table.Comment IS NOT NULL
ORDER BY
inner_table.Date DESC
)
ELSE
Date
END as Comment
FROM
mytable) a
group by usr,comment
order by usr,comment
评论