SQL 查询以计算时间范围内的记录数?

SQL query to count the number of records in a time range?

提问人:xMRi 提问时间:11/29/2022 最后编辑:xMRi 更新时间:11/29/2022 访问量:93

问:

假设您有一个包含用户 ID 和 date+time 的表(为简单起见,步长为 1 小时)

此处的表格按代理和时间戳排序。

Usr Date                Comment
1   2022-11-29 12:00    <- Start of a sequence  
1   2022-11-29 13:00
1   2022-11-29 14:00
1   2022-11-30 12:00    <- Start of a sequence
1   2022-11-30 16:00    <- Start of a sequence
2   2022-11-29 22:00    <- Start of a sequence
2   2022-11-29 23:00
2   2022-11-30 00:00    <- Start of a sequence 
2   2022-11-30 01:00    
3   2022-11-29 13:00    <- Start of a sequence 
3   2022-11-29 14:00
3   2022-11-30 12:00    <- Start of a sequence
3   2022-11-30 13:00
3   2022-11-30 14:00    
4   2022-11-30 12:00    <- Start of a sequence
4   2022-11-30 13:00
4   2022-11-30 14:00
5   2022-11-30 16:00    <- Start of a sequence
  1. 预期结果是序列的开始及其长度。
  2. 为简单起见,每个间隙为 1 小时。
  3. 新一天的开始 (00:00) 总是开始一个新的序列
Usr Date                Length
1   2022-11-29 12:00    3
1   2022-11-30 12:00    1
1   2022-11-30 16:00    1
2   2022-11-29 22:00    2
2   2022-11-30 00:00    2
3   2022-11-29 13:00    2
3   2022-11-30 12:00    3
4   2022-11-30 12:00    3
5   2022-11-30 16:00    1

我找到了一些代码示例,但没有得到预期的结果。dense_rankrow_number

我有一个解决方案,在源表中的每条记录上运行并创建结果表,但它很慢。

查询必须在 SQL 2012 或更高版本上运行。

sql-server sql-server-2012

评论


答:

1赞 Arzanis 11/29/2022 #1

这个问题有很多解决方案。这是所需的查询性能的问题。 如果存在速度问题,我建议在录制时计算持续时间。

返回所需结果的查询示例:

DROP TABLE IF EXISTS dbo.test;

CREATE TABLE dbo.test
(
  Usr INT, [Date] DATETIME);

DECLARE @gap INT = 1;

INSERT INTO dbo.test (Usr, [Date])
VALUES (1, '2022-11-29T12:00:00')
     , (1, '2022-11-29T13:00:00')
     , (1, '2022-11-29T14:00:00')
     , (1, '2022-11-30T12:00:00')
     , (1, '2022-11-30T16:00:00')
     , (2, '2022-11-29T22:00:00')
     , (2, '2022-11-29T23:00:00')
     , (2, '2022-11-30T00:00:00')
     , (2, '2022-11-30T01:00:00')
     , (3, '2022-11-29T13:00:00')
     , (3, '2022-11-29T14:00:00')
     , (3, '2022-11-30T12:00:00')
     , (3, '2022-11-30T13:00:00')
     , (3, '2022-11-30T14:00:00')
     , (4, '2022-11-30T12:00:00')
     , (4, '2022-11-30T13:00:00')
     , (4, '2022-11-30T14:00:00')
     , (5, '2022-11-30T16:00:00');

WITH lag_cte AS
(
SELECT *
     , LAG([Date], 1, [Date]) OVER (PARTITION BY Usr, CAST([Date] AS DATE) ORDER BY [Date]) lead_date --previous time by usr and date
     , DATEADD(HOUR, -@gap, [Date]) gap_date --calc same group time for comprassion
     , ROW_NUMBER() OVER (ORDER BY Usr, [Date]) rn --sort of identity
  FROM dbo.test
)
SELECT lc.Usr
     , MIN(lc.[Date]) AS [date]
     , COUNT(1) AS [length]
  FROM lag_cte lc
  OUTER APPLY (--previous start of sequence
               SELECT TOP 1 rn AS grouping_rn
                 FROM lag_cte li
               WHERE li.Usr = lc.Usr
                 AND li.[Date] <= lc.[Date]
                 AND li.lead_date != li.gap_date --sequence staert marker
               ORDER BY li.[Date] DESC
              ) g
GROUP BY lc.usr, CAST(lc.[Date] AS DATE), g.grouping_rn
1赞 Dan Guzman 11/29/2022 #2

@Arzanis如前所述,有许多解决方案可以提供所需的结果。下面的示例在复合主键 和 上应该表现得相当好。UsrDate

WITH time_sequences AS (
    SELECT 
         Usr
        ,Date
        ,LAG(Usr) OVER(PARTITION BY Usr ORDER BY Date) AS PrevUsr
        ,LAG(Date) OVER(PARTITION BY Usr ORDER BY Date) AS PrevDate
        ,LEAD(Usr) OVER(PARTITION BY Usr ORDER BY Date) AS NextUsr
        ,LEAD(Date) OVER(PARTITION BY Usr ORDER BY Date) AS NextDate
    FROM dbo.test
)
,start_sequences AS (
    SELECT
         Usr
        ,Date
        ,'start' AS comment
        ,ROW_NUMBER() OVER(PARTITION BY Usr ORDER BY Date) AS seq
    FROM time_sequences
    WHERE PrevUsr IS NULL OR PrevDate <> DATEADD(hour, -1, Date) OR CAST(Date AS date) <> CAST(PrevDate AS date)
)
,end_sequences AS (
    SELECT
         Usr
        ,Date
        ,'end' AS comment
        ,ROW_NUMBER() OVER(PARTITION BY Usr ORDER BY Date) AS seq
    FROM time_sequences
    WHERE NextUsr IS NULL OR NextDate <> DATEADD(hour, 1, Date) OR CAST(Date AS date) <> CAST(NextDate AS date)
)
SELECT ss.Usr, ss.Date, DATEDIFF(hour, ss.Date, es.Date) + 1 AS SeqLength
FROM start_sequences AS ss
JOIN end_sequences AS es ON es.Usr = ss.Usr AND es.seq = ss.seq
ORDER BY 
      es.Usr
    , es.Date;
1赞 RF1991 11/29/2022 #3

您的数据

CREATE TABLE mytable(
   Usr     INTEGER  NOT NULL 
  ,Date    DATEtime  NOT NULL
  ,Comment VARCHAR(100) 
);
INSERT INTO mytable(Usr,Date,Comment) VALUES 
(1,'2022-11-29 12:00','<- Start of a sequence'),
(1,'2022-11-29 13:00',NULL),
(1,'2022-11-29 14:00',NULL),
(1,'2022-11-30 12:00','<- Start of a sequence'),
(1,'2022-11-30 16:00','<- Start of a sequence'),
(2,'2022-11-29 22:00','<- Start of a sequence'),
(2,'2022-11-29 23:00',NULL),
(2,'2022-11-30 00:00','<- Start of a sequence'),
(2,'2022-11-30 01:00',NULL),
(3,'2022-11-29 13:00','<- Start of a sequence'),
(3,'2022-11-29 14:00',NULL),
(3,'2022-11-30 12:00','<- Start of a sequence'),
(3,'2022-11-30 13:00',NULL),
(3,'2022-11-30 14:00',NULL),
(4,'2022-11-30 12:00','<- Start of a sequence'),
(4,'2022-11-30 13:00',NULL),
(4,'2022-11-30 14:00',NULL),
(5,'2022-11-30 16:00','<- Start of a sequence');

您的查询

select usr,
       comment,
      count(comment) Length
  From (
SELECT
    usr
    ,CASE 
        WHEN Comment IS NULL THEN (
            SELECT TOP 1
                cast(inner_table.Date as varchar(100))
            FROM
                mytable as inner_table
            WHERE
                    inner_table.Usr = mytable.Usr
                AND inner_table.Date < mytable.Date
                AND inner_table.Comment IS NOT NULL
            ORDER BY
                inner_table.Date DESC
        )
    ELSE
        Date
    END as Comment
FROM
    mytable) a
group by usr,comment
order by usr,comment

dbfiddle