提问人:idiocache 提问时间:3/8/2023 最后编辑:jarlhidiocache 更新时间:4/4/2023 访问量:70
识别 SQL 表中的最新步骤序列
Identifying latest sequence of steps in SQL table
问:
我有一个 SQL 表,其中包含两个不同配方的时间戳步骤序列。我试图找出一种识别“当前”或“最新”配方的方法。我的源表看起来有点像这样:
食谱 | 时间戳 | 步 |
---|---|---|
1 | 2023-03-07 21:56:55 | step_3 |
1 | 2023-03-07 21:56:50 | step_3 |
1 | 2023-03-07 21:56:45 | step_3 |
1 | 2023-03-07 21:56:40 | step_2 |
1 | 2023-03-07 21:56:35 | step_2 |
1 | 2023-03-07 21:56:30 | step_2 |
1 | 2023-03-07 21:56:25 | step_1 |
1 | 2023-03-07 21:56:20 | step_1 |
1 | 2023-03-07 21:56:15 | step_1 |
2 | 2023-03-07 21:56:55 | step_3 |
2 | 2023-03-07 21:56:50 | step_3 |
2 | 2023-03-07 21:56:45 | step_3 |
2 | 2023-03-07 21:56:40 | step_2 |
2 | 2023-03-07 21:56:35 | step_2 |
2 | 2023-03-07 21:56:30 | step_2 |
2 | 2023-03-07 21:56:25 | step_1 |
2 | 2023-03-07 21:56:20 | step_1 |
2 | 2023-03-07 21:56:15 | step_1 |
1 | 2023-03-07 21:56:10 | step_3 |
1 | 2023-03-07 21:56:05 | step_3 |
1 | 2023-03-07 21:56:00 | step_3 |
1 | 2023-03-07 21:55:55 | step_2 |
1 | 2023-03-07 21:55:50 | step_2 |
1 | 2023-03-07 21:55:45 | step_2 |
1 | 2023-03-07 21:55:40 | step_1 |
1 | 2023-03-07 21:55:35 | step_1 |
1 | 2023-03-07 21:55:30 | step_1 |
我所追求的状态看起来有点像这样:
食谱 | 时间戳 | 步 | 当前 |
---|---|---|---|
1 | 2023-03-07 21:56:55 | step_3 | 真 |
1 | 2023-03-07 21:56:50 | step_3 | 真 |
1 | 2023-03-07 21:56:45 | step_3 | 真 |
1 | 2023-03-07 21:56:40 | step_2 | 真 |
1 | 2023-03-07 21:56:35 | step_2 | 真 |
1 | 2023-03-07 21:56:30 | step_2 | 真 |
1 | 2023-03-07 21:56:25 | step_1 | 真 |
1 | 2023-03-07 21:56:20 | step_1 | 真 |
1 | 2023-03-07 21:56:15 | step_1 | 真 |
2 | 2023-03-07 21:56:55 | step_3 | 真 |
2 | 2023-03-07 21:56:50 | step_3 | 真 |
2 | 2023-03-07 21:56:45 | step_3 | 真 |
2 | 2023-03-07 21:56:40 | step_2 | 真 |
2 | 2023-03-07 21:56:35 | step_2 | 真 |
2 | 2023-03-07 21:56:30 | step_2 | 真 |
2 | 2023-03-07 21:56:25 | step_1 | 真 |
2 | 2023-03-07 21:56:20 | step_1 | 真 |
2 | 2023-03-07 21:56:15 | step_1 | 真 |
1 | 2023-03-07 21:56:10 | step_3 | 假 |
1 | 2023-03-07 21:56:05 | step_3 | 假 |
1 | 2023-03-07 21:56:00 | step_3 | 假 |
1 | 2023-03-07 21:55:55 | step_2 | 假 |
1 | 2023-03-07 21:55:50 | step_2 | 假 |
1 | 2023-03-07 21:55:45 | step_2 | 假 |
1 | 2023-03-07 21:55:40 | step_1 | 假 |
1 | 2023-03-07 21:55:35 | step_1 | 假 |
1 | 2023-03-07 21:55:30 | step_1 | 假 |
到目前为止,我已经设法生成了一个新列,该列使用以下代码显示每个步骤之间的秒数差异,按配方/步骤分区:
WITH differences AS (
SELECT
*,
TIMESTAMPDIFF(SECOND, timestamp, lag(timestamp) OVER (PARTITION BY recipe, step ORDER BY timestamp DESC)) AS difference
FROM recipe_steps
)
这将导致以下结果(仅显示配方 1 的步骤 1):
食谱 | 时间戳 | 步 | 差异 |
---|---|---|---|
1 | 2023-03-07 21:56:25 | step_1 | 零 |
1 | 2023-03-07 21:56:20 | step_1 | 5 |
1 | 2023-03-07 21:56:15 | step_1 | 5 |
1 | 2023-03-07 21:55:40 | step_1 | 35 |
1 | 2023-03-07 21:55:35 | step_1 | 5 |
1 | 2023-03-07 21:55:30 | step_1 | 5 |
这个想法是,对于特定的配方,配方的当前/最新执行从最新一行之后的行开始,差异大于 5 秒:
食谱 | 时间戳 | 步 | 差异 | 当前 |
---|---|---|---|---|
1 | 2023-03-07 21:56:25 | step_1 | 零 | 真 |
1 | 2023-03-07 21:56:20 | step_1 | 5 | 真 |
1 | 2023-03-07 21:56:15 | step_1 | 5 | 真 |
1 | 2023-03-07 21:55:40 | step_1 | 35 | 假 |
1 | 2023-03-07 21:55:35 | step_1 | 5 | 假 |
1 | 2023-03-07 21:55:30 | step_1 | 5 | 假 |
经过几个小时的研究,我对如何从“差异”列推导出“当前”列完全不知所措,这就是这个问题的原因。
我尝试了各种从“差异”列派生“当前”列的方法,包括使用 和 窗口函数,但没有成功。row_number()
答:
您需要按配方和步骤对行进行分区,然后使用 ROW_NUMBER() 窗口函数为其分区中的每一行分配一个编号,按时间戳降序排序
SELECT
recipe, timestamp, step,
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY recipe, step ORDER BY timestamp DESC) = 1 THEN 'true'
ELSE 'false'
END AS current
从 recipe_steps ORDER BY 配方,时间戳 DESC;
评论
你正朝着正确的方向前进。
从您的差异 cte 开始,我将 FALSE 分配给> 5 个时间戳。
SELECT *,
CASE WHEN difference > 5 THEN 'FALSE' ELSE 'TRUE' END AS firstFalse
FROM diff
使用自连接,我将 FALSE 分配给“firstFalse”之前的每个日期。这将返回所有为 false 的时间戳。
SELECT f1.recipe,
f1.step,
MIN(f2.t_stamp) firstFalse,
f1.t_stamp,
'FALSE' AS Curr
FROM false1 f1
JOIN false1 f2 ON f1.recipe = f2.recipe AND f1.step = f2.step AND f2.firstFalse = 'FALSE'
WHERE f1.t_stamp <= f2.t_stamp
GROUP BY f1.recipe, f1.step, f1.t_stamp
最后,我用左联接把它们放在一起。
SELECT f1.recipe,
f1.t_stamp AS [timestamp],
f1.step,
COALESCE(f2.curr, 'TRUE') AS [current]
FROM false1 f1
LEFT JOIN false2 f2 ON f1.recipe = f2.recipe AND f1.step = f2.step AND f1.t_stamp = f2.t_stamp
ORDER BY [current] desc, f1.recipe asc, f1.step asc, f1.t_stamp asc
WITH diff AS
(SELECT recipe,
t_stamp,
step,
DATEDIFF(SECOND, t_stamp, lag(t_stamp) OVER (PARTITION BY recipe, step ORDER BY t_stamp DESC)) AS difference
FROM recipes
)
, false1 AS
(SELECT *,
CASE WHEN difference > 5 THEN 'FALSE' ELSE 'TRUE' END AS firstFalse
FROM diff
)
, false2 AS
(
SELECT f1.recipe,
f1.step,
MIN(f2.t_stamp) firstFalse,
f1.t_stamp,
'FALSE' AS Curr
FROM false1 f1
JOIN false1 f2 ON f1.recipe = f2.recipe AND f1.step = f2.step AND f2.firstFalse = 'FALSE'
WHERE f1.t_stamp <= f2.t_stamp
GROUP BY f1.recipe, f1.step, f1.t_stamp
)
SELECT f1.recipe,
f1.t_stamp AS [timestamp],
f1.step,
COALESCE(f2.curr, 'TRUE') AS [current]
FROM false1 f1
LEFT JOIN false2 f2 ON f1.recipe = f2.recipe AND f1.step = f2.step AND f1.t_stamp = f2.t_stamp
ORDER BY [current] desc, f1.recipe asc, f1.step asc, f1.t_stamp asc
您可以使用窗口函数和一些匹配技巧通过几个步骤获得所需的结果:
步骤 1。根据您的定义,使用窗口函数计算配方重新启动时的提取行,这意味着它是配方的第 1 步行,并且它与上一步的时间差大于 5 秒。
第 2 步:使用 windwo 函数并标记recipe_restart点来获得recipe_restart_bucket,因为您想按降序获得结果,因此我们做了一个数学技巧来按降序将 sum(#restart) - sum (#seenrestartsofar) 转换为桶索引。
第 3 步:当 bucket_index=0 时添加 current_flag(最近一个按降序排列)
以下是所有步骤的查询:
with recipe_steps_time_diff AS (
select
recipe,
timestamp,
step,
case when step <> 'step_1' or datediff(SECOND, lag(timestamp) over(partition by recipe,step order by timestamp), timestamp) <=5 then 0 else 1 end as recipe_restart
from
recipe_steps
),
recipe_steps_restart_bucket AS (
select
recipe,
timestamp,
step,
sum(recipe_restart) over (partition by recipe) - sum(recipe_restart) over (partition by recipe order by timestamp) as recipe_restart_bucket
from
recipe_steps_time_diff
)
select
recipe,
timestamp,
step,
case when recipe_restart_bucket = 0 then 'True' else 'False' end as current_flag
from
recipe_steps_restart_bucket
order by recipe_restart_bucket, recipe, timestamp desc
食谱 | 时间戳 | 步 | current_flag |
---|---|---|---|
1 | 2023-03-07T21:56:55Z | step_3 | 真 |
1 | 2023-03-07T21:56:50Z | step_3 | 真 |
1 | 2023-03-07T21:56:45Z | step_3 | 真 |
1 | 2023-03-07T21:56:40Z | step_2 | 真 |
1 | 2023-03-07T21:56:35Z | step_2 | 真 |
1 | 2023-03-07T21:56:30Z | step_2 | 真 |
1 | 2023-03-07T21:56:25Z | step_1 | 真 |
1 | 2023-03-07T21:56:20Z | step_1 | 真 |
1 | 2023-03-07T21:56:15Z | step_1 | 真 |
2 | 2023-03-07T21:56:55Z | step_3 | 真 |
2 | 2023-03-07T21:56:50Z | step_3 | 真 |
2 | 2023-03-07T21:56:45Z | step_3 | 真 |
2 | 2023-03-07T21:56:40Z | step_2 | 真 |
2 | 2023-03-07T21:56:35Z | step_2 | 真 |
2 | 2023-03-07T21:56:30Z | step_2 | 真 |
2 | 2023-03-07T21:56:25Z | step_1 | 真 |
2 | 2023-03-07T21:56:20Z | step_1 | 真 |
2 | 2023-03-07T21:56:15Z | step_1 | 真 |
1 | 2023-03-07T21:56:10Z | step_3 | 假 |
1 | 2023-03-07T21:56:05Z | step_3 | 假 |
1 | 2023-03-07T21:56:00Z | step_3 | 假 |
1 | 2023-03-07T21:55:55Z | step_2 | 假 |
1 | 2023-03-07T21:55:50Z | step_2 | 假 |
1 | 2023-03-07T21:55:45Z | step_2 | 假 |
1 | 2023-03-07T21:55:40Z | step_1 | 假 |
1 | 2023-03-07T21:55:35Z | step_1 | 假 |
1 | 2023-03-07T21:55:30Z | step_1 | 假 |
评论