提问人:Anonymous 提问时间:7/27/2023 最后编辑:Simeon PilgrimAnonymous 更新时间:7/28/2023 访问量:41
在特定列显示为非 NULL 值(每个 ID)的表中,首选一条记录而不是另一条记录
Prefer one record over anotherin a table where a specific column shows as not NULL for value (per ID)
问:
使用表格MY_TABLE如下所示:
编号 | 类型 | 开始_ | 结束_ | 加载 | SOURCE_DATE |
---|---|---|---|---|---|
1 | 一个 | 零 | 5 | 2022-03-03 21:57:38.494 | '2022-03-03' |
1 | B | 零 | 7 | 2023-07-20 22:55:38.494 | '2023-07-20' |
1 | 一个 | 5 | 零 | 2023-07-20 22:57:38.494 | '2023-07-20' |
1 | B | 零 | 7 | 2023-07-20 22:59:38.494 | '2023-07-20' |
4 | 一个 | 零 | 20 | 2023-06-30 18:59:38.494 | '2023-06-30' |
4 | 一个 | 20 | 17 | 2023-06-30 19:43:38.494 | '2023-06-30' |
5 | 一个 | 零 | 32 | 2023-05-30 04:43:36.494 | '2023-05-30' |
5 | B | 零 | 48 | 2023-05-30 05:48:32.494 | '2023-05-30' |
7 | 一个 | 零 | 32 | 2023-04-22 08:33:36.494 | '2023-04-22' |
7 | B | 10 | 零 | 2023-04-22 09:58:32.434 | '2023-04-22' |
我已经通过执行操作获得了每个 TYPE 的最新更改
SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY SOURCE_DATE DESC) = 1;
这会产生一个类似
编号 | 类型 | 开始_ | 结束_ | 加载 | SOURCE_DATE |
---|---|---|---|---|---|
1 | 一个 | 5 | 零 | 2023-07-20 22:57:38.494 | '2023-07-20' |
1 | B | 零 | 7 | 2023-07-20 22:59:38.494 | '2023-07-20' |
4 | 一个 | 20 | 17 | 2023-06-30 19:43:38.494 | '2023-06-30' |
5 | 一个 | 零 | 32 | 2023-05-30 04:43:36.494 | '2023-05-30' |
5 | B | 零 | 48 | 2023-05-30 05:48:32.494 | '2023-05-30' |
7 | 一个 | 零 | 32 | 2023-04-22 08:33:36.494 | '2023-04-22' |
7 | B | 10 | 零 | 2023-04-22 09:58:32.434 | '2023-04-22' |
我正在尝试获得一个结果表,如果 ID 的记录中存在 TYPE A,并且 TYPE A 的 END_ 不是 NULL,则首选 TYPE A 的 ID END_,否则保留 TYPE B 的END_,或者如果 ID 只有一条记录,则保留END_。我本质上是想得到一张像这样的桌子
编号 | 类型 | 开始_ | 结束_ | SOURCE_DATE |
---|---|---|---|---|
1 | B | 零 | 7 | '2023-07-20' |
4 | 一个 | 20 | 17 | '2023-06-30' |
5 | 一个 | 零 | 32 | '2023-05-30' |
7 | 一个 | 零 | 32 | '2023-04-22' |
(LOADED 列未在我的最终结果/输出表中显示)有没有办法做到这一点?我试过:
WITH GET_LATEST_CHANGES AS (SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY SOURCE_DATE DESC) = 1
)
SELECT ID, TYPE, START_, END_, SOURCE_DATE,
CASE
WHEN TYPE <> LAG(TYPE, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE) AND TYPE = 'A' AND END_ IS NULL THEN COALESCE(LAG(END_, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE), END_)
WHEN TYPE <> LAG(TYPE, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE) AND TYPE = 'A' AND END_ IS NOT NULL THEN COALESCE(END_, LAG(END_, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE))
ELSE END_
END AS END_2
FROM GET_LATEST_CHANGES;
但是,这种方法似乎有点复杂,而且我也遇到了像 ID 5 这样的情况,其中 TYPE A 和 B 的 END_ 值都被填充了,在这种情况下如何首选 TYPE A 记录而不是 B。谢谢!
答:
这几乎可以生成所需的输出,但使用起始列的 null 值效果更好(如果需要,可以调整逻辑)。coalesce
create table my_table(
id int not null,
type char(1) not null,
start_ int,
end_ int,
loaded timestamp not null,
source_date date not null
);
insert into my_table(id, type, start_, end_, loaded, source_date) values
(1, 'A', NULL, 5, '2022-03-03 21:57:38.494', '2022-03-03'),
(1, 'B', NULL, 7, '2023-07-20 22:55:38.494', '2023-07-20'),
(1, 'A', 5, NULL, '2023-07-20 22:57:38.494', '2023-07-20'),
(1, 'B', NULL, 7, '2023-07-20 22:59:38.494', '2023-07-20'),
(4, 'A', NULL, 20, '2023-06-30 18:59:38.494', '2023-06-30'),
(4, 'A', 20, 17, '2023-06-30 19:43:38.494', '2023-06-30'),
(5, 'A', NULL, 32, '2023-05-30 04:43:36.494', '2023-05-30'),
(5, 'B', NULL, 48, '2023-05-30 05:48:32.494', '2023-05-30'),
(7, 'A', NULL, 32, '2023-04-22 08:33:36.494', '2023-04-22'),
(7, 'B', 10, NULL, '2023-04-22 09:58:32.434', '2023-04-22');
with grouped as (
select distinct on (id, type) id, type, start_, end_, source_date
from my_table
order by id, type, source_date desc
)
select coalesce(type_a.id, type_b.id) as id,
case when type_a.end_ is not null then type_a.type else type_b.type end as type,
coalesce(type_a.start_, type_b.start_) as start,
coalesce(type_a.end_, type_b.end_) as end,
case when type_a.end_ is not null then type_a.source_date else type_b.source_date end as source_date
from grouped as type_a
full outer join grouped as type_b on type_a.id = type_b.id
and type_a.type <> type_b.type
where (type_a.type is null or type_a.type = 'A')
and (type_b.type is null or type_b.type = 'B');
id type start end source_date
1 B 5 7 2023-07-20
4 A 20 17 2023-06-30
5 A null 32 2023-05-30
7 A 10 32 2023-04-22
首先要注意的是,您的初始 QUALIFY ROW_NUMBER不稳定,您需要添加另一个术语来对具有 some 的项目进行排序,正如您所看到的那样,您的 ID 4 具有相同的值,因此您将获得当前代码的随机结果。我也选择你使用 LOADED DESC:SORUCE DATE
select *
from my_table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE
ORDER BY SOURCE_DATE DESC, loaded DESC) = 1
如果我们将其推送到 CTE(或者可以使用子选择),我们要对上述过滤器输出的四种类型的行进行排名:
select *
,case
when type = 'A' and end_ is not null then 4
when type = 'B' and end_ is not null then 3
when type = 'A' and end_ is null then 2
when type = 'B' and end_ is null then 1
end as type_rank
from last_per_id_type_filter
这给了:
我们可以使这种情况更简单,因为最后一个就是另一个,而第三个必须是空的,所以可以写成:
,case
when type = 'A' and end_ is not null then 4
when type = 'B' and end_ is not null then 3
when type = 'A' then 2
else 1
end as type_rank
此外,如果您喜欢非常紧凑的 SQL,则可以将其压缩为一个 IFF 和两个 NVL2
,iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1)) as type_rank
但同样,更明确并不会真正受到伤害,并且有助于其他读者了解意图。
所以现在我们可以添加另一个 QUALIFY,这次使用窗口框架 MAX:
WITH last_per_id_type_filter as (
select *
from my_table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE
ORDER BY SOURCE_DATE DESC, loaded DESC) = 1
)
select *
,case
when type = 'A' and end_ is not null then 4
when type = 'B' and end_ is not null then 3
when type = 'A' and end_ is null then 2
when type = 'B' and end_ is null then 1
end as type_rank
from last_per_id_type_filter
QUALIFY max(type_rank) over (partition by id) = type_rank
ORDER BY 1,2;
并且可以通过使用类型秩的紧凑形式来删除:type_rank
QUALIFY max(iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1))) over (partition by id) = iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1))
或者通过在子选择(或 CTE)上添加排除:
select * exclude (type_rank) from (
select *
,iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1)) as type_rank
from last_per_id_type_filter
QUALIFY max(type_rank) over (partition by id) = type_rank
)
评论