在特定列显示为非 NULL 值(每个 ID)的表中,首选一条记录而不是另一条记录

Prefer one record over anotherin a table where a specific column shows as not NULL for value (per ID)

提问人:Anonymous 提问时间:7/27/2023 最后编辑:Simeon PilgrimAnonymous 更新时间:7/28/2023 访问量:41

问:

使用表格MY_TABLE如下所示:

编号 类型 开始_ 结束_ 加载 SOURCE_DATE
1 一个 5 2022-03-03 21:57:38.494 '2022-03-03'
1 B 7 2023-07-20 22:55:38.494 '2023-07-20'
1 一个 5 2023-07-20 22:57:38.494 '2023-07-20'
1 B 7 2023-07-20 22:59:38.494 '2023-07-20'
4 一个 20 2023-06-30 18:59:38.494 '2023-06-30'
4 一个 20 17 2023-06-30 19:43:38.494 '2023-06-30'
5 一个 32 2023-05-30 04:43:36.494 '2023-05-30'
5 B 48 2023-05-30 05:48:32.494 '2023-05-30'
7 一个 32 2023-04-22 08:33:36.494 '2023-04-22'
7 B 10 2023-04-22 09:58:32.434 '2023-04-22'

我已经通过执行操作获得了每个 TYPE 的最新更改

SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY SOURCE_DATE DESC) = 1;

这会产生一个类似

编号 类型 开始_ 结束_ 加载 SOURCE_DATE
1 一个 5 2023-07-20 22:57:38.494 '2023-07-20'
1 B 7 2023-07-20 22:59:38.494 '2023-07-20'
4 一个 20 17 2023-06-30 19:43:38.494 '2023-06-30'
5 一个 32 2023-05-30 04:43:36.494 '2023-05-30'
5 B 48 2023-05-30 05:48:32.494 '2023-05-30'
7 一个 32 2023-04-22 08:33:36.494 '2023-04-22'
7 B 10 2023-04-22 09:58:32.434 '2023-04-22'

我正在尝试获得一个结果表,如果 ID 的记录中存在 TYPE A,并且 TYPE A 的 END_ 不是 NULL,则首选 TYPE A 的 ID END_,否则保留 TYPE B 的END_,或者如果 ID 只有一条记录,则保留END_。我本质上是想得到一张像这样的桌子

编号 类型 开始_ 结束_ SOURCE_DATE
1 B 7 '2023-07-20'
4 一个 20 17 '2023-06-30'
5 一个 32 '2023-05-30'
7 一个 32 '2023-04-22'

(LOADED 列未在我的最终结果/输出表中显示)有没有办法做到这一点?我试过:

WITH GET_LATEST_CHANGES AS (SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY SOURCE_DATE DESC) = 1
)
SELECT ID, TYPE, START_, END_, SOURCE_DATE,
     CASE 
        WHEN TYPE <> LAG(TYPE, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE) AND TYPE = 'A' AND END_ IS NULL THEN COALESCE(LAG(END_, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE), END_)
        WHEN TYPE <> LAG(TYPE, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE) AND TYPE = 'A' AND END_ IS NOT NULL THEN COALESCE(END_, LAG(END_, 1) OVER (PARTITION BY ID ORDER BY SOURCE_DATE))
        ELSE END_
     END AS END_2
FROM GET_LATEST_CHANGES;

但是,这种方法似乎有点复杂,而且我也遇到了像 ID 5 这样的情况,其中 TYPE A 和 B 的 END_ 值都被填充了,在这种情况下如何首选 TYPE A 记录而不是 B。谢谢!

SQL Snowflake-Cloud-Data-Platform 滞后 合并

评论

0赞 Reinderien 7/27/2023
实际的表名是什么?当然不是my_table。您是否有一个具有唯一 ID 值的辅助表?
0赞 Anonymous 7/27/2023
是的,此示例的表名是MY_TABLE,我不确定您所说的辅助表是什么意思,我只有这张表
0赞 Simeon Pilgrim 7/27/2023
啊,我觉得你之前的问题被过度压缩了。

答:

0赞 Reinderien 7/27/2023 #1

这几乎可以生成所需的输出,但使用起始列的 null 值效果更好(如果需要,可以调整逻辑)。coalesce

create table my_table(
    id int not null,
    type char(1) not null,
    start_ int,
    end_ int,
    loaded timestamp not null,
    source_date date not null
);

insert into my_table(id, type, start_, end_, loaded, source_date) values
(1, 'A', NULL,    5, '2022-03-03 21:57:38.494', '2022-03-03'),
(1, 'B', NULL,    7, '2023-07-20 22:55:38.494', '2023-07-20'),
(1, 'A',    5, NULL, '2023-07-20 22:57:38.494', '2023-07-20'),
(1, 'B', NULL,    7, '2023-07-20 22:59:38.494', '2023-07-20'),
(4, 'A', NULL,   20, '2023-06-30 18:59:38.494', '2023-06-30'),
(4, 'A',   20,   17, '2023-06-30 19:43:38.494', '2023-06-30'),
(5, 'A', NULL,   32, '2023-05-30 04:43:36.494', '2023-05-30'),
(5, 'B', NULL,   48, '2023-05-30 05:48:32.494', '2023-05-30'),
(7, 'A', NULL,   32, '2023-04-22 08:33:36.494', '2023-04-22'),
(7, 'B',   10, NULL, '2023-04-22 09:58:32.434', '2023-04-22');

with grouped as (
    select distinct on (id, type) id, type, start_, end_, source_date
    from my_table
    order by id, type, source_date desc
)
select coalesce(type_a.id, type_b.id) as id,
    case when type_a.end_ is not null then type_a.type else type_b.type end as type,
    coalesce(type_a.start_, type_b.start_) as start,
    coalesce(type_a.end_, type_b.end_) as end,
    case when type_a.end_ is not null then type_a.source_date else type_b.source_date end as source_date
from grouped as type_a
full outer join grouped as type_b on type_a.id = type_b.id
                                 and type_a.type <> type_b.type
where (type_a.type is null or type_a.type = 'A')
  and (type_b.type is null or type_b.type = 'B');
id  type    start   end     source_date
1   B       5   7   2023-07-20
4   A       20  17  2023-06-30
5   A   null    32  2023-05-30
7   A       10  32  2023-04-22
0赞 Simeon Pilgrim 7/28/2023 #2

首先要注意的是,您的初始 QUALIFY ROW_NUMBER不稳定,您需要添加另一个术语来对具有 some 的项目进行排序,正如您所看到的那样,您的 ID 4 具有相同的值,因此您将获得当前代码的随机结果。我也选择你使用 LOADED DESC:SORUCE DATE

select *
from my_table
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE 
        ORDER BY SOURCE_DATE DESC, loaded DESC) = 1

如果我们将其推送到 CTE(或者可以使用子选择),我们要对上述过滤器输出的四种类型的行进行排名:

select *
    ,case
        when type = 'A' and end_ is not null then 4
        when type = 'B' and end_ is not null then 3
        when type = 'A' and end_ is null then 2
        when type = 'B' and end_ is null then 1
    end as type_rank
from last_per_id_type_filter

这给了:

enter image description here

我们可以使这种情况更简单,因为最后一个就是另一个,而第三个必须是空的,所以可以写成:

    ,case
        when type = 'A' and end_ is not null then 4
        when type = 'B' and end_ is not null then 3
        when type = 'A' then 2
        else 1
    end as type_rank

此外,如果您喜欢非常紧凑的 SQL,则可以将其压缩为一个 IFF 和两个 NVL2

,iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1)) as type_rank

但同样,更明确并不会真正受到伤害,并且有助于其他读者了解意图。

所以现在我们可以添加另一个 QUALIFY,这次使用窗口框架 MAX:

WITH last_per_id_type_filter as (
    select *
    from my_table
    QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE 
        ORDER BY SOURCE_DATE DESC, loaded DESC) = 1
)
select *
    ,case
        when type = 'A' and end_ is not null then 4
        when type = 'B' and end_ is not null then 3
        when type = 'A' and end_ is null then 2
        when type = 'B' and end_ is null then 1
    end as type_rank
from last_per_id_type_filter
QUALIFY max(type_rank) over (partition by id) = type_rank
ORDER BY 1,2;

并且可以通过使用类型秩的紧凑形式来删除:type_rank

QUALIFY max(iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1))) over (partition by id) = iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1))

或者通过在子选择(或 CTE)上添加排除:

select * exclude (type_rank) from (
    select *
       ,iff(type = 'A', NVL2(end_,4,2), NVL2(end_,3,1)) as type_rank
    from last_per_id_type_filter
    QUALIFY max(type_rank) over (partition by id) = type_rank
)