使用 dbt cte 联接多个表

Joining multiple tables with dbt cte's

提问人:david backx 提问时间:11/3/2023 最后编辑:Isolateddavid backx 更新时间:11/3/2023 访问量:53

问:

所以我有这个表是 duckdb sql。

fan_id attendance_season_1
1 1
2 0
3 0
4 1

现在我每个赛季都有 15 张这样的桌子。现在我想在fan_id上合并这些内容,所以我有下表。

fan_id attendance_season_1 attendance_season_2 attendance_season_3
1 1 1 1
2 0 0 1
3 0 0 1
4 1 0 1

这是 15 个出席行。目前,我的查询如下所示。

season_attendances as (
    select distinct fan_id,
    attendance_season_1, attendance_season_2, attendance_season_3,
    attendance_season_4, attendance_season_5, attendance_season_6,
    attendance_season_7, attendance_season_8, attendance_season_9,
    attendance_season_10, attendance_season_11, attendance_season_12,
    attendance_season_13, attendance_season_14, attendance_season_15
    from join_attendances_1 inner join join_attendances_2 using (fan_id)
    inner join join_attendances_3 using (fan_id)
    inner join join_attendances_4 using (fan_id)
    inner join join_attendances_5 using (fan_id)
    inner join join_attendances_6 using (fan_id)
    inner join join_attendances_7 using (fan_id)
    inner join join_attendances_8 using (fan_id)
    inner join join_attendances_9 using (fan_id)
    inner join join_attendances_10 using (fan_id)
    inner join join_attendances_11 using (fan_id)
    inner join join_attendances_12 using (fan_id)
    inner join join_attendances_13 using (fan_id)
    inner join join_attendances_14 using (fan_id)
    inner join join_attendances_15 using (fan_id)
),

对代码的一些解释。这是来自 dbt 的 CTE。attendance_season_{n} 是统计量 0 或 1,join_attendance 是本文中显示的第一个表。我的 sql 引擎是 duckdb。查询单个表时,它会在 0.3 秒内运行,但在运行包含所有联接的 CTE 时,它需要很长时间。任何人都想如何改进它,以便它以与本文中的第二个表相同的输出快速运行。

sql dbt 鸭子数据库

评论

0赞 Isolated 11/3/2023
每个表都有相同的fan_ids吗?
0赞 david backx 11/3/2023
是的,它们都有 141750 行包含相同的fan_id
0赞 Isolated 11/3/2023
我不熟悉 DuckDB,在加入时也没有使用过上下文,但我想知道当您在两个以上的表上使用时,幕后会发生什么。我想知道如果您使用传统的连接语法,将所有连接到第 1 季,性能是否会有所不同。而且,与一个带有“季节”列的表相比,您拥有这种类型的数据结构太糟糕了。usingusing

答:

0赞 david backx 11/3/2023 #1

找到了解决方案。只需将其放入这样的子查询中即可。

season_attendance_sub as (
    select distinct fan_id,
    (select attendance_season_1
    from join_attendances_1 a
    where fan_id = a.fan_id) as 'attendance_season_1',
    (select attendance_season_2
    from join_attendances_2 a
    where fan_id = a.fan_id) as 'attendance_season_2',
    (select attendance_season_3
    from join_attendances_3 a
    where fan_id = a.fan_id) as 'attendance_season_3',
    (select attendance_season_4
    from join_attendances_4 a
    where fan_id = a.fan_id) as 'attendance_season_4',
    (select attendance_season_5
    from join_attendances_5 a
    where fan_id = a.fan_id) as 'attendance_season_5',
    (select attendance_season_6
    from join_attendances_6 a
    where fan_id = a.fan_id) as 'attendance_season_6',
    (select attendance_season_7
    from join_attendances_7 a
    where fan_id = a.fan_id) as 'attendance_season_7',
    (select attendance_season_8
    from join_attendances_8 a
    where fan_id = a.fan_id) as 'attendance_season_8',
    (select attendance_season_9
    from join_attendances_9 a
    where fan_id = a.fan_id) as 'attendance_season_9',
    (select attendance_season_10
    from join_attendances_10 a
    where fan_id = a.fan_id) as 'attendance_season_10',
    (select attendance_season_11
    from join_attendances_11 a
    where fan_id = a.fan_id) as 'attendance_season_11',
    (select attendance_season_12
    from join_attendances_12 a
    where fan_id = a.fan_id) as 'attendance_season_12',
    (select attendance_season_13
    from join_attendances_13 a
    where fan_id = a.fan_id) as 'attendance_season_13',
    (select attendance_season_14
    from join_attendances_14 a
    where fan_id = a.fan_id) as 'attendance_season_14',
    (select attendance_season_15
    from join_attendances_15 a
    where fan_id = a.fan_id) as 'attendance_season_15'
    from fans
),