无法聚合包含相关子查询的查询的结果而不会超时 - mysql 5.7

Unable to Aggregate Results From Query Containing Correlated SubQuery without Timeout - mysql 5.7

提问人:sweetld540 提问时间:11/17/2023 最后编辑:sweetld540 更新时间:11/17/2023 访问量:23

问:

我下面的初始查询在 9 秒左右运行得相当快,并返回大约 250K 条记录。我想按 ID 计数month_year对结果进行分组,但是,任何按日期对结果进行分组或将查询包装为派生表并再次查询的尝试都会导致大约 6 分钟后超时。我认为相关子查询可能正在重新运行。我无法在 mysql 5.7 中使用 CTE,也无法使用初始结果创建临时表。

初始查询 - 运行良好

select
        valid_sellers.uid
        ,(select min(si.created) as 'temp_name' 
            from sell_item si where si.uid = valid_sellers.uid and si.status in ("For Sale", "Sold")  
            having temp_name >= "2023-01-01" and temp_name <= "2023-12-31") as 'first_sales_activity'
from 
    (
        select
            u.uid
        from
            user u
        where
            u.uid in (
                    select
                        t1.uid
                     from
                        (select distinct si.uid from sell_item si) t1)
            and
            u.since < "2023-04-24"
            and
            u.can_sell = "Y"
            and
            u.last_alive_time >= "2023-01-01"
    ) valid_sellers 

尝试聚合导致超时

select
        count(valid_sellers.uid)
        ,DATE_FORMAT((select min(si.created) as 'temp_name' 
            from sell_item si where si.uid = valid_sellers.uid and si.status in ("For Sale", "Sold")  
            having temp_name >= "2023-01-01" and temp_name <= "2023-12-31"), '%Y-%m') as 'year_month'
from 
    (
        select
            u.uid
        from
            user u
        where
            u.uid in (
                    select
                        t1.uid
                     from
                        (select distinct si.uid from sell_item si) t1)
            and
            u.since < "2023-04-24"
            and
            u.can_sell = "Y"
            and
            u.last_alive_time >= "2023-01-01"
    ) valid_sellers

group by
    2
    

第二次尝试聚合导致超时

select
    count(first_listings.uid)
    ,DATE_FORMAT(first_listings.first_sales_activity, '%Y-%m') as 'first_listing'
    
from (
        select
                valid_sellers.uid
                ,(select min(si.created) as 'temp_name' 
                    from sell_item si where si.uid = valid_sellers.uid and si.status in ("For Sale", "Sold")  
                    having temp_name >= "2023-01-01" and temp_name <= "2023-12-31") as 'first_sales_activity'
        from 
            (
                select
                    u.uid
                from
                    user u
                where
                    u.uid in (
                            select
                                t1.uid
                             from
                                (select distinct si.uid from sell_item si) t1)
                    and
                    u.since < "2023-04-24"
                    and
                    u.can_sell = "Y"
                    and
                    u.last_alive_time >= "2023-01-01"
            ) valid_sellers
    
    ) first_listings 

group by
    2
mysql correlated-subquery 派生表

评论

0赞 nbk 11/17/2023
增加超时
0赞 sweetld540 11/17/2023
@nbk很公平。如果我的运行时间从 9 秒到 +6 分钟聚合在 250K 结果集上,我似乎可以做得更好。
0赞 nbk 11/17/2023
运行 EXPLAIN SELECT ...。并查看查询是否使用了正确的索引,如果没有,请参阅如何优化查询
0赞 Barmar 11/17/2023
避免使用相关的子查询,而是使用带有子查询的 JOIN。

答:

0赞 Barmar 11/17/2023 #1

我不认为MySQL很好地实现了相关的子查询。把它作为一个代替。JOIN

此外,它在优化方面真的很糟糕——这几乎总是一个连接。WHERE uid IN (subquery)

当您加入子查询时,它将过滤掉该子查询未返回的用户,因此您甚至不需要检查。WHERE IN

WITH d AS (
    SELECT uid, MIN(created) AS first_sales_activity
    FROM sell_item
    WHERE status IN ("For Sale", "Sold") AND created BETWEEN '2023-01-01' AND '2023-12-31'
    GROUP BY uid
)

SELECT u.uid, DATE_FORMAT(d.first_sales_activity, '%Y-%m') AS first_sales_activity
FROM users AS u
JOIN d ON u.uid = d.uid
WHERE u.since < '2023-04-014' AND u.can_sell = 'Y' AND u.last_alive_time >= "2023-01-01"