Postgresql IN 与 ANY 运算符与子查询的性能差异

Postgresql IN vs ANY operator performance differences with subquery

提问人:Tiancheng Liu 提问时间:7/7/2019 最后编辑:Bouke VersteeghTiancheng Liu 更新时间:9/9/2021 访问量:11960

问:

我有两个查询做同样的事情。 1

    SELECT *
    FROM "Products_product"
    WHERE ("Products_product"."id" IN
           (SELECT U0."product_id"
            FROM "Products_purchase" U0
            WHERE (U0."state" = 1
                   AND U0."user_id" = 5))
           AND "Products_product"."state" IN (1,
                                              6,
                                              3)
           AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
    ORDER BY "Products_product"."title_index" ASC
    LIMIT 10;

2

SELECT *
FROM "Products_product"
WHERE ("Products_product"."id" = ANY (ARRAY(
       (SELECT U0."product_id"
        FROM "Products_purchase" U0
        WHERE (U0."state" = 1
               AND U0."user_id" = 5))))
       AND "Products_product"."state" IN (1,
                                          6,
                                          3)
       AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
ORDER BY "Products_product"."title_index" ASC
LIMIT 10;

唯一的区别是第一次用于子查询,而第二次用于。但第二个明显比第一个快 10 倍左右。 我运行了解释,得到了以下两个结果: 1IN=ANY(ARRAY())

 Limit  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=3414.185..3414.190 rows=10 loops=1)
   ->  Sort  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=3414.184..3414.185 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Nested Loop Semi Join  (cost=92.66..5309.91 rows=1 width=1906) (actual time=3385.153..3414.099 rows=16 loops=1)
               ->  Bitmap Heap Scan on "Products_product"  (cost=13.85..256.32 rows=61 width=1906) (actual time=3381.327..3384.430 rows=63 loops=1)
                     Recheck Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
                     Rows Removed by Index Recheck: 1
                     Heap Blocks: exact=64
                     ->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=3381.001..3381.001 rows=64 loops=1)
                           Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               ->  Bitmap Heap Scan on "Products_purchase" u0  (cost=78.82..82.84 rows=1 width=4) (actual time=0.467..0.467 rows=0 loops=63)
                     Recheck Cond: ((product_id = "Products_product".id) AND (user_id = 5))
                     Filter: (state = 1)
                     Heap Blocks: exact=16
                     ->  BitmapAnd  (cost=78.82..78.82 rows=1 width=0) (actual time=0.465..0.465 rows=0 loops=63)
                           ->  Bitmap Index Scan on "Products_purchase_product_id"  (cost=0.00..5.06 rows=84 width=0) (actual time=0.265..0.265 rows=30 loops=63)
                                 Index Cond: (product_id = "Products_product".id)
                           ->  Bitmap Index Scan on "Products_purchase_user_id"  (cost=0.00..72.57 rows=3752 width=0) (actual time=0.242..0.242 rows=3335 loops=51)
                                 Index Cond: (user_id = 5)
 Planning time: 7.540 ms
 Execution time: 3414.356 ms
(22 rows)

2

Limit  (cost=7378.07..7378.07 rows=1 width=1906) (actual time=116.559..116.562 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using "Products_purchase_user_id" on "Products_purchase" u0  (cost=0.43..7329.83 rows=3752 width=4) (actual time=0.021..15.535 rows=3335 loops=1)
           Index Cond: (user_id = 5)
           Filter: (state = 1)
   ->  Sort  (cost=48.24..48.25 rows=1 width=1906) (actual time=116.558..116.559 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Bitmap Heap Scan on "Products_product"  (cost=44.20..48.23 rows=1 width=1906) (actual time=116.202..116.536 rows=16 loops=1)
               Recheck Cond: ((id = ANY ($0)) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               Filter: (state = ANY ('{1,6,3}'::integer[]))
               Rows Removed by Filter: 2
               Heap Blocks: exact=18
               ->  Bitmap Index Scan on "Products_product_id_upper_idx1"  (cost=0.00..44.20 rows=1 width=0) (actual time=116.103..116.103 rows=18 loops=1)
                     Index Cond: ((id = ANY ($0)) AND (upper((title)::text) ~~ '%TORONTO%'::text))
 Planning time: 1.054 ms
 Execution time: 116.663 ms
(17 rows)

从 doc 来看,或 之间没有实质性区别。但是为什么我得到如此不同的结果。无论如何,它都有优势吗?INANYANYIN

更新: 有人指出这个问题可能与PostgreSQL中的IN与NOY运算符重复。 它们是同一个问题,但那个问题的答案并没有解决我的问题,因为除了这个答案之外,我还有一个更详细的案例。

但是每个的第二个变体并不等同于另一个变体。这 ANY 构造的第二个变体接受一个数组(必须是实际的 array type),而 IN 的第二个变体采用逗号分隔 值列表。这导致传递值时有不同的限制 在特殊情况下,还可能导致不同的查询计划:

https://dba.stackexchange.com/a/125500/3684

将多个值集或值数组传递给函数

在我的问题中,这两个问题都不是这种情况。我只是将一个数组作为子查询传递。我的情况与第一个 URL 正好相反。我的索引只用在 中,而不用在 中。所以基本上,这个答案并没有解决我的问题。ANYIN

UPDATE2: 我更新了索引:.我可以确认两个查询的情况是相等的,这意味着索引存在在那里,但第一个没有使用它。CREATE INDEX ON "Products_product" USING GIST (state, id, upper((title) :: TEXT) gist_trgm_ops);

UPDATE3: 我只是在代码中删除。但结果是一样的。ARRAY

 explain analyze SELECT *
FROM "Products_product"
WHERE ("Products_product"."id" = ANY(
       (SELECT U0."product_id"
        FROM "Products_purchase" U0
        WHERE (U0."state" = 1
               AND U0."user_id" = 5)))
       AND "Products_product"."state" IN (1,
                                          6,
                                          3)
       AND UPPER("Products_product"."title" :: TEXT) LIKE UPPER('%toronto%'))
ORDER BY "Products_product"."title" ASC
LIMIT 10;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=228.980..228.983 rows=10 loops=1)
   ->  Sort  (cost=5309.92..5309.93 rows=1 width=1906) (actual time=228.979..228.980 rows=10 loops=1)
         Sort Key: "Products_product".title
         Sort Method: quicksort  Memory: 57kB
         ->  Nested Loop Semi Join  (cost=92.66..5309.91 rows=1 width=1906) (actual time=216.392..228.913 rows=16 loops=1)
               ->  Bitmap Heap Scan on "Products_product"  (cost=13.85..256.32 rows=61 width=1906) (actual time=214.332..215.260 rows=63 loops=1)
                     Recheck Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
                     Rows Removed by Index Recheck: 1
                     Heap Blocks: exact=64
                     ->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=214.296..214.296 rows=64 loops=1)
                           Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))
               ->  Bitmap Heap Scan on "Products_purchase" u0  (cost=78.82..82.84 rows=1 width=4) (actual time=0.215..0.215 rows=0 loops=63)
                     Recheck Cond: ((product_id = "Products_product".id) AND (user_id = 5))
                     Filter: (state = 1)
                     Heap Blocks: exact=16
                     ->  BitmapAnd  (cost=78.82..78.82 rows=1 width=0) (actual time=0.212..0.212 rows=0 loops=63)
                           ->  Bitmap Index Scan on "Products_purchase_product_id"  (cost=0.00..5.06 rows=84 width=0) (actual time=0.017..0.017 rows=30 loops=63)
                                 Index Cond: (product_id = "Products_product".id)
                           ->  Bitmap Index Scan on "Products_purchase_user_id"  (cost=0.00..72.57 rows=3752 width=0) (actual time=0.239..0.239 rows=3335 loops=51)
                                 Index Cond: (user_id = 5)
 Planning time: 5.083 ms
 Execution time: 229.904 ms
(22 rows)

我认为 or 不是这种情况,这只是 和 之间的区别ANYANY(ARRAY())INANY

数组 postgresql 设置 query-planner

评论

2赞 JGH 7/7/2019
PostgreSQL 中 IN 与 ANY 运算符的可能重复
1赞 Tiancheng Liu 7/8/2019
@JGH我更新问题,我认为它不是重复的。
1赞 7/8/2019
优化器会将具有常量值的 IN 子句重写为 any 子句,例如 将被重写,因为通常比 - 至少对于常量值更快。令我惊讶的是,IN 和子查询也是如此。你可能想在Postgres性能邮件列表上问这个问题,因为核心开发人员已经阅读了这篇文章,并且可能会对此有更多的了解where x in (1,2,3)where x = any (array[1,2,3])= ANY()IN
0赞 7/8/2019
不相关,但是:你真的应该避免那些可怕的引用标识符。从长远来看,他们的麻烦比他们值得的要多得多。

答:

2赞 Laurenz Albe 7/8/2019 #1

不同的计划不是由 vs. 引起的,而是由第二个查询中的子选择周围的附加项引起的。没有这个,计划是相同的。IN= ANYARRAY()

不同之处在于,在缓慢的执行中,索引扫描需要很长时间,而在编辑的(完全相同的)计划中,相同的扫描速度很快:

慢:

->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=3381.001..3381.001 rows=64 loops=1)
      Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))

快:

->  Bitmap Index Scan on "Products_product_state_id_upper_idx"  (cost=0.00..13.83 rows=61 width=0) (actual time=214.296..214.296 rows=64 loops=1)
      Index Cond: ((state = ANY ('{1,6,3}'::integer[])) AND (upper((title)::text) ~~ '%TORONTO%'::text))

同样有趣的是,在慢速计划中,生成索引扫描的第一行需要 3 秒。

无论问题是什么,它都是暂时的。唯一想到的怀疑是被杀死的索引元组:大规模删除留下了许多索引元组,这些元组指向死堆元组,这些元组只需要在第一次扫描,因为之后它们被标记为死元组

你有没有大规模删除?

评论

0赞 Tiancheng Liu 7/8/2019
Products_product_state_id_upper_idx 是 “Products_product”.title 上的卦索引。但它实际上是一个包含 state、title 和 id 的 3 列索引。这两个查询在完全相同的情况下运行。即使我清理缓存,使用 IN 也总是比 ANY 慢 10 倍。
0赞 Laurenz Albe 7/8/2019
我再说一遍,正如我在答案中所说,查询并不相同。如果您编辑问题并添加陈述,事情可能会变得更加清晰。CREATE INDEX
0赞 7/8/2019
嗯,我认为这两种说法是一样的。这不和 和 的重写一样吗?where x in (1,2,3)where x = any(array[1,2,3])
0赞 Laurenz Albe 7/8/2019
是的,但是 (转换为 ) 与 不同。WHERE x IN (SELECT something)WHERE x = ANY (SELECT something)WHERE x = ANY (array(SELECT something))
0赞 Laurenz Albe 7/9/2019
我已经更新了答案。因此,与您所说的相反,删除导致计划变得相同的原因,并且必须在其他地方寻找索引扫描缓慢的原因。array()