Postgres 忽略了我在数组（字符串）列上的 GIN 索引-解网

问：

我在 Students 表上有一个数组列（字符串）。此列可能（通常）包含数百个元素。

对于我的一个查询，我使用运算符将另一个数组（也可能有数百个元素）与该数组进行比较，以查看是否有任何重叠。它看起来像这样：&&

SELECT students.*
FROM students
WHERE (students.read_allow_sids && ARRAY['long', 'list', 'of', 'strings'])
AND NOT (students.read_deny_sids && ARRAY['long', 'list', 'of', 'strings'])

在我的测试数据库中，students 表大约有 10k 条记录。但是，我的查询大约需要 4.5 秒才能完成顺序扫描（太长了！

为了提高性能，我在数组中添加了一个 GIN 索引（在 Rails 迁移中）：

add_index :students, :read_allow_sids, using: 'gin'
add_index :students, :read_deny_sids, using: 'gin'

我注意到使用 EXPLAIN 时，我的新索引被忽略了，转而支持顺序扫描。我暂时禁用了它来检查索引的性能。瞧，一旦查询开始使用索引，它就从 4.5 秒 -> 200 毫秒（大约快 20 倍！SET enable_seqscan TO OFF;

编辑：这是查询的 EXPLAIN（ANALYZE， VERBOSE， BUFFERS）的输出（开启与关闭 seq 扫描）

开启后：

Seq Scan on public.students (cost=0.00..970.57 rows=786 width=522) (actual time=3.429..4319.502 rows=1698 loops=1)
  Output: id, classroom_id, created_at, updated_at, user_id, read_allow_sids, read_deny_sids, write_allow_sids, write_deny_sids
  Filter: ((students.read_allow_sids && '{long, list, of, strings}'::character varying[]) AND (NOT (students.read_deny_sids && '{long, list, of, strings}'::character varying[])))
  Rows Removed by Filter: 9554
  Buffers: shared hit=803
Planning Time: 20.932 ms
Execution Time: 4320.698 ms

关闭时：

Bitmap Heap Scan on public.students  (cost=1734.59..2634.39 rows=786 width=522) (actual time=127.219..137.151 rows=1698 loops=1)
  Output: id, classroom_id, created_at, updated_at, user_id, read_allow_sids, read_deny_sids, write_allow_sids, write_deny_sids
  Recheck Cond: (students.read_allow_sids && '{long, list, of, strings}'::character varying[])
  Filter: (NOT (students.read_deny_sids && '{long, list, of, strings}'::character varying[]))
  Heap Blocks: exact=549
  Buffers: shared hit=1390
  ->  Bitmap Index Scan on index_students_on_read_allow_sids  (cost=0.00..1734.40 rows=6453 width=0) (actual time=126.842..126.843 rows=1698 loops=1)
        Index Cond: (students.read_allow_sids && '{long, list, of, strings}'::character varying[])
        Buffers: shared hit=841
Planning Time: 21.460 ms
Execution Time: 137.890 ms

我希望 postgres 查询规划器在规划查询时选择最佳选择（即 GIN 索引），但事实并非如此。查询一开始非常简单......目前尚不清楚我将如何更改它以使查询计划器更喜欢索引。我已经熟悉 Postgres 没有“提示”的事实。我也已经尝试在 students 表上运行 ANALYZE 来更新其统计信息，但查询计划器仍然选择 seq 扫描。

所以最大的问题是：

如果索引速度如此之快，为什么查询规划器更喜欢 seq 扫描？
更重要的是，我们如何才能让查询规划器使用在生产环境中可行的方法更喜欢索引（即正确计划）？

SQL Ruby-on-Rails PostgreSQL 性能索引

感谢您的建议。遗憾的是，更新这些值不起作用。我确实尝试过在各个方向上更改它们，看看它如何改变成本。我能让它支持索引的唯一方法是设置为（与 at ）。根据 postgres 文档，将值设置为低于没有任何意义。改变对成本没有显着影响。这些结果会打开你脑海中的灯泡吗？我发现它们令人惊讶/困惑random_page_cost0.5seq_page_cost1.0randomseqcpu_index_tuple_cost

0赞 Frank Heikens 8/14/2023

@some_guy我必须查看查询计划并查看成本如何变化。计划者使用成本最低的计划。您可以执行的其他操作是更改此表的统计信息数。规划者确实希望从索引扫描中获得 6453 行，而只能找到 1698 行。您可以使用 ALTER TABLE ...设置。。。统计学？或者您使用 CREATE STATISTICS 为这两列的组合创建一些自定义统计信息。

0赞 Frank Heikens 8/15/2023

@some_guy What is the maximum value you used for cpu_tuple_cost? And what happens to your query plans when you crank it up even further? Check the costs for both plans, that determines what plan will be used

0赞 some_guy 8/15/2023

SETTINGS: seq_page_cost = 1.0; random_page_cost = 0.5; seq_page_cost cpu_tuple_cost = 0.1; cpu_index_tuple_cost = 0.001. --- The EXPLAIN:--- Bitmap Heap Scan on public.enrollments (cost=469.72..1963.41 rows=801 width=541) (actual time=2152.797..2164.859 rows=1698 loops=1) Heap Blocks: exact=569 Buffers: shared hit=1460 -> Bitmap Index Scan on index_enrollments_on_read_allow_sids (cost=0.00..469.52 rows=6578 width=0) (actual time=2152.448..2152.450 rows=1769 loops=1) Buffers: shared hit=891 Planning Time: 27.297 ms Execution Time: 2165.705 ms

0赞 some_guy 8/15/2023

If I do same settings EXCEPT change cpu_index_tuple_cost = 0.0025 --- EXPLAIN --- Seq Scan on public.enrollments (cost=0.00..1984.46 rows=801 width=541) (actual time=2.934..4396.847 rows=1698 loops=1) Rows Removed by Filter: 9554 Buffers: shared hit=803 Planning Time: 27.232 ms Execution Time: 4398.419 ms

上一个：根据关联模型的属性的最大值返回所有实例

下一个：如何查询自引用评论表以查找带有回复的评论，按最新回复排序？

Postgres 忽略了我在数组（字符串）列上的 GIN 索引

Postgres is ignoring my GIN index on an array (strings) column

评论

评论