是否有在启用了 RLS 策略的表中搜索文本列的解决方法-解网

问：

令我惊讶的是，我注意到简单的文本搜索 API 不利用作为 RLS 策略一部分的表的索引。例如，mytable 有一个策略，我正在使用一个名为 text_vector 的列来搜索：

SELECT inst_id, text_vector, ts_rank(text_vector, q) AS rank, q
  FROM mytable, plainto_tsquery('apple & orange') q
 WHERE text_vector @@ q

我猜，这与它们不防漏有关。是否有任何一种在不更改系统表的情况下的解决方法（例如，我在 RDS 中无法做到这一点）？

为了重现，我执行了以下操作：

CREATE TABLE mytable (inst_id INTEGER, text_content TEXT);

ALTER TABLE mytable 
ADD COLUMN text_vector TSVECTOR 
GENERATED ALWAYS AS (to_tsvector('english', text_content)) STORED;

CREATE INDEX idx_text_vector ON mytable USING GIN (text_vector);

INSERT INTO mytable (inst_id, text_content) VALUES (1, 'An apple a day is good for you.');
INSERT INTO mytable (inst_id, text_content) VALUES (2, 'My favorite color is orange.');
INSERT INTO mytable (inst_id, text_content) VALUES (3, 'You can have an orange or apple, not both.');

（又插入了 10K 行......

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT inst_id, text_vector, ts_rank(text_vector, q) AS rank, q
FROM mytable, plainto_tsquery('apple & orange') q
WHERE text_vector @@ q

Nested Loop  (cost=17.03..264.54 rows=100 width=159) (actual time=0.037..0.038 rows=1 loops=1)
  Output: mytable.inst_id, mytable.text_vector, ts_rank(mytable.text_vector, q.q), q.q
  Buffers: shared hit=8
  ->  Function Scan on pg_catalog.plainto_tsquery q  (cost=0.25..0.26 rows=1 width=32) (actual time=0.019..0.020 rows=1 loops=1)
        Output: q.q
        Function Call: plainto_tsquery('apple & orange'::text)
  ->  Bitmap Heap Scan on myschema.mytable  (cost=16.78..263.03 rows=100 width=123) (actual time=0.013..0.013 rows=1 loops=1)
        Output: mytable.inst_id, mytable.text_content, mytable.text_vector
        Recheck Cond: (mytable.text_vector @@ q.q)
        Heap Blocks: exact=1
        Buffers: shared hit=8
        ->  Bitmap Index Scan on idx_text_vector  (cost=0.00..16.75 rows=100 width=0) (actual time=0.010..0.010 rows=1 loops=1)
              Index Cond: (mytable.text_vector @@ q.q)
              Buffers: shared hit=7
Planning:
  Buffers: shared hit=1
Planning Time: 0.065 ms
Execution Time: 0.065 ms

（在mytable上创建了一个调用函数has_access（）的策略）

CREATE OR REPLACE FUNCTION has_access(p_inst_id BIGINT) RETURNS BOOLEAN
LANGUAGE 'plpgsql' SECURITY DEFINER
AS $has_access$
DECLARE
  v_has_access BOOLEAN;
BEGIN
  v_has_access := (SELECT TRUE WHERE p_inst_id < 4);
  RETURN v_has_access;
END;
$has_access$;

CREATE POLICY pol_has_access 
ON mytable 
FOR SELECT TO PUBLIC USING (has_access(inst_id));
ALTER TABLE mytable ENABLE ROW LEVEL SECURITY;

（以非所有者身份登录系统，在 mytable 上具有 SELECT 权限）

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT inst_id, text_vector, ts_rank(text_vector, q) AS rank, q
FROM mytable, plainto_tsquery('apple & orange') q
WHERE text_vector @@ q

Nested Loop  (cost=0.25..5816.14 rows=33 width=159) (actual time=0.053..39.206 rows=1 loops=1)
  Output: mytable.inst_id, mytable.text_vector, ts_rank(mytable.text_vector, q.q), q.q
  Join Filter: (mytable.text_vector @@ q.q)
  Rows Removed by Join Filter: 2
  Buffers: shared hit=479
  ->  Function Scan on pg_catalog.plainto_tsquery q  (cost=0.25..0.26 rows=1 width=32) (actual time=0.018..0.020 rows=1 loops=1)
        Output: q.q
        Function Call: plainto_tsquery('apple & orange'::text)
  ->  Seq Scan on myschema.mytable  (cost=0.00..5732.41 rows=6671 width=123) (actual time=0.026..39.181 rows=3 loops=1)
        Output: mytable.inst_id, mytable.text_content, mytable.text_vector
        Filter: has_access((mytable.inst_id)::bigint)
        Rows Removed by Filter: 20010
        Buffers: shared hit=479
Planning Time: 0.048 ms
Execution Time: 39.220 ms

请注意，索引被用作所有者，但当我以非所有者身份执行相同的查询时，RLS 策略会阻止使用该索引。

PostgreSQL 全文搜索行级安全性

是的，那一定是由 RLS 引起的。PostgreSQL 必须防止在行级安全策略施加的条件之前调用非函数。现在操作人员不防漏。因此，即使同一索引扫描（例如，多列 GiST 索引）可以满足 RLS 策略，该运算符也永远不会在索引扫描条件下使用。LEAKPROOF@@

唯一的补救措施是将用户从该表上的 RLS 中免除。对不起，安全不是免费的。

您可能想知道为什么像这样无害的操作员不能防漏。PostgreSQL在这里采用了非常严格的策略：只有当你能证明你提供的任何可能的参数都不会触发一个错误，让你猜测有关其他参数的任何信息时，才会标记一个函数。例如，for 不防漏，因为溢出错误会泄漏有关另一个参数的信息。@@LEAKPROOF+integer

谢谢！我查了一下IMMUTABLE。它表示，它表明该函数“不进行数据库查找或以其他方式使用其参数列表中未直接存在的信息”。在我们的实际用例中（不是此处的简化用例），它正在执行数据库查找以确定用户是否有权访问数据库行。因此，它不可能是不可变的，除非我误解了它的用法。此外，我遇到的问题是使用非 LEAKPROOF 方法查询受 RLS 保护的表（在此示例中为 @@）。如果我遗漏了什么，请告诉我;我还在学习......

上一个：具有用户-租户灵活性的多租户系统中 JWT 管理的最佳实践？

下一个：如何为 Web 应用程序设置 Postgres RLS

是否有在启用了 RLS 策略的表中搜索文本列的解决方法

Is there a workaround for searching text columns in RLS Policy-enabled tables

评论

评论

评论