PostgreSQL 的进程相互阻塞不会被检测为死锁

PostgreSQL's processes blocking each other are not detected as deadlock

提问人:xoip 提问时间:9/26/2023 更新时间:9/26/2023 访问量:47

问:

PostgreSQL 的版本:13.10

配置:deadlock_timeout:1s,log_lock_waits:关闭

我在日志中发现了一些检测到的死锁:

2023-09-22 16:54:19.425 UTC 650dc6ba.5732 497942 22322 ERROR:  deadlock detected
2023-09-22 16:54:19.425 UTC 650dc6ba.5732 497942 22322 DETAIL:  Process 22322 waits for ExclusiveLock on tuple (0,48) of relation 415480 of database 16873; blocked by process 13502.
        Process 13502 waits for ShareLock on transaction 497938; blocked by process 13482.
        Process 13482 waits for ShareLock on transaction 497932; blocked by process 21544.
        Process 21544 waits for ExclusiveLock on tuple (0,48) of relation 415480 of database 16873; blocked by process 22322.
2023-09-22 16:54:20.292 UTC 650dc538.34aa 497938 13482 ERROR:  deadlock detected
2023-09-22 16:54:20.292 UTC 650dc538.34aa 497938 13482 DETAIL:  Process 13482 waits for ShareLock on transaction 497932; blocked by process 21544.
        Process 21544 waits for ExclusiveLock on tuple (0,48) of relation 415480 of database 16873; blocked by process 13502.
        Process 13502 waits for ShareLock on transaction 497938; blocked by process 13482.

几个小时后(在检测到死锁之后),我可以看到上述日志中的两个进程在pg_locks表中提到。

根据官方文档(https://wiki.postgresql.org/wiki/Lock_Monitoring#.D0.A1ombination_of_blocked_and_blocking_activity),我可以使用以下查询查看被阻止和阻止的活动:

查询(锁)

select relation::regclass, * from pg_locks where not granted;

输出(锁)

-[ RECORD 1 ]------+--------------
relation           |
locktype           | transactionid
database           |
relation           |
page               |
tuple              |
virtualxid         |
transactionid      | 497932
classid            |
objid              |
objsubid           |
virtualtransaction | 16/1248
pid                | 13502
mode               | ShareLock
granted            | f
fastpath           | f

-[ RECORD 2 ]------+--------------
relation           |
locktype           | transactionid
database           |
relation           |
page               |
tuple              |
virtualxid         |
transactionid      | 497941
classid            |
objid              |
objsubid           |
virtualtransaction | 41/34
pid                | 21544
mode               | ShareLock
granted            | f
fastpath           | f

查询(阻塞 pids)

SELECT blocked_locks.pid     AS blocked_pid,
     blocked_activity.usename  AS blocked_user,
     blocking_locks.pid     AS blocking_pid,
     blocking_activity.usename AS blocking_user,
     blocked_activity.query    AS blocked_statement,
     blocking_activity.query   AS current_statement_in_blocking_process
FROM  pg_catalog.pg_locks         blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks         blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

输出(阻塞 pids):

-[ RECORD 1 ]-------------------------+-----------------------------------
blocked_pid                           | 13502
blocked_user                          | my_username
blocking_pid                          | 21544
blocking_user                         | my_username
blocked_statement                     | SELECT * FROM my_procedure_1()
current_statement_in_blocking_process | SELECT * FROM my_procedure_2()

-[ RECORD 2 ]-------------------------+-----------------------------------
blocked_pid                           | 21544
blocked_user                          | my_username
blocking_pid                          | 13502
blocking_user                         | my_username
blocked_statement                     | SELECT * FROM my_procedure_2()
current_statement_in_blocking_process | SELECT * FROM my_procedure_1()

Linux 控制台输出:

ps -ef | grep postgres | grep waiting

  postgres   13502     972  0 Sep22 ?        00:00:01 postgres: 127.0.0.1(43736) SELECT waiting
  postgres   21544     972  0 Sep22 ?        00:00:00 postgres: 127.0.0.1(41848) SELECT waiting
strace -p 13502
  strace: Process 13502 attached
  epoll_wait(96, ^Cstrace: Process 13502 detached
   <detached ...>

strace -p 21544
  strace: Process 21544 attached
  epoll_wait(77, ^Cstrace: Process 21544 detached
   <detached ...>

I understand that I have to modify the code in order to avoid the deadlocks. I will do that.

I need answers to the following questions:

  1. 13502 is blocked by 21544 and 21544 is blocked by 13502. Why PostgreSQL didn't detect this situation as a deadlock ? The fact that I'm using procedure calls could mislead PostgreSQL ?
  2. The lock mode used is ShareLock (as mentioned in the locks output). The official documentation (https://www.postgresql.org/docs/13/explicit-locking.html) says that ShareLock can be "Acquired by CREATE INDEX (without CONCURRENTLY)." I'm not using "CREATE INDEX" or something related to indexes in my queries. The question is, why is ShareLock used for the locking system in my situation ?
PostgreSQL 锁定 死锁 PostgreSQL-13

评论

1赞 Laurenz Albe 9/26/2023
I think your query for blocking PIDs is wrong. You should figure out which sessions hold the transactions that the processes are waiting for: .SELECT transactionid, pid FROM pg_locks WHERE transactionid IN (497932, 497941) AND mode = 'ExclusiveLock';
0赞 xoip 9/26/2023
SELECT transactionid, pid FROM pg_locks WHERE transactionid IN (497932, 497941) AND mode = 'ExclusiveLock'; transactionid | pid ---------------+------- 497932 | 21544 497941 | 13502 (2 rows)
0赞 Laurenz Albe 9/26/2023
That looks like a deadlock alright. Strange. If you have , do you see any log messages from the time when the processes started hanging?log_lock_waits = on
0赞 xoip 9/26/2023
As I mentioned at the second line, log_lock_waits is off.
1赞 jjanes 9/26/2023
The documentation you cite is only for table locks. But what you have here is not a table lock, it is a transactionid lock. In that case, ShareLock means you are waiting for some other transaction to end. This doesn't explain why the deadlock occurs or why it is not detected, it just addresses your narrow 2nd question.

答: 暂无答案