使用 Redis Sentinel 时遇到“failover-abort-slave-timeout”错误

Encountered 'failover-abort-slave-timeout' error while using Redis Sentinel

提问人:Joonseo Lee 提问时间:8/11/2023 最后编辑:Joonseo Lee 更新时间:8/17/2023 访问量:98

问:

我目前正在设置 Redis Sentinel,但面临从属节点未按预期提升到主节点的问题。我正在单个服务器上进行测试,并按如下方式构建它,通过端口区分它们:

7001:主
7002:从 7003:从

7101:哨兵 7102:哨兵

7103:哨兵

根据我的观察,哨兵似乎尝试升级,但没有收到任何响应,导致无休止的重启循环。如何排查和解决此问题?

先谢谢你

[root@ip-172-31-14-169 redis-env]# redis-cli -c -p 7001 cluster nodes
ab72c65d95714772ba0261e5c1a79691ae84fa88 127.0.0.1:7001@17001 myself,master - 0 1691713101000 6 connected 0-16383
103be0fdbfaaba54fc11c779e80c8158f4728537 127.0.0.1:7003@17003 slave ab72c65d95714772ba0261e5c1a79691ae84fa88 0 1691713102605 6 connected
74a6d8b6f82ab2079569f8d0bba606e11a11c3fc 127.0.0.1:7002@17002 slave ab72c65d95714772ba0261e5c1a79691ae84fa88 0 1691713102000 6 connected
[root@ip-172-31-14-169 redis-env]# redis-cli -p 7101 info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=127.0.0.1:7001,slaves=2,sentinels=3
13795:X 10 Aug 2023 23:36:24.464 * Running mode=sentinel, port=7102.
13795:X 10 Aug 2023 23:36:24.467 # Sentinel ID is 2383756333fe3ea0a7982cf2cc2cfd6c7697884b
13795:X 10 Aug 2023 23:36:24.467 # +monitor master mymaster 127.0.0.1 7001 quorum 2
13795:X 10 Aug 2023 23:36:24.467 * +slave slave 127.0.0.1:7002 127.0.0.1 7002 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:36:24.470 * +slave slave 127.0.0.1:7003 127.0.0.1 7003 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:36:24.559 * +sentinel sentinel 6d2bc92e99cf42d5d9ef2d70429a16561b28ab47 127.0.0.1 7101 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:36:28.126 * +sentinel sentinel 822b45c804c370d9a0c93b42da03af3cbea2b5a5 127.0.0.1 7103 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.237 # +sdown master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.290 # +odown master mymaster 127.0.0.1 7001 #quorum 2/2
13795:X 10 Aug 2023 23:37:36.290 # +new-epoch 1
13795:X 10 Aug 2023 23:37:36.290 # +try-failover master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.293 # +vote-for-leader 2383756333fe3ea0a7982cf2cc2cfd6c7697884b 1
13795:X 10 Aug 2023 23:37:36.298 # 822b45c804c370d9a0c93b42da03af3cbea2b5a5 voted for 2383756333fe3ea0a7982cf2cc2cfd6c7697884b 1
13795:X 10 Aug 2023 23:37:36.299 # 6d2bc92e99cf42d5d9ef2d70429a16561b28ab47 voted for 2383756333fe3ea0a7982cf2cc2cfd6c7697884b 1
13795:X 10 Aug 2023 23:37:36.359 # +elected-leader master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.359 # +failover-state-select-slave master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.443 # +selected-slave slave 127.0.0.1:7002 127.0.0.1 7002 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.443 * +failover-state-send-slaveof-noone slave 127.0.0.1:7002 127.0.0.1 7002 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:37:36.543 * +failover-state-wait-promotion slave 127.0.0.1:7002 127.0.0.1 7002 @ mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:39:36.589 # -failover-abort-slave-timeout master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:39:36.673 # Next failover delay: I will not start a failover before Thu Aug 10 23:41:37 2023
13795:X 10 Aug 2023 23:41:36.775 # +new-epoch 2
13795:X 10 Aug 2023 23:41:36.778 # +vote-for-leader 6d2bc92e99cf42d5d9ef2d70429a16561b28ab47 2
13795:X 10 Aug 2023 23:41:36.819 # Next failover delay: I will not start a failover before Thu Aug 10 23:45:37 2023
13795:X 10 Aug 2023 23:45:17.170 * +reboot master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:45:17.253 # -sdown master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:45:17.253 # -odown master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:45:47.799 # +sdown master mymaster 127.0.0.1 7001
13795:X 10 Aug 2023 23:45:47.883 # +odown master mymaster 127.0.0.1 7001 #quorum 2/2
13795:X 10 Aug 2023 23:45:47.883 # +new-epoch 3

哨兵配置

除端口号外,以下文件具有相同的配置。

bind 0.0.0.0
port 7101
daemonize yes

pidfile "/var/run/redis-sentinel_7101.pid"
logfile "/var/log/redis/sentinel_7101.log"
dir "/tmp"

sentinel monitor mymaster 127.0.0.1 7001 2
sentinel down-after-milliseconds mymaster 3000
acllog-max-len 128
sentinel failover-timeout mymaster 120000
sentinel deny-scripts-reconfig yes
sentinel resolve-hostnames no
sentinel announce-hostnames no

# Generated by CONFIG REWRITE
protected-mode no
user default on nopass sanitize-payload ~* &* +@all
sentinel myid 6d2bc92e99cf42d5d9ef2d70429a16561b28ab47
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 9
sentinel current-epoch 9
sentinel known-replica mymaster 127.0.0.1 7002
sentinel known-replica mymaster 127.0.0.1 7003
sentinel known-sentinel mymaster 127.0.0.1 7102 2383756333fe3ea0a7982cf2cc2cfd6c7697884b
sentinel known-sentinel mymaster 127.0.0.1 7103 822b45c804c370d9a0c93b42da03af3cbea2b5a5
雷迪斯 哨兵

评论


答:

0赞 duckoak 8/17/2023 #1

看起来您同时使用了 Redis 集群和 Redis Sentinel。这些组件不应相互运行。

您可能想要的是:

  • 主副本 Redis 集群或
  • 使用复制和 Redis Sentinel 将 Redis 实例分开,以实现高可用性。