使用 Velero 和 MinIO 本地的 MongoDB 持久卷备份和还原错误-解网

问：

使用 Velero 和 MinIO 和 Restic 备份 Kubernetes 本地集群上由三个节点组成的 MongoDB 集群，在还原备份后会触发其中一个节点的致命错误：

"ERROR","verbose_level_id":-3,"msg":"__wt_block_read_off:226:WiredTigerHS.wt: potential hardware corruption, read checksum error for 4096B block at offset 172032: block header checksum of 0x63755318 doesn't match expected checksum of 0x22b37ec4" "ERROR","verbose_level_id":-3,"msg":"__wt_block_read_off:235:WiredTigerHS.wt: fatal read error","error_str":"WT_ERROR: non-specific WiredTiger error","error_code":-31802 "ERROR","verbose_level_id":-3,"msg":"__wt_block_read_off:235:the process must exit and restart","error_str":"WT_PANIC: WiredTiger library panic","error_code":-31804 Fatal assertion","attr":{"msgid":50853,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp","line":712 \n\n***aborting after fassert() failure\n\n Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n

请注意：

我们在 Azure 上使用相同的应用程序，并且不会触发这样的错误，Azure 上的备份和还原按预期工作
我们使用版本 4.4.11 和 6.0.5 对其进行了测试

我们继续执行以下步骤：

备份应用程序的整个命名空间（在此期间不使用应用程序）
删除 NameApce
删除所有 PV 的 claimRef（以便它们再次可用）
删除存储在 Kubernetes 节点上的所有持久性数据
还原整个命名空间

在此命名空间中，我们使用 Cassandra、RabbitMQ 和 MongoDB。除了一个MongoDB节点外，一切都恢复得很好（包括两个MongoDB节点），该节点大部分时间都处于“Back-off restarting failed container”状态（即使在触发了手动“mongod --repair”之后）。

您知道是什么原因导致此问题以及我们如何解决它吗？

MongoDB Kubernetes minio velero

使用 Velero 和 MinIO 本地的 MongoDB 持久卷备份和还原错误

MongoDB persistent volume backup and restore error using Velero and MinIO on-premise

评论