代码改变世界

MongoDB 重新同步复制集成员

2024-03-16 15:58  abce  阅读(11)  评论(0编辑  收藏  举报

测试的复制集因为磁盘空间的问题,落后主节点太多,数据不同步:

{"t":{"$date":"2024-02-19T16:13:06.387+08:00"},"s":"I",  "c":"REPL",     "id":21799,   "ctx":"ReplCoordExtern-0","msg":"Sync source candidate chosen","attr":{"syncSource":"x.x.x.82:27017"}}
{"t":{"$date":"2024-02-19T16:13:06.388+08:00"},"s":"I",  "c":"REPL",     "id":5579708, "ctx":"ReplCoordExtern-0","msg":"We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp","attr":{"candidate":"x.x.x.82:27017","lastOpTimeFetchedTimestamp":{"$timestamp":{"t":1702315041,"i":1249}},"remoteEarliestOpTimeTimestamp":{"$timestamp":{"t":1707324692,"i":393}},"denylistDurationMinutes":1,"denylistUntil":{"$date":"2024-02-19T08:14:06.388Z"}}}
{"t":{"$date":"2024-02-19T16:13:06.388+08:00"},"s":"I",  "c":"REPL",     "id":21799,   "ctx":"ReplCoordExtern-0","msg":"Sync source candidate chosen","attr":{"syncSource":"x.x.x.81:27017"}}
{"t":{"$date":"2024-02-19T16:13:06.389+08:00"},"s":"I",  "c":"REPL",     "id":5579708, "ctx":"ReplCoordExtern-0","msg":"We are too stale to use candidate as a sync source. Denylisting this sync source because our last fetched timestamp is before their earliest timestamp","attr":{"candidate":"x.x.x.81:27017","lastOpTimeFetchedTimestamp":{"$timestamp":{"t":1702315041,"i":1249}},"remoteEarliestOpTimeTimestamp":{"$timestamp":{"t":1707324526,"i":1115}},"denylistDurationMinutes":1,"denylistUntil":{"$date":"2024-02-19T08:14:06.389Z"}}}
{"t":{"$date":"2024-02-19T16:13:06.389+08:00"},"s":"I",  "c":"REPL",     "id":21798,   "ctx":"ReplCoordExtern-0","msg":"Could not find member to sync from"}
{"t":{"$date":"2024-02-19T16:13:20.998+08:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":"[1708330400:998702][596:0x7ff714c47700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 15, snapshot max: 15 snapshot count: 0, oldest timestamp: (1702314741, 1249) , meta checkpoint timestamp: (1702315041, 1249) base write gen: 33406434"}}

 

当复制集成员的复制进程落后太多,以至于主节点覆盖了该成员尚未复制的 oplog 条目时,复制集成员就会变得 "stale"。出现这种情况时,必须删除成员的数据并执行初始同步,从而完全重新同步成员。

 

MongoDB 提供了两种执行初始同步的选项:

1.以空数据目录重启 mongod,让 MongoDB 的正常初始同步功能恢复数据。这是更简单的选择,但可能需要更长的时间来替换数据。

2.使用复制集中另一个成员的最新数据目录副本重启机器。此程序可以更快地替换数据,但需要更多手动步骤。

 

同步成员时,请选择系统带宽足以移动大量数据的时间。将同步安排在使用率较低的时间或维护窗口期间。

 

方法一:自动同步成员

这个过程依赖momgodb复制集同步的常规过程。还可以在 dbPath 目录不包含内容的情况下重启实例,从而强制已是集合成员的 mongod 执行初始同步:

 

1.关闭 mongodb 成员实例

要确保干净的关闭,可以使用db.shutdownServer()、或者使用mongod --shutdown

2.可选项

备份数据

3.删除dbPath目录下的所有数据和子目录

4.重启mongodb实例

 

启动后,mongod就会执行初始化同步。初始化同步消耗的时间取决与数据量的大小、以及成员节点之间的网络延迟。

 

方法二:从其它成员节点拷贝数据文件进行同步

1.从其它成员节点拷贝数据文件

2.成员同步

数据文件拷贝完成后,使用一个新的 members[n]._id 启动 mongodb 实例

 

注意点:拷贝数据文件,记得要包含local数据库的内容。此外,不要使用mongodump的备份,必须是快照一致性备份。