使用mongoshake进行oplog同步读取,解决乱码问题
mongoshake 是个开源的用户mongo数据迁移和同步的工具,支持往各种目标源头写数据
具体:https://github.com/alibaba/MongoShake
有业务场景想把oplog 写入到kafka 中,如果直接在collector.conf 设置kafka 信息会导致写入kafka 中数据是乱码
官方解释是直接从collector 采集到的oplog是带有控制信息的。直接写入kafka 的内容使用时要进行剥离。
在下载mongoshake 包时他会提供receiver 进行控制信息剥离
mongo --> collector --> kafka --> receiver --> 业务
mongo --> collector --> receiver --> kafka --> 业务
这里更倾向于第二种
collector --> receiver 我采用的是tcp
配置:
collector.conf
tunnel = tcp
tunnel.address = 127.0.0.1:9300
receiver.conf
tunnel = tcp
tunnel.address = 127.0.0.1:9300
这里会很奇怪,也没有设置kafka 的地方啊,这样所有oplog剥离信息都会放在receiver 的log下
这里官方解释是要求我们对源码进行修改、编译,源码是GO 写的,改起来也比较熟悉
下载官方源码
src/mongoshake/receiver/replayer.go
在handler()
/*
* Users should modify this function according to different demands.
*/
func (er *ExampleReplayer) handler() {
config := sarama.NewConfig()//kafka配置
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Return.Successes = true
kafkaClient, err := sarama.NewSyncProducer([]string{conf.Options.KafkaHost}, config)
if err != nil {
LOG.Info("producer close,err:", err)
return
}
defer kafkaClient.Close()
for msg := range er.pendingQueue {
count := uint64(len(msg.message.RawLogs))
if count == 0 {
// probe request
continue
}
// parse batched message
oplogs := make([]*oplog.PartialLog, len(msg.message.RawLogs))
for i, raw := range msg.message.RawLogs {
oplogs[i] = new(oplog.PartialLog)
if err := bson.Unmarshal(raw, oplogs[i]); err != nil {
// impossible switch, need panic and exit
LOG.Crashf("unmarshal oplog[%v] failed[%v]", raw, err)
return
}
oplogs[i].RawSize = len(raw)
//这里是对oplog 做了一些定制化内容
kafkaOpLog := KafkaOpLog{}
kafkaOpLog.Namespace = oplogs[i].Namespace
kafkaOpLog.Query = oplogs[i].Query
kafkaOpLog.Object = oplogs[i].Object.Map()
kafkaOpLog.Operation = oplogs[i].Operation
kafkaOpLog.Timestamp = oplogs[i].Timestamp
msg := &sarama.ProducerMessage{}
msg.Topic = conf.Options.KafkaTopic
encode ,err := json.Marshal(kafkaOpLog)
if err != nil {
_ = LOG.Error("oplogs bson.MarshalJSON err",err)
continue
}
msg.Value = sarama.StringEncoder(encode)
msg.Key = sarama.StringEncoder(kafkaOpLog.Namespace)
_, _, err = kafkaClient.SendMessage(msg)
if err != nil {
_ = LOG.Error("send message failed,", err)
return
}
//原来源码中只是打印了log
//LOG.Info(oplogs[i]) // just print for test, users can modify to fulfill different needs
}
if callback := msg.completion; callback != nil {
callback() // exec callback
}
// get the newest timestamp
n := len(oplogs)
lastTs := utils.TimestampToInt64(oplogs[n-1].Timestamp)
er.Ack = lastTs
LOG.Debug("handle ack[%v]", er.Ack)
// add logical code below
}
}
然后go build 使用

浙公网安备 33010602011771号