Flink-cdc同步mysql到iceberg丢失数据排查
一、获取任务信息
任务id:i01f51582-d8be-4262-aefa-000000
任务名称:ods_test1234
丢失的数据时间:2024-09-16 09:28:47
二、数据同步查看日志
1、筛选日志
筛选2024-09-16 09:28:47 的前后5分钟后数据
2、查找快照id,筛选内容
Committed snapshot 7258609197164498019 (BaseRowDelta)

3、查看快照提交的数据和删除文件
筛选内容:Emit iceberg write result dataFiles
Emit iceberg write result dataFiles: [
GenericDataFile{content=data, file_path=ofs://sss/test1235/data/00000-0-8a53aa17-c767-47bd-b865-32e13d54bd8e-02520.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=592, file_size_in_bytes=1067447, column_sizes={1=2113, 2=2077, 3=2393, 4=3018, 5=1667, 6=166, 7=95, 8=1412, 9=1121, 10=163, 11=1409}, value_counts={1=592, 2=592, 3=592, 4=592, 5=592, 6=592, 7=592, 8=592, 9=592, 10=592, 11=592}, null_value_counts={1=0, 2=0, 3=0, 4=0, 5=0, 6=0, 7=0, 8=0, 9=0, 10=0, 11=0}, nan_value_counts={}, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@fcc2b49e, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@1b557080, key_metadata=null, split_offsets=[4], equality_ids=null, sort_order_id=0}], 
result.deleteFiles [GenericDeleteFile{content=equality_deletes, file_path=ofs://sss/test1235/data/00000-0-8a53aa17-c767-47bd-b865-32e13d54bd8e-02521.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=534, file_size_in_bytes=1050981, column_sizes={1=1931}, value_counts={1=534}, null_value_counts={1=0}, nan_value_counts={}, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@471b5f22, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@9fc9501c, key_metadata=null, split_offsets=null, equality_ids=[1], sort_order_id=0}, 
GenericDeleteFile{content=position_deletes, file_path=ofs://sss/test1235/data/00000-0-8a53aa17-c767-47bd-b865-32e13d54bd8e-02522.parquet, file_format=PARQUET, spec_id=0, partition=PartitionData{}, record_count=58, file_size_in_bytes=2004, column_sizes={2147483546=239, 2147483545=153}, value_counts={2147483546=58, 2147483545=58}, null_value_counts={2147483546=0, 2147483545=0}, nan_value_counts={}, lower_bounds=org.apache.iceberg.SerializableByteBufferMap@c337bae4, upper_bounds=org.apache.iceberg.SerializableByteBufferMap@c337baa5, key_metadata=null, split_offsets=null, equality_ids=null, sort_order_id=null}] 
4、查看数据文件内容
hadoop jar ./parquet-tools-1.11.0.jar cat ofs:/路径/data/00000-0-8a53aa17-c767-47bd-b865-32e13d54bd8e-02520.parquet |grep -C 20 '27966798647001472'
下载 hadoop fs -get ofs:///data/00000-0-8a53aa17-c767-47bd-b865-32e13d54bd8e-02520.parquet ~/iceberg

5、查看equality_deletes删除文件内容,可以看到被删除
hadoop jar ./parquet-tools-1.11.0.jar cat ofs:/路径/data/00000-0-8a53aa17-c767-47bd-b865-32e13d54bd8e-02520.parquet |grep '27966798647001472'

6、查找本地快照对应的binlog文件和位点
筛选内容:Binlog offset on checkpoint
mysql-bin.022296  pos从459680089  474797741
2024-09-16 09:25:53.026 [Source: MySQL-CDC- -> Calc(select=[id, course_number, clazz_number, subclazz_number, user_id, CAST(status) AS status, CASE(isdel IS NOT NULL, CASE(isdel, 1, 0), null:INTEGER) AS isdel, CAST(create_time) AS create_time, CAST(update_time) AS update_time, enter_status, CAST(join_clazz_time) AS join_clazz_time]) -> NotNullEnforcer(fields=[id]) (1/1)#0] INFO  
com.ververica.cdc.connectors.mysql.source.reader.MySqlSourceReader - Binlog offset on checkpoint 859: {transaction_id=null, ts_sec=1726449952, file=mysql-bin.022296, pos=459680089, kind=SPECIFIC, gtids=15fad577-6501-11ea-b6b2-b8599fae21fa:9456479883-9480246523, row=1, event=2, server_id=31681949}
2024-09-16 09:30:53.037 [Source: MySQL-CDC- -> Calc(select=[id, course_number, clazz_number, subclazz_number, user_id, CAST(status) AS status, CASE(isdel IS NOT NULL, CASE(isdel, 1, 0), null:INTEGER) AS isdel, CAST(create_time) AS create_time, CAST(update_time) AS update_time, enter_status, CAST(join_clazz_time) AS join_clazz_time]) -> NotNullEnforcer(fields=[id]) (1/1)#0] INFO  
com.ververica.cdc.connectors.mysql.source.reader.MySqlSourceReader - Binlog offset on checkpoint 860: {transaction_id=null, ts_sec=1726450252, file=mysql-bin.022296, pos=474797741, kind=SPECIFIC, gtids=15fad577-6501-11ea-b6b2-b8599fae21fa:9456479883-9480269403, row=1, event=2, server_id=31681949}

7、查询binlog
(1)查看binlog文件是否存在:show binary logs
(2)下载binlog,需要保持本地mysql版本和服务器一致
mysqlbinlog --host=??? --user=???? --password=??? --read-from-remote-server --raw mysql-bin.022296
(3)查看binlog文件内容
mysqlbinlog --base64-output=DECODE-ROWS -v ./mysql-bin.022296 | head -n 10
mysqlbinlog --base64-output=DECODE-ROWS -v ./mysql-bin.022296 | grep -C 20 27966798647001472
show binlog events in 'mysql-bin.022296' from 4 limit 5

 
                     
                    
                 
                    
                
 
 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号