两表hash比较
PG ↔ openGauss 两表 Hash 比较
1. 功能概述
在指定时间范围内,对比 PostgreSQL(源库,小写表名/列名)与 openGauss(目标库,大写表名/列名)同一张表的数据一致性:
- 通过 MD5 行 hash 流式扫描,避免全量加载
- 输出三类差异:PG 独有、OG 独有、hash 不一致(含列级 diff)
- 异步执行:HTTP 立即返回
taskId,结果仅写控制台日志 - 数据源 ID 从
application.yaml读取,连接信息来自已有数据源管理模块
2. 前置依赖
本功能不独立连接数据库,依赖现有数据源管理模块:
| 依赖类 | 用途 |
|---|---|
DataSourceMapper |
按 id 查询 DataSourceDO |
DataSourceService |
getDecryptedPassword(id) 解密密码 |
DataSourceDO |
host/port/database/schema/username/type 等 |
DataSourceTypeEnum |
fromCode、buildJdbcUrl、resolveDriverClassName(须支持 pgsql、opengauss) |
BusinessException |
业务异常 |
ErrorCode |
错误码(至少用到 PARAM_ERROR、NOT_FOUND、BUSINESS_ERROR、CONNECTION_FAILED、SYSTEM_ERROR) |
Result<T> |
统一 HTTP 响应包装 |
Maven 驱动(若尚未引入)
<!-- PostgreSQL -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
</dependency>
<!-- openGauss(按公司私服坐标调整) -->
<dependency>
<groupId>org.opengauss</groupId>
<artifactId>opengauss-jdbc</artifactId>
</dependency>
包名替换说明
下文代码中 import 均为 com.liang.learn.*,迁移时全局替换为公司项目包名(如 com.company.xxx)。
3. 配置项
在 application.yaml 增加:
app:
table-hash:
pgsql-data-source-id: 3 # PG 数据源在 data_source_manage 表中的 id
opengauss-data-source-id: 2 # openGauss 数据源 id
detail-log-limit: 500 # 每类差异明细最多打印条数
fetch-batch-size: 500 # 差异 id 批量 IN 查询每批大小
compare-fetch-size: 1000 # 流式 hash 查询 JDBC fetchSize
4. 架构设计
4.1 分层结构
┌─────────────────────────────────────────────────────────────┐
│ TableHashController POST /api/table-hash/compare │
└───────────────────────────────┬─────────────────────────────┘
│ 校验 DTO,提交任务
┌───────────────────────────────▼─────────────────────────────┐
│ TableHashCompareServiceImpl │
│ ThreadPoolTaskExecutor.execute → 异步 compare │
└───────────────────────────────┬─────────────────────────────┘
│
┌───────────────────────────────▼─────────────────────────────┐
│ TableHashCompareExecutor(编排器,同步逻辑) │
│ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ │
│ │TableMetadataLoader│ │HashSqlBuilder │ │ExternalJdbc │ │
│ │(PG information_ │ │(动态 MD5 SQL) │ │Executor │ │
│ │ schema 读列) │ │ │ │(DriverManager)│
│ └─────────────────┘ └──────────────────┘ └─────────────┘ │
│ ┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ │
│ │HashRowCursor │ │StreamingHashComparer│ │ColumnDiff │ │
│ │(流式 ResultSet) │ │+ IdComparator │ │Calculator │ │
│ └─────────────────┘ └──────────────────┘ └─────────────┘ │
│ ┌─────────────────────────────────────────────────────────┐│
│ │TableHashCompareLogger → 控制台结构化日志 ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
↑ ↑
TableHashProperties DataSourceMapper / Service
(yaml 配置) (数据源 id → 连接信息)
4.2 核心设计决策
| 项 | 结论 |
|---|---|
| id 列 | 表首列(ordinal_position = 1),可为 bigint 或 varchar |
| id 比较 | 整数类型用数值序(BigInteger),与 SQL ORDER BY 一致 |
| 时间范围 | [startTime, endTime) 左闭右开,timestamp + setTimestamp 绑定 |
| Schema | 从各自数据源配置读取,API 只传小写 tableName |
| OG 标识符 | schema 原样引用,table/column 转大写双引号 |
| 排除 hash | blob、clob、bytea、text(varchar 仍参与) |
| 未知类型 | 严格模式,任务失败 |
| 外部连接 | 轻量 DriverManager,不走连接池 |
5. 代码目录结构
src/main/java/{basePackage}/
├── config/
│ ├── AsyncConfig.java # tableHashExecutor 线程池
│ └── TableHashProperties.java # yaml 配置绑定
├── controller/
│ └── TableHashController.java
├── model/
│ ├── dto/
│ │ └── TableHashCompareDTO.java
│ └── vo/
│ └── TableHashTaskVO.java
├── service/
│ ├── TableHashCompareService.java
│ ├── impl/
│ │ └── TableHashCompareServiceImpl.java
│ └── tablehash/
│ ├── ColumnDiff.java
│ ├── ColumnDiffCalculator.java
│ ├── ColumnMeta.java
│ ├── ColumnTypeHashStrategy.java
│ ├── CompareResult.java
│ ├── ExternalJdbcExecutor.java
│ ├── HashRow.java
│ ├── HashRowCursor.java
│ ├── HashSqlBuilder.java
│ ├── IdComparator.java
│ ├── StreamingHashComparer.java
│ ├── TableHashCompareExecutor.java
│ ├── TableHashCompareLogger.java
│ └── TableMetadataLoader.java
src/test/java/{basePackage}/service/tablehash/
├── ColumnTypeHashStrategyTest.java
├── HashSqlBuilderTest.java
└── StreamingHashComparerTest.java
新增文件共 22 个(19 个 main + 3 个 test),另修改 application.yaml。
6. 执行流程说明
6.1 HTTP 提交阶段(同步)
Client
│ POST /api/table-hash/compare
│ Body: { tableName, timeColumn, startTime, endTime }
▼
TableHashController
│ @Valid 校验 DTO(标识符格式、时间格式、start < end)
▼
TableHashCompareServiceImpl
│ 生成 UUID taskId
│ tableHashExecutor.execute(() -> compareExecutor.compare(...))
│ 队列满 → BusinessException「比较任务队列已满」
▼
立即返回 Result { taskId }
6.2 异步比较阶段(后台线程)
TableHashCompareExecutor.compare(taskId, dto)
Step 1 加载数据源
├─ dataSourceMapper.selectById(pgsql-data-source-id)
├─ dataSourceMapper.selectById(opengauss-data-source-id)
└─ jdbcExecutor.openConnection × 2(PG + OG)
Step 2 解析表名
├─ tableName = dto.tableName.toLowerCase()
├─ ogTableName = tableName.toUpperCase()
├─ pgSchema = pgDataSource.schema
└─ ogSchema = ogDataSource.schema
Step 3 读取列元数据(仅 PG)
├─ information_schema.columns WHERE schema + table
├─ validateTimeColumn 存在性校验
└─ 首列 → idColumn + idUdtName
Step 4 构建并执行 hash SQL(双库并行流式)
├─ HashSqlBuilder.buildHashQuery(..., uppercase=false/true)
├─ HashRowCursor.open(connection, sql, start, end, fetchSize)
│ setFetchSize + setTimestamp(1/2) + ORDER BY id
└─ 每行 → HashRow(id, md5_hash)
Step 5 流式归并 StreamingHashComparer.merge(pgCursor, ogCursor, idUdtName)
├─ 双指针,IdComparator 按数值/字典序对齐
├─ pgId < ogId → pgOnlyIds
├─ pgId > ogId → ogOnlyIds
├─ id 相等 hash 不同 → mismatchIds
└─ 产出 CompareResult(三类 id 列表 + 行数统计)
Step 6 差异明细查询 + 日志
├─ pgOnly / ogOnly → fetchRowsByIds(SELECT * WHERE id::text = ANY(?))
├─ mismatch → buildNormalizedColumnsQuery + ColumnDiffCalculator
└─ TableHashCompareLogger(每类最多 detail-log-limit 条)
Step 7 摘要日志 + finally 关闭游标/连接
6.3 Hash SQL 模板
SELECT "<id_col>",
MD5(concat_ws('|', COALESCE(<col_expr_1>, 'NULL'), ...)) AS row_hash
FROM "<schema>"."<table>"
WHERE "<time_col>" >= ? AND "<time_col>" < ?
ORDER BY "<id_col>"
- PG:
"test"."test_hash"."test_id"(小写) - OG:
"DEV"."TEST_HASH"."TEST_ID"(大写,schema 配置dev→ 引用时转DEV)
6.4 控制台日志格式
[TABLE-HASH][taskId=xxx][table=test_hash] 完成 | PG行数=10000 OG行数=10000 | PG独有=0 OG独有=0 不一致=0 | 耗时=1234ms
[TABLE-HASH][taskId=xxx] PG独有 (1/2): id=5 row={...}
[TABLE-HASH][taskId=xxx] 不一致 (1/1): id=3 columnDiffs=[ColumnDiff(column=col_x, pgValue=a, ogValue=b)]
[TABLE-HASH][taskId=xxx] 失败: 不支持的列类型: jsonb
7. API 说明
请求
POST /api/table-hash/compare
Content-Type: application/json
{
"tableName": "test_hash",
"timeColumn": "col_timestamp_without",
"startTime": "2020-01-01 00:00:00",
"endTime": "2030-01-01 00:00:00"
}
响应
{
"code": 0,
"msg": "任务已提交",
"data": {
"taskId": "550e8400-e29b-41d4-a716-446655440000"
}
}
比较结果不在响应中返回,通过 taskId 在应用日志中检索 [TABLE-HASH][taskId=...]。
8. 步骤清单
- 确认数据源管理模块可用,PG/OG 数据源 id 与 yaml 配置一致
- 确认 OG schema 配置正确(如
"DEV"需配dev,程序会转大写) - 按第 5 节目录创建全部 Java 文件(替换包名)
- 添加 yaml 配置
- 运行 3 个单元测试验证 SQL 构建与归并逻辑
- 启动应用,对测试表调用 API,检查日志
9. 完整源代码
说明:以下每个文件不含
package声明,迁移时自行添加。
代码块已包含源码中的 Javadoc 与行内注释。
9.1 config/TableHashProperties.java
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
/**
* 两表 hash 比较功能配置
*
* @author liang
* @since 2026-05-24
*/
@Data
@Component
@ConfigurationProperties(prefix = "app.table-hash")
public class TableHashProperties {
/**
* PostgreSQL 数据源 ID(源库)
*/
private Long pgsqlDataSourceId = 3L;
/**
* openGauss 数据源 ID(目标库)
*/
private Long opengaussDataSourceId = 2L;
/**
* 每集合控制台明细输出上限
*/
private int detailLogLimit = 500;
/**
* 差异行批量查询 IN 列表大小
*/
private int fetchBatchSize = 500;
/**
* 流式 hash 查询 ResultSet fetchSize
*/
private int compareFetchSize = 1000;
}
9.2 config/AsyncConfig.java
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
/**
* 异步任务线程池配置
*
* @author liang
* @since 2026-05-24
*/
@Configuration
public class AsyncConfig {
/**
* 表 hash 比较专用线程池,限制并发避免同时跑多个重型比较任务
*
* @return 异步执行器
*/
@Bean(name = "tableHashExecutor")
public ThreadPoolTaskExecutor tableHashExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(1);
executor.setMaxPoolSize(2);
executor.setQueueCapacity(10);
executor.setThreadNamePrefix("table-hash-");
executor.initialize();
return executor;
}
}
9.3 model/dto/TableHashCompareDTO.java
import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.AssertTrue;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Pattern;
import lombok.Data;
/**
* 两表 hash 比较请求参数
*
* @author liang
* @since 2026-05-24
*/
@Data
@Schema(description = "两表 hash 比较请求")
public class TableHashCompareDTO {
/**
* 表名(小写,与 PG 一致)
*/
private static final String IDENTIFIER_PATTERN = "^[a-z][a-z0-9_]*$";
@NotBlank(message = "表名不能为空")
@Pattern(regexp = IDENTIFIER_PATTERN, message = "表名只能包含小写字母、数字和下划线,且以字母开头")
@Schema(description = "表名(小写,与 PG 一致)", example = "test_all_types")
private String tableName;
/**
* 时间字段名(小写),用于限定比较数据范围
*/
@NotBlank(message = "时间字段名不能为空")
@Pattern(regexp = IDENTIFIER_PATTERN, message = "时间字段名只能包含小写字母、数字和下划线,且以字母开头")
@Schema(description = "时间字段名(小写)", example = "col_timestamp_without")
private String timeColumn;
/**
* 开始时间(含),格式 yyyy-MM-dd HH:mm:ss
*/
@NotBlank(message = "开始时间不能为空")
@Pattern(regexp = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", message = "开始时间格式须为 yyyy-MM-dd HH:mm:ss")
@Schema(description = "开始时间(含)", example = "2020-01-01 00:00:00")
private String startTime;
/**
* 结束时间(不含),格式 yyyy-MM-dd HH:mm:ss
*/
@NotBlank(message = "结束时间不能为空")
@Pattern(regexp = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", message = "结束时间格式须为 yyyy-MM-dd HH:mm:ss")
@Schema(description = "结束时间(不含)", example = "2030-01-01 00:00:00")
private String endTime;
/**
* 校验时间范围为左闭右开且 start < end
*/
@AssertTrue(message = "开始时间必须早于结束时间")
public boolean isTimeRangeValid() {
if (startTime == null || endTime == null) {
return true;
}
return startTime.compareTo(endTime) < 0;
}
}
9.4 model/vo/TableHashTaskVO.java
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.AllArgsConstructor;
import lombok.Data;
/**
* hash 比较任务提交结果
*
* @author liang
* @since 2026-05-24
*/
@Data
@AllArgsConstructor
@Schema(description = "hash 比较任务提交结果")
public class TableHashTaskVO {
/**
* 任务 ID,用于控制台日志追踪
*/
@Schema(description = "任务 ID,用于日志追踪")
private String taskId;
}
9.5 controller/TableHashController.java
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.result.Result;
import com.liang.learn.model.vo.TableHashTaskVO;
import com.liang.learn.service.TableHashCompareService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.Parameter;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
/**
* 两表 hash 比较 Controller
*
* @author liang
* @since 2026-05-24
*/
@Slf4j
@RestController
@RequestMapping("/api/table-hash")
@RequiredArgsConstructor
@Tag(name = "两表 Hash 比较", description = "PG 与 openGauss 表级 hash 一致性比较")
public class TableHashController {
/**
* 两表 hash 比较 Service
*/
private final TableHashCompareService tableHashCompareService;
/**
* 提交 hash 比较任务(异步执行,结果仅输出控制台日志)
*
* @param dto 比较参数
* @return 任务 ID
*/
@PostMapping("/compare")
@Operation(summary = "提交 hash 比较任务", description = "异步执行,结果仅输出控制台日志")
public Result<TableHashTaskVO> compare(
@Parameter(description = "比较参数") @Valid @RequestBody TableHashCompareDTO dto) {
log.info("提交 hash 比较任务, table={}, timeColumn={}, start={}, end={}",
dto.getTableName(), dto.getTimeColumn(), dto.getStartTime(), dto.getEndTime());
TableHashTaskVO task = tableHashCompareService.submitCompare(dto);
return Result.success("任务已提交", task);
}
}
9.6 service/TableHashCompareService.java
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.vo.TableHashTaskVO;
/**
* 两表 hash 比较 Service
*
* @author liang
* @since 2026-05-24
*/
public interface TableHashCompareService {
/**
* 提交异步 hash 比较任务
*
* @param dto 比较参数
* @return 任务 ID,比较结果输出到控制台日志
*/
TableHashTaskVO submitCompare(TableHashCompareDTO dto);
}
9.7 service/impl/TableHashCompareServiceImpl.java
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.model.vo.TableHashTaskVO;
import com.liang.learn.service.TableHashCompareService;
import com.liang.learn.service.tablehash.TableHashCompareExecutor;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import org.springframework.stereotype.Service;
import java.util.UUID;
import java.util.concurrent.RejectedExecutionException;
/**
* 两表 hash 比较 Service 实现
*
* @author liang
* @since 2026-05-24
*/
@Service
public class TableHashCompareServiceImpl implements TableHashCompareService {
/**
* 表 hash 比较专用线程池
*/
private final ThreadPoolTaskExecutor tableHashExecutor;
/**
* 比较编排执行器
*/
private final TableHashCompareExecutor compareExecutor;
public TableHashCompareServiceImpl(
@Qualifier("tableHashExecutor") ThreadPoolTaskExecutor tableHashExecutor,
TableHashCompareExecutor compareExecutor) {
this.tableHashExecutor = tableHashExecutor;
this.compareExecutor = compareExecutor;
}
@Override
public TableHashTaskVO submitCompare(TableHashCompareDTO dto) {
String taskId = UUID.randomUUID().toString();
try {
tableHashExecutor.execute(() -> compareExecutor.compare(taskId, dto));
} catch (RejectedExecutionException ex) {
throw new BusinessException(ErrorCode.BUSINESS_ERROR, "比较任务队列已满,请稍后重试");
}
return new TableHashTaskVO(taskId);
}
}
9.8 service/tablehash/ColumnMeta.java
import lombok.AllArgsConstructor;
import lombok.Getter;
/**
* 表列元数据,用于 hash SQL 动态生成
*
* @author liang
* @since 2026-05-24
*/
@Getter
@AllArgsConstructor
public class ColumnMeta {
/**
* 列名(小写)
*/
private final String columnName;
/**
* 列序号,从 1 开始
*/
private final int ordinalPosition;
/**
* PostgreSQL udt_name,如 int8、varchar、timestamp
*/
private final String udtName;
/**
* 是否为首列(作为比较 id)
*
* @return ordinal_position 为 1 时返回 true
*/
public boolean isFirstColumn() {
return ordinalPosition == 1;
}
}
9.9 service/tablehash/ColumnDiff.java
import lombok.AllArgsConstructor;
import lombok.Getter;
/**
* 单列差异:PG 与 openGauss 侧标准化值的对比
*
* @author liang
* @since 2026-05-24
*/
@Getter
@AllArgsConstructor
public class ColumnDiff {
/**
* 列名
*/
private final String column;
/**
* PG 侧标准化值
*/
private final String pgValue;
/**
* openGauss 侧标准化值
*/
private final String ogValue;
}
9.10 service/tablehash/HashRow.java
import lombok.AllArgsConstructor;
import lombok.Getter;
/**
* 单行 hash 查询结果
*
* @author liang
* @since 2026-05-24
*/
@Getter
@AllArgsConstructor
public class HashRow implements Comparable<HashRow> {
/**
* 行 id(首列值,字符串形式)
*/
private final String id;
/**
* MD5 行 hash
*/
private final String rowHash;
@Override
public int compareTo(HashRow other) {
return this.id.compareTo(other.id);
}
}
9.11 service/tablehash/CompareResult.java
import lombok.Getter;
import java.util.ArrayList;
import java.util.List;
/**
* 两库 hash 流式归并比较结果
*
* @author liang
* @since 2026-05-24
*/
@Getter
public class CompareResult {
/**
* PG 独有行的 id 列表
*/
private final List<String> pgOnlyIds = new ArrayList<>();
/**
* openGauss 独有行的 id 列表
*/
private final List<String> ogOnlyIds = new ArrayList<>();
/**
* 两库均有但 hash 不一致的 id 列表
*/
private final List<String> mismatchIds = new ArrayList<>();
/**
* PG 侧扫描行数
*/
long pgRowCount;
/**
* openGauss 侧扫描行数
*/
long ogRowCount;
}
9.12 service/tablehash/IdComparator.java
import java.math.BigInteger;
import java.util.Locale;
import java.util.Set;
/**
* 首列 id 比较器,与 SQL ORDER BY 语义对齐
*
* @author liang
* @since 2026-05-24
*/
public final class IdComparator {
/**
* 按数值序比较 id 的类型(与 PG bigint/int ORDER BY 一致)
*/
private static final Set<String> NUMERIC_ID_TYPES = Set.of(
"int2", "int4", "int8", "int16", "serial", "bigserial", "smallserial");
private IdComparator() {
}
/**
* 比较两个 id 字符串
*
* @param leftId 左侧 id
* @param rightId 右侧 id
* @param idUdtName 首列 udt_name
* @return compareTo 语义下的比较结果
*/
public static int compare(String leftId, String rightId, String idUdtName) {
if (isNumericIdType(idUdtName)) {
return new BigInteger(leftId).compareTo(new BigInteger(rightId));
}
return leftId.compareTo(rightId);
}
private static boolean isNumericIdType(String udtName) {
return NUMERIC_ID_TYPES.contains(udtName.toLowerCase(Locale.ROOT));
}
}
9.13 service/tablehash/ColumnTypeHashStrategy.java
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;
import java.util.Locale;
import java.util.Set;
/**
* 列类型到 hash 表达式的映射策略
*
* @author liang
* @since 2026-05-24
*/
public final class ColumnTypeHashStrategy {
/**
* 不参与 hash 的大对象类型
*/
private static final Set<String> EXCLUDED_TYPES = Set.of("blob", "clob", "bytea", "text");
/**
* numeric/decimal 格式化模板
*/
private static final String NUMERIC_FORMAT = "FM99999999999999999990.####################";
/**
* float 格式化模板
*/
private static final String FLOAT_FORMAT = "FM0.####################";
private ColumnTypeHashStrategy() {
}
/**
* 判断列类型是否应跳过 hash 计算
*
* @param udtName PostgreSQL udt_name
* @return 为 blob/clob/bytea/text 时返回 true
*/
public static boolean isExcludedFromHash(String udtName) {
return EXCLUDED_TYPES.contains(udtName.toLowerCase(Locale.ROOT));
}
/**
* 根据列类型生成 hash 用 SQL 表达式(不含 COALESCE)
*
* @param quotedColumn 带双引号的列名,如 "col_integer"
* @param udtName PostgreSQL udt_name
* @return SQL 表达式
*/
public static String buildExpression(String quotedColumn, String udtName) {
String type = udtName.toLowerCase(Locale.ROOT);
return switch (type) {
// 整数 / 序列:直接转 text
case "int2", "int4", "int8", "int16", "serial", "bigserial", "smallserial" ->
quotedColumn + "::text";
// 定点数:统一 to_char 格式,消除尾零差异
case "numeric", "decimal" ->
"to_char(" + quotedColumn + ", '" + NUMERIC_FORMAT + "')";
// 浮点:-0 与 0 统一为 '0'
case "float4", "float8" ->
"CASE WHEN abs(" + quotedColumn + ")=0 THEN '0' ELSE to_char("
+ quotedColumn + "::double precision, '" + FLOAT_FORMAT + "') END";
// 变长文本:先 normalize 换行符
case "varchar", "character varying" ->
normalizeNewlines(quotedColumn + "::text");
// 定长字符:trim 尾部空格
case "bpchar", "char", "character" ->
"trim(trailing ' ' from " + quotedColumn + "::text)";
case "date" ->
"to_char(" + quotedColumn + ", 'YYYY-MM-DD')";
// 时间戳:统一微秒精度格式
case "timestamp", "timestamptz" ->
"to_char(" + quotedColumn + ", 'YYYY-MM-DD HH24:MI:SS.US')";
// 布尔:1/0 代替 t/f 与 true/false
case "bool" ->
"CASE WHEN " + quotedColumn + " THEN '1' ELSE '0' END";
default ->
throw new BusinessException(ErrorCode.BUSINESS_ERROR,
"不支持的列类型: " + udtName);
};
}
/**
* 用 COALESCE 包裹表达式,NULL 统一为 'NULL'
*
* @param expression SQL 表达式
* @return COALESCE(expr, 'NULL')
*/
public static String wrapCoalesce(String expression) {
return "COALESCE(" + expression + ", 'NULL')";
}
/**
* 统一换行符为 \n,避免 PG 与 openGauss 文本序列化差异
*
* @param textExpr 文本表达式
* @return regexp_replace 嵌套表达式
*/
private static String normalizeNewlines(String textExpr) {
return "regexp_replace(regexp_replace(" + textExpr + ", E'\\r\\n', E'\\n', 'g'), E'\\r', E'\\n', 'g')";
}
}
9.14 service/tablehash/HashSqlBuilder.java
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.Locale;
import java.util.stream.Collectors;
/**
* 动态生成 hash 比较 SQL
*
* @author liang
* @since 2026-05-24
*/
@Component
public class HashSqlBuilder {
/**
* 构建 hash 查询 SQL:SELECT id, MD5(concat_ws(...)) AS row_hash FROM ... WHERE time >= ? AND time < ?
*
* @param schema schema 名
* @param tableName 表名
* @param columns 列元数据(有序)
* @param timeColumn 时间字段名
* @param uppercaseIdentifiers true 时标识符转大写(openGauss)
* @return 带两个时间占位符的 SQL
*/
public String buildHashQuery(String schema, String tableName, List<ColumnMeta> columns,
String timeColumn, boolean uppercaseIdentifiers) {
// 首列作为 id 输出;所有非排除列(含首列)按类型映射后参与 MD5
String idCol = columns.stream().filter(ColumnMeta::isFirstColumn).findFirst()
.orElseThrow(() -> new IllegalStateException("未找到首列"))
.getColumnName();
String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
String quotedId = quoteColumn(idCol, uppercaseIdentifiers);
String quotedTime = quoteColumn(timeColumn, uppercaseIdentifiers);
// 跳过 blob/clob/bytea/text;每列 COALESCE 保证 NULL 可比较
List<String> hashParts = columns.stream()
.filter(col -> !ColumnTypeHashStrategy.isExcludedFromHash(col.getUdtName()))
.map(col -> {
String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
String expr = ColumnTypeHashStrategy.buildExpression(quoted, col.getUdtName());
return ColumnTypeHashStrategy.wrapCoalesce(expr);
})
.collect(Collectors.toList());
return "SELECT " + quotedId + ", MD5(concat_ws('|', "
+ String.join(", ", hashParts)
+ ")) AS row_hash FROM " + quotedSchema + "." + quotedTable
+ " WHERE " + quotedTime + " >= ? AND " + quotedTime + " < ?"
+ " ORDER BY " + quotedId;
}
/**
* 构建标准化列查询 SQL,用于 hash 不一致时的列级 diff
*
* @param schema schema 名
* @param tableName 表名
* @param columns 列元数据
* @param uppercaseIdentifiers true 时标识符转大写
* @return SELECT id, norm_col1, norm_col2, ... WHERE id::text = ANY(?)
*/
public String buildNormalizedColumnsQuery(String schema, String tableName, List<ColumnMeta> columns,
boolean uppercaseIdentifiers) {
String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
String quotedId = quoteColumn(columns.get(0).getColumnName(), uppercaseIdentifiers);
// 除首列外,每列输出与 hash 相同的标准化表达式,别名即列名
List<String> selectParts = columns.stream()
.filter(col -> !col.isFirstColumn())
.filter(col -> !ColumnTypeHashStrategy.isExcludedFromHash(col.getUdtName()))
.map(col -> {
String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
String expr = ColumnTypeHashStrategy.buildExpression(quoted, col.getUdtName());
return ColumnTypeHashStrategy.wrapCoalesce(expr) + " AS "
+ quoteColumn(col.getColumnName(), uppercaseIdentifiers);
})
.collect(Collectors.toList());
return "SELECT " + quotedId + ", " + String.join(", ", selectParts)
+ " FROM " + quotedSchema + "." + quotedTable
+ " WHERE " + buildIdAnyClause(quotedId);
}
/**
* 构建 id 批量匹配条件:id 统一转 text,与 varchar[] 参数对齐(兼容 bigint/varchar 首列)
*
* @param quotedId 带引号的 id 列名
* @return 如 "test_id"::text = ANY(?)
*/
String buildIdAnyClause(String quotedId) {
return quotedId + "::text = ANY(?)";
}
/**
* 引用列名
*
* @param columnName 列名
* @param uppercase true 转大写
* @return 带双引号的列名
*/
public String quoteColumn(String columnName, boolean uppercase) {
return quoteIdentifier(columnName, uppercase);
}
/**
* 引用 schema / 表名 / 列名
*
* @param name 标识符
* @param uppercase true 转大写,false 转小写
* @return 带双引号的标识符
*/
public String quoteIdentifier(String name, boolean uppercase) {
String normalized = uppercase ? name.toUpperCase(Locale.ROOT) : name.toLowerCase(Locale.ROOT);
return "\"" + normalized + "\"";
}
}
9.15 service/tablehash/TableMetadataLoader.java
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;
import org.springframework.stereotype.Component;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;
/**
* 从 PostgreSQL information_schema 加载表列元数据
*
* @author liang
* @since 2026-05-24
*/
@Component
public class TableMetadataLoader {
/**
* 查询列元数据的 SQL(Java 文本块(Text Block) 语法(Java 15+))
*/
private static final String LOAD_COLUMNS_SQL = """
SELECT column_name, ordinal_position, udt_name
FROM information_schema.columns
WHERE table_schema = ? AND table_name = ?
ORDER BY ordinal_position
""";
/**
* 加载指定表的列元数据(从 PG 源库读取)
*
* @param pgConnection PG 连接
* @param schema schema 名
* @param tableName 表名(小写)
* @return 按 ordinal_position 排序的列列表
*/
public List<ColumnMeta> loadColumns(Connection pgConnection, String schema, String tableName) {
try (PreparedStatement ps = pgConnection.prepareStatement(LOAD_COLUMNS_SQL)) {
ps.setString(1, schema);
ps.setString(2, tableName.toLowerCase());
try (ResultSet rs = ps.executeQuery()) {
List<ColumnMeta> columns = new ArrayList<>();
while (rs.next()) {
columns.add(new ColumnMeta(
rs.getString("column_name"),
rs.getInt("ordinal_position"),
rs.getString("udt_name")));
}
// 表不存在或无列时严格失败,避免生成空 hash SQL
if (columns.isEmpty()) {
throw new BusinessException(ErrorCode.NOT_FOUND, "表不存在或无列: " + tableName);
}
return columns;
}
} catch (BusinessException e) {
throw e;
} catch (Exception e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "读取表结构失败: " + e.getMessage());
}
}
}
9.16 service/tablehash/HashRowCursor.java
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.Iterator;
import java.util.NoSuchElementException;
/**
* 流式 hash 行游标,持有 PreparedStatement + ResultSet,避免全量加载
*
* @author liang
* @since 2026-05-24
*/
public class HashRowCursor implements Iterator<HashRow>, AutoCloseable {
/**
* hash 查询 PreparedStatement
*/
private final PreparedStatement preparedStatement;
/**
* hash 查询 ResultSet
*/
private final ResultSet resultSet;
/**
* 预读下一行
*/
private HashRow nextRow;
/**
* 是否已读完
*/
private boolean finished;
private HashRowCursor(PreparedStatement preparedStatement, ResultSet resultSet) {
this.preparedStatement = preparedStatement;
this.resultSet = resultSet;
}
/**
* 打开 hash 游标,绑定时间范围参数
*
* @param connection 数据库连接
* @param sql hash 查询 SQL(含两个时间占位符)
* @param startTime 开始时间,格式 yyyy-MM-dd HH:mm:ss
* @param endTime 结束时间(不含)
* @param fetchSize ResultSet fetchSize
* @return 可迭代游标,使用后须 close
*/
public static HashRowCursor open(Connection connection, String sql, String startTime, String endTime,
int fetchSize) {
try {
PreparedStatement ps = connection.prepareStatement(sql);
ps.setFetchSize(fetchSize);
// 左闭右开 [start, end),与 hash 主流程一致
ps.setTimestamp(1, Timestamp.valueOf(startTime));
ps.setTimestamp(2, Timestamp.valueOf(endTime));
ResultSet rs = ps.executeQuery();
return new HashRowCursor(ps, rs);
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 查询失败: " + e.getMessage());
}
}
@Override
public boolean hasNext() {
if (finished) {
return false;
}
// 已预读则直接返回,避免重复消耗 ResultSet
if (nextRow != null) {
return true;
}
try {
if (resultSet.next()) {
nextRow = new HashRow(resultSet.getString(1), resultSet.getString(2));
return true;
}
finished = true;
return false;
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 读取失败: " + e.getMessage());
}
}
@Override
public HashRow next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
HashRow current = nextRow;
nextRow = null;
return current;
}
@Override
public void close() {
finished = true;
nextRow = null;
closeQuietly(resultSet);
closeQuietly(preparedStatement);
}
private static void closeQuietly(AutoCloseable closeable) {
if (closeable == null) {
return;
}
try {
closeable.close();
} catch (Exception ignored) {
// ignore close errors during cleanup
}
}
}
9.17 service/tablehash/StreamingHashComparer.java
import org.springframework.stereotype.Component;
import java.util.Iterator;
/**
* 双库 hash 流式归并比较器
*
* @author liang
* @since 2026-05-24
*/
@Component
public class StreamingHashComparer {
/**
* 对两路按 id 排序的 hash 流做双指针归并,产出三类差异 id
*
* @param pgIter PG 侧 hash 迭代器
* @param ogIter openGauss 侧 hash 迭代器
* @param idUdtName 首列 udt_name,用于选择数值或字典序比较
* @return 比较结果(含 pgOnly / ogOnly / mismatch id 列表及行数统计)
*/
public CompareResult merge(Iterator<HashRow> pgIter, Iterator<HashRow> ogIter, String idUdtName) {
CompareResult result = new CompareResult();
// 双指针:各持当前行,SQL 已 ORDER BY id,等价于两个有序链表归并
HashRow pgRow = pgIter.hasNext() ? pgIter.next() : null;
HashRow ogRow = ogIter.hasNext() ? ogIter.next() : null;
while (pgRow != null || ogRow != null) {
// PG 已耗尽,剩余 OG 行均为 OG 独有
if (pgRow == null) {
result.ogRowCount++;
result.getOgOnlyIds().add(ogRow.getId());
ogRow = ogIter.hasNext() ? ogIter.next() : null;
continue;
}
// OG 已耗尽,剩余 PG 行均为 PG 独有
if (ogRow == null) {
result.pgRowCount++;
result.getPgOnlyIds().add(pgRow.getId());
pgRow = pgIter.hasNext() ? pgIter.next() : null;
continue;
}
// 两路均有数据,按 id 排序规则对齐(数值型与 SQL ORDER BY bigint 一致)
int cmp = IdComparator.compare(pgRow.getId(), ogRow.getId(), idUdtName);
if (cmp < 0) {
// PG id 更小,说明 OG 中不存在该行
result.pgRowCount++;
result.getPgOnlyIds().add(pgRow.getId());
pgRow = pgIter.hasNext() ? pgIter.next() : null;
} else if (cmp > 0) {
// OG id 更小,说明 PG 中不存在该行
result.ogRowCount++;
result.getOgOnlyIds().add(ogRow.getId());
ogRow = ogIter.hasNext() ? ogIter.next() : null;
} else {
// id 相同,比对 hash;一致则跳过,不一致记入 mismatch
result.pgRowCount++;
result.ogRowCount++;
if (!pgRow.getRowHash().equals(ogRow.getRowHash())) {
result.getMismatchIds().add(pgRow.getId());
}
pgRow = pgIter.hasNext() ? pgIter.next() : null;
ogRow = ogIter.hasNext() ? ogIter.next() : null;
}
}
return result;
}
}
9.18 service/tablehash/ExternalJdbcExecutor.java
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.DataSourceTypeEnum;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.service.DataSourceService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
* 外部数据库 JDBC 直连执行器
*
* @author liang
* @since 2026-05-24
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class ExternalJdbcExecutor {
/**
* 数据源 Service,用于解密密码
*/
private final DataSourceService dataSourceService;
/**
* 打开外部数据库 JDBC 连接
*
* @param dataSource 数据源配置
* @return JDBC 连接,使用后须关闭
*/
public Connection openConnection(DataSourceDO dataSource) {
try {
DataSourceTypeEnum typeEnum = DataSourceTypeEnum.fromCode(dataSource.getDataSourceType());
String driver = typeEnum.resolveDriverClassName(dataSource.getDriverClassName());
String password = dataSourceService.getDecryptedPassword(dataSource.getId());
String url = typeEnum.buildJdbcUrl(
dataSource.getHost(), dataSource.getPort(),
dataSource.getDatabaseName(), dataSource.getSchema());
// 轻量直连,不经过连接池
Class.forName(driver);
Properties props = new Properties();
props.setProperty("user", dataSource.getUsername());
props.setProperty("password", password);
return DriverManager.getConnection(url, props);
} catch (BusinessException e) {
throw e;
} catch (Exception e) {
log.warn("外部库连接失败, dataSourceId={}, reason={}", dataSource.getId(), e.getMessage());
throw new BusinessException(ErrorCode.CONNECTION_FAILED, "外部库连接失败");
}
}
/**
* 按 id 列表批量查询全行
*
* @param connection 数据库连接
* @param schema schema 名
* @param tableName 表名
* @param idColumn id 列名
* @param ids id 列表
* @param uppercase true 时标识符转大写(openGauss)
* @return 行数据列表,每行 Map 的 key 为列名
*/
public List<Map<String, Object>> fetchRowsByIds(Connection connection, String schema, String tableName,
String idColumn, List<String> ids, boolean uppercase) {
if (ids.isEmpty()) {
return List.of();
}
// PG 小写 / OG 大写引用标识符
String quotedSchema = uppercase ? quote(schema, true) : quote(schema, false);
String quotedTable = uppercase ? quote(tableName, true) : quote(tableName, false);
String quotedId = uppercase ? quote(idColumn, true) : quote(idColumn, false);
String sql = "SELECT * FROM " + quotedSchema + "." + quotedTable
+ " WHERE " + quotedId + "::text = ANY(?)";
try (PreparedStatement ps = connection.prepareStatement(sql)) {
// id 在 Java 侧为 String,SQL 侧用 id::text 与 varchar[] 匹配
ps.setArray(1, connection.createArrayOf(
uppercase ? "VARCHAR" : "varchar", ids.toArray()));
try (ResultSet rs = ps.executeQuery()) {
List<Map<String, Object>> rows = new ArrayList<>();
int colCount = rs.getMetaData().getColumnCount();
// 按列标签组装 Map,保留原始列值供日志输出
while (rs.next()) {
Map<String, Object> row = new LinkedHashMap<>();
for (int i = 1; i <= colCount; i++) {
row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
}
rows.add(row);
}
return rows;
}
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "全行查询失败: " + e.getMessage());
}
}
/**
* 按 id 列表查询标准化列值(用于列级 diff)
*
* @param connection 数据库连接
* @param sql 标准化列 SQL(含 id = ANY(?) 占位符)
* @param ids id 列表
* @return 行数据列表
*/
public List<Map<String, Object>> fetchNormalizedRows(Connection connection, String sql, List<String> ids) {
if (ids.isEmpty()) {
return List.of();
}
try (PreparedStatement ps = connection.prepareStatement(sql)) {
ps.setArray(1, connection.createArrayOf("varchar", ids.toArray()));
try (ResultSet rs = ps.executeQuery()) {
List<Map<String, Object>> rows = new ArrayList<>();
int colCount = rs.getMetaData().getColumnCount();
// 值为 hash 标准化后的字符串,供 ColumnDiffCalculator 逐列对比
while (rs.next()) {
Map<String, Object> row = new LinkedHashMap<>();
for (int i = 1; i <= colCount; i++) {
row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
}
rows.add(row);
}
return rows;
}
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "标准化列查询失败: " + e.getMessage());
}
}
/**
* 引用 SQL 标识符
*
* @param name 标识符
* @param upper true 转大写
* @return 带双引号的标识符
*/
private String quote(String name, boolean upper) {
return "\"" + (upper ? name.toUpperCase() : name.toLowerCase()) + "\"";
}
}
9.19 service/tablehash/ColumnDiffCalculator.java
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Objects;
/**
* 列级 diff 计算器,跳过首列与排除类型
*
* @author liang
* @since 2026-05-24
*/
@Component
public class ColumnDiffCalculator {
/**
* 对比两库同一行的标准化列值,返回有差异的列
*
* @param pgRow PG 侧标准化行(列名小写)
* @param ogRow openGauss 侧标准化行(列名小写)
* @param columns 列元数据
* @return 差异列列表,无差异时为空
*/
public List<ColumnDiff> diff(Map<String, Object> pgRow, Map<String, Object> ogRow,
List<ColumnMeta> columns) {
List<ColumnDiff> diffs = new ArrayList<>();
for (ColumnMeta col : columns) {
// 首列是 id;大对象列未参与 hash,也不做 diff
if (col.isFirstColumn() || ColumnTypeHashStrategy.isExcludedFromHash(col.getUdtName())) {
continue;
}
String key = col.getColumnName();
String pgVal = stringify(pgRow.get(key));
String ogVal = stringify(ogRow.get(key));
if (!Objects.equals(pgVal, ogVal)) {
diffs.add(new ColumnDiff(key, pgVal, ogVal));
}
}
return diffs;
}
/**
* 将列值转为 diff 比较用字符串,NULL 统一为 'NULL'
*
* @param value 列值
* @return 字符串形式
*/
private String stringify(Object value) {
return value == null ? "NULL" : String.valueOf(value);
}
}
9.20 service/tablehash/TableHashCompareLogger.java
import com.liang.learn.properties.TableHashProperties;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.Map;
/**
* 表 hash 比较控制台日志,支持 detailLogLimit 截断
*
* @author liang
* @since 2026-05-24
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class TableHashCompareLogger {
/**
* hash 比较配置
*/
private final TableHashProperties properties;
/**
* 输出比较摘要统计
*
* @param taskId 任务 ID
* @param tableName 表名
* @param result 比较结果
* @param elapsedMs 耗时(毫秒)
*/
public void logSummary(String taskId, String tableName, CompareResult result, long elapsedMs) {
log.info("[TABLE-HASH][taskId={}][table={}] 完成 | PG行数={} OG行数={} | PG独有={} OG独有={} 不一致={} | 耗时={}ms",
taskId, tableName,
result.getPgRowCount(), result.getOgRowCount(),
result.getPgOnlyIds().size(), result.getOgOnlyIds().size(), result.getMismatchIds().size(),
elapsedMs);
}
/**
* 输出 PG 独有行明细
*
* @param taskId 任务 ID
* @param ids PG 独有 id 列表
* @param rows 对应全行数据
*/
public void logPgOnly(String taskId, List<String> ids, List<Map<String, Object>> rows) {
logIdRows(taskId, "PG独有", ids, rows);
}
/**
* 输出 openGauss 独有行明细
*
* @param taskId 任务 ID
* @param ids OG 独有 id 列表
* @param rows 对应全行数据
*/
public void logOgOnly(String taskId, List<String> ids, List<Map<String, Object>> rows) {
logIdRows(taskId, "OG独有", ids, rows);
}
/**
* 输出 hash 不一致行的列级 diff 明细
*
* @param taskId 任务 ID
* @param ids 不一致 id 列表
* @param pgRows PG 侧标准化行
* @param ogRows OG 侧标准化行
* @param columns 列元数据
* @param diffCalculator 列 diff 计算器
*/
public void logMismatch(String taskId, List<String> ids,
List<Map<String, Object>> pgRows, List<Map<String, Object>> ogRows,
List<ColumnMeta> columns, ColumnDiffCalculator diffCalculator) {
int limit = properties.getDetailLogLimit();
int total = ids.size();
int printCount = Math.min(total, limit);
for (int i = 0; i < printCount; i++) {
String id = ids.get(i);
Map<String, Object> pgRow = findRow(pgRows, id);
Map<String, Object> ogRow = findRow(ogRows, id);
List<ColumnDiff> diffs = diffCalculator.diff(pgRow, ogRow, columns);
log.info("[TABLE-HASH][taskId={}] 不一致 ({}/{}): id={} columnDiffs={}",
taskId, i + 1, total, id, diffs);
}
// 超出 detailLogLimit 时仅输出省略提示,摘要统计仍含全量 count
if (total > limit) {
log.info("[TABLE-HASH][taskId={}] 不一致 其余 {} 条已省略", taskId, total - limit);
}
}
/**
* 输出任务失败信息
*
* @param taskId 任务 ID
* @param message 错误信息
*/
public void logError(String taskId, String message) {
log.error("[TABLE-HASH][taskId={}] 失败: {}", taskId, message);
}
/**
* 输出 id 行明细,超出 detailLogLimit 时截断
*/
private void logIdRows(String taskId, String label, List<String> ids, List<Map<String, Object>> rows) {
int limit = properties.getDetailLogLimit();
int total = ids.size();
int printCount = Math.min(total, limit);
// 按 id 顺序输出,最多 detailLogLimit 条
for (int i = 0; i < printCount; i++) {
String id = ids.get(i);
Map<String, Object> row = findRow(rows, id);
log.info("[TABLE-HASH][taskId={}] {} ({}/{}): id={} row={}", taskId, label, i + 1, total, id, row);
}
if (total > limit) {
log.info("[TABLE-HASH][taskId={}] {} 其余 {} 条已省略", taskId, label, total - limit);
}
}
/**
* 按 id 从行列表中查找对应行
*/
private Map<String, Object> findRow(List<Map<String, Object>> rows, String id) {
// 兼容 PG 小写 id 列与 OG 大写 ID 列
return rows.stream()
.filter(r -> id.equals(String.valueOf(r.values().iterator().next()))
|| id.equals(String.valueOf(r.get("id")))
|| id.equals(String.valueOf(r.get("ID"))))
.findFirst()
.orElse(Map.of());
}
}
9.21 service/tablehash/TableHashCompareExecutor.java
import com.liang.learn.exception.BusinessException;
import com.liang.learn.mapper.DataSourceMapper;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.ErrorCode;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Component;
import java.sql.Connection;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
/**
* 两表 hash 比较编排执行器(同步逻辑)
*
* @author liang
* @since 2026-05-24
*/
@Component
@RequiredArgsConstructor
public class TableHashCompareExecutor {
/**
* 数据源 Mapper
*/
private final DataSourceMapper dataSourceMapper;
/**
* hash 比较配置
*/
private final com.liang.learn.properties.TableHashProperties properties;
/**
* 外部 JDBC 执行器
*/
private final ExternalJdbcExecutor jdbcExecutor;
/**
* 表元数据加载器
*/
private final TableMetadataLoader metadataLoader;
/**
* hash SQL 构建器
*/
private final HashSqlBuilder hashSqlBuilder;
/**
* 流式归并比较器
*/
private final StreamingHashComparer streamingHashComparer;
/**
* 列级 diff 计算器
*/
private final ColumnDiffCalculator diffCalculator;
/**
* 控制台日志输出
*/
private final TableHashCompareLogger logger;
/**
* 执行完整的 hash 比较流程:连接 → 流式归并 → 差异明细 → 日志输出
*
* @param taskId 任务 ID
* @param dto 比较参数
*/
public void compare(String taskId, TableHashCompareDTO dto) {
long startMs = System.currentTimeMillis();
Connection pgConnection = null;
Connection ogConnection = null;
HashRowCursor pgCursor = null;
HashRowCursor ogCursor = null;
try {
// 1. 加载 PG / openGauss 数据源配置
DataSourceDO pgDataSource = dataSourceMapper.selectById(properties.getPgsqlDataSourceId());
if (pgDataSource == null) {
logger.logError(taskId, "PG 数据源不存在, id=" + properties.getPgsqlDataSourceId());
return;
}
DataSourceDO ogDataSource = dataSourceMapper.selectById(properties.getOpengaussDataSourceId());
if (ogDataSource == null) {
logger.logError(taskId, "openGauss 数据源不存在, id=" + properties.getOpengaussDataSourceId());
return;
}
pgConnection = jdbcExecutor.openConnection(pgDataSource);
ogConnection = jdbcExecutor.openConnection(ogDataSource);
// 2. 表名:入参小写;OG 侧转大写;schema 取自各自数据源配置
String tableName = dto.getTableName().toLowerCase(Locale.ROOT);
String ogTableName = tableName.toUpperCase(Locale.ROOT);
String pgSchema = pgDataSource.getSchema();
String ogSchema = ogDataSource.getSchema();
// 3. 从 PG 读取列元数据,首列作为比较 id
List<ColumnMeta> columns = metadataLoader.loadColumns(pgConnection, pgSchema, tableName);
validateTimeColumn(columns, dto.getTimeColumn());
String idColumn = columns.stream()
.filter(ColumnMeta::isFirstColumn)
.findFirst()
.orElse(columns.getFirst())
.getColumnName();
String idUdtName = columns.stream()
.filter(ColumnMeta::isFirstColumn)
.findFirst()
.orElse(columns.getFirst())
.getUdtName();
// 4. 动态生成 hash SQL,打开双库流式游标
String pgHashSql = hashSqlBuilder.buildHashQuery(
pgSchema, tableName, columns, dto.getTimeColumn(), false);
String ogHashSql = hashSqlBuilder.buildHashQuery(
ogSchema, ogTableName, columns, dto.getTimeColumn(), true);
pgCursor = HashRowCursor.open(
pgConnection, pgHashSql, dto.getStartTime(), dto.getEndTime(), properties.getCompareFetchSize());
ogCursor = HashRowCursor.open(
ogConnection, ogHashSql, dto.getStartTime(), dto.getEndTime(), properties.getCompareFetchSize());
// 5. 流式归并,仅收集差异 id(不全量加载 hash 到内存)
CompareResult result = streamingHashComparer.merge(pgCursor, ogCursor, idUdtName);
// 6. 对 PG独有 / OG独有 id 批量查全行并输出日志
int batchSize = properties.getFetchBatchSize();
List<Map<String, Object>> pgOnlyRows = fetchRowsInBatches(
pgConnection, pgSchema, tableName, idColumn, result.getPgOnlyIds(), false, batchSize);
logger.logPgOnly(taskId, result.getPgOnlyIds(), pgOnlyRows);
List<Map<String, Object>> ogOnlyRows = fetchRowsInBatches(
ogConnection, ogSchema, ogTableName, idColumn, result.getOgOnlyIds(), true, batchSize);
logger.logOgOnly(taskId, result.getOgOnlyIds(), ogOnlyRows);
// 7. hash 不一致行:查标准化列值,做列级 diff
if (!result.getMismatchIds().isEmpty()) {
String pgNormalizedSql = hashSqlBuilder.buildNormalizedColumnsQuery(
pgSchema, tableName, columns, false);
String ogNormalizedSql = hashSqlBuilder.buildNormalizedColumnsQuery(
ogSchema, ogTableName, columns, true);
List<Map<String, Object>> pgNormalizedRows = fetchNormalizedInBatches(
pgConnection, pgNormalizedSql, result.getMismatchIds(), batchSize, false);
List<Map<String, Object>> ogNormalizedRows = fetchNormalizedInBatches(
ogConnection, ogNormalizedSql, result.getMismatchIds(), batchSize, true);
logger.logMismatch(taskId, result.getMismatchIds(), pgNormalizedRows, ogNormalizedRows,
columns, diffCalculator);
}
logger.logSummary(taskId, tableName, result, System.currentTimeMillis() - startMs);
} catch (Exception e) {
logger.logError(taskId, e.getMessage());
} finally {
closeQuietly(pgCursor);
closeQuietly(ogCursor);
closeQuietly(ogConnection);
closeQuietly(pgConnection);
}
}
/**
* 校验时间字段存在于表元数据中
*/
private void validateTimeColumn(List<ColumnMeta> columns, String timeColumn) {
boolean exists = columns.stream()
.anyMatch(col -> col.getColumnName().equalsIgnoreCase(timeColumn));
if (!exists) {
throw new BusinessException(ErrorCode.PARAM_ERROR, "时间字段不存在: " + timeColumn);
}
}
/**
* 分批按 id 查询全行
*/
private List<Map<String, Object>> fetchRowsInBatches(Connection connection, String schema, String tableName,
String idColumn, List<String> ids, boolean uppercase,
int batchSize) {
List<Map<String, Object>> allRows = new ArrayList<>();
for (int i = 0; i < ids.size(); i += batchSize) {
List<String> batch = ids.subList(i, Math.min(i + batchSize, ids.size()));
allRows.addAll(jdbcExecutor.fetchRowsByIds(connection, schema, tableName, idColumn, batch, uppercase));
}
return allRows;
}
/**
* 分批查询标准化列值;openGauss 侧列名转小写以便 diff
*/
private List<Map<String, Object>> fetchNormalizedInBatches(Connection connection, String sql, List<String> ids,
int batchSize, boolean lowercaseKeys) {
List<Map<String, Object>> allRows = new ArrayList<>();
for (int i = 0; i < ids.size(); i += batchSize) {
List<String> batch = ids.subList(i, Math.min(i + batchSize, ids.size()));
for (Map<String, Object> row : jdbcExecutor.fetchNormalizedRows(connection, sql, batch)) {
// OG 侧列名大写,转小写后与 PG 侧对齐 diff
allRows.add(lowercaseKeys ? lowercaseKeys(row) : row);
}
}
return allRows;
}
/**
* 将 Map key 统一为小写
*/
private Map<String, Object> lowercaseKeys(Map<String, Object> row) {
Map<String, Object> normalized = new LinkedHashMap<>();
row.forEach((key, value) -> normalized.put(key.toLowerCase(Locale.ROOT), value));
return normalized;
}
/**
* 静默关闭资源
*/
private static void closeQuietly(AutoCloseable closeable) {
if (closeable == null) {
return;
}
try {
closeable.close();
} catch (Exception ignored) {
// ignore close errors during cleanup
}
}
}
10. 单元测试代码
10.1 HashSqlBuilderTest.java
import org.junit.jupiter.api.Test;
import java.util.List;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
class HashSqlBuilderTest {
private final HashSqlBuilder builder = new HashSqlBuilder();
@Test
void buildHashQuery_shouldExcludeTextAndBytea() {
List<ColumnMeta> columns = List.of(
new ColumnMeta("id", 1, "int8"),
new ColumnMeta("col_text", 2, "text"),
new ColumnMeta("col_varchar", 3, "varchar"),
new ColumnMeta("col_bytea", 4, "bytea"),
new ColumnMeta("col_integer", 5, "int4")
);
String sql = builder.buildHashQuery("test", "test_all_types", columns, "col_timestamp_without", false);
assertTrue(sql.contains("\"id\""));
assertTrue(sql.contains("\"col_varchar\""));
assertFalse(sql.contains("col_text"));
assertFalse(sql.contains("col_bytea"));
assertTrue(sql.contains("MD5(concat_ws('|'"));
assertTrue(sql.contains("\"col_timestamp_without\" >="));
assertTrue(sql.contains("\"col_timestamp_without\" <"));
}
@Test
void buildHashQuery_opengauss_shouldUppercaseIdentifiers() {
List<ColumnMeta> columns = List.of(
new ColumnMeta("id", 1, "int8"),
new ColumnMeta("col_integer", 2, "int4")
);
String sql = builder.buildHashQuery("dev", "test_all_types", columns, "col_timestamp_without", true);
assertTrue(sql.contains("\"TEST_ALL_TYPES\""));
assertTrue(sql.contains("\"ID\""));
assertTrue(sql.contains("\"COL_INTEGER\""));
}
@Test
void quoteColumn_pgsql_shouldLowercase() {
assertEquals("\"col_integer\"", builder.quoteColumn("col_integer", false));
}
@Test
void buildNormalizedColumnsQuery_shouldUseTextCastForIdAny() {
List<ColumnMeta> columns = List.of(
new ColumnMeta("test_id", 1, "int8"),
new ColumnMeta("col_var", 2, "varchar"),
new ColumnMeta("col_bool", 3, "bool")
);
String sql = builder.buildNormalizedColumnsQuery("test", "test_hash", columns, false);
assertTrue(sql.contains("WHERE \"test_id\"::text = ANY(?)"));
}
@Test
void quoteColumn_opengauss_shouldUppercase() {
assertEquals("\"COL_INTEGER\"", builder.quoteColumn("col_integer", true));
}
}
10.2 ColumnTypeHashStrategyTest.java
import com.liang.learn.exception.BusinessException;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
class ColumnTypeHashStrategyTest {
@Test
void isExcludedFromHash_text_shouldReturnTrue() {
assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("text"));
assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("bytea"));
assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("blob"));
assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("clob"));
}
@Test
void isExcludedFromHash_varchar_shouldReturnFalse() {
assertFalse(ColumnTypeHashStrategy.isExcludedFromHash("varchar"));
}
@Test
void buildExpression_int4_shouldUseCastText() {
String expr = ColumnTypeHashStrategy.buildExpression("\"col_integer\"", "int4");
assertEquals("\"col_integer\"::text", expr);
}
@Test
void buildExpression_unknown_shouldThrow() {
assertThrows(BusinessException.class,
() -> ColumnTypeHashStrategy.buildExpression("\"col_json\"", "jsonb"));
}
}
10.3 StreamingHashComparerTest.java
import org.junit.jupiter.api.Test;
import java.util.Iterator;
import java.util.List;
import static org.junit.jupiter.api.Assertions.assertEquals;
class StreamingHashComparerTest {
private final StreamingHashComparer comparer = new StreamingHashComparer();
@Test
void merge_shouldDetectPgOnlyOgOnlyAndMismatch() {
Iterator<HashRow> pg = List.of(
new HashRow("1", "aaa"),
new HashRow("3", "ccc"),
new HashRow("4", "ddd")
).iterator();
Iterator<HashRow> og = List.of(
new HashRow("2", "bbb"),
new HashRow("3", "xxx"),
new HashRow("4", "ddd")
).iterator();
CompareResult result = comparer.merge(pg, og, "int8");
assertEquals(List.of("1"), result.getPgOnlyIds());
assertEquals(List.of("2"), result.getOgOnlyIds());
assertEquals(List.of("3"), result.getMismatchIds());
assertEquals(3, result.getPgRowCount());
assertEquals(3, result.getOgRowCount());
}
@Test
void merge_numericId_shouldUseNumericOrderNotLexicographic() {
Iterator<HashRow> pg = List.of(
new HashRow("1", "aaa"),
new HashRow("2", "bbb"),
new HashRow("10", "ccc")
).iterator();
Iterator<HashRow> og = List.of(
new HashRow("1", "aaa"),
new HashRow("10", "ccc")
).iterator();
CompareResult result = comparer.merge(pg, og, "int8");
assertEquals(List.of("2"), result.getPgOnlyIds());
assertEquals(List.of(), result.getOgOnlyIds());
assertEquals(List.of(), result.getMismatchIds());
}
}
11. 类型映射速查表
| udt_name | hash 表达式要点 |
|---|---|
| int2/int4/int8/int16/serial* | ::text |
| numeric/decimal | to_char(..., 'FM999...') |
| float4/float8 | abs=0 → '0',否则 to_char |
| varchar | 换行 normalize |
| bpchar/char | trim trailing space |
| date | YYYY-MM-DD |
| timestamp/timestamptz | YYYY-MM-DD HH24:MI:SS.US |
| bool | 1 / 0 |
| blob/clob/bytea/text | 不参与 hash |
| 其他 | 严格失败 |
12. 常见问题
| 现象 | 原因 / 处理 |
|---|---|
| OG 报 schema 不存在 | 数据源 schema 配置需与库中一致;程序会将 table/column 转大写,schema 配置 dev → "DEV" |
| PG 行数少于预期 | 时间范围 [start,end) 左闭右开,边界行可能被过滤 |
| bigint id 归并错误 | 须使用 IdComparator(本文档已包含) |
| 队列已满 | 线程池 queue=10,稍后重试或调大配置 |
| jsonb 等类型 | 当前不支持,需在 ColumnTypeHashStrategy 扩展 |
文档结束
浙公网安备 33010602011771号