两表hash比较

PG ↔ openGauss 两表 Hash 比较


1. 功能概述

在指定时间范围内,对比 PostgreSQL(源库,小写表名/列名)与 openGauss(目标库,大写表名/列名)同一张表的数据一致性:

  • 通过 MD5 行 hash 流式扫描,避免全量加载
  • 输出三类差异:PG 独有OG 独有hash 不一致(含列级 diff)
  • 异步执行:HTTP 立即返回 taskId,结果仅写控制台日志
  • 数据源 ID 从 application.yaml 读取,连接信息来自已有数据源管理模块

2. 前置依赖

本功能不独立连接数据库,依赖现有数据源管理模块:

依赖类 用途
DataSourceMapper 按 id 查询 DataSourceDO
DataSourceService getDecryptedPassword(id) 解密密码
DataSourceDO host/port/database/schema/username/type 等
DataSourceTypeEnum fromCodebuildJdbcUrlresolveDriverClassName(须支持 pgsqlopengauss
BusinessException 业务异常
ErrorCode 错误码(至少用到 PARAM_ERRORNOT_FOUNDBUSINESS_ERRORCONNECTION_FAILEDSYSTEM_ERROR
Result<T> 统一 HTTP 响应包装

Maven 驱动(若尚未引入)

<!-- PostgreSQL -->
<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
</dependency>
<!-- openGauss(按公司私服坐标调整) -->
<dependency>
    <groupId>org.opengauss</groupId>
    <artifactId>opengauss-jdbc</artifactId>
</dependency>

包名替换说明

下文代码中 import 均为 com.liang.learn.*,迁移时全局替换为公司项目包名(如 com.company.xxx)。


3. 配置项

application.yaml 增加:

app:
  table-hash:
    pgsql-data-source-id: 3          # PG 数据源在 data_source_manage 表中的 id
    opengauss-data-source-id: 2      # openGauss 数据源 id
    detail-log-limit: 500            # 每类差异明细最多打印条数
    fetch-batch-size: 500            # 差异 id 批量 IN 查询每批大小
    compare-fetch-size: 1000         # 流式 hash 查询 JDBC fetchSize

4. 架构设计

4.1 分层结构

┌─────────────────────────────────────────────────────────────┐
│  TableHashController          POST /api/table-hash/compare  │
└───────────────────────────────┬─────────────────────────────┘
                                │ 校验 DTO,提交任务
┌───────────────────────────────▼─────────────────────────────┐
│  TableHashCompareServiceImpl                                │
│    ThreadPoolTaskExecutor.execute → 异步 compare            │
└───────────────────────────────┬─────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────┐
│  TableHashCompareExecutor(编排器,同步逻辑)               │
│  ┌─────────────────┐  ┌──────────────────┐  ┌─────────────┐ │
│  │TableMetadataLoader│ │HashSqlBuilder   │ │ExternalJdbc │ │
│  │(PG information_  │ │(动态 MD5 SQL)    │ │Executor     │ │
│  │ schema 读列)      │ │                  │ │(DriverManager)│
│  └─────────────────┘  └──────────────────┘  └─────────────┘ │
│  ┌─────────────────┐  ┌──────────────────┐  ┌─────────────┐ │
│  │HashRowCursor    │  │StreamingHashComparer│ │ColumnDiff   │ │
│  │(流式 ResultSet) │  │+ IdComparator    │ │Calculator   │ │
│  └─────────────────┘  └──────────────────┘  └─────────────┘ │
│  ┌─────────────────────────────────────────────────────────┐│
│  │TableHashCompareLogger → 控制台结构化日志               ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
         ↑                              ↑
  TableHashProperties            DataSourceMapper / Service
  (yaml 配置)                    (数据源 id → 连接信息)

4.2 核心设计决策

结论
id 列 首列ordinal_position = 1),可为 bigint 或 varchar
id 比较 整数类型用数值序BigInteger),与 SQL ORDER BY 一致
时间范围 [startTime, endTime) 左闭右开,timestamp + setTimestamp 绑定
Schema 从各自数据源配置读取,API 只传小写 tableName
OG 标识符 schema 原样引用,table/column 转大写双引号
排除 hash blobclobbyteatextvarchar 仍参与)
未知类型 严格模式,任务失败
外部连接 轻量 DriverManager,不走连接池

5. 代码目录结构

src/main/java/{basePackage}/
├── config/
│   ├── AsyncConfig.java                 # tableHashExecutor 线程池
│   └── TableHashProperties.java         # yaml 配置绑定
├── controller/
│   └── TableHashController.java
├── model/
│   ├── dto/
│   │   └── TableHashCompareDTO.java
│   └── vo/
│       └── TableHashTaskVO.java
├── service/
│   ├── TableHashCompareService.java
│   ├── impl/
│   │   └── TableHashCompareServiceImpl.java
│   └── tablehash/
│       ├── ColumnDiff.java
│       ├── ColumnDiffCalculator.java
│       ├── ColumnMeta.java
│       ├── ColumnTypeHashStrategy.java
│       ├── CompareResult.java
│       ├── ExternalJdbcExecutor.java
│       ├── HashRow.java
│       ├── HashRowCursor.java
│       ├── HashSqlBuilder.java
│       ├── IdComparator.java
│       ├── StreamingHashComparer.java
│       ├── TableHashCompareExecutor.java
│       ├── TableHashCompareLogger.java
│       └── TableMetadataLoader.java

src/test/java/{basePackage}/service/tablehash/
├── ColumnTypeHashStrategyTest.java
├── HashSqlBuilderTest.java
└── StreamingHashComparerTest.java

新增文件共 22 个(19 个 main + 3 个 test),另修改 application.yaml


6. 执行流程说明

6.1 HTTP 提交阶段(同步)

Client
  │ POST /api/table-hash/compare
  │ Body: { tableName, timeColumn, startTime, endTime }
  ▼
TableHashController
  │ @Valid 校验 DTO(标识符格式、时间格式、start < end)
  ▼
TableHashCompareServiceImpl
  │ 生成 UUID taskId
  │ tableHashExecutor.execute(() -> compareExecutor.compare(...))
  │ 队列满 → BusinessException「比较任务队列已满」
  ▼
立即返回 Result { taskId }

6.2 异步比较阶段(后台线程)

TableHashCompareExecutor.compare(taskId, dto)

Step 1  加载数据源
        ├─ dataSourceMapper.selectById(pgsql-data-source-id)
        ├─ dataSourceMapper.selectById(opengauss-data-source-id)
        └─ jdbcExecutor.openConnection × 2(PG + OG)

Step 2  解析表名
        ├─ tableName = dto.tableName.toLowerCase()
        ├─ ogTableName = tableName.toUpperCase()
        ├─ pgSchema = pgDataSource.schema
        └─ ogSchema = ogDataSource.schema

Step 3  读取列元数据(仅 PG)
        ├─ information_schema.columns WHERE schema + table
        ├─ validateTimeColumn 存在性校验
        └─ 首列 → idColumn + idUdtName

Step 4  构建并执行 hash SQL(双库并行流式)
        ├─ HashSqlBuilder.buildHashQuery(..., uppercase=false/true)
        ├─ HashRowCursor.open(connection, sql, start, end, fetchSize)
        │     setFetchSize + setTimestamp(1/2) + ORDER BY id
        └─ 每行 → HashRow(id, md5_hash)

Step 5  流式归并 StreamingHashComparer.merge(pgCursor, ogCursor, idUdtName)
        ├─ 双指针,IdComparator 按数值/字典序对齐
        ├─ pgId < ogId  → pgOnlyIds
        ├─ pgId > ogId  → ogOnlyIds
        ├─ id 相等 hash 不同 → mismatchIds
        └─ 产出 CompareResult(三类 id 列表 + 行数统计)

Step 6  差异明细查询 + 日志
        ├─ pgOnly / ogOnly → fetchRowsByIds(SELECT * WHERE id::text = ANY(?))
        ├─ mismatch → buildNormalizedColumnsQuery + ColumnDiffCalculator
        └─ TableHashCompareLogger(每类最多 detail-log-limit 条)

Step 7  摘要日志 + finally 关闭游标/连接

6.3 Hash SQL 模板

SELECT "<id_col>",
       MD5(concat_ws('|', COALESCE(<col_expr_1>, 'NULL'), ...)) AS row_hash
FROM "<schema>"."<table>"
WHERE "<time_col>" >= ? AND "<time_col>" < ?
ORDER BY "<id_col>"
  • PG:"test"."test_hash"."test_id"(小写)
  • OG:"DEV"."TEST_HASH"."TEST_ID"(大写,schema 配置 dev → 引用时转 DEV

6.4 控制台日志格式

[TABLE-HASH][taskId=xxx][table=test_hash] 完成 | PG行数=10000 OG行数=10000 | PG独有=0 OG独有=0 不一致=0 | 耗时=1234ms

[TABLE-HASH][taskId=xxx] PG独有 (1/2): id=5 row={...}
[TABLE-HASH][taskId=xxx] 不一致 (1/1): id=3 columnDiffs=[ColumnDiff(column=col_x, pgValue=a, ogValue=b)]
[TABLE-HASH][taskId=xxx] 失败: 不支持的列类型: jsonb

7. API 说明

请求

POST /api/table-hash/compare
Content-Type: application/json

{
  "tableName": "test_hash",
  "timeColumn": "col_timestamp_without",
  "startTime": "2020-01-01 00:00:00",
  "endTime": "2030-01-01 00:00:00"
}

响应

{
  "code": 0,
  "msg": "任务已提交",
  "data": {
    "taskId": "550e8400-e29b-41d4-a716-446655440000"
  }
}

比较结果不在响应中返回,通过 taskId 在应用日志中检索 [TABLE-HASH][taskId=...]


8. 步骤清单

  1. 确认数据源管理模块可用,PG/OG 数据源 id 与 yaml 配置一致
  2. 确认 OG schema 配置正确(如 "DEV" 需配 dev,程序会转大写)
  3. 按第 5 节目录创建全部 Java 文件(替换包名)
  4. 添加 yaml 配置
  5. 运行 3 个单元测试验证 SQL 构建与归并逻辑
  6. 启动应用,对测试表调用 API,检查日志

9. 完整源代码

说明:以下每个文件不含 package 声明,迁移时自行添加。
代码块已包含源码中的 Javadoc 与行内注释。


9.1 config/TableHashProperties.java

import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;

/**
 * 两表 hash 比较功能配置
 *
 * @author liang
 * @since 2026-05-24
 */
@Data
@Component
@ConfigurationProperties(prefix = "app.table-hash")
public class TableHashProperties {

    /**
     * PostgreSQL 数据源 ID(源库)
     */
    private Long pgsqlDataSourceId = 3L;

    /**
     * openGauss 数据源 ID(目标库)
     */
    private Long opengaussDataSourceId = 2L;

    /**
     * 每集合控制台明细输出上限
     */
    private int detailLogLimit = 500;

    /**
     * 差异行批量查询 IN 列表大小
     */
    private int fetchBatchSize = 500;

    /**
     * 流式 hash 查询 ResultSet fetchSize
     */
    private int compareFetchSize = 1000;
}

9.2 config/AsyncConfig.java

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;

/**
 * 异步任务线程池配置
 *
 * @author liang
 * @since 2026-05-24
 */
@Configuration
public class AsyncConfig {

    /**
     * 表 hash 比较专用线程池,限制并发避免同时跑多个重型比较任务
     *
     * @return 异步执行器
     */
    @Bean(name = "tableHashExecutor")
    public ThreadPoolTaskExecutor tableHashExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(1);
        executor.setMaxPoolSize(2);
        executor.setQueueCapacity(10);
        executor.setThreadNamePrefix("table-hash-");
        executor.initialize();
        return executor;
    }
}

9.3 model/dto/TableHashCompareDTO.java

import io.swagger.v3.oas.annotations.media.Schema;
import jakarta.validation.constraints.AssertTrue;
import jakarta.validation.constraints.NotBlank;
import jakarta.validation.constraints.Pattern;
import lombok.Data;

/**
 * 两表 hash 比较请求参数
 *
 * @author liang
 * @since 2026-05-24
 */
@Data
@Schema(description = "两表 hash 比较请求")
public class TableHashCompareDTO {

    /**
     * 表名(小写,与 PG 一致)
     */
    private static final String IDENTIFIER_PATTERN = "^[a-z][a-z0-9_]*$";

    @NotBlank(message = "表名不能为空")
    @Pattern(regexp = IDENTIFIER_PATTERN, message = "表名只能包含小写字母、数字和下划线,且以字母开头")
    @Schema(description = "表名(小写,与 PG 一致)", example = "test_all_types")
    private String tableName;

    /**
     * 时间字段名(小写),用于限定比较数据范围
     */
    @NotBlank(message = "时间字段名不能为空")
    @Pattern(regexp = IDENTIFIER_PATTERN, message = "时间字段名只能包含小写字母、数字和下划线,且以字母开头")
    @Schema(description = "时间字段名(小写)", example = "col_timestamp_without")
    private String timeColumn;

    /**
     * 开始时间(含),格式 yyyy-MM-dd HH:mm:ss
     */
    @NotBlank(message = "开始时间不能为空")
    @Pattern(regexp = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", message = "开始时间格式须为 yyyy-MM-dd HH:mm:ss")
    @Schema(description = "开始时间(含)", example = "2020-01-01 00:00:00")
    private String startTime;

    /**
     * 结束时间(不含),格式 yyyy-MM-dd HH:mm:ss
     */
    @NotBlank(message = "结束时间不能为空")
    @Pattern(regexp = "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", message = "结束时间格式须为 yyyy-MM-dd HH:mm:ss")
    @Schema(description = "结束时间(不含)", example = "2030-01-01 00:00:00")
    private String endTime;

    /**
     * 校验时间范围为左闭右开且 start &lt; end
     */
    @AssertTrue(message = "开始时间必须早于结束时间")
    public boolean isTimeRangeValid() {
        if (startTime == null || endTime == null) {
            return true;
        }
        return startTime.compareTo(endTime) < 0;
    }
}

9.4 model/vo/TableHashTaskVO.java

import io.swagger.v3.oas.annotations.media.Schema;
import lombok.AllArgsConstructor;
import lombok.Data;

/**
 * hash 比较任务提交结果
 *
 * @author liang
 * @since 2026-05-24
 */
@Data
@AllArgsConstructor
@Schema(description = "hash 比较任务提交结果")
public class TableHashTaskVO {

    /**
     * 任务 ID,用于控制台日志追踪
     */
    @Schema(description = "任务 ID,用于日志追踪")
    private String taskId;
}

9.5 controller/TableHashController.java

import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.result.Result;
import com.liang.learn.model.vo.TableHashTaskVO;
import com.liang.learn.service.TableHashCompareService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.Parameter;
import io.swagger.v3.oas.annotations.tags.Tag;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

/**
 * 两表 hash 比较 Controller
 *
 * @author liang
 * @since 2026-05-24
 */
@Slf4j
@RestController
@RequestMapping("/api/table-hash")
@RequiredArgsConstructor
@Tag(name = "两表 Hash 比较", description = "PG 与 openGauss 表级 hash 一致性比较")
public class TableHashController {

    /**
     * 两表 hash 比较 Service
     */
    private final TableHashCompareService tableHashCompareService;

    /**
     * 提交 hash 比较任务(异步执行,结果仅输出控制台日志)
     *
     * @param dto 比较参数
     * @return 任务 ID
     */
    @PostMapping("/compare")
    @Operation(summary = "提交 hash 比较任务", description = "异步执行,结果仅输出控制台日志")
    public Result<TableHashTaskVO> compare(
            @Parameter(description = "比较参数") @Valid @RequestBody TableHashCompareDTO dto) {
        log.info("提交 hash 比较任务, table={}, timeColumn={}, start={}, end={}",
                dto.getTableName(), dto.getTimeColumn(), dto.getStartTime(), dto.getEndTime());
        TableHashTaskVO task = tableHashCompareService.submitCompare(dto);
        return Result.success("任务已提交", task);
    }
}

9.6 service/TableHashCompareService.java

import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.vo.TableHashTaskVO;

/**
 * 两表 hash 比较 Service
 *
 * @author liang
 * @since 2026-05-24
 */
public interface TableHashCompareService {

    /**
     * 提交异步 hash 比较任务
     *
     * @param dto 比较参数
     * @return 任务 ID,比较结果输出到控制台日志
     */
    TableHashTaskVO submitCompare(TableHashCompareDTO dto);
}

9.7 service/impl/TableHashCompareServiceImpl.java

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.model.vo.TableHashTaskVO;
import com.liang.learn.service.TableHashCompareService;
import com.liang.learn.service.tablehash.TableHashCompareExecutor;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import org.springframework.stereotype.Service;

import java.util.UUID;
import java.util.concurrent.RejectedExecutionException;

/**
 * 两表 hash 比较 Service 实现
 *
 * @author liang
 * @since 2026-05-24
 */
@Service
public class TableHashCompareServiceImpl implements TableHashCompareService {

    /**
     * 表 hash 比较专用线程池
     */
    private final ThreadPoolTaskExecutor tableHashExecutor;

    /**
     * 比较编排执行器
     */
    private final TableHashCompareExecutor compareExecutor;

    public TableHashCompareServiceImpl(
            @Qualifier("tableHashExecutor") ThreadPoolTaskExecutor tableHashExecutor,
            TableHashCompareExecutor compareExecutor) {
        this.tableHashExecutor = tableHashExecutor;
        this.compareExecutor = compareExecutor;
    }

    @Override
    public TableHashTaskVO submitCompare(TableHashCompareDTO dto) {
        String taskId = UUID.randomUUID().toString();
        try {
            tableHashExecutor.execute(() -> compareExecutor.compare(taskId, dto));
        } catch (RejectedExecutionException ex) {
            throw new BusinessException(ErrorCode.BUSINESS_ERROR, "比较任务队列已满,请稍后重试");
        }
        return new TableHashTaskVO(taskId);
    }
}

9.8 service/tablehash/ColumnMeta.java

import lombok.AllArgsConstructor;
import lombok.Getter;

/**
 * 表列元数据,用于 hash SQL 动态生成
 *
 * @author liang
 * @since 2026-05-24
 */
@Getter
@AllArgsConstructor
public class ColumnMeta {

    /**
     * 列名(小写)
     */
    private final String columnName;

    /**
     * 列序号,从 1 开始
     */
    private final int ordinalPosition;

    /**
     * PostgreSQL udt_name,如 int8、varchar、timestamp
     */
    private final String udtName;

    /**
     * 是否为首列(作为比较 id)
     *
     * @return ordinal_position 为 1 时返回 true
     */
    public boolean isFirstColumn() {
        return ordinalPosition == 1;
    }
}

9.9 service/tablehash/ColumnDiff.java

import lombok.AllArgsConstructor;
import lombok.Getter;

/**
 * 单列差异:PG 与 openGauss 侧标准化值的对比
 *
 * @author liang
 * @since 2026-05-24
 */
@Getter
@AllArgsConstructor
public class ColumnDiff {

    /**
     * 列名
     */
    private final String column;

    /**
     * PG 侧标准化值
     */
    private final String pgValue;

    /**
     * openGauss 侧标准化值
     */
    private final String ogValue;
}

9.10 service/tablehash/HashRow.java

import lombok.AllArgsConstructor;
import lombok.Getter;

/**
 * 单行 hash 查询结果
 *
 * @author liang
 * @since 2026-05-24
 */
@Getter
@AllArgsConstructor
public class HashRow implements Comparable<HashRow> {

    /**
     * 行 id(首列值,字符串形式)
     */
    private final String id;

    /**
     * MD5 行 hash
     */
    private final String rowHash;

    @Override
    public int compareTo(HashRow other) {
        return this.id.compareTo(other.id);
    }
}

9.11 service/tablehash/CompareResult.java

import lombok.Getter;

import java.util.ArrayList;
import java.util.List;

/**
 * 两库 hash 流式归并比较结果
 *
 * @author liang
 * @since 2026-05-24
 */
@Getter
public class CompareResult {

    /**
     * PG 独有行的 id 列表
     */
    private final List<String> pgOnlyIds = new ArrayList<>();

    /**
     * openGauss 独有行的 id 列表
     */
    private final List<String> ogOnlyIds = new ArrayList<>();

    /**
     * 两库均有但 hash 不一致的 id 列表
     */
    private final List<String> mismatchIds = new ArrayList<>();

    /**
     * PG 侧扫描行数
     */
    long pgRowCount;

    /**
     * openGauss 侧扫描行数
     */
    long ogRowCount;
}

9.12 service/tablehash/IdComparator.java

import java.math.BigInteger;
import java.util.Locale;
import java.util.Set;

/**
 * 首列 id 比较器,与 SQL ORDER BY 语义对齐
 *
 * @author liang
 * @since 2026-05-24
 */
public final class IdComparator {

    /**
     * 按数值序比较 id 的类型(与 PG bigint/int ORDER BY 一致)
     */
    private static final Set<String> NUMERIC_ID_TYPES = Set.of(
            "int2", "int4", "int8", "int16", "serial", "bigserial", "smallserial");

    private IdComparator() {
    }

    /**
     * 比较两个 id 字符串
     *
     * @param leftId    左侧 id
     * @param rightId   右侧 id
     * @param idUdtName 首列 udt_name
     * @return compareTo 语义下的比较结果
     */
    public static int compare(String leftId, String rightId, String idUdtName) {
        if (isNumericIdType(idUdtName)) {
            return new BigInteger(leftId).compareTo(new BigInteger(rightId));
        }
        return leftId.compareTo(rightId);
    }

    private static boolean isNumericIdType(String udtName) {
        return NUMERIC_ID_TYPES.contains(udtName.toLowerCase(Locale.ROOT));
    }
}

9.13 service/tablehash/ColumnTypeHashStrategy.java

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;

import java.util.Locale;
import java.util.Set;

/**
 * 列类型到 hash 表达式的映射策略
 *
 * @author liang
 * @since 2026-05-24
 */
public final class ColumnTypeHashStrategy {

    /**
     * 不参与 hash 的大对象类型
     */
    private static final Set<String> EXCLUDED_TYPES = Set.of("blob", "clob", "bytea", "text");

    /**
     * numeric/decimal 格式化模板
     */
    private static final String NUMERIC_FORMAT = "FM99999999999999999990.####################";

    /**
     * float 格式化模板
     */
    private static final String FLOAT_FORMAT = "FM0.####################";

    private ColumnTypeHashStrategy() {
    }

    /**
     * 判断列类型是否应跳过 hash 计算
     *
     * @param udtName PostgreSQL udt_name
     * @return 为 blob/clob/bytea/text 时返回 true
     */
    public static boolean isExcludedFromHash(String udtName) {
        return EXCLUDED_TYPES.contains(udtName.toLowerCase(Locale.ROOT));
    }

    /**
     * 根据列类型生成 hash 用 SQL 表达式(不含 COALESCE)
     *
     * @param quotedColumn 带双引号的列名,如 "col_integer"
     * @param udtName      PostgreSQL udt_name
     * @return SQL 表达式
     */
    public static String buildExpression(String quotedColumn, String udtName) {
        String type = udtName.toLowerCase(Locale.ROOT);
        return switch (type) {
            // 整数 / 序列:直接转 text
            case "int2", "int4", "int8", "int16", "serial", "bigserial", "smallserial" ->
                    quotedColumn + "::text";
            // 定点数:统一 to_char 格式,消除尾零差异
            case "numeric", "decimal" ->
                    "to_char(" + quotedColumn + ", '" + NUMERIC_FORMAT + "')";
            // 浮点:-0 与 0 统一为 '0'
            case "float4", "float8" ->
                    "CASE WHEN abs(" + quotedColumn + ")=0 THEN '0' ELSE to_char("
                            + quotedColumn + "::double precision, '" + FLOAT_FORMAT + "') END";
            // 变长文本:先 normalize 换行符
            case "varchar", "character varying" ->
                    normalizeNewlines(quotedColumn + "::text");
            // 定长字符:trim 尾部空格
            case "bpchar", "char", "character" ->
                    "trim(trailing ' ' from " + quotedColumn + "::text)";
            case "date" ->
                    "to_char(" + quotedColumn + ", 'YYYY-MM-DD')";
            // 时间戳:统一微秒精度格式
            case "timestamp", "timestamptz" ->
                    "to_char(" + quotedColumn + ", 'YYYY-MM-DD HH24:MI:SS.US')";
            // 布尔:1/0 代替 t/f 与 true/false
            case "bool" ->
                    "CASE WHEN " + quotedColumn + " THEN '1' ELSE '0' END";
            default ->
                    throw new BusinessException(ErrorCode.BUSINESS_ERROR,
                            "不支持的列类型: " + udtName);
        };
    }

    /**
     * 用 COALESCE 包裹表达式,NULL 统一为 'NULL'
     *
     * @param expression SQL 表达式
     * @return COALESCE(expr, 'NULL')
     */
    public static String wrapCoalesce(String expression) {
        return "COALESCE(" + expression + ", 'NULL')";
    }

    /**
     * 统一换行符为 \n,避免 PG 与 openGauss 文本序列化差异
     *
     * @param textExpr 文本表达式
     * @return regexp_replace 嵌套表达式
     */
    private static String normalizeNewlines(String textExpr) {
        return "regexp_replace(regexp_replace(" + textExpr + ", E'\\r\\n', E'\\n', 'g'), E'\\r', E'\\n', 'g')";
    }
}

9.14 service/tablehash/HashSqlBuilder.java

import org.springframework.stereotype.Component;

import java.util.List;
import java.util.Locale;
import java.util.stream.Collectors;

/**
 * 动态生成 hash 比较 SQL
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
public class HashSqlBuilder {

    /**
     * 构建 hash 查询 SQL:SELECT id, MD5(concat_ws(...)) AS row_hash FROM ... WHERE time >= ? AND time < ?
     *
     * @param schema                schema 名
     * @param tableName             表名
     * @param columns               列元数据(有序)
     * @param timeColumn            时间字段名
     * @param uppercaseIdentifiers  true 时标识符转大写(openGauss)
     * @return 带两个时间占位符的 SQL
     */
    public String buildHashQuery(String schema, String tableName, List<ColumnMeta> columns,
                                 String timeColumn, boolean uppercaseIdentifiers) {
        // 首列作为 id 输出;所有非排除列(含首列)按类型映射后参与 MD5
        String idCol = columns.stream().filter(ColumnMeta::isFirstColumn).findFirst()
                .orElseThrow(() -> new IllegalStateException("未找到首列"))
                .getColumnName();
        String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
        String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
        String quotedId = quoteColumn(idCol, uppercaseIdentifiers);
        String quotedTime = quoteColumn(timeColumn, uppercaseIdentifiers);

        // 跳过 blob/clob/bytea/text;每列 COALESCE 保证 NULL 可比较
        List<String> hashParts = columns.stream()
                .filter(col -> !ColumnTypeHashStrategy.isExcludedFromHash(col.getUdtName()))
                .map(col -> {
                    String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
                    String expr = ColumnTypeHashStrategy.buildExpression(quoted, col.getUdtName());
                    return ColumnTypeHashStrategy.wrapCoalesce(expr);
                })
                .collect(Collectors.toList());

        return "SELECT " + quotedId + ", MD5(concat_ws('|', "
                + String.join(", ", hashParts)
                + ")) AS row_hash FROM " + quotedSchema + "." + quotedTable
                + " WHERE " + quotedTime + " >= ? AND " + quotedTime + " < ?"
                + " ORDER BY " + quotedId;
    }

    /**
     * 构建标准化列查询 SQL,用于 hash 不一致时的列级 diff
     *
     * @param schema                schema 名
     * @param tableName             表名
     * @param columns               列元数据
     * @param uppercaseIdentifiers  true 时标识符转大写
     * @return SELECT id, norm_col1, norm_col2, ... WHERE id::text = ANY(?)
     */
    public String buildNormalizedColumnsQuery(String schema, String tableName, List<ColumnMeta> columns,
                                               boolean uppercaseIdentifiers) {
        String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
        String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
        String quotedId = quoteColumn(columns.get(0).getColumnName(), uppercaseIdentifiers);

        // 除首列外,每列输出与 hash 相同的标准化表达式,别名即列名
        List<String> selectParts = columns.stream()
                .filter(col -> !col.isFirstColumn())
                .filter(col -> !ColumnTypeHashStrategy.isExcludedFromHash(col.getUdtName()))
                .map(col -> {
                    String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
                    String expr = ColumnTypeHashStrategy.buildExpression(quoted, col.getUdtName());
                    return ColumnTypeHashStrategy.wrapCoalesce(expr) + " AS "
                            + quoteColumn(col.getColumnName(), uppercaseIdentifiers);
                })
                .collect(Collectors.toList());

        return "SELECT " + quotedId + ", " + String.join(", ", selectParts)
                + " FROM " + quotedSchema + "." + quotedTable
                + " WHERE " + buildIdAnyClause(quotedId);
    }

    /**
     * 构建 id 批量匹配条件:id 统一转 text,与 varchar[] 参数对齐(兼容 bigint/varchar 首列)
     *
     * @param quotedId 带引号的 id 列名
     * @return 如 "test_id"::text = ANY(?)
     */
    String buildIdAnyClause(String quotedId) {
        return quotedId + "::text = ANY(?)";
    }

    /**
     * 引用列名
     *
     * @param columnName 列名
     * @param uppercase  true 转大写
     * @return 带双引号的列名
     */
    public String quoteColumn(String columnName, boolean uppercase) {
        return quoteIdentifier(columnName, uppercase);
    }

    /**
     * 引用 schema / 表名 / 列名
     *
     * @param name      标识符
     * @param uppercase true 转大写,false 转小写
     * @return 带双引号的标识符
     */
    public String quoteIdentifier(String name, boolean uppercase) {
        String normalized = uppercase ? name.toUpperCase(Locale.ROOT) : name.toLowerCase(Locale.ROOT);
        return "\"" + normalized + "\"";
    }
}

9.15 service/tablehash/TableMetadataLoader.java

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;
import org.springframework.stereotype.Component;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;

/**
 * 从 PostgreSQL information_schema 加载表列元数据
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
public class TableMetadataLoader {

    /**
     * 查询列元数据的 SQL(Java 文本块(Text Block) 语法(Java 15+))
     */
    private static final String LOAD_COLUMNS_SQL = """
            SELECT column_name, ordinal_position, udt_name
            FROM information_schema.columns
            WHERE table_schema = ? AND table_name = ?
            ORDER BY ordinal_position
            """;

    /**
     * 加载指定表的列元数据(从 PG 源库读取)
     *
     * @param pgConnection PG 连接
     * @param schema       schema 名
     * @param tableName    表名(小写)
     * @return 按 ordinal_position 排序的列列表
     */
    public List<ColumnMeta> loadColumns(Connection pgConnection, String schema, String tableName) {
        try (PreparedStatement ps = pgConnection.prepareStatement(LOAD_COLUMNS_SQL)) {
            ps.setString(1, schema);
            ps.setString(2, tableName.toLowerCase());
            try (ResultSet rs = ps.executeQuery()) {
                List<ColumnMeta> columns = new ArrayList<>();
                while (rs.next()) {
                    columns.add(new ColumnMeta(
                            rs.getString("column_name"),
                            rs.getInt("ordinal_position"),
                            rs.getString("udt_name")));
                }
                // 表不存在或无列时严格失败,避免生成空 hash SQL
                if (columns.isEmpty()) {
                    throw new BusinessException(ErrorCode.NOT_FOUND, "表不存在或无列: " + tableName);
                }
                return columns;
            }
        } catch (BusinessException e) {
            throw e;
        } catch (Exception e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "读取表结构失败: " + e.getMessage());
        }
    }
}

9.16 service/tablehash/HashRowCursor.java

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.Iterator;
import java.util.NoSuchElementException;

/**
 * 流式 hash 行游标,持有 PreparedStatement + ResultSet,避免全量加载
 *
 * @author liang
 * @since 2026-05-24
 */
public class HashRowCursor implements Iterator<HashRow>, AutoCloseable {

    /**
     * hash 查询 PreparedStatement
     */
    private final PreparedStatement preparedStatement;

    /**
     * hash 查询 ResultSet
     */
    private final ResultSet resultSet;

    /**
     * 预读下一行
     */
    private HashRow nextRow;

    /**
     * 是否已读完
     */
    private boolean finished;

    private HashRowCursor(PreparedStatement preparedStatement, ResultSet resultSet) {
        this.preparedStatement = preparedStatement;
        this.resultSet = resultSet;
    }

    /**
     * 打开 hash 游标,绑定时间范围参数
     *
     * @param connection 数据库连接
     * @param sql        hash 查询 SQL(含两个时间占位符)
     * @param startTime  开始时间,格式 yyyy-MM-dd HH:mm:ss
     * @param endTime    结束时间(不含)
     * @param fetchSize  ResultSet fetchSize
     * @return 可迭代游标,使用后须 close
     */
    public static HashRowCursor open(Connection connection, String sql, String startTime, String endTime,
                                     int fetchSize) {
        try {
            PreparedStatement ps = connection.prepareStatement(sql);
            ps.setFetchSize(fetchSize);
            // 左闭右开 [start, end),与 hash 主流程一致
            ps.setTimestamp(1, Timestamp.valueOf(startTime));
            ps.setTimestamp(2, Timestamp.valueOf(endTime));
            ResultSet rs = ps.executeQuery();
            return new HashRowCursor(ps, rs);
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 查询失败: " + e.getMessage());
        }
    }

    @Override
    public boolean hasNext() {
        if (finished) {
            return false;
        }
        // 已预读则直接返回,避免重复消耗 ResultSet
        if (nextRow != null) {
            return true;
        }
        try {
            if (resultSet.next()) {
                nextRow = new HashRow(resultSet.getString(1), resultSet.getString(2));
                return true;
            }
            finished = true;
            return false;
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 读取失败: " + e.getMessage());
        }
    }

    @Override
    public HashRow next() {
        if (!hasNext()) {
            throw new NoSuchElementException();
        }
        HashRow current = nextRow;
        nextRow = null;
        return current;
    }

    @Override
    public void close() {
        finished = true;
        nextRow = null;
        closeQuietly(resultSet);
        closeQuietly(preparedStatement);
    }

    private static void closeQuietly(AutoCloseable closeable) {
        if (closeable == null) {
            return;
        }
        try {
            closeable.close();
        } catch (Exception ignored) {
            // ignore close errors during cleanup
        }
    }
}

9.17 service/tablehash/StreamingHashComparer.java

import org.springframework.stereotype.Component;

import java.util.Iterator;

/**
 * 双库 hash 流式归并比较器
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
public class StreamingHashComparer {

    /**
     * 对两路按 id 排序的 hash 流做双指针归并,产出三类差异 id
     *
     * @param pgIter    PG 侧 hash 迭代器
     * @param ogIter    openGauss 侧 hash 迭代器
     * @param idUdtName 首列 udt_name,用于选择数值或字典序比较
     * @return 比较结果(含 pgOnly / ogOnly / mismatch id 列表及行数统计)
     */
    public CompareResult merge(Iterator<HashRow> pgIter, Iterator<HashRow> ogIter, String idUdtName) {
        CompareResult result = new CompareResult();
        // 双指针:各持当前行,SQL 已 ORDER BY id,等价于两个有序链表归并
        HashRow pgRow = pgIter.hasNext() ? pgIter.next() : null;
        HashRow ogRow = ogIter.hasNext() ? ogIter.next() : null;

        while (pgRow != null || ogRow != null) {
            // PG 已耗尽,剩余 OG 行均为 OG 独有
            if (pgRow == null) {
                result.ogRowCount++;
                result.getOgOnlyIds().add(ogRow.getId());
                ogRow = ogIter.hasNext() ? ogIter.next() : null;
                continue;
            }
            // OG 已耗尽,剩余 PG 行均为 PG 独有
            if (ogRow == null) {
                result.pgRowCount++;
                result.getPgOnlyIds().add(pgRow.getId());
                pgRow = pgIter.hasNext() ? pgIter.next() : null;
                continue;
            }
            // 两路均有数据,按 id 排序规则对齐(数值型与 SQL ORDER BY bigint 一致)
            int cmp = IdComparator.compare(pgRow.getId(), ogRow.getId(), idUdtName);
            if (cmp < 0) {
                // PG id 更小,说明 OG 中不存在该行
                result.pgRowCount++;
                result.getPgOnlyIds().add(pgRow.getId());
                pgRow = pgIter.hasNext() ? pgIter.next() : null;
            } else if (cmp > 0) {
                // OG id 更小,说明 PG 中不存在该行
                result.ogRowCount++;
                result.getOgOnlyIds().add(ogRow.getId());
                ogRow = ogIter.hasNext() ? ogIter.next() : null;
            } else {
                // id 相同,比对 hash;一致则跳过,不一致记入 mismatch
                result.pgRowCount++;
                result.ogRowCount++;
                if (!pgRow.getRowHash().equals(ogRow.getRowHash())) {
                    result.getMismatchIds().add(pgRow.getId());
                }
                pgRow = pgIter.hasNext() ? pgIter.next() : null;
                ogRow = ogIter.hasNext() ? ogIter.next() : null;
            }
        }
        return result;
    }
}

9.18 service/tablehash/ExternalJdbcExecutor.java

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.DataSourceTypeEnum;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.service.DataSourceService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

/**
 * 外部数据库 JDBC 直连执行器
 *
 * @author liang
 * @since 2026-05-24
 */
@Slf4j
@Component
@RequiredArgsConstructor
public class ExternalJdbcExecutor {

    /**
     * 数据源 Service,用于解密密码
     */
    private final DataSourceService dataSourceService;

    /**
     * 打开外部数据库 JDBC 连接
     *
     * @param dataSource 数据源配置
     * @return JDBC 连接,使用后须关闭
     */
    public Connection openConnection(DataSourceDO dataSource) {
        try {
            DataSourceTypeEnum typeEnum = DataSourceTypeEnum.fromCode(dataSource.getDataSourceType());
            String driver = typeEnum.resolveDriverClassName(dataSource.getDriverClassName());
            String password = dataSourceService.getDecryptedPassword(dataSource.getId());
            String url = typeEnum.buildJdbcUrl(
                    dataSource.getHost(), dataSource.getPort(),
                    dataSource.getDatabaseName(), dataSource.getSchema());
            // 轻量直连,不经过连接池
            Class.forName(driver);
            Properties props = new Properties();
            props.setProperty("user", dataSource.getUsername());
            props.setProperty("password", password);
            return DriverManager.getConnection(url, props);
        } catch (BusinessException e) {
            throw e;
        } catch (Exception e) {
            log.warn("外部库连接失败, dataSourceId={}, reason={}", dataSource.getId(), e.getMessage());
            throw new BusinessException(ErrorCode.CONNECTION_FAILED, "外部库连接失败");
        }
    }

    /**
     * 按 id 列表批量查询全行
     *
     * @param connection 数据库连接
     * @param schema     schema 名
     * @param tableName  表名
     * @param idColumn   id 列名
     * @param ids        id 列表
     * @param uppercase  true 时标识符转大写(openGauss)
     * @return 行数据列表,每行 Map 的 key 为列名
     */
    public List<Map<String, Object>> fetchRowsByIds(Connection connection, String schema, String tableName,
                                                    String idColumn, List<String> ids, boolean uppercase) {
        if (ids.isEmpty()) {
            return List.of();
        }
        // PG 小写 / OG 大写引用标识符
        String quotedSchema = uppercase ? quote(schema, true) : quote(schema, false);
        String quotedTable = uppercase ? quote(tableName, true) : quote(tableName, false);
        String quotedId = uppercase ? quote(idColumn, true) : quote(idColumn, false);
        String sql = "SELECT * FROM " + quotedSchema + "." + quotedTable
                + " WHERE " + quotedId + "::text = ANY(?)";
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            // id 在 Java 侧为 String,SQL 侧用 id::text 与 varchar[] 匹配
            ps.setArray(1, connection.createArrayOf(
                    uppercase ? "VARCHAR" : "varchar", ids.toArray()));
            try (ResultSet rs = ps.executeQuery()) {
                List<Map<String, Object>> rows = new ArrayList<>();
                int colCount = rs.getMetaData().getColumnCount();
                // 按列标签组装 Map,保留原始列值供日志输出
                while (rs.next()) {
                    Map<String, Object> row = new LinkedHashMap<>();
                    for (int i = 1; i <= colCount; i++) {
                        row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
                    }
                    rows.add(row);
                }
                return rows;
            }
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "全行查询失败: " + e.getMessage());
        }
    }

    /**
     * 按 id 列表查询标准化列值(用于列级 diff)
     *
     * @param connection 数据库连接
     * @param sql        标准化列 SQL(含 id = ANY(?) 占位符)
     * @param ids        id 列表
     * @return 行数据列表
     */
    public List<Map<String, Object>> fetchNormalizedRows(Connection connection, String sql, List<String> ids) {
        if (ids.isEmpty()) {
            return List.of();
        }
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            ps.setArray(1, connection.createArrayOf("varchar", ids.toArray()));
            try (ResultSet rs = ps.executeQuery()) {
                List<Map<String, Object>> rows = new ArrayList<>();
                int colCount = rs.getMetaData().getColumnCount();
                // 值为 hash 标准化后的字符串,供 ColumnDiffCalculator 逐列对比
                while (rs.next()) {
                    Map<String, Object> row = new LinkedHashMap<>();
                    for (int i = 1; i <= colCount; i++) {
                        row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
                    }
                    rows.add(row);
                }
                return rows;
            }
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "标准化列查询失败: " + e.getMessage());
        }
    }

    /**
     * 引用 SQL 标识符
     *
     * @param name  标识符
     * @param upper true 转大写
     * @return 带双引号的标识符
     */
    private String quote(String name, boolean upper) {
        return "\"" + (upper ? name.toUpperCase() : name.toLowerCase()) + "\"";
    }
}

9.19 service/tablehash/ColumnDiffCalculator.java

import org.springframework.stereotype.Component;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Objects;

/**
 * 列级 diff 计算器,跳过首列与排除类型
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
public class ColumnDiffCalculator {

    /**
     * 对比两库同一行的标准化列值,返回有差异的列
     *
     * @param pgRow   PG 侧标准化行(列名小写)
     * @param ogRow   openGauss 侧标准化行(列名小写)
     * @param columns 列元数据
     * @return 差异列列表,无差异时为空
     */
    public List<ColumnDiff> diff(Map<String, Object> pgRow, Map<String, Object> ogRow,
                                 List<ColumnMeta> columns) {
        List<ColumnDiff> diffs = new ArrayList<>();
        for (ColumnMeta col : columns) {
            // 首列是 id;大对象列未参与 hash,也不做 diff
            if (col.isFirstColumn() || ColumnTypeHashStrategy.isExcludedFromHash(col.getUdtName())) {
                continue;
            }
            String key = col.getColumnName();
            String pgVal = stringify(pgRow.get(key));
            String ogVal = stringify(ogRow.get(key));
            if (!Objects.equals(pgVal, ogVal)) {
                diffs.add(new ColumnDiff(key, pgVal, ogVal));
            }
        }
        return diffs;
    }

    /**
     * 将列值转为 diff 比较用字符串,NULL 统一为 'NULL'
     *
     * @param value 列值
     * @return 字符串形式
     */
    private String stringify(Object value) {
        return value == null ? "NULL" : String.valueOf(value);
    }
}

9.20 service/tablehash/TableHashCompareLogger.java

import com.liang.learn.properties.TableHashProperties;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import java.util.List;
import java.util.Map;

/**
 * 表 hash 比较控制台日志,支持 detailLogLimit 截断
 *
 * @author liang
 * @since 2026-05-24
 */
@Slf4j
@Component
@RequiredArgsConstructor
public class TableHashCompareLogger {

    /**
     * hash 比较配置
     */
    private final TableHashProperties properties;

    /**
     * 输出比较摘要统计
     *
     * @param taskId    任务 ID
     * @param tableName 表名
     * @param result    比较结果
     * @param elapsedMs 耗时(毫秒)
     */
    public void logSummary(String taskId, String tableName, CompareResult result, long elapsedMs) {
        log.info("[TABLE-HASH][taskId={}][table={}] 完成 | PG行数={} OG行数={} | PG独有={} OG独有={} 不一致={} | 耗时={}ms",
                taskId, tableName,
                result.getPgRowCount(), result.getOgRowCount(),
                result.getPgOnlyIds().size(), result.getOgOnlyIds().size(), result.getMismatchIds().size(),
                elapsedMs);
    }

    /**
     * 输出 PG 独有行明细
     *
     * @param taskId 任务 ID
     * @param ids    PG 独有 id 列表
     * @param rows   对应全行数据
     */
    public void logPgOnly(String taskId, List<String> ids, List<Map<String, Object>> rows) {
        logIdRows(taskId, "PG独有", ids, rows);
    }

    /**
     * 输出 openGauss 独有行明细
     *
     * @param taskId 任务 ID
     * @param ids    OG 独有 id 列表
     * @param rows   对应全行数据
     */
    public void logOgOnly(String taskId, List<String> ids, List<Map<String, Object>> rows) {
        logIdRows(taskId, "OG独有", ids, rows);
    }

    /**
     * 输出 hash 不一致行的列级 diff 明细
     *
     * @param taskId          任务 ID
     * @param ids             不一致 id 列表
     * @param pgRows          PG 侧标准化行
     * @param ogRows          OG 侧标准化行
     * @param columns         列元数据
     * @param diffCalculator  列 diff 计算器
     */
    public void logMismatch(String taskId, List<String> ids,
                            List<Map<String, Object>> pgRows, List<Map<String, Object>> ogRows,
                            List<ColumnMeta> columns, ColumnDiffCalculator diffCalculator) {
        int limit = properties.getDetailLogLimit();
        int total = ids.size();
        int printCount = Math.min(total, limit);
        for (int i = 0; i < printCount; i++) {
            String id = ids.get(i);
            Map<String, Object> pgRow = findRow(pgRows, id);
            Map<String, Object> ogRow = findRow(ogRows, id);
            List<ColumnDiff> diffs = diffCalculator.diff(pgRow, ogRow, columns);
            log.info("[TABLE-HASH][taskId={}] 不一致 ({}/{}): id={} columnDiffs={}",
                    taskId, i + 1, total, id, diffs);
        }
        // 超出 detailLogLimit 时仅输出省略提示,摘要统计仍含全量 count
        if (total > limit) {
            log.info("[TABLE-HASH][taskId={}] 不一致 其余 {} 条已省略", taskId, total - limit);
        }
    }

    /**
     * 输出任务失败信息
     *
     * @param taskId  任务 ID
     * @param message 错误信息
     */
    public void logError(String taskId, String message) {
        log.error("[TABLE-HASH][taskId={}] 失败: {}", taskId, message);
    }

    /**
     * 输出 id 行明细,超出 detailLogLimit 时截断
     */
    private void logIdRows(String taskId, String label, List<String> ids, List<Map<String, Object>> rows) {
        int limit = properties.getDetailLogLimit();
        int total = ids.size();
        int printCount = Math.min(total, limit);
        // 按 id 顺序输出,最多 detailLogLimit 条
        for (int i = 0; i < printCount; i++) {
            String id = ids.get(i);
            Map<String, Object> row = findRow(rows, id);
            log.info("[TABLE-HASH][taskId={}] {} ({}/{}): id={} row={}", taskId, label, i + 1, total, id, row);
        }
        if (total > limit) {
            log.info("[TABLE-HASH][taskId={}] {} 其余 {} 条已省略", taskId, label, total - limit);
        }
    }

    /**
     * 按 id 从行列表中查找对应行
     */
    private Map<String, Object> findRow(List<Map<String, Object>> rows, String id) {
        // 兼容 PG 小写 id 列与 OG 大写 ID 列
        return rows.stream()
                .filter(r -> id.equals(String.valueOf(r.values().iterator().next()))
                        || id.equals(String.valueOf(r.get("id")))
                        || id.equals(String.valueOf(r.get("ID"))))
                .findFirst()
                .orElse(Map.of());
    }
}

9.21 service/tablehash/TableHashCompareExecutor.java

import com.liang.learn.exception.BusinessException;
import com.liang.learn.mapper.DataSourceMapper;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.ErrorCode;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Component;

import java.sql.Connection;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;

/**
 * 两表 hash 比较编排执行器(同步逻辑)
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
@RequiredArgsConstructor
public class TableHashCompareExecutor {

    /**
     * 数据源 Mapper
     */
    private final DataSourceMapper dataSourceMapper;

    /**
     * hash 比较配置
     */
    private final com.liang.learn.properties.TableHashProperties properties;

    /**
     * 外部 JDBC 执行器
     */
    private final ExternalJdbcExecutor jdbcExecutor;

    /**
     * 表元数据加载器
     */
    private final TableMetadataLoader metadataLoader;

    /**
     * hash SQL 构建器
     */
    private final HashSqlBuilder hashSqlBuilder;

    /**
     * 流式归并比较器
     */
    private final StreamingHashComparer streamingHashComparer;

    /**
     * 列级 diff 计算器
     */
    private final ColumnDiffCalculator diffCalculator;

    /**
     * 控制台日志输出
     */
    private final TableHashCompareLogger logger;

    /**
     * 执行完整的 hash 比较流程:连接 → 流式归并 → 差异明细 → 日志输出
     *
     * @param taskId 任务 ID
     * @param dto    比较参数
     */
    public void compare(String taskId, TableHashCompareDTO dto) {
        long startMs = System.currentTimeMillis();
        Connection pgConnection = null;
        Connection ogConnection = null;
        HashRowCursor pgCursor = null;
        HashRowCursor ogCursor = null;
        try {
            // 1. 加载 PG / openGauss 数据源配置
            DataSourceDO pgDataSource = dataSourceMapper.selectById(properties.getPgsqlDataSourceId());
            if (pgDataSource == null) {
                logger.logError(taskId, "PG 数据源不存在, id=" + properties.getPgsqlDataSourceId());
                return;
            }
            DataSourceDO ogDataSource = dataSourceMapper.selectById(properties.getOpengaussDataSourceId());
            if (ogDataSource == null) {
                logger.logError(taskId, "openGauss 数据源不存在, id=" + properties.getOpengaussDataSourceId());
                return;
            }

            pgConnection = jdbcExecutor.openConnection(pgDataSource);
            ogConnection = jdbcExecutor.openConnection(ogDataSource);

            // 2. 表名:入参小写;OG 侧转大写;schema 取自各自数据源配置
            String tableName = dto.getTableName().toLowerCase(Locale.ROOT);
            String ogTableName = tableName.toUpperCase(Locale.ROOT);
            String pgSchema = pgDataSource.getSchema();
            String ogSchema = ogDataSource.getSchema();

            // 3. 从 PG 读取列元数据,首列作为比较 id
            List<ColumnMeta> columns = metadataLoader.loadColumns(pgConnection, pgSchema, tableName);
            validateTimeColumn(columns, dto.getTimeColumn());
            String idColumn = columns.stream()
                    .filter(ColumnMeta::isFirstColumn)
                    .findFirst()
                    .orElse(columns.getFirst())
                    .getColumnName();
            String idUdtName = columns.stream()
                    .filter(ColumnMeta::isFirstColumn)
                    .findFirst()
                    .orElse(columns.getFirst())
                    .getUdtName();

            // 4. 动态生成 hash SQL,打开双库流式游标
            String pgHashSql = hashSqlBuilder.buildHashQuery(
                    pgSchema, tableName, columns, dto.getTimeColumn(), false);
            String ogHashSql = hashSqlBuilder.buildHashQuery(
                    ogSchema, ogTableName, columns, dto.getTimeColumn(), true);

            pgCursor = HashRowCursor.open(
                    pgConnection, pgHashSql, dto.getStartTime(), dto.getEndTime(), properties.getCompareFetchSize());
            ogCursor = HashRowCursor.open(
                    ogConnection, ogHashSql, dto.getStartTime(), dto.getEndTime(), properties.getCompareFetchSize());

            // 5. 流式归并,仅收集差异 id(不全量加载 hash 到内存)
            CompareResult result = streamingHashComparer.merge(pgCursor, ogCursor, idUdtName);

            // 6. 对 PG独有 / OG独有 id 批量查全行并输出日志
            int batchSize = properties.getFetchBatchSize();
            List<Map<String, Object>> pgOnlyRows = fetchRowsInBatches(
                    pgConnection, pgSchema, tableName, idColumn, result.getPgOnlyIds(), false, batchSize);
            logger.logPgOnly(taskId, result.getPgOnlyIds(), pgOnlyRows);

            List<Map<String, Object>> ogOnlyRows = fetchRowsInBatches(
                    ogConnection, ogSchema, ogTableName, idColumn, result.getOgOnlyIds(), true, batchSize);
            logger.logOgOnly(taskId, result.getOgOnlyIds(), ogOnlyRows);

            // 7. hash 不一致行:查标准化列值,做列级 diff
            if (!result.getMismatchIds().isEmpty()) {
                String pgNormalizedSql = hashSqlBuilder.buildNormalizedColumnsQuery(
                        pgSchema, tableName, columns, false);
                String ogNormalizedSql = hashSqlBuilder.buildNormalizedColumnsQuery(
                        ogSchema, ogTableName, columns, true);
                List<Map<String, Object>> pgNormalizedRows = fetchNormalizedInBatches(
                        pgConnection, pgNormalizedSql, result.getMismatchIds(), batchSize, false);
                List<Map<String, Object>> ogNormalizedRows = fetchNormalizedInBatches(
                        ogConnection, ogNormalizedSql, result.getMismatchIds(), batchSize, true);
                logger.logMismatch(taskId, result.getMismatchIds(), pgNormalizedRows, ogNormalizedRows,
                        columns, diffCalculator);
            }

            logger.logSummary(taskId, tableName, result, System.currentTimeMillis() - startMs);
        } catch (Exception e) {
            logger.logError(taskId, e.getMessage());
        } finally {
            closeQuietly(pgCursor);
            closeQuietly(ogCursor);
            closeQuietly(ogConnection);
            closeQuietly(pgConnection);
        }
    }

    /**
     * 校验时间字段存在于表元数据中
     */
    private void validateTimeColumn(List<ColumnMeta> columns, String timeColumn) {
        boolean exists = columns.stream()
                .anyMatch(col -> col.getColumnName().equalsIgnoreCase(timeColumn));
        if (!exists) {
            throw new BusinessException(ErrorCode.PARAM_ERROR, "时间字段不存在: " + timeColumn);
        }
    }

    /**
     * 分批按 id 查询全行
     */
    private List<Map<String, Object>> fetchRowsInBatches(Connection connection, String schema, String tableName,
                                                         String idColumn, List<String> ids, boolean uppercase,
                                                         int batchSize) {
        List<Map<String, Object>> allRows = new ArrayList<>();
        for (int i = 0; i < ids.size(); i += batchSize) {
            List<String> batch = ids.subList(i, Math.min(i + batchSize, ids.size()));
            allRows.addAll(jdbcExecutor.fetchRowsByIds(connection, schema, tableName, idColumn, batch, uppercase));
        }
        return allRows;
    }

    /**
     * 分批查询标准化列值;openGauss 侧列名转小写以便 diff
     */
    private List<Map<String, Object>> fetchNormalizedInBatches(Connection connection, String sql, List<String> ids,
                                                               int batchSize, boolean lowercaseKeys) {
        List<Map<String, Object>> allRows = new ArrayList<>();
        for (int i = 0; i < ids.size(); i += batchSize) {
            List<String> batch = ids.subList(i, Math.min(i + batchSize, ids.size()));
            for (Map<String, Object> row : jdbcExecutor.fetchNormalizedRows(connection, sql, batch)) {
                // OG 侧列名大写,转小写后与 PG 侧对齐 diff
                allRows.add(lowercaseKeys ? lowercaseKeys(row) : row);
            }
        }
        return allRows;
    }

    /**
     * 将 Map key 统一为小写
     */
    private Map<String, Object> lowercaseKeys(Map<String, Object> row) {
        Map<String, Object> normalized = new LinkedHashMap<>();
        row.forEach((key, value) -> normalized.put(key.toLowerCase(Locale.ROOT), value));
        return normalized;
    }

    /**
     * 静默关闭资源
     */
    private static void closeQuietly(AutoCloseable closeable) {
        if (closeable == null) {
            return;
        }
        try {
            closeable.close();
        } catch (Exception ignored) {
            // ignore close errors during cleanup
        }
    }
}

10. 单元测试代码

10.1 HashSqlBuilderTest.java

import org.junit.jupiter.api.Test;

import java.util.List;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;

class HashSqlBuilderTest {

    private final HashSqlBuilder builder = new HashSqlBuilder();

    @Test
    void buildHashQuery_shouldExcludeTextAndBytea() {
        List<ColumnMeta> columns = List.of(
                new ColumnMeta("id", 1, "int8"),
                new ColumnMeta("col_text", 2, "text"),
                new ColumnMeta("col_varchar", 3, "varchar"),
                new ColumnMeta("col_bytea", 4, "bytea"),
                new ColumnMeta("col_integer", 5, "int4")
        );
        String sql = builder.buildHashQuery("test", "test_all_types", columns, "col_timestamp_without", false);
        assertTrue(sql.contains("\"id\""));
        assertTrue(sql.contains("\"col_varchar\""));
        assertFalse(sql.contains("col_text"));
        assertFalse(sql.contains("col_bytea"));
        assertTrue(sql.contains("MD5(concat_ws('|'"));
        assertTrue(sql.contains("\"col_timestamp_without\" >="));
        assertTrue(sql.contains("\"col_timestamp_without\" <"));
    }

    @Test
    void buildHashQuery_opengauss_shouldUppercaseIdentifiers() {
        List<ColumnMeta> columns = List.of(
                new ColumnMeta("id", 1, "int8"),
                new ColumnMeta("col_integer", 2, "int4")
        );
        String sql = builder.buildHashQuery("dev", "test_all_types", columns, "col_timestamp_without", true);
        assertTrue(sql.contains("\"TEST_ALL_TYPES\""));
        assertTrue(sql.contains("\"ID\""));
        assertTrue(sql.contains("\"COL_INTEGER\""));
    }

    @Test
    void quoteColumn_pgsql_shouldLowercase() {
        assertEquals("\"col_integer\"", builder.quoteColumn("col_integer", false));
    }

    @Test
    void buildNormalizedColumnsQuery_shouldUseTextCastForIdAny() {
        List<ColumnMeta> columns = List.of(
                new ColumnMeta("test_id", 1, "int8"),
                new ColumnMeta("col_var", 2, "varchar"),
                new ColumnMeta("col_bool", 3, "bool")
        );
        String sql = builder.buildNormalizedColumnsQuery("test", "test_hash", columns, false);
        assertTrue(sql.contains("WHERE \"test_id\"::text = ANY(?)"));
    }

    @Test
    void quoteColumn_opengauss_shouldUppercase() {
        assertEquals("\"COL_INTEGER\"", builder.quoteColumn("col_integer", true));
    }
}

10.2 ColumnTypeHashStrategyTest.java

import com.liang.learn.exception.BusinessException;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;

class ColumnTypeHashStrategyTest {

    @Test
    void isExcludedFromHash_text_shouldReturnTrue() {
        assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("text"));
        assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("bytea"));
        assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("blob"));
        assertTrue(ColumnTypeHashStrategy.isExcludedFromHash("clob"));
    }

    @Test
    void isExcludedFromHash_varchar_shouldReturnFalse() {
        assertFalse(ColumnTypeHashStrategy.isExcludedFromHash("varchar"));
    }

    @Test
    void buildExpression_int4_shouldUseCastText() {
        String expr = ColumnTypeHashStrategy.buildExpression("\"col_integer\"", "int4");
        assertEquals("\"col_integer\"::text", expr);
    }

    @Test
    void buildExpression_unknown_shouldThrow() {
        assertThrows(BusinessException.class,
                () -> ColumnTypeHashStrategy.buildExpression("\"col_json\"", "jsonb"));
    }
}

10.3 StreamingHashComparerTest.java

import org.junit.jupiter.api.Test;

import java.util.Iterator;
import java.util.List;

import static org.junit.jupiter.api.Assertions.assertEquals;

class StreamingHashComparerTest {

    private final StreamingHashComparer comparer = new StreamingHashComparer();

    @Test
    void merge_shouldDetectPgOnlyOgOnlyAndMismatch() {
        Iterator<HashRow> pg = List.of(
                new HashRow("1", "aaa"),
                new HashRow("3", "ccc"),
                new HashRow("4", "ddd")
        ).iterator();
        Iterator<HashRow> og = List.of(
                new HashRow("2", "bbb"),
                new HashRow("3", "xxx"),
                new HashRow("4", "ddd")
        ).iterator();

        CompareResult result = comparer.merge(pg, og, "int8");
        assertEquals(List.of("1"), result.getPgOnlyIds());
        assertEquals(List.of("2"), result.getOgOnlyIds());
        assertEquals(List.of("3"), result.getMismatchIds());
        assertEquals(3, result.getPgRowCount());
        assertEquals(3, result.getOgRowCount());
    }

    @Test
    void merge_numericId_shouldUseNumericOrderNotLexicographic() {
        Iterator<HashRow> pg = List.of(
                new HashRow("1", "aaa"),
                new HashRow("2", "bbb"),
                new HashRow("10", "ccc")
        ).iterator();
        Iterator<HashRow> og = List.of(
                new HashRow("1", "aaa"),
                new HashRow("10", "ccc")
        ).iterator();

        CompareResult result = comparer.merge(pg, og, "int8");
        assertEquals(List.of("2"), result.getPgOnlyIds());
        assertEquals(List.of(), result.getOgOnlyIds());
        assertEquals(List.of(), result.getMismatchIds());
    }
}

11. 类型映射速查表

udt_name hash 表达式要点
int2/int4/int8/int16/serial* ::text
numeric/decimal to_char(..., 'FM999...')
float4/float8 abs=0 → '0',否则 to_char
varchar 换行 normalize
bpchar/char trim trailing space
date YYYY-MM-DD
timestamp/timestamptz YYYY-MM-DD HH24:MI:SS.US
bool 1 / 0
blob/clob/bytea/text 不参与 hash
其他 严格失败

12. 常见问题

现象 原因 / 处理
OG 报 schema 不存在 数据源 schema 配置需与库中一致;程序会将 table/column 转大写,schema 配置 dev"DEV"
PG 行数少于预期 时间范围 [start,end) 左闭右开,边界行可能被过滤
bigint id 归并错误 须使用 IdComparator(本文档已包含)
队列已满 线程池 queue=10,稍后重试或调大配置
jsonb 等类型 当前不支持,需在 ColumnTypeHashStrategy 扩展

文档结束

posted @ 2026-05-24 20:11  景之1231  阅读(7)  评论(0)    收藏  举报