两表 Hash 比较 — PG 流式 + OG 批量

1. 改造背景(简述)

问题 结论
openGauss JDBC 无法可靠流式 OG 必须 批量 fetchHashRows
PG 仍支持服务端游标 PG 保留 HashRowCursor 流式
双端流式 + 明细 log 废弃;改为返回 CompareResult
超 100 万行 OOM 风险 id keyset 分页max(pgCount,ogCount) 触发)

2. 设计思路

2.1 PG 流式 + OG 批量(非对称)

  • PGautoCommit=false + fetchSize → merge 时 next(),JVM 峰值 ≈ compare-fetch-size 行。
  • OG:整片读入 List<HashRow>,JVM 峰值 ≈ 本片行数。
  • 归并StreamingHashComparer.merge(pgCursor, ogList.iterator()),算法不变。

2.2 片内并行

虚拟线程: OG fetchHashRows  ─┐
主线程:   PG HashRowCursor.open ┘ 库端可重叠
OG 就绪后 → merge(PG 大量 next 在此阶段)

Java 21 Executors.newVirtualThreadPerTaskExecutor(),每片内提交一个 OG 拉数任务。

2.3 分页:COUNT + id keyset

  • 触发:max(pgCount, ogCount) > shardRowThreshold(默认 100 万)。
  • SQL:AND id::text > ? ORDER BY id LIMIT ?不用 OFFSET
  • 翻页:lastId = max(PG lastReadId, OG list 末 id),与 IdComparatorUtil 一致。

2.4 输出与失败

  • 执行器 compare(dto)CompareResult(三类 id + 行数)。
  • 删除 TableHashCompareLogger 及全行/列 diff 查询。
  • 异步 worker:log.error("[TABLE-HASH][taskId=...]") 捕获失败。

2.5 连接事务

连接 行为
PG 每片 compareOnePagesetAutoCommit(false) + setReadOnly(true);片末 rollback()
OG 默认 autocommit,短查询

3. 文件变更一览

操作 路径
新增 service/tablehash/HashPagePlanner.java
删除 service/tablehash/TableHashCompareLogger.java
重写/修改 见 §4 源码
未改 StreamingHashComparerHashRowColumnMetaTableMetadataLoaderTableHashController、DTO/VO

4. 变更源码(完整)

包名 com.liang.learn.* 源码内注释说明非显而易见的设计点。

4.1 properties/TableHashProperties.java

package com.liang.learn.properties;

import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;

/**
 * 两表 hash 比较功能配置
 *
 * @author liang
 * @since 2026-05-24
 */
@Data
@Component
@ConfigurationProperties(prefix = "app.table-hash")
public class TableHashProperties {

    /**
     * PostgreSQL 数据源 ID(源库)
     */
    private Long pgsqlDataSourceId = 3L;

    /**
     * openGauss 数据源 ID(目标库)
     */
    private Long opengaussDataSourceId = 2L;

    /**
     * 单片 OG 批量加载行数上限。
     * 当 max(pgCount, ogCount) 超过该值时启用 id keyset 多片,避免 OG 侧 List 过大 OOM。
     */
    private int shardRowThreshold = 1_000_000;

    /**
     * PG 流式 hash 游标 fetchSize(仅 PG 侧 HashRowCursor 使用;OG 不支持流式,不使用此配置)
     */
    private int compareFetchSize = 1000;
}

4.2 service/tablehash/CompareResult.java

package com.liang.learn.service.tablehash;

import lombok.Getter;

import java.util.ArrayList;
import java.util.List;

/**
 * 两库 hash 归并比较结果(仅含差异 id 与行数统计,不含行内容)
 *
 * @author liang
 * @since 2026-05-24
 */
@Getter
public class CompareResult {

    /**
     * PG 独有行的 id 列表
     */
    private final List<String> pgOnlyIds = new ArrayList<>();

    /**
     * openGauss 独有行的 id 列表
     */
    private final List<String> ogOnlyIds = new ArrayList<>();

    /**
     * 两库均有但 hash 不一致的 id 列表
     */
    private final List<String> mismatchIds = new ArrayList<>();

    /**
     * PG 侧扫描行数(多分片时各片 pgRowCount 累加)
     */
    long pgRowCount;

    /**
     * openGauss 侧扫描行数(多分片时各片 ogRowCount 累加)
     */
    long ogRowCount;

    /**
     * 合并另一分片结果:id 列表追加、行数累加(keyset 多片循环调用)
     */
    public void mergeFrom(CompareResult other) {
        pgOnlyIds.addAll(other.getPgOnlyIds());
        ogOnlyIds.addAll(other.getOgOnlyIds());
        mismatchIds.addAll(other.getMismatchIds());
        pgRowCount += other.getPgRowCount();
        ogRowCount += other.getOgRowCount();
    }
}

4.3 service/tablehash/HashPagePlanner.java(新增)

package com.liang.learn.service.tablehash;

import com.liang.learn.util.IdComparatorUtil;
import org.springframework.stereotype.Component;

import java.util.List;

/**
 * hash 比较分片规划:是否 keyset 多片、翻页边界
 *
 * @author liang
 * @since 2026-05-27
 */
@Component
public class HashPagePlanner {

    /**
     * 判断是否需 keyset 多片。
     * 取 max(pgCount, ogCount):任一侧行数更多时单片 OG 批量仍可能 OOM。
     */
    public boolean needsPaging(long pgCount, long ogCount, int threshold) {
        return Math.max(pgCount, ogCount) > threshold;
    }

    /**
     * 本片结束后是否还有下一页。
     * 任一侧读满 limit 说明时间窗内可能还有后续 id,需继续翻页。
     */
    public boolean hasNextPage(long pgRowsRead, int ogRowsRead, int limit) {
        return pgRowsRead >= limit || ogRowsRead >= limit;
    }

    /**
     * 计算下一页 keyset 起点。
     * PG 流式、OG 批量各自 LIMIT 后末 id 可能不同,取较大值避免漏扫中间 id。
     */
    public String nextLastId(String pgLastReadId, List<HashRow> ogRows, String idUdtName) {
        String ogLast = ogRows.isEmpty() ? null : ogRows.get(ogRows.size() - 1).getId();
        if (pgLastReadId == null) {
            return ogLast;
        }
        if (ogLast == null) {
            return pgLastReadId;
        }
        return IdComparatorUtil.compare(pgLastReadId, ogLast, idUdtName) >= 0 ? pgLastReadId : ogLast;
    }
}

4.4 service/tablehash/HashSqlBuilder.java

package com.liang.learn.service.tablehash;

import com.liang.learn.util.ColumnTypeHashStrategyUtil;
import org.springframework.stereotype.Component;

import java.util.List;
import java.util.Locale;
import java.util.stream.Collectors;

/**
 * 动态生成 hash 比较 SQL
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
public class HashSqlBuilder {

    /**
     * 构建 hash 查询 SQL:SELECT id, MD5(concat_ws(...)) AS row_hash FROM ... WHERE time >= ? AND time < ?
     *
     * @param schema                schema 名
     * @param tableName             表名
     * @param columns               列元数据(有序)
     * @param timeColumn            时间字段名
     * @param uppercaseIdentifiers  true 时标识符转大写(openGauss)
     * @return 带两个时间占位符的 SQL
     */
    public String buildHashQuery(String schema, String tableName, List<ColumnMeta> columns,
                                 String timeColumn, boolean uppercaseIdentifiers) {
        return buildHashQuery(schema, tableName, columns, timeColumn, uppercaseIdentifiers, null, null);
    }

    /**
     * 构建 hash 查询 SQL,可选 id keyset 与 LIMIT(多片兜底分页)
     *
     * @param lastId keyset 游标,null 表示第一页
     * @param limit  null 表示不 LIMIT(单片全量)
     */
    public String buildHashQuery(String schema, String tableName, List<ColumnMeta> columns,
                                 String timeColumn, boolean uppercaseIdentifiers,
                                 String lastId, Integer limit) {
        String idCol = columns.stream().filter(ColumnMeta::isFirstColumn).findFirst()
                .orElseThrow(() -> new IllegalStateException("未找到首列"))
                .getColumnName();
        String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
        String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
        String quotedId = quoteColumn(idCol, uppercaseIdentifiers);
        String quotedTime = quoteColumn(timeColumn, uppercaseIdentifiers);

        List<String> hashParts = columns.stream()
                .filter(col -> !ColumnTypeHashStrategyUtil.isExcludedFromHash(col.getUdtName()))
                .map(col -> {
                    String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
                    String expr = ColumnTypeHashStrategyUtil.buildExpression(quoted, col.getUdtName());
                    return ColumnTypeHashStrategyUtil.wrapCoalesce(expr);
                })
                .collect(Collectors.toList());

        StringBuilder sql = new StringBuilder("SELECT ").append(quotedId).append(", MD5(concat_ws('|', ")
                .append(String.join(", ", hashParts))
                .append(")) AS row_hash FROM ").append(quotedSchema).append(".").append(quotedTable)
                .append(" WHERE ").append(quotedTime).append(" >= ? AND ").append(quotedTime).append(" < ?");
        // keyset 分页:不用 OFFSET,避免大偏移性能问题;与 ORDER BY id 配合
        if (lastId != null) {
            sql.append(" AND ").append(quotedId).append("::text > ?");
        }
        sql.append(" ORDER BY ").append(quotedId);
        if (limit != null) {
            sql.append(" LIMIT ?");
        }
        return sql.toString();
    }

    /**
     * 构建时间窗行数统计 SQL,用于决定是否启用 keyset 多片
     */
    public String buildCountQuery(String schema, String tableName, String timeColumn,
                                  boolean uppercaseIdentifiers) {
        String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
        String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
        String quotedTime = quoteColumn(timeColumn, uppercaseIdentifiers);
        return "SELECT COUNT(*) FROM " + quotedSchema + "." + quotedTable
                + " WHERE " + quotedTime + " >= ? AND " + quotedTime + " < ?";
    }

    /**
     * 构建标准化列查询 SQL,用于 hash 不一致时的列级 diff
     *
     * @param schema                schema 名
     * @param tableName             表名
     * @param columns               列元数据
     * @param uppercaseIdentifiers  true 时标识符转大写
     * @return SELECT id, norm_col1, norm_col2, ... WHERE id::text = ANY(?)
     */
    public String buildNormalizedColumnsQuery(String schema, String tableName, List<ColumnMeta> columns,
                                               boolean uppercaseIdentifiers) {
        String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
        String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
        String quotedId = quoteColumn(columns.get(0).getColumnName(), uppercaseIdentifiers);

        List<String> selectParts = columns.stream()
                .filter(col -> !col.isFirstColumn())
                .filter(col -> !ColumnTypeHashStrategyUtil.isExcludedFromHash(col.getUdtName()))
                .map(col -> {
                    String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
                    String expr = ColumnTypeHashStrategyUtil.buildExpression(quoted, col.getUdtName());
                    return ColumnTypeHashStrategyUtil.wrapCoalesce(expr) + " AS "
                            + quoteColumn(col.getColumnName(), uppercaseIdentifiers);
                })
                .collect(Collectors.toList());

        return "SELECT " + quotedId + ", " + String.join(", ", selectParts)
                + " FROM " + quotedSchema + "." + quotedTable
                + " WHERE " + buildIdAnyClause(quotedId);
    }

    /**
     * 构建 id 批量匹配条件:id 统一转 text,与 varchar[] 参数对齐(兼容 bigint/varchar 首列)
     */
    String buildIdAnyClause(String quotedId) {
        return quotedId + "::text = ANY(?)";
    }

    /**
     * 引用列名
     */
    public String quoteColumn(String columnName, boolean uppercase) {
        return quoteIdentifier(columnName, uppercase);
    }

    /**
     * 引用 schema / 表名 / 列名
     */
    public String quoteIdentifier(String name, boolean uppercase) {
        String normalized = uppercase ? name.toUpperCase(Locale.ROOT) : name.toLowerCase(Locale.ROOT);
        return "\"" + normalized + "\"";
    }
}

4.5 service/tablehash/HashRowCursor.java(仅 PG 使用)

package com.liang.learn.service.tablehash;

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;
import lombok.Getter;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.Iterator;
import java.util.NoSuchElementException;

/**
 * PG 侧流式 hash 行游标(openGauss 不支持可靠流式,OG 侧改用 fetchHashRows 批量)
 *
 * @author liang
 * @since 2026-05-24
 */
public class HashRowCursor implements Iterator<HashRow>, AutoCloseable {

    private final PreparedStatement preparedStatement;

    private final ResultSet resultSet;

    private HashRow nextRow;

    private boolean finished;

    /**
     * merge 消费过程中 PG 侧最后读到的 id,供 keyset 翻页计算 nextLastId
     */
    @Getter
    private String lastReadId;

    private HashRowCursor(PreparedStatement preparedStatement, ResultSet resultSet) {
        this.preparedStatement = preparedStatement;
        this.resultSet = resultSet;
    }

    /**
     * 打开 hash 游标(单片全量,无 keyset/LIMIT)
     */
    public static HashRowCursor open(Connection connection, String sql, String startTime, String endTime,
                                     int fetchSize) {
        return open(connection, sql, startTime, endTime, null, null, fetchSize);
    }

    /**
     * 打开 hash 游标,可选 id keyset 与 LIMIT(多分片兜底)
     * 调用前须对 PG 连接 configurePgStreamingConnection(autoCommit=false)
     */
    public static HashRowCursor open(Connection connection, String sql,
                                     String startTime, String endTime,
                                     String lastId, Integer limit, int fetchSize) {
        try {
            PreparedStatement ps = connection.prepareStatement(sql);
            ps.setFetchSize(fetchSize);
            // 占位符顺序须与 HashSqlBuilder.buildHashQuery 一致
            int idx = 1;
            ps.setTimestamp(idx++, Timestamp.valueOf(startTime));
            ps.setTimestamp(idx++, Timestamp.valueOf(endTime));
            if (lastId != null) {
                ps.setString(idx++, lastId);
            }
            if (limit != null) {
                ps.setInt(idx, limit);
            }
            ResultSet rs = ps.executeQuery();
            return new HashRowCursor(ps, rs);
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 查询失败: " + e.getMessage());
        }
    }

    @Override
    public boolean hasNext() {
        if (finished) {
            return false;
        }
        if (nextRow != null) {
            return true;
        }
        try {
            if (resultSet.next()) {
                nextRow = new HashRow(resultSet.getString(1), resultSet.getString(2));
                return true;
            }
            finished = true;
            return false;
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 读取失败: " + e.getMessage());
        }
    }

    @Override
    public HashRow next() {
        if (!hasNext()) {
            throw new NoSuchElementException();
        }
        HashRow current = nextRow;
        nextRow = null;
        lastReadId = current.getId();
        return current;
    }

    @Override
    public void close() {
        finished = true;
        nextRow = null;
        closeQuietly(resultSet);
        closeQuietly(preparedStatement);
    }

    private static void closeQuietly(AutoCloseable closeable) {
        if (closeable == null) {
            return;
        }
        try {
            closeable.close();
        } catch (Exception ignored) {
            // ignore close errors during cleanup
        }
    }
}

4.6 service/tablehash/ExternalJdbcExecutor.java

package com.liang.learn.service.tablehash;

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.DataSourceTypeEnum;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.service.DataSourceService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

/**
 * 外部数据库 JDBC 直连执行器
 *
 * @author liang
 * @since 2026-05-24
 */
@Slf4j
@Component
@RequiredArgsConstructor
public class ExternalJdbcExecutor {

    /**
     * 数据源 Service,用于解密密码
     */
    private final DataSourceService dataSourceService;

    /**
     * 打开外部数据库 JDBC 连接
     *
     * @param dataSource 数据源配置
     * @return JDBC 连接,使用后须关闭
     */
    public Connection openConnection(DataSourceDO dataSource) {
        try {
            DataSourceTypeEnum typeEnum = DataSourceTypeEnum.fromCode(dataSource.getDataSourceType());
            String driver = typeEnum.resolveDriverClassName(dataSource.getDriverClassName());
            String password = dataSourceService.getDecryptedPassword(dataSource.getId());
            String url = typeEnum.buildJdbcUrl(
                    dataSource.getHost(), dataSource.getPort(),
                    dataSource.getDatabaseName(), dataSource.getSchema());
            // 轻量直连,不经过连接池
            Class.forName(driver);
            Properties props = new Properties();
            props.setProperty("user", dataSource.getUsername());
            props.setProperty("password", password);
            return DriverManager.getConnection(url, props);
        } catch (BusinessException e) {
            throw e;
        } catch (Exception e) {
            log.warn("外部库连接失败, dataSourceId={}, reason={}", dataSource.getId(), e.getMessage());
            throw new BusinessException(ErrorCode.CONNECTION_FAILED, "外部库连接失败");
        }
    }

    /**
     * 统计时间窗内行数(PG/OG 各查一次,用于 needsPaging 决策)
     */
    public long countRows(Connection connection, String sql, String startTime, String endTime) {
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            ps.setTimestamp(1, Timestamp.valueOf(startTime));
            ps.setTimestamp(2, Timestamp.valueOf(endTime));
            try (ResultSet rs = ps.executeQuery()) {
                return rs.next() ? rs.getLong(1) : 0L;
            }
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "行数统计失败: " + e.getMessage());
        }
    }

    /**
     * 一次性加载 hash 行(**openGauss 侧专用**:驱动无法流式,整片读入 List)
     *
     * @param lastId keyset 游标,null 表示第一页
     * @param limit  null 表示不 LIMIT(单片全量)
     */
    public List<HashRow> fetchHashRows(Connection connection, String sql,
                                       String startTime, String endTime,
                                       String lastId, Integer limit) {
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            // 占位符顺序须与 HashSqlBuilder.buildHashQuery 一致
            int idx = 1;
            ps.setTimestamp(idx++, Timestamp.valueOf(startTime));
            ps.setTimestamp(idx++, Timestamp.valueOf(endTime));
            if (lastId != null) {
                ps.setString(idx++, lastId);
            }
            if (limit != null) {
                ps.setInt(idx, limit);
            }
            try (ResultSet rs = ps.executeQuery()) {
                List<HashRow> rows = new ArrayList<>();
                while (rs.next()) {
                    rows.add(new HashRow(rs.getString(1), rs.getString(2)));
                }
                return rows;
            }
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 查询失败: " + e.getMessage());
        }
    }

    /**
     * 单片便捷重载:全时间窗、无 keyset
     */
    public List<HashRow> fetchHashRows(Connection connection, String sql,
                                       String startTime, String endTime) {
        return fetchHashRows(connection, sql, startTime, endTime, null, null);
    }

    /**
     * 按 id 列表批量查询全行
     *
     * @param connection 数据库连接
     * @param schema     schema 名
     * @param tableName  表名
     * @param idColumn   id 列名
     * @param ids        id 列表
     * @param uppercase  true 时标识符转大写(openGauss)
     * @return 行数据列表,每行 Map 的 key 为列名
     */
    public List<Map<String, Object>> fetchRowsByIds(Connection connection, String schema, String tableName,
                                                    String idColumn, List<String> ids, boolean uppercase) {
        if (ids.isEmpty()) {
            return List.of();
        }
        // PG 小写 / OG 大写引用标识符
        String quotedSchema = uppercase ? quote(schema, true) : quote(schema, false);
        String quotedTable = uppercase ? quote(tableName, true) : quote(tableName, false);
        String quotedId = uppercase ? quote(idColumn, true) : quote(idColumn, false);
        String sql = "SELECT * FROM " + quotedSchema + "." + quotedTable
                + " WHERE " + quotedId + "::text = ANY(?)";
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            // id 在 Java 侧为 String,SQL 侧用 id::text 与 varchar[] 匹配
            ps.setArray(1, connection.createArrayOf(
                    uppercase ? "VARCHAR" : "varchar", ids.toArray()));
            try (ResultSet rs = ps.executeQuery()) {
                List<Map<String, Object>> rows = new ArrayList<>();
                int colCount = rs.getMetaData().getColumnCount();
                // 按列标签组装 Map,保留原始列值供日志输出
                while (rs.next()) {
                    Map<String, Object> row = new LinkedHashMap<>();
                    for (int i = 1; i <= colCount; i++) {
                        row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
                    }
                    rows.add(row);
                }
                return rows;
            }
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "全行查询失败: " + e.getMessage());
        }
    }

    /**
     * 按 id 列表查询标准化列值(用于列级 diff)
     *
     * @param connection 数据库连接
     * @param sql        标准化列 SQL(含 id = ANY(?) 占位符)
     * @param ids        id 列表
     * @return 行数据列表
     */
    public List<Map<String, Object>> fetchNormalizedRows(Connection connection, String sql, List<String> ids) {
        if (ids.isEmpty()) {
            return List.of();
        }
        try (PreparedStatement ps = connection.prepareStatement(sql)) {
            ps.setArray(1, connection.createArrayOf("varchar", ids.toArray()));
            try (ResultSet rs = ps.executeQuery()) {
                List<Map<String, Object>> rows = new ArrayList<>();
                int colCount = rs.getMetaData().getColumnCount();
                // 值为 hash 标准化后的字符串,供 ColumnDiffCalculator 逐列对比
                while (rs.next()) {
                    Map<String, Object> row = new LinkedHashMap<>();
                    for (int i = 1; i <= colCount; i++) {
                        row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
                    }
                    rows.add(row);
                }
                return rows;
            }
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "标准化列查询失败: " + e.getMessage());
        }
    }

    /**
     * 引用 SQL 标识符
     *
     * @param name  标识符
     * @param upper true 转大写
     * @return 带双引号的标识符
     */
    private String quote(String name, boolean upper) {
        return "\"" + (upper ? name.toUpperCase() : name.toLowerCase()) + "\"";
    }
}

4.7 service/tablehash/TableHashCompareExecutor.java(核心重写)

package com.liang.learn.service.tablehash;

import com.liang.learn.exception.BusinessException;
import com.liang.learn.mapper.DataSourceMapper;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.properties.TableHashProperties;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Component;

import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;
import java.util.Locale;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

/**
 * 两表 hash 比较编排执行器:PG 流式 + OG 批量,可选 id keyset 多片
 *
 * @author liang
 * @since 2026-05-24
 */
@Component
@RequiredArgsConstructor
public class TableHashCompareExecutor {

    private final DataSourceMapper dataSourceMapper;

    private final TableHashProperties properties;

    private final ExternalJdbcExecutor jdbcExecutor;

    private final TableMetadataLoader metadataLoader;

    private final HashSqlBuilder hashSqlBuilder;

    private final StreamingHashComparer streamingHashComparer;

    /** 分片决策与 keyset 翻页边界 */
    private final HashPagePlanner hashPagePlanner;

    /**
     * 执行 hash 比较,返回差异 id 与行数统计(不含行内容与列 diff)
     */
    public CompareResult compare(TableHashCompareDTO dto) {
        DataSourceDO pgDataSource = requireDataSource(properties.getPgsqlDataSourceId(), "PG");
        DataSourceDO ogDataSource = requireDataSource(properties.getOpengaussDataSourceId(), "openGauss");

        String tableName = dto.getTableName().toLowerCase(Locale.ROOT);
        String ogTableName = tableName.toUpperCase(Locale.ROOT);
        String pgSchema = pgDataSource.getSchema();
        String ogSchema = ogDataSource.getSchema();

        try (Connection pgConnection = jdbcExecutor.openConnection(pgDataSource);
             Connection ogConnection = jdbcExecutor.openConnection(ogDataSource)) {

            List<ColumnMeta> columns = metadataLoader.loadColumns(pgConnection, pgSchema, tableName);
            validateTimeColumn(columns, dto.getTimeColumn());
            String idUdtName = columns.stream()
                    .filter(ColumnMeta::isFirstColumn)
                    .findFirst()
                    .orElse(columns.getFirst())
                    .getUdtName();

            // 1. COUNT 决定是否 keyset 多片(max 两侧行数)
            String pgCountSql = hashSqlBuilder.buildCountQuery(pgSchema, tableName, dto.getTimeColumn(), false);
            String ogCountSql = hashSqlBuilder.buildCountQuery(ogSchema, ogTableName, dto.getTimeColumn(), true);
            long pgCount = jdbcExecutor.countRows(pgConnection, pgCountSql, dto.getStartTime(), dto.getEndTime());
            long ogCount = jdbcExecutor.countRows(ogConnection, ogCountSql, dto.getStartTime(), dto.getEndTime());

            boolean paging = hashPagePlanner.needsPaging(pgCount, ogCount, properties.getShardRowThreshold());
            // 单片:pageLimit=null,SQL 无 LIMIT;多片:每片最多 shardRowThreshold 行
            Integer pageLimit = paging ? properties.getShardRowThreshold() : null;

            CompareResult total = new CompareResult();
            String lastId = null;
            do {
                String pgHashSql = hashSqlBuilder.buildHashQuery(
                        pgSchema, tableName, columns, dto.getTimeColumn(), false, lastId, pageLimit);
                String ogHashSql = hashSqlBuilder.buildHashQuery(
                        ogSchema, ogTableName, columns, dto.getTimeColumn(), true, lastId, pageLimit);

                PageOutcome outcome = compareOnePage(
                        pgConnection, ogConnection, pgHashSql, ogHashSql,
                        dto.getStartTime(), dto.getEndTime(), lastId, pageLimit, idUdtName);
                total.mergeFrom(outcome.result());

                if (!paging) {
                    break;
                }
                if (!hashPagePlanner.hasNextPage(outcome.result().getPgRowCount(), outcome.ogRowCount(), pageLimit)) {
                    break;
                }
                lastId = hashPagePlanner.nextLastId(outcome.pgLastReadId(), outcome.ogRows(), idUdtName);
            } while (true);

            return total;
        } catch (BusinessException e) {
            throw e;
        } catch (Exception e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 比较失败: " + e.getMessage());
        }
    }

    /**
     * 单片比较:虚拟线程并行拉 OG 批量,主线程 PG 流式游标,OG 就绪后双指针归并
     */
    private PageOutcome compareOnePage(Connection pgConnection, Connection ogConnection,
                                       String pgHashSql, String ogHashSql,
                                       String startTime, String endTime,
                                       String lastId, Integer pageLimit,
                                       String idUdtName) {
        try (ExecutorService virtualExec = Executors.newVirtualThreadPerTaskExecutor()) {
            // OG 批量:与 PG 开游标并行,重叠库端耗时(OG 连接仅在虚拟线程内使用)
            CompletableFuture<List<HashRow>> ogFuture = CompletableFuture.supplyAsync(
                    () -> jdbcExecutor.fetchHashRows(ogConnection, ogHashSql, startTime, endTime, lastId, pageLimit),
                    virtualExec);

            // PG 流式:仅 PG 连接设置 autoCommit=false,OG 保持默认 autocommit
            configurePgStreamingConnection(pgConnection);
            HashRowCursor pgCursor = pageLimit == null
                    ? HashRowCursor.open(pgConnection, pgHashSql, startTime, endTime, properties.getCompareFetchSize())
                    : HashRowCursor.open(pgConnection, pgHashSql, startTime, endTime, lastId, pageLimit,
                    properties.getCompareFetchSize());
            try {
                List<HashRow> ogRows = ogFuture.join();
                CompareResult result = streamingHashComparer.merge(pgCursor, ogRows.iterator(), idUdtName);
                return new PageOutcome(result, ogRows.size(), pgCursor.getLastReadId(), ogRows);
            } finally {
                pgCursor.close();
                endReadOnlyTransaction(pgConnection);
            }
        }
    }

    private DataSourceDO requireDataSource(Long id, String label) {
        DataSourceDO dataSource = dataSourceMapper.selectById(id);
        if (dataSource == null) {
            throw new BusinessException(ErrorCode.PARAM_ERROR, label + " 数据源不存在, id=" + id);
        }
        return dataSource;
    }

    /**
     * 启用 PG JDBC 服务端游标(须 autoCommit=false 且 fetchSize&gt;0)
     */
    private static void configurePgStreamingConnection(Connection connection) {
        try {
            connection.setAutoCommit(false);
            connection.setReadOnly(true);
        } catch (SQLException e) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "配置 PG 流式连接失败: " + e.getMessage());
        }
    }

    /**
     * 片结束后 rollback,释放 PG 只读事务,以便同连接开启下一片游标
     */
    private static void endReadOnlyTransaction(Connection connection) {
        if (connection == null) {
            return;
        }
        try {
            if (!connection.isClosed() && !connection.getAutoCommit()) {
                connection.rollback();
            }
        } catch (SQLException ignored) {
            // ignore rollback errors during cleanup
        }
    }

    private void validateTimeColumn(List<ColumnMeta> columns, String timeColumn) {
        boolean exists = columns.stream()
                .anyMatch(col -> col.getColumnName().equalsIgnoreCase(timeColumn));
        if (!exists) {
            throw new BusinessException(ErrorCode.PARAM_ERROR, "时间字段不存在: " + timeColumn);
        }
    }

    /** 单片比较产出:归并结果、OG 行数(判 hasNextPage)、PG 末 id(算 nextLastId)、OG 行列表(算 nextLastId) */
    private record PageOutcome(CompareResult result, int ogRowCount, String pgLastReadId, List<HashRow> ogRows) {
    }
}

4.8 service/impl/TableHashCompareServiceImpl.java

package com.liang.learn.service.impl;

import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.model.vo.TableHashTaskVO;
import com.liang.learn.service.TableHashCompareService;
import com.liang.learn.service.tablehash.CompareResult;
import com.liang.learn.service.tablehash.TableHashCompareExecutor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import org.springframework.stereotype.Service;

import java.util.UUID;
import java.util.concurrent.RejectedExecutionException;

/**
 * 两表 hash 比较 Service 实现
 *
 * @author liang
 * @since 2026-05-24
 */
@Slf4j
@Service
public class TableHashCompareServiceImpl implements TableHashCompareService {

    /** 表 hash 比较专用线程池,限制并发避免多个重型任务同时跑 */
    private final ThreadPoolTaskExecutor tableHashExecutor;

    private final TableHashCompareExecutor compareExecutor;

    public TableHashCompareServiceImpl(
            @Qualifier("tableHashExecutor") ThreadPoolTaskExecutor tableHashExecutor,
            TableHashCompareExecutor compareExecutor) {
        this.tableHashExecutor = tableHashExecutor;
        this.compareExecutor = compareExecutor;
    }

    @Override
    public TableHashTaskVO submitCompare(TableHashCompareDTO dto) {
        String taskId = UUID.randomUUID().toString();
        try {
            tableHashExecutor.execute(() -> {
                try {
                    CompareResult result = compareExecutor.compare(dto);
                    // CompareResult 当前不落库;debug 便于联调确认任务完成
                    log.debug("hash compare done taskId={} pgRows={} ogRows={} pgOnly={} ogOnly={} mismatch={}",
                            taskId, result.getPgRowCount(), result.getOgRowCount(),
                            result.getPgOnlyIds().size(), result.getOgOnlyIds().size(), result.getMismatchIds().size());
                } catch (BusinessException e) {
                    // 异步线程异常无法回传 HTTP,须 error 日志便于运维发现(替代原 TableHashCompareLogger)
                    log.error("[TABLE-HASH][taskId={}] 失败: code={}, message={}",
                            taskId, e.getCode(), e.getMessage(), e);
                } catch (Exception e) {
                    log.error("[TABLE-HASH][taskId={}] 失败: {}", taskId, e.getMessage(), e);
                }
            });
        } catch (RejectedExecutionException ex) {
            throw new BusinessException(ErrorCode.BUSINESS_ERROR, "比较任务队列已满,请稍后重试");
        }
        return new TableHashTaskVO(taskId);
    }
}

4.9 删除文件

删除 service/tablehash/TableHashCompareLogger.java(整文件移除,无替代)。

TableHashCompareExecutor 中移除对 TableHashCompareLoggerColumnDiffCalculator 的依赖注入,以及 fetchRowsInBatches / fetchNormalizedInBatches / lowercaseKeys 等私有方法。

4.10 application.yaml 片段

app:
  table-hash:
    pgsql-data-source-id: 3          # PG 数据源 id
    opengauss-data-source-id: 2      # openGauss 数据源 id
    shard-row-threshold: 1000000     # max(pg,og) 超过则 id keyset 多片
    compare-fetch-size: 1000         # 仅 PG HashRowCursor.fetchSize

移除(若存在):

    detail-log-limit: 500      # 已删除:不再输出差异明细 log
    fetch-batch-size: 500      # 已删除:不再批量查全行

5. 与旧版差异速查

维度 旧版 本次
OG 加载 HashRowCursor 流式 fetchHashRows 批量
PG 加载 双端流式 HashRowCursor
Executor compare(taskId,dto) void compare(dto)CompareResult
输出 Logger 明细 CompareResult(不落库)
大表 id keyset 分页
配置 detail-log-limit / fetch-batch-size shard-row-threshold

6. 验证

mvn test

重点单测:HashPagePlannerTestHashSqlBuilderTestCompareResultTestStreamingHashComparerTest

联调:POST /api/table-hash/compare,失败时查 [TABLE-HASH][taskId=] error 日志。


posted @ 2026-05-27 20:24  景之1231  阅读(5)  评论(0)    收藏  举报