两表 Hash 比较 — PG 流式 + OG 批量
1. 改造背景(简述)
| 问题 | 结论 |
|---|---|
| openGauss JDBC 无法可靠流式 | OG 必须 批量 fetchHashRows |
| PG 仍支持服务端游标 | PG 保留 HashRowCursor 流式 |
| 双端流式 + 明细 log | 废弃;改为返回 CompareResult |
| 超 100 万行 OOM 风险 | id keyset 分页(max(pgCount,ogCount) 触发) |
2. 设计思路
2.1 PG 流式 + OG 批量(非对称)
- PG:
autoCommit=false+fetchSize→ merge 时next(),JVM 峰值 ≈compare-fetch-size行。 - OG:整片读入
List<HashRow>,JVM 峰值 ≈ 本片行数。 - 归并:
StreamingHashComparer.merge(pgCursor, ogList.iterator()),算法不变。
2.2 片内并行
虚拟线程: OG fetchHashRows ─┐
主线程: PG HashRowCursor.open ┘ 库端可重叠
OG 就绪后 → merge(PG 大量 next 在此阶段)
Java 21 Executors.newVirtualThreadPerTaskExecutor(),每片内提交一个 OG 拉数任务。
2.3 分页:COUNT + id keyset
- 触发:
max(pgCount, ogCount) > shardRowThreshold(默认 100 万)。 - SQL:
AND id::text > ? ORDER BY id LIMIT ?,不用 OFFSET。 - 翻页:
lastId = max(PG lastReadId, OG list 末 id),与IdComparatorUtil一致。
2.4 输出与失败
- 执行器
compare(dto)→CompareResult(三类 id + 行数)。 - 删除
TableHashCompareLogger及全行/列 diff 查询。 - 异步 worker:
log.error("[TABLE-HASH][taskId=...]")捕获失败。
2.5 连接事务
| 连接 | 行为 |
|---|---|
| PG | 每片 compareOnePage 前 setAutoCommit(false) + setReadOnly(true);片末 rollback() |
| OG | 默认 autocommit,短查询 |
3. 文件变更一览
| 操作 | 路径 |
|---|---|
| 新增 | service/tablehash/HashPagePlanner.java |
| 删除 | service/tablehash/TableHashCompareLogger.java |
| 重写/修改 | 见 §4 源码 |
| 未改 | StreamingHashComparer、HashRow、ColumnMeta、TableMetadataLoader、TableHashController、DTO/VO |
4. 变更源码(完整)
包名
com.liang.learn.*源码内注释说明非显而易见的设计点。
4.1 properties/TableHashProperties.java
package com.liang.learn.properties;
import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.stereotype.Component;
/**
* 两表 hash 比较功能配置
*
* @author liang
* @since 2026-05-24
*/
@Data
@Component
@ConfigurationProperties(prefix = "app.table-hash")
public class TableHashProperties {
/**
* PostgreSQL 数据源 ID(源库)
*/
private Long pgsqlDataSourceId = 3L;
/**
* openGauss 数据源 ID(目标库)
*/
private Long opengaussDataSourceId = 2L;
/**
* 单片 OG 批量加载行数上限。
* 当 max(pgCount, ogCount) 超过该值时启用 id keyset 多片,避免 OG 侧 List 过大 OOM。
*/
private int shardRowThreshold = 1_000_000;
/**
* PG 流式 hash 游标 fetchSize(仅 PG 侧 HashRowCursor 使用;OG 不支持流式,不使用此配置)
*/
private int compareFetchSize = 1000;
}
4.2 service/tablehash/CompareResult.java
package com.liang.learn.service.tablehash;
import lombok.Getter;
import java.util.ArrayList;
import java.util.List;
/**
* 两库 hash 归并比较结果(仅含差异 id 与行数统计,不含行内容)
*
* @author liang
* @since 2026-05-24
*/
@Getter
public class CompareResult {
/**
* PG 独有行的 id 列表
*/
private final List<String> pgOnlyIds = new ArrayList<>();
/**
* openGauss 独有行的 id 列表
*/
private final List<String> ogOnlyIds = new ArrayList<>();
/**
* 两库均有但 hash 不一致的 id 列表
*/
private final List<String> mismatchIds = new ArrayList<>();
/**
* PG 侧扫描行数(多分片时各片 pgRowCount 累加)
*/
long pgRowCount;
/**
* openGauss 侧扫描行数(多分片时各片 ogRowCount 累加)
*/
long ogRowCount;
/**
* 合并另一分片结果:id 列表追加、行数累加(keyset 多片循环调用)
*/
public void mergeFrom(CompareResult other) {
pgOnlyIds.addAll(other.getPgOnlyIds());
ogOnlyIds.addAll(other.getOgOnlyIds());
mismatchIds.addAll(other.getMismatchIds());
pgRowCount += other.getPgRowCount();
ogRowCount += other.getOgRowCount();
}
}
4.3 service/tablehash/HashPagePlanner.java(新增)
package com.liang.learn.service.tablehash;
import com.liang.learn.util.IdComparatorUtil;
import org.springframework.stereotype.Component;
import java.util.List;
/**
* hash 比较分片规划:是否 keyset 多片、翻页边界
*
* @author liang
* @since 2026-05-27
*/
@Component
public class HashPagePlanner {
/**
* 判断是否需 keyset 多片。
* 取 max(pgCount, ogCount):任一侧行数更多时单片 OG 批量仍可能 OOM。
*/
public boolean needsPaging(long pgCount, long ogCount, int threshold) {
return Math.max(pgCount, ogCount) > threshold;
}
/**
* 本片结束后是否还有下一页。
* 任一侧读满 limit 说明时间窗内可能还有后续 id,需继续翻页。
*/
public boolean hasNextPage(long pgRowsRead, int ogRowsRead, int limit) {
return pgRowsRead >= limit || ogRowsRead >= limit;
}
/**
* 计算下一页 keyset 起点。
* PG 流式、OG 批量各自 LIMIT 后末 id 可能不同,取较大值避免漏扫中间 id。
*/
public String nextLastId(String pgLastReadId, List<HashRow> ogRows, String idUdtName) {
String ogLast = ogRows.isEmpty() ? null : ogRows.get(ogRows.size() - 1).getId();
if (pgLastReadId == null) {
return ogLast;
}
if (ogLast == null) {
return pgLastReadId;
}
return IdComparatorUtil.compare(pgLastReadId, ogLast, idUdtName) >= 0 ? pgLastReadId : ogLast;
}
}
4.4 service/tablehash/HashSqlBuilder.java
package com.liang.learn.service.tablehash;
import com.liang.learn.util.ColumnTypeHashStrategyUtil;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.Locale;
import java.util.stream.Collectors;
/**
* 动态生成 hash 比较 SQL
*
* @author liang
* @since 2026-05-24
*/
@Component
public class HashSqlBuilder {
/**
* 构建 hash 查询 SQL:SELECT id, MD5(concat_ws(...)) AS row_hash FROM ... WHERE time >= ? AND time < ?
*
* @param schema schema 名
* @param tableName 表名
* @param columns 列元数据(有序)
* @param timeColumn 时间字段名
* @param uppercaseIdentifiers true 时标识符转大写(openGauss)
* @return 带两个时间占位符的 SQL
*/
public String buildHashQuery(String schema, String tableName, List<ColumnMeta> columns,
String timeColumn, boolean uppercaseIdentifiers) {
return buildHashQuery(schema, tableName, columns, timeColumn, uppercaseIdentifiers, null, null);
}
/**
* 构建 hash 查询 SQL,可选 id keyset 与 LIMIT(多片兜底分页)
*
* @param lastId keyset 游标,null 表示第一页
* @param limit null 表示不 LIMIT(单片全量)
*/
public String buildHashQuery(String schema, String tableName, List<ColumnMeta> columns,
String timeColumn, boolean uppercaseIdentifiers,
String lastId, Integer limit) {
String idCol = columns.stream().filter(ColumnMeta::isFirstColumn).findFirst()
.orElseThrow(() -> new IllegalStateException("未找到首列"))
.getColumnName();
String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
String quotedId = quoteColumn(idCol, uppercaseIdentifiers);
String quotedTime = quoteColumn(timeColumn, uppercaseIdentifiers);
List<String> hashParts = columns.stream()
.filter(col -> !ColumnTypeHashStrategyUtil.isExcludedFromHash(col.getUdtName()))
.map(col -> {
String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
String expr = ColumnTypeHashStrategyUtil.buildExpression(quoted, col.getUdtName());
return ColumnTypeHashStrategyUtil.wrapCoalesce(expr);
})
.collect(Collectors.toList());
StringBuilder sql = new StringBuilder("SELECT ").append(quotedId).append(", MD5(concat_ws('|', ")
.append(String.join(", ", hashParts))
.append(")) AS row_hash FROM ").append(quotedSchema).append(".").append(quotedTable)
.append(" WHERE ").append(quotedTime).append(" >= ? AND ").append(quotedTime).append(" < ?");
// keyset 分页:不用 OFFSET,避免大偏移性能问题;与 ORDER BY id 配合
if (lastId != null) {
sql.append(" AND ").append(quotedId).append("::text > ?");
}
sql.append(" ORDER BY ").append(quotedId);
if (limit != null) {
sql.append(" LIMIT ?");
}
return sql.toString();
}
/**
* 构建时间窗行数统计 SQL,用于决定是否启用 keyset 多片
*/
public String buildCountQuery(String schema, String tableName, String timeColumn,
boolean uppercaseIdentifiers) {
String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
String quotedTime = quoteColumn(timeColumn, uppercaseIdentifiers);
return "SELECT COUNT(*) FROM " + quotedSchema + "." + quotedTable
+ " WHERE " + quotedTime + " >= ? AND " + quotedTime + " < ?";
}
/**
* 构建标准化列查询 SQL,用于 hash 不一致时的列级 diff
*
* @param schema schema 名
* @param tableName 表名
* @param columns 列元数据
* @param uppercaseIdentifiers true 时标识符转大写
* @return SELECT id, norm_col1, norm_col2, ... WHERE id::text = ANY(?)
*/
public String buildNormalizedColumnsQuery(String schema, String tableName, List<ColumnMeta> columns,
boolean uppercaseIdentifiers) {
String quotedSchema = quoteIdentifier(schema, uppercaseIdentifiers);
String quotedTable = quoteIdentifier(tableName, uppercaseIdentifiers);
String quotedId = quoteColumn(columns.get(0).getColumnName(), uppercaseIdentifiers);
List<String> selectParts = columns.stream()
.filter(col -> !col.isFirstColumn())
.filter(col -> !ColumnTypeHashStrategyUtil.isExcludedFromHash(col.getUdtName()))
.map(col -> {
String quoted = quoteColumn(col.getColumnName(), uppercaseIdentifiers);
String expr = ColumnTypeHashStrategyUtil.buildExpression(quoted, col.getUdtName());
return ColumnTypeHashStrategyUtil.wrapCoalesce(expr) + " AS "
+ quoteColumn(col.getColumnName(), uppercaseIdentifiers);
})
.collect(Collectors.toList());
return "SELECT " + quotedId + ", " + String.join(", ", selectParts)
+ " FROM " + quotedSchema + "." + quotedTable
+ " WHERE " + buildIdAnyClause(quotedId);
}
/**
* 构建 id 批量匹配条件:id 统一转 text,与 varchar[] 参数对齐(兼容 bigint/varchar 首列)
*/
String buildIdAnyClause(String quotedId) {
return quotedId + "::text = ANY(?)";
}
/**
* 引用列名
*/
public String quoteColumn(String columnName, boolean uppercase) {
return quoteIdentifier(columnName, uppercase);
}
/**
* 引用 schema / 表名 / 列名
*/
public String quoteIdentifier(String name, boolean uppercase) {
String normalized = uppercase ? name.toUpperCase(Locale.ROOT) : name.toLowerCase(Locale.ROOT);
return "\"" + normalized + "\"";
}
}
4.5 service/tablehash/HashRowCursor.java(仅 PG 使用)
package com.liang.learn.service.tablehash;
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.enums.ErrorCode;
import lombok.Getter;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.Iterator;
import java.util.NoSuchElementException;
/**
* PG 侧流式 hash 行游标(openGauss 不支持可靠流式,OG 侧改用 fetchHashRows 批量)
*
* @author liang
* @since 2026-05-24
*/
public class HashRowCursor implements Iterator<HashRow>, AutoCloseable {
private final PreparedStatement preparedStatement;
private final ResultSet resultSet;
private HashRow nextRow;
private boolean finished;
/**
* merge 消费过程中 PG 侧最后读到的 id,供 keyset 翻页计算 nextLastId
*/
@Getter
private String lastReadId;
private HashRowCursor(PreparedStatement preparedStatement, ResultSet resultSet) {
this.preparedStatement = preparedStatement;
this.resultSet = resultSet;
}
/**
* 打开 hash 游标(单片全量,无 keyset/LIMIT)
*/
public static HashRowCursor open(Connection connection, String sql, String startTime, String endTime,
int fetchSize) {
return open(connection, sql, startTime, endTime, null, null, fetchSize);
}
/**
* 打开 hash 游标,可选 id keyset 与 LIMIT(多分片兜底)
* 调用前须对 PG 连接 configurePgStreamingConnection(autoCommit=false)
*/
public static HashRowCursor open(Connection connection, String sql,
String startTime, String endTime,
String lastId, Integer limit, int fetchSize) {
try {
PreparedStatement ps = connection.prepareStatement(sql);
ps.setFetchSize(fetchSize);
// 占位符顺序须与 HashSqlBuilder.buildHashQuery 一致
int idx = 1;
ps.setTimestamp(idx++, Timestamp.valueOf(startTime));
ps.setTimestamp(idx++, Timestamp.valueOf(endTime));
if (lastId != null) {
ps.setString(idx++, lastId);
}
if (limit != null) {
ps.setInt(idx, limit);
}
ResultSet rs = ps.executeQuery();
return new HashRowCursor(ps, rs);
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 查询失败: " + e.getMessage());
}
}
@Override
public boolean hasNext() {
if (finished) {
return false;
}
if (nextRow != null) {
return true;
}
try {
if (resultSet.next()) {
nextRow = new HashRow(resultSet.getString(1), resultSet.getString(2));
return true;
}
finished = true;
return false;
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 读取失败: " + e.getMessage());
}
}
@Override
public HashRow next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
HashRow current = nextRow;
nextRow = null;
lastReadId = current.getId();
return current;
}
@Override
public void close() {
finished = true;
nextRow = null;
closeQuietly(resultSet);
closeQuietly(preparedStatement);
}
private static void closeQuietly(AutoCloseable closeable) {
if (closeable == null) {
return;
}
try {
closeable.close();
} catch (Exception ignored) {
// ignore close errors during cleanup
}
}
}
4.6 service/tablehash/ExternalJdbcExecutor.java
package com.liang.learn.service.tablehash;
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.DataSourceTypeEnum;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.service.DataSourceService;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
* 外部数据库 JDBC 直连执行器
*
* @author liang
* @since 2026-05-24
*/
@Slf4j
@Component
@RequiredArgsConstructor
public class ExternalJdbcExecutor {
/**
* 数据源 Service,用于解密密码
*/
private final DataSourceService dataSourceService;
/**
* 打开外部数据库 JDBC 连接
*
* @param dataSource 数据源配置
* @return JDBC 连接,使用后须关闭
*/
public Connection openConnection(DataSourceDO dataSource) {
try {
DataSourceTypeEnum typeEnum = DataSourceTypeEnum.fromCode(dataSource.getDataSourceType());
String driver = typeEnum.resolveDriverClassName(dataSource.getDriverClassName());
String password = dataSourceService.getDecryptedPassword(dataSource.getId());
String url = typeEnum.buildJdbcUrl(
dataSource.getHost(), dataSource.getPort(),
dataSource.getDatabaseName(), dataSource.getSchema());
// 轻量直连,不经过连接池
Class.forName(driver);
Properties props = new Properties();
props.setProperty("user", dataSource.getUsername());
props.setProperty("password", password);
return DriverManager.getConnection(url, props);
} catch (BusinessException e) {
throw e;
} catch (Exception e) {
log.warn("外部库连接失败, dataSourceId={}, reason={}", dataSource.getId(), e.getMessage());
throw new BusinessException(ErrorCode.CONNECTION_FAILED, "外部库连接失败");
}
}
/**
* 统计时间窗内行数(PG/OG 各查一次,用于 needsPaging 决策)
*/
public long countRows(Connection connection, String sql, String startTime, String endTime) {
try (PreparedStatement ps = connection.prepareStatement(sql)) {
ps.setTimestamp(1, Timestamp.valueOf(startTime));
ps.setTimestamp(2, Timestamp.valueOf(endTime));
try (ResultSet rs = ps.executeQuery()) {
return rs.next() ? rs.getLong(1) : 0L;
}
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "行数统计失败: " + e.getMessage());
}
}
/**
* 一次性加载 hash 行(**openGauss 侧专用**:驱动无法流式,整片读入 List)
*
* @param lastId keyset 游标,null 表示第一页
* @param limit null 表示不 LIMIT(单片全量)
*/
public List<HashRow> fetchHashRows(Connection connection, String sql,
String startTime, String endTime,
String lastId, Integer limit) {
try (PreparedStatement ps = connection.prepareStatement(sql)) {
// 占位符顺序须与 HashSqlBuilder.buildHashQuery 一致
int idx = 1;
ps.setTimestamp(idx++, Timestamp.valueOf(startTime));
ps.setTimestamp(idx++, Timestamp.valueOf(endTime));
if (lastId != null) {
ps.setString(idx++, lastId);
}
if (limit != null) {
ps.setInt(idx, limit);
}
try (ResultSet rs = ps.executeQuery()) {
List<HashRow> rows = new ArrayList<>();
while (rs.next()) {
rows.add(new HashRow(rs.getString(1), rs.getString(2)));
}
return rows;
}
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 查询失败: " + e.getMessage());
}
}
/**
* 单片便捷重载:全时间窗、无 keyset
*/
public List<HashRow> fetchHashRows(Connection connection, String sql,
String startTime, String endTime) {
return fetchHashRows(connection, sql, startTime, endTime, null, null);
}
/**
* 按 id 列表批量查询全行
*
* @param connection 数据库连接
* @param schema schema 名
* @param tableName 表名
* @param idColumn id 列名
* @param ids id 列表
* @param uppercase true 时标识符转大写(openGauss)
* @return 行数据列表,每行 Map 的 key 为列名
*/
public List<Map<String, Object>> fetchRowsByIds(Connection connection, String schema, String tableName,
String idColumn, List<String> ids, boolean uppercase) {
if (ids.isEmpty()) {
return List.of();
}
// PG 小写 / OG 大写引用标识符
String quotedSchema = uppercase ? quote(schema, true) : quote(schema, false);
String quotedTable = uppercase ? quote(tableName, true) : quote(tableName, false);
String quotedId = uppercase ? quote(idColumn, true) : quote(idColumn, false);
String sql = "SELECT * FROM " + quotedSchema + "." + quotedTable
+ " WHERE " + quotedId + "::text = ANY(?)";
try (PreparedStatement ps = connection.prepareStatement(sql)) {
// id 在 Java 侧为 String,SQL 侧用 id::text 与 varchar[] 匹配
ps.setArray(1, connection.createArrayOf(
uppercase ? "VARCHAR" : "varchar", ids.toArray()));
try (ResultSet rs = ps.executeQuery()) {
List<Map<String, Object>> rows = new ArrayList<>();
int colCount = rs.getMetaData().getColumnCount();
// 按列标签组装 Map,保留原始列值供日志输出
while (rs.next()) {
Map<String, Object> row = new LinkedHashMap<>();
for (int i = 1; i <= colCount; i++) {
row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
}
rows.add(row);
}
return rows;
}
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "全行查询失败: " + e.getMessage());
}
}
/**
* 按 id 列表查询标准化列值(用于列级 diff)
*
* @param connection 数据库连接
* @param sql 标准化列 SQL(含 id = ANY(?) 占位符)
* @param ids id 列表
* @return 行数据列表
*/
public List<Map<String, Object>> fetchNormalizedRows(Connection connection, String sql, List<String> ids) {
if (ids.isEmpty()) {
return List.of();
}
try (PreparedStatement ps = connection.prepareStatement(sql)) {
ps.setArray(1, connection.createArrayOf("varchar", ids.toArray()));
try (ResultSet rs = ps.executeQuery()) {
List<Map<String, Object>> rows = new ArrayList<>();
int colCount = rs.getMetaData().getColumnCount();
// 值为 hash 标准化后的字符串,供 ColumnDiffCalculator 逐列对比
while (rs.next()) {
Map<String, Object> row = new LinkedHashMap<>();
for (int i = 1; i <= colCount; i++) {
row.put(rs.getMetaData().getColumnLabel(i), rs.getObject(i));
}
rows.add(row);
}
return rows;
}
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "标准化列查询失败: " + e.getMessage());
}
}
/**
* 引用 SQL 标识符
*
* @param name 标识符
* @param upper true 转大写
* @return 带双引号的标识符
*/
private String quote(String name, boolean upper) {
return "\"" + (upper ? name.toUpperCase() : name.toLowerCase()) + "\"";
}
}
4.7 service/tablehash/TableHashCompareExecutor.java(核心重写)
package com.liang.learn.service.tablehash;
import com.liang.learn.exception.BusinessException;
import com.liang.learn.mapper.DataSourceMapper;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.entity.DataSourceDO;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.properties.TableHashProperties;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Component;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.List;
import java.util.Locale;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**
* 两表 hash 比较编排执行器:PG 流式 + OG 批量,可选 id keyset 多片
*
* @author liang
* @since 2026-05-24
*/
@Component
@RequiredArgsConstructor
public class TableHashCompareExecutor {
private final DataSourceMapper dataSourceMapper;
private final TableHashProperties properties;
private final ExternalJdbcExecutor jdbcExecutor;
private final TableMetadataLoader metadataLoader;
private final HashSqlBuilder hashSqlBuilder;
private final StreamingHashComparer streamingHashComparer;
/** 分片决策与 keyset 翻页边界 */
private final HashPagePlanner hashPagePlanner;
/**
* 执行 hash 比较,返回差异 id 与行数统计(不含行内容与列 diff)
*/
public CompareResult compare(TableHashCompareDTO dto) {
DataSourceDO pgDataSource = requireDataSource(properties.getPgsqlDataSourceId(), "PG");
DataSourceDO ogDataSource = requireDataSource(properties.getOpengaussDataSourceId(), "openGauss");
String tableName = dto.getTableName().toLowerCase(Locale.ROOT);
String ogTableName = tableName.toUpperCase(Locale.ROOT);
String pgSchema = pgDataSource.getSchema();
String ogSchema = ogDataSource.getSchema();
try (Connection pgConnection = jdbcExecutor.openConnection(pgDataSource);
Connection ogConnection = jdbcExecutor.openConnection(ogDataSource)) {
List<ColumnMeta> columns = metadataLoader.loadColumns(pgConnection, pgSchema, tableName);
validateTimeColumn(columns, dto.getTimeColumn());
String idUdtName = columns.stream()
.filter(ColumnMeta::isFirstColumn)
.findFirst()
.orElse(columns.getFirst())
.getUdtName();
// 1. COUNT 决定是否 keyset 多片(max 两侧行数)
String pgCountSql = hashSqlBuilder.buildCountQuery(pgSchema, tableName, dto.getTimeColumn(), false);
String ogCountSql = hashSqlBuilder.buildCountQuery(ogSchema, ogTableName, dto.getTimeColumn(), true);
long pgCount = jdbcExecutor.countRows(pgConnection, pgCountSql, dto.getStartTime(), dto.getEndTime());
long ogCount = jdbcExecutor.countRows(ogConnection, ogCountSql, dto.getStartTime(), dto.getEndTime());
boolean paging = hashPagePlanner.needsPaging(pgCount, ogCount, properties.getShardRowThreshold());
// 单片:pageLimit=null,SQL 无 LIMIT;多片:每片最多 shardRowThreshold 行
Integer pageLimit = paging ? properties.getShardRowThreshold() : null;
CompareResult total = new CompareResult();
String lastId = null;
do {
String pgHashSql = hashSqlBuilder.buildHashQuery(
pgSchema, tableName, columns, dto.getTimeColumn(), false, lastId, pageLimit);
String ogHashSql = hashSqlBuilder.buildHashQuery(
ogSchema, ogTableName, columns, dto.getTimeColumn(), true, lastId, pageLimit);
PageOutcome outcome = compareOnePage(
pgConnection, ogConnection, pgHashSql, ogHashSql,
dto.getStartTime(), dto.getEndTime(), lastId, pageLimit, idUdtName);
total.mergeFrom(outcome.result());
if (!paging) {
break;
}
if (!hashPagePlanner.hasNextPage(outcome.result().getPgRowCount(), outcome.ogRowCount(), pageLimit)) {
break;
}
lastId = hashPagePlanner.nextLastId(outcome.pgLastReadId(), outcome.ogRows(), idUdtName);
} while (true);
return total;
} catch (BusinessException e) {
throw e;
} catch (Exception e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "hash 比较失败: " + e.getMessage());
}
}
/**
* 单片比较:虚拟线程并行拉 OG 批量,主线程 PG 流式游标,OG 就绪后双指针归并
*/
private PageOutcome compareOnePage(Connection pgConnection, Connection ogConnection,
String pgHashSql, String ogHashSql,
String startTime, String endTime,
String lastId, Integer pageLimit,
String idUdtName) {
try (ExecutorService virtualExec = Executors.newVirtualThreadPerTaskExecutor()) {
// OG 批量:与 PG 开游标并行,重叠库端耗时(OG 连接仅在虚拟线程内使用)
CompletableFuture<List<HashRow>> ogFuture = CompletableFuture.supplyAsync(
() -> jdbcExecutor.fetchHashRows(ogConnection, ogHashSql, startTime, endTime, lastId, pageLimit),
virtualExec);
// PG 流式:仅 PG 连接设置 autoCommit=false,OG 保持默认 autocommit
configurePgStreamingConnection(pgConnection);
HashRowCursor pgCursor = pageLimit == null
? HashRowCursor.open(pgConnection, pgHashSql, startTime, endTime, properties.getCompareFetchSize())
: HashRowCursor.open(pgConnection, pgHashSql, startTime, endTime, lastId, pageLimit,
properties.getCompareFetchSize());
try {
List<HashRow> ogRows = ogFuture.join();
CompareResult result = streamingHashComparer.merge(pgCursor, ogRows.iterator(), idUdtName);
return new PageOutcome(result, ogRows.size(), pgCursor.getLastReadId(), ogRows);
} finally {
pgCursor.close();
endReadOnlyTransaction(pgConnection);
}
}
}
private DataSourceDO requireDataSource(Long id, String label) {
DataSourceDO dataSource = dataSourceMapper.selectById(id);
if (dataSource == null) {
throw new BusinessException(ErrorCode.PARAM_ERROR, label + " 数据源不存在, id=" + id);
}
return dataSource;
}
/**
* 启用 PG JDBC 服务端游标(须 autoCommit=false 且 fetchSize>0)
*/
private static void configurePgStreamingConnection(Connection connection) {
try {
connection.setAutoCommit(false);
connection.setReadOnly(true);
} catch (SQLException e) {
throw new BusinessException(ErrorCode.SYSTEM_ERROR, "配置 PG 流式连接失败: " + e.getMessage());
}
}
/**
* 片结束后 rollback,释放 PG 只读事务,以便同连接开启下一片游标
*/
private static void endReadOnlyTransaction(Connection connection) {
if (connection == null) {
return;
}
try {
if (!connection.isClosed() && !connection.getAutoCommit()) {
connection.rollback();
}
} catch (SQLException ignored) {
// ignore rollback errors during cleanup
}
}
private void validateTimeColumn(List<ColumnMeta> columns, String timeColumn) {
boolean exists = columns.stream()
.anyMatch(col -> col.getColumnName().equalsIgnoreCase(timeColumn));
if (!exists) {
throw new BusinessException(ErrorCode.PARAM_ERROR, "时间字段不存在: " + timeColumn);
}
}
/** 单片比较产出:归并结果、OG 行数(判 hasNextPage)、PG 末 id(算 nextLastId)、OG 行列表(算 nextLastId) */
private record PageOutcome(CompareResult result, int ogRowCount, String pgLastReadId, List<HashRow> ogRows) {
}
}
4.8 service/impl/TableHashCompareServiceImpl.java
package com.liang.learn.service.impl;
import com.liang.learn.exception.BusinessException;
import com.liang.learn.model.dto.TableHashCompareDTO;
import com.liang.learn.model.enums.ErrorCode;
import com.liang.learn.model.vo.TableHashTaskVO;
import com.liang.learn.service.TableHashCompareService;
import com.liang.learn.service.tablehash.CompareResult;
import com.liang.learn.service.tablehash.TableHashCompareExecutor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor;
import org.springframework.stereotype.Service;
import java.util.UUID;
import java.util.concurrent.RejectedExecutionException;
/**
* 两表 hash 比较 Service 实现
*
* @author liang
* @since 2026-05-24
*/
@Slf4j
@Service
public class TableHashCompareServiceImpl implements TableHashCompareService {
/** 表 hash 比较专用线程池,限制并发避免多个重型任务同时跑 */
private final ThreadPoolTaskExecutor tableHashExecutor;
private final TableHashCompareExecutor compareExecutor;
public TableHashCompareServiceImpl(
@Qualifier("tableHashExecutor") ThreadPoolTaskExecutor tableHashExecutor,
TableHashCompareExecutor compareExecutor) {
this.tableHashExecutor = tableHashExecutor;
this.compareExecutor = compareExecutor;
}
@Override
public TableHashTaskVO submitCompare(TableHashCompareDTO dto) {
String taskId = UUID.randomUUID().toString();
try {
tableHashExecutor.execute(() -> {
try {
CompareResult result = compareExecutor.compare(dto);
// CompareResult 当前不落库;debug 便于联调确认任务完成
log.debug("hash compare done taskId={} pgRows={} ogRows={} pgOnly={} ogOnly={} mismatch={}",
taskId, result.getPgRowCount(), result.getOgRowCount(),
result.getPgOnlyIds().size(), result.getOgOnlyIds().size(), result.getMismatchIds().size());
} catch (BusinessException e) {
// 异步线程异常无法回传 HTTP,须 error 日志便于运维发现(替代原 TableHashCompareLogger)
log.error("[TABLE-HASH][taskId={}] 失败: code={}, message={}",
taskId, e.getCode(), e.getMessage(), e);
} catch (Exception e) {
log.error("[TABLE-HASH][taskId={}] 失败: {}", taskId, e.getMessage(), e);
}
});
} catch (RejectedExecutionException ex) {
throw new BusinessException(ErrorCode.BUSINESS_ERROR, "比较任务队列已满,请稍后重试");
}
return new TableHashTaskVO(taskId);
}
}
4.9 删除文件
删除 service/tablehash/TableHashCompareLogger.java(整文件移除,无替代)。
TableHashCompareExecutor 中移除对 TableHashCompareLogger、ColumnDiffCalculator 的依赖注入,以及 fetchRowsInBatches / fetchNormalizedInBatches / lowercaseKeys 等私有方法。
4.10 application.yaml 片段
app:
table-hash:
pgsql-data-source-id: 3 # PG 数据源 id
opengauss-data-source-id: 2 # openGauss 数据源 id
shard-row-threshold: 1000000 # max(pg,og) 超过则 id keyset 多片
compare-fetch-size: 1000 # 仅 PG HashRowCursor.fetchSize
移除(若存在):
detail-log-limit: 500 # 已删除:不再输出差异明细 log
fetch-batch-size: 500 # 已删除:不再批量查全行
5. 与旧版差异速查
| 维度 | 旧版 | 本次 |
|---|---|---|
| OG 加载 | HashRowCursor 流式 |
fetchHashRows 批量 |
| PG 加载 | 双端流式 | 仍 HashRowCursor |
| Executor | compare(taskId,dto) void |
compare(dto) → CompareResult |
| 输出 | Logger 明细 | CompareResult(不落库) |
| 大表 | 无 | id keyset 分页 |
| 配置 | detail-log-limit / fetch-batch-size | shard-row-threshold |
6. 验证
mvn test
重点单测:HashPagePlannerTest、HashSqlBuilderTest、CompareResultTest、StreamingHashComparerTest。
联调:POST /api/table-hash/compare,失败时查 [TABLE-HASH][taskId=] error 日志。
浙公网安备 33010602011771号