记一次海量POI网格统计优化过程
Base
tb_taile_p_gouwu :4059071条记录
CREATE TABLE `tb_taile_p_gouwu` (
`OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
`SHAPE` geometry NOT NULL,
`name` varchar(103) DEFAULT NULL,
`pyname` varchar(23) DEFAULT NULL,
`kind` varchar(4) DEFAULT NULL,
UNIQUE KEY `OGR_FID` (`OGR_FID`),
SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=MyISAM AUTO_INCREMENT=4059072 DEFAULT CHARSET=utf8;
生成数据:
CREATE TABLE `tb_poi_statistics_china_gouwu` (
`id` int(11) NOT NULL,
`index_x` int(4) DEFAULT NULL COMMENT '网格的x坐标',
`index_y` int(4) DEFAULT NULL COMMENT '网格的y坐标',
`vl` double NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_poi_statistics_base` (`index_x`,`index_y`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='POI的聚合数据';
V1
任务执行情况(分页10000条)
总共分了406个子任务,按15个批次完成:
select count(1) from tb_batch_task

select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time, bt.batch_no, count(1) as cn
from tb_batch_task bt
GROUP BY bt.batch_no

任务失败重试次数:
select count(1) as failed_batch_num, sum(bt.execute_count-1) as failed_cn
from tb_batch_task bt
where bt.execute_count > 1

总共耗时:
select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time
from tb_batch_task bt
;

SELECT count(1) as cn, sum(vl) as sum_vl
from tb_poi_statistics_china_gouwu

问题点:
1) 分页查询效率低
select
<include refid="Base_Column_List" />
from `${tableName}`
order by OGR_FID
limit #{offset,jdbcType=INTEGER}, #{pageSize,jdbcType=INTEGER}
2) 插入存在deadlock
3) 数据不一致
V2
优化:
1) 分页不指定order by
2) 数据插入方法加锁
任务执行情况(分页10000条)
总共分了406个子任务,按15个批次完成:
select count(1) from tb_batch_task

select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time, bt.batch_no, count(1) as cn
from tb_batch_task bt
GROUP BY bt.batch_no

任务失败重试次数:
select count(1) as failed_batch_num, sum(bt.execute_count-1) as failed_cn
from tb_batch_task bt
where bt.execute_count > 1

总共耗时:
select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time
from tb_batch_task bt
;

SELECT count(1) as cn, sum(vl) as sum_vl
from tb_poi_statistics_china_gouwu

其他
如果可以分页可以加上where条件,比如OGR_FID>10000

浙公网安备 33010602011771号