记一次海量POI网格统计优化过程

 

Base

tb_taile_p_gouwu 4059071条记录

CREATE TABLE `tb_taile_p_gouwu` (

  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,

  `SHAPE` geometry NOT NULL,

  `name` varchar(103) DEFAULT NULL,

  `pyname` varchar(23) DEFAULT NULL,

  `kind` varchar(4) DEFAULT NULL,

  UNIQUE KEY `OGR_FID` (`OGR_FID`),

  SPATIAL KEY `SHAPE` (`SHAPE`)

) ENGINE=MyISAM AUTO_INCREMENT=4059072 DEFAULT CHARSET=utf8;

 

生成数据:

CREATE TABLE `tb_poi_statistics_china_gouwu` (

  `id` int(11) NOT NULL,

  `index_x` int(4) DEFAULT NULL COMMENT '网格的x坐标',

  `index_y` int(4) DEFAULT NULL COMMENT '网格的y坐标',

  `vl` double NOT NULL DEFAULT '0',

  PRIMARY KEY (`id`),

  KEY `idx_poi_statistics_base` (`index_x`,`index_y`) USING BTREE

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='POI的聚合数据';

 

 

V1

任务执行情况(分页10000条)

总共分了406个子任务,按15个批次完成:

select count(1) from tb_batch_task

 

 


select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time, bt.batch_no, count(1) as cn

 from tb_batch_task bt

GROUP BY bt.batch_no

 

 


任务失败重试次数:

select count(1) as failed_batch_num, sum(bt.execute_count-1) as failed_cn

 from tb_batch_task bt

where bt.execute_count > 1

 

 


总共耗时:

select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time

 from tb_batch_task bt

;

 

 

 


SELECT count(1) as cn, sum(vl) as sum_vl

from tb_poi_statistics_china_gouwu

 

 


 

问题点:

1 分页查询效率低

    select

    <include refid="Base_Column_List" />

    from `${tableName}`

    order by OGR_FID

    limit #{offset,jdbcType=INTEGER}, #{pageSize,jdbcType=INTEGER}

2 插入存在deadlock

3 数据不一致

 

V2

优化:

1 分页不指定order by

2 数据插入方法加锁

 

任务执行情况(分页10000条)

总共分了406个子任务,按15个批次完成:

select count(1) from tb_batch_task

 

 


select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time, bt.batch_no, count(1) as cn

 from tb_batch_task bt

GROUP BY bt.batch_no

 

 


任务失败重试次数:

select count(1) as failed_batch_num, sum(bt.execute_count-1) as failed_cn

 from tb_batch_task bt

where bt.execute_count > 1

 

 


总共耗时:

select min(bt.create_time) as bgnTime, max(bt.update_time) as endTime, max(bt.update_time)-min(bt.create_time) as time

 from tb_batch_task bt

;

 

 

 


SELECT count(1) as cn, sum(vl) as sum_vl

from tb_poi_statistics_china_gouwu

 

 


 

 

其他

如果可以分页可以加上where条件,比如OGR_FID>10000

 

 

posted @ 2021-07-27 15:44  又见君  阅读(287)  评论(0)    收藏  举报