对DOTA模型的修改与优化问题
学习率的调整问题
关于warmup:
可以参考神经网络中 warmup 策略为什么有效;有什么理论解释么? - 香侬科技的回答 - 知乎 https://www.zhihu.com/question/338066667/answer/771252708
简单来说就是:在神经网络训练前,为了防止提前过拟合和保持稳定性,先采取一个小的学习率进行学习。
DOTA中学习率的机制如下:
lr: 0.0005
lr_step: '45,52' #表示学习率改变的两个epoch
lr_factor: 0.1
warmup: true
warmup_lr: 0.00005
warmup_step: 1000
begin_epoch: 0
end_epoch: 60
base_lr = lr lr_factor = config.TRAIN.lr_factor lr_epoch = [float(epoch) for epoch in lr_step.split(',')] lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch>begin_epoch] #begin=0 等同于le_epoch lr = base_lr * (lr_factor ** (len(lr_epoch) - len(lr_epoch_diff))) #还一直是base_lr lr_iters = [int(epoch * len(roidb) / batch_size) for epoch in lr_epoch_diff] #两个大整数 表示iter print('lr', lr, 'lr_epoch_diff', lr_epoch_diff, 'lr_iters', lr_iters) lr_scheduler = WarmupMultiFactorScheduler(lr_iters, lr_factor, config.TRAIN.warmup, config.TRAIN.warmup_lr, config.TRAIN.warmup_step)
class WarmupMultiFactorScheduler(LRScheduler):
"""Reduce learning rate in factor at steps specified in a list
Assume the weight has been updated by n times, then the learning rate will
be
base_lr * factor^(sum((step/n)<=1)) # step is an array
Parameters
----------
step: list of int
schedule learning rate after n updates
factor: float
the factor for reducing the learning rate
"""
def __init__(self, step, factor=1, warmup=False, warmup_lr=0, warmup_step=0):
super(WarmupMultiFactorScheduler, self).__init__()
assert isinstance(step, list) and len(step) >= 1
for i, _step in enumerate(step):
if i != 0 and step[i] <= step[i-1]:
raise ValueError("Schedule step must be an increasing integer list")
if _step < 1:
raise ValueError("Schedule step must be greater or equal than 1 round")
if factor > 1.0:
raise ValueError("Factor must be no more than 1 to make lr reduce")
self.step = step
self.cur_step_ind = 0
self.factor = factor
self.count = 0
self.warmup = warmup
self.warmup_lr = warmup_lr
self.warmup_step = warmup_step
def __call__(self, num_update):
"""
Call to schedule current learning rate
Parameters
----------
num_update: int
the maximal number of updates applied to a weight.
"""
# NOTE: use while rather than if (for continuing training via load_epoch)
if self.warmup and num_update < self.warmup_step:
return self.warmup_lr
while self.cur_step_ind <= len(self.step)-1:
if num_update > self.step[self.cur_step_ind]:
self.count = self.step[self.cur_step_ind]
self.cur_step_ind += 1
self.base_lr *= self.factor
logging.info("Update[%d]: Change learning rate to %0.5e",
num_update, self.base_lr)
else:
return self.base_lr
return self.base_lr
总结下来学习率的规则就是;前1000次使用较小的lr_warmup=0.00005,之后le_step中对应[m1,m2]次更新,在m1次前使用base_lr,在m1与m2之间,使用base_lr *factor,在m2之后,跳出循环,学习率为base_lr *factor*factor不变直到结束,不论从哪里开始训练,对于阶段的学习率的取值总是这样。