对DOTA模型的修改与优化问题

学习率的调整问题

关于warmup:

  可以参考神经网络中 warmup 策略为什么有效;有什么理论解释么? - 香侬科技的回答 - 知乎 https://www.zhihu.com/question/338066667/answer/771252708

 

  简单来说就是:在神经网络训练前,为了防止提前过拟合和保持稳定性,先采取一个小的学习率进行学习。

DOTA中学习率的机制如下:

 

 lr: 0.0005
  lr_step: '45,52'  #表示学习率改变的两个epoch
  lr_factor: 0.1
  warmup: true
  warmup_lr: 0.00005
warmup_step: 1000
  begin_epoch: 0
  end_epoch: 60

 

 

base_lr = lr
lr_factor = config.TRAIN.lr_factor
lr_epoch = [float(epoch) for epoch in lr_step.split(',')]  
lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch>begin_epoch]  #begin=0 等同于le_epoch
 lr = base_lr * (lr_factor ** (len(lr_epoch) - len(lr_epoch_diff)))  #还一直是base_lr
lr_iters = [int(epoch * len(roidb) / batch_size) for epoch in lr_epoch_diff]  #两个大整数 表示iter
 print('lr', lr, 'lr_epoch_diff', lr_epoch_diff, 'lr_iters', lr_iters)
lr_scheduler = WarmupMultiFactorScheduler(lr_iters, lr_factor, config.TRAIN.warmup, config.TRAIN.warmup_lr, config.TRAIN.warmup_step)



class WarmupMultiFactorScheduler(LRScheduler):
    """Reduce learning rate in factor at steps specified in a list

    Assume the weight has been updated by n times, then the learning rate will
    be

    base_lr * factor^(sum((step/n)<=1)) # step is an array

    Parameters
    ----------
    step: list of int
        schedule learning rate after n updates
    factor: float
        the factor for reducing the learning rate
    """
    def __init__(self, step, factor=1, warmup=False, warmup_lr=0, warmup_step=0):
        super(WarmupMultiFactorScheduler, self).__init__()
        assert isinstance(step, list) and len(step) >= 1
        for i, _step in enumerate(step):
            if i != 0 and step[i] <= step[i-1]:
                raise ValueError("Schedule step must be an increasing integer list")
            if _step < 1:
                raise ValueError("Schedule step must be greater or equal than 1 round")
        if factor > 1.0:
            raise ValueError("Factor must be no more than 1 to make lr reduce")
        self.step = step
        self.cur_step_ind = 0
        self.factor = factor
        self.count = 0
        self.warmup = warmup
        self.warmup_lr = warmup_lr
        self.warmup_step = warmup_step

    def __call__(self, num_update):
        """
        Call to schedule current learning rate

        Parameters
        ----------
        num_update: int
            the maximal number of updates applied to a weight.
        """

        # NOTE: use while rather than if  (for continuing training via load_epoch)
        if self.warmup and num_update < self.warmup_step:
            return self.warmup_lr  
        while self.cur_step_ind <= len(self.step)-1:
            if num_update > self.step[self.cur_step_ind]:
                self.count = self.step[self.cur_step_ind]
                self.cur_step_ind += 1
                self.base_lr *= self.factor
                logging.info("Update[%d]: Change learning rate to %0.5e",
                             num_update, self.base_lr)
            else:
                return self.base_lr
        return self.base_lr

 

总结下来学习率的规则就是;前1000次使用较小的lr_warmup=0.00005,之后le_step中对应[m1,m2]次更新,在m1次前使用base_lr,在m1与m2之间,使用base_lr *factor,在m2之后,跳出循环,学习率为base_lr *factor*factor不变直到结束,不论从哪里开始训练,对于阶段的学习率的取值总是这样。

 

posted @ 2020-03-11 16:50  Snow丶Flower  阅读(359)  评论(0编辑  收藏  举报