## 梯度下降法更新参数

$\theta_{t+1} = \theta_{t} - \eta \cdot \nabla J(\theta_t)$

$\theta_{t+1} = \theta_{t} - \eta \cdot g_t$

\begin{aligned} m_{t} &=\beta_{1} m_{t-1}+\left(1-\beta_{1}\right) g_{t} \\ v_{t} &=\beta_{2} v_{t-1}+\left(1-\beta_{2}\right) g_{t}^{2} \\ \hat{m}_{t} &=\frac{m_{t}}{1-\beta_{1}^{t}} \\ \hat{v}_{t} &=\frac{v_{t}}{1-\beta_{2}^{t}} \\ \theta_{t+1}&=\theta_{t}-\frac{\eta}{\sqrt{\hat{v}_{t}}+\epsilon} \hat{m}_{t} \end{aligned}

• 前两行：

\begin{aligned} m_{t} &=\beta_{1} m_{t-1}+\left(1-\beta_{1}\right) g_{t} \\ v_{t} &=\beta_{2} v_{t-1}+\left(1-\beta_{2}\right) g_{t}^{2} \end{aligned}

• 中间两行：

\begin{aligned} \hat{m}_{t} &=\frac{m_{t}}{1-\beta_{1}^{t}} \\ \hat{v}_{t} &=\frac{v_{t}}{1-\beta_{2}^{t}} \end{aligned}

• 最后一行：

$\theta_{t+1}=\theta_{t}-\frac{\eta}{\sqrt{\hat{v}_{t}}+\epsilon} \hat{m}_{t}$

Since Adam already adapts its parameterwise learning rates it is not as common to use a learning rate multiplier schedule with it as it is with SGD, but as our results show such schedules can substantially improve Adam’s performance, and we advocate not to overlook their use for adaptive gradient algorithms.

ReduceLROnPlateau 在 val_loss 正常下降的时候，对学习率是没有影响的，只有在 patience（默认为 10）个 epoch 内，val_loss 都不下降 1e-4 或者直接上升了，这个时候降低学习率确实是可以很明显提升模型训练效果的，在 val_acc 曲线上看到一个快速上升的过程。对于其它类型的学习率衰减，这里没有过多地介绍。

• exponential_decay：
decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

• natural_exp_decay：
decayed_learning_rate = learning_rate * exp(-decay_rate * global_step / decay_steps)

• ReduceLROnPlateau
如果被监控的值（如‘val_loss’）在 patience 个 epoch 内都没有下降，那么学习率衰减，乘以一个 factor
decayed_learning_rate = learning_rate * factor

## References

[1] An overview of gradient descent optimization algorithms -- Sebastian Ruder
[2] Should we do learning rate decay for adam optimizer - Stack Overflow
[3] Tensorflow中learning rate decay的奇技淫巧 -- Elevanth
[4] Loshchilov, I., & Hutter, F. (2017). Decoupled Weight Decay Regularization. ICLR 2019. Retrieved from http://arxiv.org/abs/1711.05101

posted @ 2019-06-28 17:06  wuliytTaotao  阅读(40741)  评论(1编辑  收藏  举报