Pytorch学习率调整策略
torch.optim.lr_scheduler提供了几种方式根据epoch调整学习率,torch.optim.lr_scheduler.ReduceLROnPlateau允许使用一些验证规则动态降低学习率,学习率调整应该在Optimizer更新后应用。
1. Lambda LR
将学习率设置为给定函数的初始lr倍,当last epoch=-1时,lr设置为初始lr:
\[lr_{epoch}=lr_{initial} * Lambda(epoch)
\]
model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: 0.5 ** epoch)
lrs = []
for i in range(10):
optimize.step()
lrs.append(optimizer.param_groups[0]['lr'])
scheduler.step()
plt.plot(range(len(lrs)), lrs)
2. MutliplicativeLR
将学习率乘以指定函数中给定的系数,当last epoch=-1时,lr设置为初始lr:
\[lr_{epoch}=lr_{epoch-1} * Lambda(epoch)
\]
model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lambda epoch: .95)
lrs = []
for i in range(10):
optimize.step()
lrs.append(optimizer.param_groups[0]['lr'])
scheduler.step()
plt.plot(range(len(lrs)), lrs)
3. StepLR
每step_size epochs按gamma衰减的学习率,当last epoch=-1时,lr设置为初始lr:
\[lr_{\text {epoch}}=\left\{\begin{array}{ll}
Gamma * lr_{\text{epoch-1}}, &\text{if}{\text {epoch % step_size}}=0 \\
lr_{\text{epoch-1}}, &\text {otherwise}
\end{array}\right.\]
model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)
lrs = []
for i in range(10):
optimize.step()
lrs.append(optimizer.param_groups[0]['lr'])
scheduler.step()
plt.plot(range(len(lrs)), lrs)
4. MultiStepLR
一旦epoch达到一个milestone,则通过gamma衰减学习率,当last epoch=-1时,lr设置为初始lr:
\[lr_{\text {epoch}}=\left\{\begin{array}{ll}
Gamma * lr_{\text {epoch - 1}}, & \text { if } {\text{epoch in [milestones]}} \\
lr_{\text {epoch - 1}}, & \text { otherwise }
\end{array}\right.\]
model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[6,8,9], gamma=0.1)
lrs = []
for i in range(10):
optimize.step()
lrs.append(optimizer.param_groups[0]['lr'])
scheduler.step()
plt.plot(range(len(lrs)), lrs)
5. ExponentialLR
每个epoch用gamma衰减学习率,当last epoch=-1时,lr设置为初始lr:
\[lr_{epoch}=Gamma*lr_{epoch-1}
\]
model = torch.nn.Linear(2, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.1)
lrs = []
for i in range(10):
optimize.step()
lrs.append(optimizer.param_groups[0]['lr'])
scheduler.step()
plt.plot(range(len(lrs)), lrs)
WarmupCosineSchedule
class WarmupCosineSchedule(LambdaLR):
"""Linear warmup and then cosine decay.
Based on https://huggingface.co/ implementation.
"""
def __init__(
self, optimizer: Optimizer, warmup_steps: int, t_total: int, cycles: float = 0.5, last_epoch: int = -1
) -> None:
"""
Args:
optimizer: wrapped optimizer.
warmup_steps: number of warmup iterations.
t_total: total number of training iterations.
cycles: cosine cycles parameter.
last_epoch: the index of last epoch.
Returns:
None
"""
self.warmup_steps = warmup_steps
self.t_total = t_total
self.cycles = cycles
super().__init__(optimizer, self.lr_lambda, last_epoch)
def lr_lambda(self, step):
if step < self.warmup_steps:
return float(step) / float(max(1.0, self.warmup_steps))
progress = float(step - self.warmup_steps) / float(max(1, self.t_total - self.warmup_steps))
return max(0.0, 0.5 * (1.0 + math.cos(math.pi * float(self.cycles) * 2.0 * progress)))
参考来源:https://www.kaggle.com/code/isbhargav/guide-to-pytorch-learning-rate-scheduling

浙公网安备 33010602011771号