I am training a PyTorch segmentation model and using:
torch.optim.AdamWmonai.optimizers.WarmupCosineSchedule
My optimizer:
optimizer = torch.optim.AdamW(
model.parameters(),
lr=1e-4,
weight_decay=1e-5,
betas=(0.9, 0.999),
eps=1e-8,
)
My scheduler:
from monai.optimizers import WarmupCosineSchedule
scheduler = WarmupCosineSchedule(
optimizer=optimizer,
warmup_steps=1000,
t_total=10000,
end_lr=0.0,
cycles=0.5,
warmup_multiplier=0.01,
)
My training loop:
for batch in train_loader:
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
scheduler.step()
My first quesiton is as follows: Since this scheduler is step-based (similar to HuggingFace warmup cosine schedules), is it correct to call:
scheduler.step()
once per batch / optimizer update step, not once per epoch?
My secod question is : Does WarmupCosineSchedule use the learning rate passed to AdamW (lr=1e-4) as the main / peak learning rate ( to reach through the warm up phase)?
Meaning:
start warmup from
lr * warmup_multiplierincrease toward
1e-4then cosine decay to
end_lr
So is the optimizer lr the reference maximum LR used by the scheduler?
I checked the MONAI docs and saw it is based on HuggingFace’s cosine warmup scheduler, but I want to confirm expected usage.
Thanks.