Regarding's MONAI WarmupCosineSchedule with AdamW, should scheduler.step() be called per batch, and does optimizer lr define the peak lr?

Ask Question

I am training a PyTorch segmentation model and using:

torch.optim.AdamW
monai.optimizers.WarmupCosineSchedule

My optimizer:

optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-4,
    weight_decay=1e-5,
    betas=(0.9, 0.999),
    eps=1e-8,
)

My scheduler:

from monai.optimizers import WarmupCosineSchedule

scheduler = WarmupCosineSchedule(
    optimizer=optimizer,
    warmup_steps=1000,
    t_total=10000,
    end_lr=0.0,
    cycles=0.5,
    warmup_multiplier=0.01,
)

My training loop:

for batch in train_loader:
optimizer.zero_grad()

outputs = model(images)  
loss = criterion(outputs, labels)  

loss.backward()  
optimizer.step()  
scheduler.step()

My first quesiton is as follows: Since this scheduler is step-based (similar to HuggingFace warmup cosine schedules), is it correct to call:

scheduler.step()

once per batch / optimizer update step, not once per epoch?

My secod question is : Does WarmupCosineSchedule use the learning rate passed to AdamW (lr=1e-4) as the main / peak learning rate ( to reach through the warm up phase)?

Meaning:

start warmup from lr * warmup_multiplier
increase toward 1e-4
then cosine decay to end_lr

So is the optimizer lr the reference maximum LR used by the scheduler?

I checked the MONAI docs and saw it is based on HuggingFace’s cosine warmup scheduler, but I want to confirm expected usage.

Thanks.

Collectives™ on Stack Overflow

Regarding's MONAI WarmupCosineSchedule with AdamW, should scheduler.step() be called per batch, and does optimizer lr define the peak lr?