I am training a PyTorch segmentation model and using:

  • torch.optim.AdamW

  • monai.optimizers.WarmupCosineSchedule

My optimizer:

optimizer = torch.optim.AdamW(
    model.parameters(),
    lr=1e-4,
    weight_decay=1e-5,
    betas=(0.9, 0.999),
    eps=1e-8,
)

My scheduler:

from monai.optimizers import WarmupCosineSchedule

scheduler = WarmupCosineSchedule(
    optimizer=optimizer,
    warmup_steps=1000,
    t_total=10000,
    end_lr=0.0,
    cycles=0.5,
    warmup_multiplier=0.01,
)

My training loop:

for batch in train_loader:
optimizer.zero_grad()

outputs = model(images)  
loss = criterion(outputs, labels)  

loss.backward()  
optimizer.step()  
scheduler.step()  

My first quesiton is as follows: Since this scheduler is step-based (similar to HuggingFace warmup cosine schedules), is it correct to call:

scheduler.step()

once per batch / optimizer update step, not once per epoch?

My secod question is : Does WarmupCosineSchedule use the learning rate passed to AdamW (lr=1e-4) as the main / peak learning rate ( to reach through the warm up phase)?

Meaning:

  • start warmup from lr * warmup_multiplier

  • increase toward 1e-4

  • then cosine decay to end_lr

So is the optimizer lr the reference maximum LR used by the scheduler?

I checked the MONAI docs and saw it is based on HuggingFace’s cosine warmup scheduler, but I want to confirm expected usage.

Thanks.