0

‘Gradient will not be updated but be accumulated, and updated every N rounds.’ I have a question that how the gradients are accumulated in the below code snippet: in every round of the below loop I can see a new gradient is computed by loss.backward() and should be stored internally, but would this internally stored gradient be refreshed in the next round? How the gradient is summed up, and later be applied every N rounds?

for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()  

1 Answer 1

1

The first time you call backward, the .grad attribute of the parameters of your model will be updated from None, to the gradients. If you do not reset the gradients to zero, future calls to .backward() will accumulate (i.e. add) gradients into the attribute (see the docs).

When you call model.zero_grad() you are doing the reset.

1
  • thanks a lot! I thought it was 'set' but not 'add'.
    – Lin Zhu
    Commented Mar 21, 2021 at 23:10

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.