How gradients are accumulated in real

Question

‘Gradient will not be updated but be accumulated, and updated every N rounds.’ I have a question that how the gradients are accumulated in the below code snippet: in every round of the below loop I can see a new gradient is computed by loss.backward() and should be stored internally, but would this internally stored gradient be refreshed in the next round? How the gradient is summed up, and later be applied every N rounds?

for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()

iacolippo · Accepted Answer · 2021-03-21 09:11:01Z

1

The first time you call backward, the .grad attribute of the parameters of your model will be updated from None, to the gradients. If you do not reset the gradients to zero, future calls to .backward() will accumulate (i.e. add) gradients into the attribute (see the docs).

When you call model.zero_grad() you are doing the reset.

answered Mar 21, 2021 at 9:11

iacolippo

4,5231 gold badge28 silver badges39 bronze badges

thanks a lot! I thought it was 'set' but not 'add'.
– Lin Zhu
Commented Mar 21, 2021 at 23:10

Add a comment |

Collectives™ on Stack Overflow

How gradients are accumulated in real

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related