Timeline for answer to How to do gradient clipping in pytorch? by Rahul
Current License: CC BY-SA 4.0
Post Revisions
8 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 27, 2022 at 20:49 | comment | added | russian_spy | For "args.clip" you can use 0.01; e.g., torch.nn.utils.clip_grad_norm_(model.parameters(), 0.01) | |
| Mar 29, 2022 at 2:49 | history | edited | Mateen Ulhaq | CC BY-SA 4.0 |
Move link inline.
|
| Jan 28, 2022 at 6:45 | comment | added | vdi |
@FarhangAmaji the max_norm (clipping threshold) value from the args (perhaps from argparse module)
|
|
| Jan 21, 2022 at 20:02 | comment | added | Charlie Parker |
does it matter if you call opt.zero_grad() before the forward pass or not? My guess is that the sooner it's zeroed out perhaps the sooner MEM freeing happens?
|
|
| Dec 3, 2021 at 11:45 | comment | added | Farhang Amaji | what is args.clip? | |
| Oct 29, 2020 at 15:33 | comment | added | Rahul | This simply follows a popular pattern, where one can insert torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) between the loss.backward() and optimizer.step() | |
| Oct 28, 2020 at 11:26 | comment | added | Gulzar | Why is this more complete? I see the more votes, but don't really understand why this is better. Can you explain please? | |
| May 10, 2019 at 1:12 | history | answered | Rahul | CC BY-SA 4.0 |