Skip to main content

Timeline for answer to How to do gradient clipping in pytorch? by Rahul

Current License: CC BY-SA 4.0

Post Revisions

8 events
when toggle format what by license comment
May 27, 2022 at 20:49 comment added russian_spy For "args.clip" you can use 0.01; e.g., torch.nn.utils.clip_grad_norm_(model.parameters(), 0.01)
Mar 29, 2022 at 2:49 history edited Mateen Ulhaq CC BY-SA 4.0
Move link inline.
Jan 28, 2022 at 6:45 comment added vdi @FarhangAmaji the max_norm (clipping threshold) value from the args (perhaps from argparse module)
Jan 21, 2022 at 20:02 comment added Charlie Parker does it matter if you call opt.zero_grad() before the forward pass or not? My guess is that the sooner it's zeroed out perhaps the sooner MEM freeing happens?
Dec 3, 2021 at 11:45 comment added Farhang Amaji what is args.clip?
Oct 29, 2020 at 15:33 comment added Rahul This simply follows a popular pattern, where one can insert torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) between the loss.backward() and optimizer.step()
Oct 28, 2020 at 11:26 comment added Gulzar Why is this more complete? I see the more votes, but don't really understand why this is better. Can you explain please?
May 10, 2019 at 1:12 history answered Rahul CC BY-SA 4.0