I have some Pytorch code which demonstrates the gradient calculation within Pytorch, but I am thoroughly confused what got calculated and how it is used. This post here demonstrates the usage of it, but it does not make sense to me in terms of the back propagation algorithm. Looking at the gradient of in1 and in2 in the example below, I realized the gradient of in1 and in2 is the derivative of the loss function but my understanding is that the update needs to also account for the actual loss value as well? Where is the loss value getting used? Am I missing something here?
in1 = torch.randn(2,2,requires_grad=True)
in2 = torch.randn(2,2,requires_grad=True)
target = torch.randn(2,2)
l1 = torch.nn.L1Loss()
l2 = torch.nn.MSELoss()
out1 = l1(in1,target)
out2 = l2(in2,target)
out1.backward()
out2.backward()
in1.grad
in2.grad