I am running inference on a trained PyTorch model using the same input tensor, fixed random seeds, and evaluation mode enabled.
import torch
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)
model.eval()
Despite this, repeated inference calls produce slightly different outputs at the floating-point level.
Question
Which PyTorch or CUDA operations are non-deterministic during inference, and what exact configuration is required to guarantee deterministic results across runs?