Skip to main content

All Questions

-4 votes
0 answers
30 views

Common practices to mitigate accuracy plateauing at baseline? [closed]

I'm training a Deep neural network to detect diabetic retinopathy using Efficient-net B0 and only training the classifier layer with conv layers frozen. Initially to mitigate the class imbalance I ...
Abas jama's user avatar
0 votes
0 answers
40 views

CUDA stream sync issue in custom activation offloader

I am trying to build an activation offloader, taking inspiration from autograd's save_on_cpu function: save_on_cpu. To add more asynchrony, I'm trying to offload activations using a separate CUDA ...
Vatsal Joshi's user avatar
-1 votes
0 answers
29 views

Why does introducing dimension interactions via dot product reduce model performance?

I'm training a deep learning model where my original architecture involves an element-wise product between vectors. Specifically, the computation is straightforward: # Original computation (element-...
KuanLun's user avatar
1 vote
0 answers
40 views

SGLang server fails on launch due to cuda_fp8.h missing even with --disable-cuda-graph and prebuilt flashinfer wheel

I am trying to launch the SGLang server using python -m sglang.launch_server ... but it consistently fails during the "Capture cuda graph" phase. The error indicates a failure during Just-In-...
Charlie Parker's user avatar
1 vote
1 answer
40 views

How to perform global structured pruning in PyTorch

I am training on CIFAR10 the following simple CNN class SimpleCNN(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) ...
Noumeno's user avatar
  • 171
0 votes
1 answer
60 views

Image rotation: model for angle detection using pytorch

Basically I am trying to create a model that will detect angle on which a specific image is rotated. Also I have a dataset of 1500 documents resulting with images rotated on random.sample([0, 90, -90, ...
Ivan's user avatar
  • 11
1 vote
0 answers
136 views

Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 40

I am trying to do QLoRA+FSDP2 on kaggle using 2x T4 gpus, this is my training script def train_fsdp(rank, size): torch.cuda.empty_cache() base_model = "meta-llama/Llama-3.1-8B" ...
Arush Sharma's user avatar
0 votes
0 answers
64 views

how to apply backward warp (pytorch's grid_sample) with forward optical flow?

I have been working on optical flow algorithms recently and have been using pytorch to apply the optical flow field. I have noticed that most libraries have implemented only the backward warp function ...
medfizz's user avatar
0 votes
1 answer
45 views

TypeError: Dataloader object is not subscriptable

I'm creating an AI model to generate density plots of crowds. When splitting the dataset into two, one for training and one for validation, I create the two data sets and try to load the datasets ...
Tan's user avatar
  • 21
2 votes
0 answers
33 views

Conversion of model weights from old Keras version to Pytorch

I want to transfer pretrained weights from an old project on github : https://github.com/ajgallego/staff-lines-removal The original Keras model code is: def get_keras_autoencoder(self, input_size=256, ...
MaxC2's user avatar
  • 373
0 votes
1 answer
62 views

Using zip() on two nn.ModuleList

Is using two different nn.ModuleList() zipped lists correct to build the computational graph for training a neural net in PyTorch? nn.ModuleList is a wrapper around Python's list with a registration ...
Ivan Tishchenko's user avatar
0 votes
2 answers
90 views

Regression fails with poor initial guess [closed]

Consider a regression task where the parameters of the model differ significantly in magnitude, say: def func(x, p): p1, p2, p3 = p return np.sin(p1*x) * np.exp(p2*x) * p3 # True Parameters: ...
TsurumiTei's user avatar
0 votes
0 answers
29 views

Data loading for neural network memory optimization

def load_training_data(pgn_file = 'lichess_elite_2022-02.pgn', max_games = 140000): data = [] with open(pgn_file) as a: for i in range(max_games): game = chess.pgn.read_game(a) ...
Azazo8's user avatar
  • 3
0 votes
1 answer
85 views

How to suppress warnings from model.load_state_dict() in PyTorch?

I'm loading a model checkpoint using model.load_state_dict(state_dict, strict=False) because the model architecture does not fully match the weights. As expected, this results in a warning message ...
KuanLun's user avatar
0 votes
0 answers
35 views

How to update model weights from a custom arbitrary loss function in Pytorch?

Starting with the end in mind: Is there anyway I can write that arbitrary loss function by calling some pytorch functions, thus preserving the autograd graph? How can I ensure my loss function is &...
Alexandre Mahdhaoui's user avatar

15 30 50 per page
1
2 3 4 5
179