Questions tagged [pytorch]
For conceptual questions that somehow involve the PyTorch library, but note that programming questions are off-topic here.
273 questions
1
vote
0
answers
52
views
How can I make an LSTM model more efficient on multivariate sine wave time series data?
I am trying to figure out why my first multivariate LSTM model isn't more efficient.
It receives 2 time-series inputs and I would like it to estimate the hidden 0/1 binary state (my target). I have ...
1
vote
0
answers
48
views
Can’t reproduce perplexity distributions/peak in synthetic self-training (‘model collapse’) experiment (OPT-125m)
I investigate this paper: https://www.nature.com/articles/s41586-024-07566-y
They published some code here: https://zenodo.org/records/10866595
My generated text collapses across generations, but my ...
4
votes
0
answers
87
views
Identifying and Reducing Persistent Artifacts in Neural Vocoders
I’m evaluating a few neural vocoders (HiFi-GAN, Vocos, etc.) and I’m seeing the same type of artifact across all of them. I’m not sure exactly what it is, and I’m looking for help identifying it and ...
1
vote
2
answers
143
views
What is the exact difference between a fully RNN and an Elman Network?
What is the exact difference between a fully RNN and an Elman Network? I have my lecture notes defining the Elman Network as
\begin{align}
\textbf{s}(t) &= \textbf{W} \textbf{x}(t) + \textbf{a}(t-...
2
votes
2
answers
212
views
If LLMs like OpenAI / DeepSeek / Gemini exist, why do we still need ML or NLP libraries, now and in the future?
I’m new to AI and NLP, and I’m trying to understand how different tools fit together.
Large Language Models (LLMs) like OpenAI, DeepSeek, or Gemini can already handle many NLP tasks text ...
0
votes
0
answers
22
views
Preventing GPU memory leak due to a custom neural network layer
I am using the MixStyle methodology for domain adaptation and it involves using a custom layer which is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...
1
vote
1
answer
44
views
FastAI results not reproducible (image classification)
I'm trying to reproduce the image classification results received with FastAI using plain PyTorch script and can not achieve the same numbers. FastAI performs significantly better, so I feel there's ...
0
votes
1
answer
135
views
Why does YOLOv7 export to ONNX produce raw logits instead of normalized confidence scores?
I trained a YOLOv7 single-class model for rodent detection (class: rat) and exported it to ONNX using the standard export.py provided in the YOLOv7’s official repo.
...
0
votes
0
answers
81
views
Denoising diffusion model is not denoising
I'm starting in the world of denoising diffusive models, so to get used to them I've started with something easy: the MNIST dataset, and fully connected layers for the UNet architecture. I know that ...
0
votes
0
answers
41
views
Further fine-tuning a classification LLM
I'm working with a Lilt model, fine-tuned for classification of text within a document. I now want to further fine-tune this already fine-tuned model on additional labeled data. The goal is to adapt ...
1
vote
1
answer
162
views
Ensure Monotonic Output in a Neural Network with Variable-Length Sequence Input
I'm designing a neural network that takes input of shape (batch_size, seq_len, features), where seq_len can vary between samples....
1
vote
1
answer
135
views
DQN is not learning in Atari Pong environment and I can't figure out where I'm messing up
I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment.
I've tested my Deep Q-Network on a simple test environment, where ...
0
votes
0
answers
26
views
Efficient way to ensure high data coverage in a stochastic minibatch sampling for GNN, while minimizing train time?
I am training a Graph Neural Network for inductive link prediction. The final objective is to predict links for unseen nodes. My neural network follows the general GraphSAGE pipeline but I have ...
0
votes
0
answers
57
views
BERT Adapter and LoRA for Multi-Label Classification (301 classes)
I’m working on a multi-label classification task with 301 labels. I’m using a BERT model with Adapters and LoRA. My dataset is relatively large (~1.5M samples), but I reduced it to around 1.1M to ...
4
votes
1
answer
154
views
Can torch use different NN optimization algorithms as gradient descent?
(Py)torch has a quite sophisticated autograd system. Essentially, it tracks which tensor was built from which one. That is very fine, if it can be applied in the problem.
However, in the case of my ...