Newest 'pytorch' Questions - Artificial Intelligence Stack Exchange

1 vote

0 answers

52 views

How can I make an LSTM model more efficient on multivariate sine wave time series data?

I am trying to figure out why my first multivariate LSTM model isn't more efficient. It receives 2 time-series inputs and I would like it to estimate the hidden 0/1 binary state (my target). I have ...

litmus

111

asked Dec 22, 2025 at 20:45

1 vote

0 answers

48 views

Can’t reproduce perplexity distributions/peak in synthetic self-training (‘model collapse’) experiment (OPT-125m)

I investigate this paper: https://www.nature.com/articles/s41586-024-07566-y  They published some code here: https://zenodo.org/records/10866595 My generated text collapses across generations, but my ...

hanf_lolo

11

asked Dec 19, 2025 at 13:08

4 votes

0 answers

87 views

Identifying and Reducing Persistent Artifacts in Neural Vocoders

I’m evaluating a few neural vocoders (HiFi-GAN, Vocos, etc.) and I’m seeing the same type of artifact across all of them. I’m not sure exactly what it is, and I’m looking for help identifying it and ...

user17952421

143

asked Nov 30, 2025 at 10:20

1 vote

2 answers

143 views

What is the exact difference between a fully RNN and an Elman Network?

What is the exact difference between a fully RNN and an Elman Network? I have my lecture notes defining the Elman Network as \begin{align} \textbf{s}(t) &= \textbf{W} \textbf{x}(t) + \textbf{a}(t-...

mafe

11

asked Nov 16, 2025 at 18:24

2 votes

2 answers

212 views

If LLMs like OpenAI / DeepSeek / Gemini exist, why do we still need ML or NLP libraries, now and in the future?

I’m new to AI and NLP, and I’m trying to understand how different tools fit together. Large Language Models (LLMs) like OpenAI, DeepSeek, or Gemini can already handle many NLP tasks text ...

itsdevthen

23

asked Nov 16, 2025 at 10:37

0 votes

0 answers

22 views

Preventing GPU memory leak due to a custom neural network layer

I am using the MixStyle methodology for domain adaptation and it involves using a custom layer which is inserted after every encoder stage. However, it is causing VRAM to grow linearly, which causes ...

Vedant Dalimkar

1

asked Sep 29, 2025 at 16:00

1 vote

1 answer

44 views

FastAI results not reproducible (image classification)

I'm trying to reproduce the image classification results received with FastAI using plain PyTorch script and can not achieve the same numbers. FastAI performs significantly better, so I feel there's ...

CanonicEpicure

131

asked Sep 26, 2025 at 16:58

0 votes

1 answer

135 views

Why does YOLOv7 export to ONNX produce raw logits instead of normalized confidence scores?

I trained a YOLOv7 single-class model for rodent detection (class: rat) and exported it to ONNX using the standard export.py provided in the YOLOv7’s official repo. ...

kero

1

asked Sep 18, 2025 at 8:24

0 votes

0 answers

81 views

Denoising diffusion model is not denoising

I'm starting in the world of denoising diffusive models, so to get used to them I've started with something easy: the MNIST dataset, and fully connected layers for the UNet architecture. I know that ...

Víctor Francés Belda

1

asked Aug 8, 2025 at 8:53

0 votes

0 answers

41 views

Further fine-tuning a classification LLM

I'm working with a Lilt model, fine-tuned for classification of text within a document. I now want to further fine-tune this already fine-tuned model on additional labeled data. The goal is to adapt ...

youneedtoread1

1

asked Jul 31, 2025 at 10:36

1 vote

1 answer

162 views

Ensure Monotonic Output in a Neural Network with Variable-Length Sequence Input

I'm designing a neural network that takes input of shape (batch_size, seq_len, features), where seq_len can vary between samples....

bliu

11

asked Jul 29, 2025 at 1:11

1 vote

1 answer

135 views

DQN is not learning in Atari Pong environment and I can't figure out where I'm messing up

I'm trying to implement the findings from this DeepMind DQN paper (2015) from scratch in PyTorch using the Atari Pong environment. I've tested my Deep Q-Network on a simple test environment, where ...

Rohan Patel

13

asked Jul 21, 2025 at 21:18

0 votes

0 answers

26 views

Efficient way to ensure high data coverage in a stochastic minibatch sampling for GNN, while minimizing train time?

I am training a Graph Neural Network for inductive link prediction. The final objective is to predict links for unseen nodes. My neural network follows the general GraphSAGE pipeline but I have ...

WYSIWYG

101

asked Jul 9, 2025 at 9:49

0 votes

0 answers

57 views

BERT Adapter and LoRA for Multi-Label Classification (301 classes)

I’m working on a multi-label classification task with 301 labels. I’m using a BERT model with Adapters and LoRA. My dataset is relatively large (~1.5M samples), but I reduced it to around 1.1M to ...

Robin Mougne

1

asked Jul 2, 2025 at 13:01

4 votes

1 answer

154 views

Can torch use different NN optimization algorithms as gradient descent?

(Py)torch has a quite sophisticated autograd system. Essentially, it tracks which tensor was built from which one. That is very fine, if it can be applied in the problem. However, in the case of my ...

peterh

1

asked Jun 13, 2025 at 18:08

Stack Exchange Network

Questions tagged [pytorch]

How can I make an LSTM model more efficient on multivariate sine wave time series data?

Can’t reproduce perplexity distributions/peak in synthetic self-training (‘model collapse’) experiment (OPT-125m)

Identifying and Reducing Persistent Artifacts in Neural Vocoders

What is the exact difference between a fully RNN and an Elman Network?

If LLMs like OpenAI / DeepSeek / Gemini exist, why do we still need ML or NLP libraries, now and in the future?

Preventing GPU memory leak due to a custom neural network layer

FastAI results not reproducible (image classification)

Why does YOLOv7 export to ONNX produce raw logits instead of normalized confidence scores?

Denoising diffusion model is not denoising

Further fine-tuning a classification LLM

Ensure Monotonic Output in a Neural Network with Variable-Length Sequence Input

DQN is not learning in Atari Pong environment and I can't figure out where I'm messing up

Efficient way to ensure high data coverage in a stochastic minibatch sampling for GNN, while minimizing train time?

BERT Adapter and LoRA for Multi-Label Classification (301 classes)

Can torch use different NN optimization algorithms as gradient descent?

Hot Network Questions