All Questions
53 questions
2
votes
3
answers
126
views
How does Hydra `_partial_` interact with seeding
In the configuration management library Hydra, it is possible to only partially instantiate classes defined in configuration using the _partial_ keyword. The library explains that this results in a ...
-2
votes
1
answer
238
views
ML Model training - System Memory usage increasing over epoch
I am implementing a simple MLP using sagemaker particularly an aws ml.g4.xlarge machine and I noticed that ram memory keeps increasing over epoch. I am using pytorch.lightning for ML model ...
0
votes
0
answers
54
views
Pruning error cannot assign 'torch.cuda.FloatTensor' to parameter
I am struggling to work out why I'm getting this error when trying to apply the lottery ticket hypothesis to my model. Clearly this is happening during the pruning callback, and it seems like it's ...
0
votes
1
answer
37
views
Difference between accuracy during training and accuracy during testing
In the model below the accuracy reported at the end of the final validation stage is 0.46, but when reported during the manual testing the value is 0.53. What can account for this discrepancy?
import ...
0
votes
0
answers
24
views
self supervised model converging to a constant
I was trying to train a Barlow twins model for image classification. Nonetheless, I encountered a problem after finishing my model training. It seems that the model has become a constant always ...
1
vote
0
answers
233
views
LSTM : predict_step in PyTorch Lightning
I've developed code for an LSTM model, but I'm uncertain about how to utilize it for predictions in a production environment. Could you please assist? In the provided predict.py script, I aim to ...
0
votes
0
answers
446
views
UNet pytorch implementation is too slow
I have a tensorflow based simple UNet model used for optimization program. The program is something like an “untrained” neural net in the sense that once I load input data (experimental low res images)...
0
votes
1
answer
1k
views
PyTorch Dataloader - list indices must be integers or slices, not list
I have implemented a COCO dataset as follows:
from torch.utils.data import Dataset
from detr.datasets.coco import CocoDetection
class MyCoco(CocoDetection):
def __init__(self,
...
0
votes
1
answer
88
views
ValueError: Expected input batch_size (31) to match target batch_size (128)
The error occured in the code for a Spiking Neural Network. I am using the neuromorphic-MNIST dataset which is a spiking version of the original MNIST dataset. As libraries I am using tonic (for the ...
2
votes
0
answers
691
views
is ML Flow Pytorch Lightning autologging supposed to automatically log metrics?
I am attempting to use MLflow to log a pytorch lightning model to experiments in Databricks. Accoring to the documentation, it seems like metrics like training and validation loss are supposed to be ...
2
votes
0
answers
268
views
Pytorch Lightning - Display per class metrics (precision, recall, f1) in Train.Test(model, datamodule)
How would i go around adding per-class metrics inside the test_step methods from a Pytorch Lighting Module?
I used
self.f1_each = F1Score(task="multiclass", num_classes=self.num_classes,...
2
votes
1
answer
761
views
Unable to import the TextDataset class from the aitextgen library
I am trying to build a machine learning model that generates job descriptions by finetuning the gpt-neo model with my dataset of job descriptions
I tried importing TextDataset class from aitextgen and ...
3
votes
0
answers
866
views
Pytorch Lightning manual GPU control
I have a training pipeline with two very big models, such that a single GPU can only fit one of them. I have access to two GPUs, and I decided to put each of the models on its own device. I have ...
1
vote
2
answers
5k
views
Why doesn't PyTorch Lightning module save logged val loss? ModelCheckpoint error
I'm running an LSTM-based model training on Kaggle. I use Pytorch Lightning and wandb logger for that.
That's my model's class:
class Model(pl.LightningModule):
def __init__(
self,
...
0
votes
1
answer
193
views
How to solve my problem of max_step parameter in pytorch?
I'm trying to train source code.
class mymodel(pl.LightningModule):
def __init__(self, config , learning_rate = 1e-4, max_steps = 100000//2):
super(mymodel, self).__init__()
self.config ...