609 questions
0
votes
1
answer
59
views
How to use NeuralForecast and PyTorch Lightning on Intel GPU (XPU / torch.xpu)?
PyTorch supports Intel GPU through torch.xpu, but PyTorch Lightning does not currently have built-in XPU accelerator support.
Because NeuralForecast uses Lightning under the hood, that also blocks ...
4
votes
2
answers
167
views
ModelCheckpoint not saving last validating checkpoint when save_last=True
I am using pytorch lightning to train my model, here I use the lightning callback ModelCheckpoint, with the following settings:
ModelCheckpoint(
dirpath="path/to/dir",
monitor="...
1
vote
0
answers
107
views
Why does ty complain about invalid method override for Lightning Callback hooks?
I’m using PyTorch Lightning and trying to implement a simple callback. The code works at runtime, but the ty type checker reports invalid-method-override errors for on_train_start and on_train_end.
...
0
votes
1
answer
141
views
DiffProtect: How to fix "No module named 'numpy.lib.function_base'" when loading PyTorch Lightning model from 2023 checkpoint in Google Colab?
I'm trying to load a pre-trained PyTorch Lightning model from the DiffProtect repository (published in 2023) in Google Colab, but I'm encountering a numpy compatibility error.
Environment:
Google ...
Best practices
0
votes
0
replies
33
views
same data processing for multiple datasets with LightningDataModule
I work with multiple datasets and I repeat the same preprocessing to the data for every dataset. A convenient way of working with multiple datasets when using PyTorch, is to use the ...
0
votes
0
answers
59
views
T5-small generates only padding tokens during validation/test in PyTorch Lightning
I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps.
The Problem:
During validation_step and test_step, model.generate() consistently ...
6
votes
1
answer
529
views
Open source AI: How can I use uv or pip to install the *correct* build of PyTorch (CUDA, CPU, ROCm, etc.)?
My project uses PyTorch and Lightning. Since PyTorch is system dependent, users need to install it manually, based on their platform, using the platform-specific pip command provided by the PyTorch ...
1
vote
1
answer
123
views
PyTorch Lightning does not terminate on Mac OS Metal (M4 Max) when num_workers > 0
I've been trying to train some basic models using PyTorch Lightning on an M4 Max Mac Studio. While the training itself goes without hitch, there appears to be a problem when attempting to terminate ...
0
votes
0
answers
104
views
Batching temporal graphs with Pytorch geometric data loader
I'm conducting research with temporal graph data using Pytorch-geometric.
I'm facing some issues of memory usage when making PyG data in dense format (with to_dense_batch() and to_dense_adj()).
I have ...
0
votes
1
answer
35
views
RecursionError when using Opacus PrivacyEngine with PyTorch Lightning: maximum recursion depth exceeded
I'm implementing a differentially private recommendation system using PyTorch Lightning and Opacus, but I'm encountering a RecursionError during training. Here's my setup:
Problem
When I run my ...
0
votes
1
answer
193
views
KeyError: 'self' in save_hyperparameters() when custom metaclass used - Pytorch Lightning
Description
I'm working with LightningDataModule and wanted to ensure that a method (_after_init) runs only once after full initialization, regardless of subclassing. For that, I implemented a custom ...
1
vote
0
answers
111
views
Why the global_step (training step) is no sync with the wandb plot steps?
I'm using torch LightningModule trainer.
I create trainer with:
trainer = pl.Trainer(max_epochs = 3)
Each train epoch has 511 steps (total = 1533) and each validation epoch has 127 steps.
I use ...
1
vote
1
answer
123
views
Pytorch Lightning logs separately for train, validation and test datasets
I am trying to log the loss and auc for all 3 of my datasets - train, validation and test.
The datamodule defines the 3 loaders and I finally invoke the model as:
trainer.fit(model,datamodule)
trainer....
0
votes
1
answer
59
views
How to apply min-max scaling on a IterableDataset?
I'm using an iterableDataset because I have massive amounts of data. And since IterableDataset does not store all data in memory, we cannot directly compute min/max on the entire dataset before ...
2
votes
3
answers
458
views
How does Hydra `_partial_` interact with seeding
In the configuration management library Hydra, it is possible to only partially instantiate classes defined in configuration using the _partial_ keyword. The library explains that this results in a ...