454 questions
0
votes
1
answer
177
views
Using DataLoader for efficient model prediction
I'm trying to understand the role/utility of batch_size in torch beyond model training. I already have a trained model, where the batch_size was optimized as a hyperparameter. I want to use the model ...
0
votes
1
answer
88
views
What size should an `IterableDataset` report when used in a multi-worker `DataLoader`?
Here's a simple dataset and a data loader that uses it:
import torch
from torch.utils.data import DataLoader, IterableDataset
class Dataset(IterableDataset):
def __init__(self, size: int):
...
0
votes
0
answers
104
views
Batching temporal graphs with Pytorch geometric data loader
I'm conducting research with temporal graph data using Pytorch-geometric.
I'm facing some issues of memory usage when making PyG data in dense format (with to_dense_batch() and to_dense_adj()).
I have ...
1
vote
1
answer
77
views
num workers does not run in parallel
from torch.utils.data import Dataset, DataLoader
import time
import multiprocessing as mp
import torch
class Sleep(Dataset):
def __len__(self): return 20
def __getitem__(self, i):
...
0
votes
0
answers
41
views
Significant overhead when calling DataLoader for a dataset within FastAPI endpoint using multiple processing
I am calling a machine learning model for a dataset that I have loaded using torch DataLoader:
class FilesDataset():
def __init__(self, path):
file_paths = glob.glob(os.path.join(path, "*....
0
votes
1
answer
42
views
Discrepancy in number of elements outputted by torch Dataset and DataLoader
I have a custom Subset:
class TestSubset2(Subset):
def __init__(self, dataset, indices, days=False):
super().__init__(dataset, indices)
self.days = days
def __getitem__(self, ...
0
votes
0
answers
42
views
PyTorch DataLoader gradually slowing down as training progresses
I noticed my dataset iteration gradually slows down as training progresses. I'm using an A100 Google Colab instance. I removed the model and all the training stuff to try to debug the dataset. With ...
1
vote
0
answers
172
views
How to trace PyTorch Dataloader workers with VizTracer?
I'm using VizTracer to debug performance issues in my PyTorch data loading pipeline. Specifically, I'm using a DataLoader with num_workers > 0 to load data in parallel using multiple subprocesses.
...
1
vote
0
answers
50
views
Image Tensors Return As Zero When num_workers > 0
I am facing an issue with multiprocessing. I am trying to load my .pt data as dataloaders. Everything works fine when I set the num_workers = 0. But when I set it to a value greater than 0, the tensor ...
0
votes
1
answer
57
views
Torch tensor dataloader shape issue
I have a simple application of torch.DataLoader that gets a nice performance boost. It's created by the tensor_loader in the following example.
from torch.utils.data import DataLoader, TensorDataset, ...
1
vote
0
answers
101
views
Error When Using Batch Size Greater Than 1 in PyTorch
I'm building a neural network to predict how an image will be partitioned during compression using VVC (Versatile Video Coding). The model takes a single Y-frame from a YUV420 image as input and uses ...
1
vote
0
answers
108
views
PyTorch Forecasting TimeSeriesDataSet Returns None in DataLoader Batch
I am working with pytorch-forecasting to create a TimeSeriesDataSet where I have 30 target variables that I want to predict.
However, when I pass this dataset to a DataLoader, I encounter an issue:
...
1
vote
1
answer
706
views
RuntimeError: Given groups=1, weight of size [64, 3, 3, 7, 7], expected input[1, 8, 3, 112, 112] to have 3 channels, but got 8 channels instead
import os
import shutil
import random
import torch
import torchvision.transforms as transforms
import cv2
import numpy as np
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
...
1
vote
0
answers
64
views
How to investigate memory consumption of pytorch_geometric data
I am working on a framework that uses pytorch_geometric graph data stored in the usual way in data.x and data.edge_index Additionally, the data loading process appends multiple other keys to that data ...
0
votes
1
answer
59
views
How to apply min-max scaling on a IterableDataset?
I'm using an iterableDataset because I have massive amounts of data. And since IterableDataset does not store all data in memory, we cannot directly compute min/max on the entire dataset before ...