I'm a little confused regarding the 'best practise' to implement a PyTorch data pipeline on time series data.
I have a HD5 file which I read using a custom DataLoader. It seems that I should return the data samples as a (features,targets) tuple with the shape of each being (L,C) where L is seq_len and C is number of channels - i.e. don't preform batching in the data loader, just return as a table.
PyTorch modules seem to require a batch dim, i.e. Conv1D expects (N, C, L).
I was under the impression that the DataLoader
class would prepend the batch dimension but it isn't, I'm getting data shaped (N,L).
dataset = HD5Dataset(args.dataset)
dataloader = DataLoader(dataset,
batch_size=N,
shuffle=True,
pin_memory=is_cuda,
num_workers=num_workers)
for i, (x, y) in enumerate(train_dataloader):
...
In the code above the shape of x is (N,C) not (1,N,C), which results in the code below (from a public git repo) to fail on the first line.
def forward(self, x):
"""expected input shape is (N, L, C)"""
x = x.transpose(1, 2).contiguous() # input should have dimension (N, C, L)
The documentation states When automatic batching is enabled It always prepends a new dimension as the batch dimension which leads me to believe that automatic batching is disabled but I don't understand why?
dataset
?