I am working with pytorch-forecasting to create a TimeSeriesDataSet where I have 30 target variables that I want to predict.
However, when I pass this dataset to a DataLoader, I encounter an issue:
Expected Behavior
Since I have 30 target variables, I expect TimeSeriesDataSet to return:
A batch where the targets are in the shape (batch_size, 30) as a single torch.Tensor.
The dataset should be structured so that the DataLoader can correctly package it into mini-batches without issues.
In other words, I expect each batch to contain:
A dict of inputs with the necessary features.
A torch.Tensor for the targets, with shape (batch_size, 30).
Actual Behavior
Instead, TimeSeriesDataSet returns a tuple with two elements:
The first element is a dict containing the input tensors, which is fine.
The second element is another tuple with two elements:
sample[1][0]: A list of 30 tensors instead of a single tensor.
sample[1][1]: None, which causes an error when passed to PyTorch's default_collate.
Error Message from DataLoader:
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts, or lists; found <class 'NoneType'>
This suggests that PyTorch cannot handle the None value being returned by TimeSeriesDataSet.
Dataset Information:
7061 time steps.
Each row contains 30 numerical values (representing different features).
Goal: Predict the values for the next step based on the previous 10 time steps.
The dataset is structured as a single time series (one group).
Code:
import pandas as pd
import torch
from pytorch_forecasting.data.encoders import TorchNormalizer
from pytorch_forecasting import TimeSeriesDataSet, MultiNormalizer
from torch.utils.data import DataLoader
# Load dataset
file_name = 'DATA' # CSV file
data = pd.read_csv(f'{file_name}.CSV')
# Drop unnecessary columns
if "Date" in data.columns:
data = data.drop(columns=["Date"])
# Add time index
data["time_idx"] = range(len(data))
data["time_idx"] = data["time_idx"].astype(int)
# Add a dummy group column (since all data belongs to one group)
data["group"] = "single_group"
# Rename columns for uniformity
data.columns = ["num_" + str(i+1) for i in range(30)] + ["time_idx", "group"]
# Convert 'group' to category codes
data["group"] = data["group"].astype("category").cat.codes
# Fill any NaN values
if data.isna().sum().sum() > 0:
print("⚠️ Found NaN values, filling with 0.")
data.fillna(0, inplace=True)
# TimeSeriesDataSet configuration
max_encoder_length = 10 # Past observations
max_prediction_length = 1 # Future prediction
target_cols = ["num_" + str(i+1) for i in range(30)]
# Create TimeSeriesDataSet
training = TimeSeriesDataSet(
data=data,
time_idx="time_idx",
target=target_cols, # 30 targets
group_ids=["group"],
max_encoder_length=max_encoder_length,
max_prediction_length=max_prediction_length,
time_varying_unknown_reals=target_cols,
target_normalizer=MultiNormalizer([TorchNormalizer(method="identity") for _ in range(30)]),
add_relative_time_idx=True,
add_target_scales=False,
add_encoder_length=True
)
# DataLoader
batch_size = 32
train_dataloader = DataLoader(
training,
batch_size=batch_size,
shuffle=False
)
# DEBUG: Inspect the DataLoader output
for batch in train_dataloader:
print("🔍 Checking batch from DataLoader")
print("Batch type:", type(batch))
inputs, targets = batch # Unpacking inputs and targets
print("Target type:", type(targets))
if isinstance(targets, list):
print(f"⚠️ Target is a list with {len(targets)} elements!")
print("First 3 elements:", targets[:3])
elif isinstance(targets, torch.Tensor):
print(f"✔️ Target is a tensor with shape: {targets.shape}")
break # Only check the first batch
What I Have Tried:
✅ Checked TimeSeriesDataSet.target_names → Confirms all 30 target columns are correctly recognized.
✅ Checked MultiNormalizer → It is applied correctly to all 30 targets.
✅ Removed MultiNormalizer as a test → But TimeSeriesDataSet raises an error that MultiNormalizer is required.
✅ Checked TimeSeriesDataSet output before DataLoader → It returns a tuple where sample[1] contains another tuple.
✅ Printed sample[1] → It contains:
sample[1][0]: A list of tensors instead of a single tensor.
sample[1][1]: None (likely causing the error).
Questions:
Why is TimeSeriesDataSet returning a None as part of the second tuple?
Is there a way to configure TimeSeriesDataSet to avoid returning None?
Should DataLoader be configured differently to handle this case?
Any insights on how to properly set up TimeSeriesDataSet to return structured data would be greatly appreciated! 🙏