1

I am working with pytorch-forecasting to create a TimeSeriesDataSet where I have 30 target variables that I want to predict. However, when I pass this dataset to a DataLoader, I encounter an issue:

Expected Behavior Since I have 30 target variables, I expect TimeSeriesDataSet to return:

A batch where the targets are in the shape (batch_size, 30) as a single torch.Tensor. The dataset should be structured so that the DataLoader can correctly package it into mini-batches without issues. In other words, I expect each batch to contain: A dict of inputs with the necessary features. A torch.Tensor for the targets, with shape (batch_size, 30). Actual Behavior Instead, TimeSeriesDataSet returns a tuple with two elements:

The first element is a dict containing the input tensors, which is fine. The second element is another tuple with two elements: sample[1][0]: A list of 30 tensors instead of a single tensor. sample[1][1]: None, which causes an error when passed to PyTorch's default_collate.

Error Message from DataLoader:

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts, or lists; found <class 'NoneType'>

This suggests that PyTorch cannot handle the None value being returned by TimeSeriesDataSet.

Dataset Information: 7061 time steps. Each row contains 30 numerical values (representing different features). Goal: Predict the values for the next step based on the previous 10 time steps. The dataset is structured as a single time series (one group).

Code:

import pandas as pd
import torch
from pytorch_forecasting.data.encoders import TorchNormalizer
from pytorch_forecasting import TimeSeriesDataSet, MultiNormalizer
from torch.utils.data import DataLoader

# Load dataset
file_name = 'DATA'  # CSV file
data = pd.read_csv(f'{file_name}.CSV')

# Drop unnecessary columns
if "Date" in data.columns:
    data = data.drop(columns=["Date"])

# Add time index
data["time_idx"] = range(len(data))
data["time_idx"] = data["time_idx"].astype(int)

# Add a dummy group column (since all data belongs to one group)
data["group"] = "single_group"

# Rename columns for uniformity
data.columns = ["num_" + str(i+1) for i in range(30)] + ["time_idx", "group"]

# Convert 'group' to category codes
data["group"] = data["group"].astype("category").cat.codes

# Fill any NaN values
if data.isna().sum().sum() > 0:
    print("⚠️ Found NaN values, filling with 0.")
    data.fillna(0, inplace=True)

# TimeSeriesDataSet configuration
max_encoder_length = 10  # Past observations
max_prediction_length = 1  # Future prediction
target_cols = ["num_" + str(i+1) for i in range(30)]

# Create TimeSeriesDataSet
training = TimeSeriesDataSet(
    data=data,
    time_idx="time_idx",
    target=target_cols,  # 30 targets
    group_ids=["group"],
    max_encoder_length=max_encoder_length,
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=target_cols,
    target_normalizer=MultiNormalizer([TorchNormalizer(method="identity") for _ in range(30)]),
    add_relative_time_idx=True,
    add_target_scales=False,
    add_encoder_length=True
)

# DataLoader
batch_size = 32
train_dataloader = DataLoader(
    training,
    batch_size=batch_size,
    shuffle=False  
)

# DEBUG: Inspect the DataLoader output
for batch in train_dataloader:
    print("🔍 Checking batch from DataLoader")
    print("Batch type:", type(batch))

    inputs, targets = batch  # Unpacking inputs and targets
    print("Target type:", type(targets))

    if isinstance(targets, list):
        print(f"⚠️ Target is a list with {len(targets)} elements!")
        print("First 3 elements:", targets[:3])
    elif isinstance(targets, torch.Tensor):
        print(f"✔️ Target is a tensor with shape: {targets.shape}")

    break  # Only check the first batch

What I Have Tried:

Checked TimeSeriesDataSet.target_names → Confirms all 30 target columns are correctly recognized.

Checked MultiNormalizer → It is applied correctly to all 30 targets.

Removed MultiNormalizer as a test → But TimeSeriesDataSet raises an error that MultiNormalizer is required.

Checked TimeSeriesDataSet output before DataLoader → It returns a tuple where sample[1] contains another tuple.

Printed sample[1] → It contains:

sample[1][0]: A list of tensors instead of a single tensor. sample[1][1]: None (likely causing the error).

Should DataLoader be configured differently to handle this case?

1
  • Welcome to Stack Overflow. Your question is currently too broad as it asks multiple things; questions on Stack Overflow must ask one thing only per question, so please edit yours to focus on only one thing. If you still have questions once you've received an answer, you can ask another, separate question (and so on). Commented Mar 12, 2025 at 14:12

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.