Skip to main content
deleted 144 characters in body
Source Link

Why is TimeSeriesDataSet returning a None as part of the second tuple?

Is there a way to configure TimeSeriesDataSet to avoid returning None?

Should DataLoader be configured differently to handle this case?

Why is TimeSeriesDataSet returning a None as part of the second tuple?

Is there a way to configure TimeSeriesDataSet to avoid returning None?

Should DataLoader be configured differently to handle this case?

Should DataLoader be configured differently to handle this case?

deleted 128 characters in body
Source Link
TylerH
  • 21.3k
  • 87
  • 85
  • 123

What I Have Tried: 

Checked TimeSeriesDataSet.target_names → Confirms all 30 target columns are correctly recognized.

Checked MultiNormalizer → It is applied correctly to all 30 targets.

Removed MultiNormalizer as a test → But TimeSeriesDataSet raises an error that MultiNormalizer is required.

Checked TimeSeriesDataSet output before DataLoader → It returns a tuple where sample[1] contains another tuple.

Printed sample[1] → It contains:

Questions: WhyWhy is TimeSeriesDataSet returning a None as part of the second tuple? Is

Is there a way to configure TimeSeriesDataSet to avoid returning None? Should

Should DataLoader be configured differently to handle this case?

Any insights on how to properly set up TimeSeriesDataSet to return structured data would be greatly appreciated! 🙏

What I Have Tried: Checked TimeSeriesDataSet.target_names → Confirms all 30 target columns are correctly recognized.

Checked MultiNormalizer → It is applied correctly to all 30 targets.

Removed MultiNormalizer as a test → But TimeSeriesDataSet raises an error that MultiNormalizer is required.

Checked TimeSeriesDataSet output before DataLoader → It returns a tuple where sample[1] contains another tuple.

Printed sample[1] → It contains:

Questions: Why is TimeSeriesDataSet returning a None as part of the second tuple? Is there a way to configure TimeSeriesDataSet to avoid returning None? Should DataLoader be configured differently to handle this case?

Any insights on how to properly set up TimeSeriesDataSet to return structured data would be greatly appreciated! 🙏

What I Have Tried: 

Checked TimeSeriesDataSet.target_names → Confirms all 30 target columns are correctly recognized.

Checked MultiNormalizer → It is applied correctly to all 30 targets.

Removed MultiNormalizer as a test → But TimeSeriesDataSet raises an error that MultiNormalizer is required.

Checked TimeSeriesDataSet output before DataLoader → It returns a tuple where sample[1] contains another tuple.

Printed sample[1] → It contains:

Why is TimeSeriesDataSet returning a None as part of the second tuple?

Is there a way to configure TimeSeriesDataSet to avoid returning None?

Should DataLoader be configured differently to handle this case?

Source Link

PyTorch Forecasting TimeSeriesDataSet Returns None in DataLoader Batch

I am working with pytorch-forecasting to create a TimeSeriesDataSet where I have 30 target variables that I want to predict. However, when I pass this dataset to a DataLoader, I encounter an issue:

Expected Behavior Since I have 30 target variables, I expect TimeSeriesDataSet to return:

A batch where the targets are in the shape (batch_size, 30) as a single torch.Tensor. The dataset should be structured so that the DataLoader can correctly package it into mini-batches without issues. In other words, I expect each batch to contain: A dict of inputs with the necessary features. A torch.Tensor for the targets, with shape (batch_size, 30). Actual Behavior Instead, TimeSeriesDataSet returns a tuple with two elements:

The first element is a dict containing the input tensors, which is fine. The second element is another tuple with two elements: sample[1][0]: A list of 30 tensors instead of a single tensor. sample[1][1]: None, which causes an error when passed to PyTorch's default_collate.

Error Message from DataLoader:

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts, or lists; found <class 'NoneType'>

This suggests that PyTorch cannot handle the None value being returned by TimeSeriesDataSet.

Dataset Information: 7061 time steps. Each row contains 30 numerical values (representing different features). Goal: Predict the values for the next step based on the previous 10 time steps. The dataset is structured as a single time series (one group).

Code:

import pandas as pd
import torch
from pytorch_forecasting.data.encoders import TorchNormalizer
from pytorch_forecasting import TimeSeriesDataSet, MultiNormalizer
from torch.utils.data import DataLoader

# Load dataset
file_name = 'DATA'  # CSV file
data = pd.read_csv(f'{file_name}.CSV')

# Drop unnecessary columns
if "Date" in data.columns:
    data = data.drop(columns=["Date"])

# Add time index
data["time_idx"] = range(len(data))
data["time_idx"] = data["time_idx"].astype(int)

# Add a dummy group column (since all data belongs to one group)
data["group"] = "single_group"

# Rename columns for uniformity
data.columns = ["num_" + str(i+1) for i in range(30)] + ["time_idx", "group"]

# Convert 'group' to category codes
data["group"] = data["group"].astype("category").cat.codes

# Fill any NaN values
if data.isna().sum().sum() > 0:
    print("⚠️ Found NaN values, filling with 0.")
    data.fillna(0, inplace=True)

# TimeSeriesDataSet configuration
max_encoder_length = 10  # Past observations
max_prediction_length = 1  # Future prediction
target_cols = ["num_" + str(i+1) for i in range(30)]

# Create TimeSeriesDataSet
training = TimeSeriesDataSet(
    data=data,
    time_idx="time_idx",
    target=target_cols,  # 30 targets
    group_ids=["group"],
    max_encoder_length=max_encoder_length,
    max_prediction_length=max_prediction_length,
    time_varying_unknown_reals=target_cols,
    target_normalizer=MultiNormalizer([TorchNormalizer(method="identity") for _ in range(30)]),
    add_relative_time_idx=True,
    add_target_scales=False,
    add_encoder_length=True
)

# DataLoader
batch_size = 32
train_dataloader = DataLoader(
    training,
    batch_size=batch_size,
    shuffle=False  
)

# DEBUG: Inspect the DataLoader output
for batch in train_dataloader:
    print("🔍 Checking batch from DataLoader")
    print("Batch type:", type(batch))

    inputs, targets = batch  # Unpacking inputs and targets
    print("Target type:", type(targets))

    if isinstance(targets, list):
        print(f"⚠️ Target is a list with {len(targets)} elements!")
        print("First 3 elements:", targets[:3])
    elif isinstance(targets, torch.Tensor):
        print(f"✔️ Target is a tensor with shape: {targets.shape}")

    break  # Only check the first batch

What I Have Tried: ✅ Checked TimeSeriesDataSet.target_names → Confirms all 30 target columns are correctly recognized.

✅ Checked MultiNormalizer → It is applied correctly to all 30 targets.

✅ Removed MultiNormalizer as a test → But TimeSeriesDataSet raises an error that MultiNormalizer is required.

✅ Checked TimeSeriesDataSet output before DataLoader → It returns a tuple where sample[1] contains another tuple.

✅ Printed sample[1] → It contains:

sample[1][0]: A list of tensors instead of a single tensor. sample[1][1]: None (likely causing the error).

Questions: Why is TimeSeriesDataSet returning a None as part of the second tuple? Is there a way to configure TimeSeriesDataSet to avoid returning None? Should DataLoader be configured differently to handle this case?

Any insights on how to properly set up TimeSeriesDataSet to return structured data would be greatly appreciated! 🙏