0

I’m building a PyTorch binary classifier using ~9 months of daily data. There’s extremely strong seasonality in the positive rate, and I only have 9 months total, so a whole year of training data is unfortunately not possible.

I’m currently doing a rolling/expanding-window time-series CV (train on first N days, validate on next M days, then either expand or roll forward the train window by M), and I’ve tried:

Focal loss and weighted BCE

Varying window sizes (30 days vs full season)

Both rolling and expanding splits

Yet my model simply outputs the current class prevalence each day. Of course, trying to offset for the imbalance hasn't been ideal due to the actual data having that balance, and the classifier not being assessed on a balanced distribution. But I can't rule it out as a contributor. For most of my testing I've just used a 30 day window and binary cross entropy, I'd ideally be training on more data but the ability to assess on seasons gets difficult due to the time series constraints (if there's any approaches here that'd be greatly appreciated too).

The validation Brier score almost perfectly matches the p(1-p) curve, so it appears the network isn’t learning anything beyond the prevalence. Validation loss similarly seems quite dire, if the training and validation block have scores of over 30% it actually increases with epoch count almost immediately. I don't have very much experience dealing with the aforementioned issues while also respecting the limitations of time series data so I'd greatly appreciate any pointers or avenues to explore.

Thank you!

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.