7,743 questions
Tooling
0
votes
3
replies
57
views
What classification models are suitable for time series data?
I have a raw time series dataset. Can I use classification models on time series data? If yes which classification models can I use without forecasting? Most examples I’ve seen focus on forecasting or ...
Advice
0
votes
0
replies
32
views
Why use loss threshold for anomaly detection using AutoEncoders?
Recently I have been practicing with AutoEncoders and trying to use them for anomaly detection. I have been told that using a threshold on the loss to classify anomalies (calibrated with roc-curve) is ...
Best practices
1
vote
2
replies
58
views
Handling imbalanced categorical classification – best approach and evaluation metrics?
I am working on a categorical classification problem using the Portugal Bank Marketing Dataset. The target variable is binary (yes / no), and the dataset is highly imbalanced (far more no than yes).
I ...
Tooling
0
votes
1
replies
55
views
Improving precision in LLM-based binary classification by detecting ambiguous cases
I’m working on a prompt-based binary classification task using an LLM, where the main goal is to maximize precision.
Instead of assigning a label to every input, I want the system to:
• Assign a ...
Best practices
0
votes
2
replies
46
views
Stepwise Random Forest Classifier - Hack or Bodge
I created a random forest classifier in R intended to identify individual urban tree species/genera. I have a large train/test dataset (n= 200k) with about 30 predicotrs that are mostly spectral ...
Advice
0
votes
2
replies
85
views
Semi-Supervised Learning - Confusion Matrix - R
I am currently trying the semi-supervised classification (SSC) library in R using code from the vignette.
The vignette removes some observations from the Wine dataset such that it's partially labelled ...
1
vote
0
answers
56
views
Learn 2D Feature Map for Texture Classification
For a bit of context, I am working in a lab where we use a dissimilarity map to characterize textures between them, called LDM (Local Dissimilarity Maps)[1]. Recently, this was further enhanced by ...
Tooling
0
votes
1
replies
80
views
How to categorize small/local company names into sectors (Tech, Industry, etc.) when they cannot be found online?
I have a large dataset containing small-scale / local company names, and I need to categorize each company into sectors such as Tech, Industrial, Finance, Retail, etc.
The problem is:
These companies ...
0
votes
0
answers
38
views
Can I use fitnet with softmax at the output layer for regression task, where output should be [0, 1] and their sum =1?
am performing regression analysis using the fitnet function to develop a supervised neural network that acts as a surrogate model. The training target data have specific constraints: all outputs must ...
0
votes
1
answer
96
views
Hyperparameter tuning using Wandb or Keras Tuner - 10 fold Cross Validation
If I am using stratified 10-folds for classification/regression tasks, where do I need to define the logic for hyperparameter tuning using Scikit or Wandb?
Should it be inside the loop or outside?
I ...
0
votes
1
answer
87
views
Time series patient visits for XGBoost classifier
I’m developing a tree-based model classifier (XGBoost) using some healthcare (patient visits) data. The data has a time dimension, and I want to observe if there is a longitudinal effect for the ...
0
votes
0
answers
60
views
Handling seasonality and class imbalance in time-series binary classification
I’m building a PyTorch binary classifier using ~9 months of daily data. There’s extremely strong seasonality in the positive rate, and I only have 9 months total, so a whole year of training data is ...
0
votes
1
answer
58
views
Tidymodels: Use step_dummy() for multiple binary classifications?
I am a little bit lost in tidymodels. I have a some data from topicmodeling:
prevalent_topic: factor variable with most prevalent topic, ranging from "Topic_1" to "Topic_5"
value1 ...
1
vote
0
answers
97
views
Why is DecisionTree using same feature and same condition twice
When trying to fit scikit-learn DecisionTreeClassifier on my data, I am observing some weird behavior.
x[54] (a boolan feature) is used to break the 19 samples into 2 and 17 on top left node. Then ...
0
votes
2
answers
112
views
Missing values in olive oil dataset
I have a dataset of olive oil samples and the goal of creating a classification model for oil quality. I'm having trouble deciding how to deal with missing data. have a look at the data here if you ...