Questions tagged [feature-selection]
Methods and principles of selecting a subset of attributes for use in further modelling
959 questions
2
votes
0
answers
46
views
Feature selection for unsupervised learning with a One-Class SVM
I am trying to build a solution to detect a particular sound against all possible other sounds occuring in nature.
My approach is to train a One-Class SVM only on my class of interest, hoping it will ...
6
votes
1
answer
73
views
How Do You Balance Feature Search Strategy and HP Optimization Cost?
What I’m trying to figure out
I'm working on a machine learning project and would love to hear your thoughts on two things:
A. How to prioritize feature exploration
B. Whether to fix hyperparameters (...
3
votes
2
answers
108
views
Will decision trees discard features that are similar to each other?
So we had a machine learning last quiz, and from what I have learned if there are similar features in machine learning model, the model should discard the redundant features.
But the answer is given ...
9
votes
2
answers
244
views
Is it best practice to remove outliers from transaction data used for training?
I am building a random forest regression model. The goal is to predict the maximum each customer will spend in a single transaction during the next 90 days.
I have transaction data for 7m customers, ...
2
votes
1
answer
232
views
Select top numerical features for a classification problem
I have 3 models and each model is solving tasks (say task 1 to 2).
Once these tasks (of same type) are solved by the models; I am collecting 3 numerical features (say feature1 to feature3) for each ...
0
votes
0
answers
24
views
How to generate complementary feature subsets given a complementarity measure?
I'm working on a project where I need to generate artificial subsets of features that are complementary to each other. Given a measure of complementarity between features, do you know any algorithms ...
1
vote
0
answers
45
views
Regression analysis for histograms
I am working in the field of LIDAR/RADAR and could use your help in exploring certain ideas. I have a certain scenario where I want to map histograms to certain numerical value (distance of object in ...
8
votes
2
answers
106
views
How to scale numerical features when different output classes have different value ranges
I'm working with a dataset of about 10,000 data points covering 50 different vegetable species, each with about 200 samples, and I’m predicting time to maturation using features like current fruit ...
3
votes
0
answers
47
views
suppose 1 category in a variable create data leakage, can we use other categories in the same variable as dummy to predict?
We are predicting conversion. Conversion means customer converted from paying one-off to paying regular (subscribe)
If one feature is categorical feature "Activity" , consisting 15+ ...
7
votes
1
answer
145
views
What are some popular but outdated or ineffective practices in data science?
I was taught stepwise feature selection (like forward and backward selection) during college, and at the time, it seemed like a really effective way to pick features. But recently i have been reading ...
1
vote
0
answers
145
views
How to correctly use RFECV for feature selection in a Scikit-Learn pipeline with a Simple Decision Tree?
I am working on the Kaggle House Price Prediction competition and have built a Scikit-Learn pipeline that includes:
Preprocessing (handling missing values, scaling, encoding)
Feature Engineering
...
4
votes
1
answer
76
views
SVC for Probability - Feature Selection
For typical models, we might be able to run a p-test to see which features have importance/should be removed. However, I'm not aware of any such tests for a SVC model.
In practice how should we ...
1
vote
0
answers
38
views
Implementing the Discriminant Algorithm for Reduced Multidimensional Features in Python
I need to implement the Discriminant Algorithm for Reduced Multidimensional Features proposed in the paper https://www.jstage.jst.go.jp/article/nolta/1/1/1_1_37/_pdf/-char/en as Algorithm 2. I am ...
1
vote
0
answers
52
views
Choosing the number of features via cross-validation
I have an algorithm that trains a binary predictive model for a specified number of features from the dataset (features are all of the same type, but not all important.) Thus, the number of features ...
2
votes
2
answers
379
views
Correlation between Continuous and Categorical Variables and Feature Selection
I want to make a classification model and to do this, at the end of my pre-processing and Feature creation, I end up with 167 continuous Features and a discrete target (5 modalities).
I'd like to ...