Questions tagged [random-forest]
Random forest is a machine-learning method based on combining the outputs of many decision trees.
2,492 questions
1
vote
0
answers
34
views
Confidence threshold for random forest type = "prob" new data
I have a nice multiclass random forest model in R (using the packages ranger and caret) but I think this question applies to any random forest logic.
When I use my RF to label unknown data I want to ...
3
votes
0
answers
32
views
Mixed-effects random forest regression conditional variable permutation importance software implementations
Is there any existing open source software implementation of mixed effects random forest regression (for clustered data) that employs conditional inference decision trees as base learners, and enables ...
0
votes
0
answers
61
views
Research method selection
I used a robust linear regression to evaluate the impact of some variables on a dependent variable, their linear correlation being tested and proven. Now, I want to compute an importance score of ...
0
votes
0
answers
89
views
Can I use confusion matrix for prediction?
TLDR : confusion matrix is used to validate a model. But I also want to make predictions using my models. Can I use the confusion matrix to make predictions? I don't see any other way to do it, but I ...
0
votes
0
answers
55
views
MSE Loss: Which target representation allows better focus on minority class learning?
Given these two target representations for the same underlying data:
Target A : Minority class samples (Cluster 5) isolated in distribution tail, majority class samples (Clusters 3+6) shifted toward ...
1
vote
0
answers
62
views
Random forests and time series data
I am learning currently about decision trees and have read about bagging and random forests method.
Since bagging and random forests rely on the fact that data is IID, so that bootstrapping makes ...
8
votes
1
answer
207
views
Bootstrap validation of random forest models
A typical workflow in machine learning is to split data into train and test sets, using the former to develop a model and the latter to evaluate its ability to generalize.
Some dispute this as a best ...
1
vote
0
answers
76
views
Calibration with all data: data-poor scenarios [closed]
I’m working on species distribution modeling with binary data (presence / absence, 1 / 0). My target species is extremely rare (prevalence ~0.014), so my dataset is almost all zeros and just a handful ...
0
votes
0
answers
46
views
Should I tune models used in feature selection and initial model evaluation?
I'm building a classification pipeline that evaluates multiple predictive models across different feature sets, each generated using a distinct feature selection method.
Feature selection methods: ...
0
votes
0
answers
34
views
Is cross-validation necessary when using randomForest in R only for feature selection (not prediction)?
I’m using randomForest in R solely for feature selection, not for prediction. The model is trained on all available data, and variable importance is assessed using <...
2
votes
0
answers
70
views
Variable selection with clustered data on large datasets (700k): Cross-validated, scalable, and interpretable models with random effects?
I’m working with a large dataset (≈700k observations) from an experiment, involving ≈5k patients and repeated trials across ≈50 covariates. The data structure includes multiple levels of clustering, ...
1
vote
1
answer
61
views
AUC for cforest in case of highly correlated variables
I have a binary outcome and multiple covariates. I am calculating the AUC for a fitted random forest model (using the party::cforest function to fit the random forest). Some of my covariates are ...
0
votes
0
answers
55
views
Why are there random spikes in RMSE and Rsq when tuning my random forest?
I am rather new to the world of random forests and have been using the tidy models package in R.
For context I am running a random forest with 7 predictors on a testing data set of 5,122 observations. ...
1
vote
1
answer
118
views
How to reduce overfitting for a randomforest model even when cross validation is implemented?
I'm working on fitting a random forest model using the caret library in R with a repeated cross-validation design to select hyperparameters. I've also experimented with adjusting the number of trees (...
0
votes
0
answers
46
views
Ensemble Neural Network - Stacking ensemble neural network accuracy is significantly similar or low compared to base models
Context
I'm trying to create an Ensemble survival neural network with a custom loss function which consist of 3 base models, Random Survival Forest (RSF), Gradient Boosting Survival Model (GBSM) and a ...