All Questions
367 questions
-2
votes
1
answer
87
views
Why does my RandomForestClassifier overfit despite using cross-validation? [closed]
I'm working on a binary classification problem using RandomForestClassifier from scikit-learn. My dataset has ~10,000 rows and ~20 numerical features. I used train_test_split and cross_val_score, but ...
0
votes
0
answers
71
views
GridSearchCV and cross_val_score with KNeighborsClassifier using roc_auc metric is returning error because of decision_function
I am working on a binary classification problem.
The class distribution is Positive: 30% - Negative: 70%. Because of that, I decided to use as a metric roc_auc
I am then, running a hyperparameter ...
0
votes
1
answer
53
views
How to use GridSearchCV on a customized estimator?
I built a custom Estimator using sklearn BaseEstimator and ClassifierMixin. But when it comes to cross validation, GridSearchCV gives me nan values on the score.
Here is the code of the estimator :
...
1
vote
0
answers
29
views
Problem with cros_val_score with Linear Regression
I have this code where I don't understand where the error is when I calculate the cross_val_score.
You can find the code at the end.
When I insert X and Y into cross_val_score I get this output=
[-1....
1
vote
1
answer
341
views
How to weight samples with sklearns's cross_validate for scoring only?
I am running a regression task on a dataset which is composed of both authentic and augmented samples. The augmented samples are generated by jittering the authentic ones. I would like to select the ...
1
vote
1
answer
613
views
How does scikit's RFECV class compute cv_results_?
My understanding of Recursive Feature Elimination Cross Validation: (sklearn.feature_selection.RFECV) You provide an algorithm which is trained on the entire dataset and creates a feature importance ...
1
vote
1
answer
780
views
How to perform cross-validation with LightGBM.LGBMRanker, while keeping groups together?
I'm on a search problem, I have a dataset of queries and urls. Each couple (query, url) has a relevance (the target), a float which should preserve the order of the urls, for a given query.
I would ...
2
votes
1
answer
119
views
Getting different score values between manual cross validation and cross_val_score
I created a python for loop to split the training dataset into stratified KFolds and used a classifier inside the loop to train it. Then used the trained model to predict with the validation data. The ...
0
votes
1
answer
99
views
Different Cross-Validation Techniques Yielding Identical Evaluation Metrics
I implement three ML algorithms (K-Nearest Neighbor, Decision Trees, and Random Forest) and use four different cross-validation techniques (Hold-Out Method, Leave-One-Out Method, K-Fold Cross-...
0
votes
0
answers
137
views
Nested and non-nested cross validation exactly the same for different ML techniques
I am using K-fold CV (nested vs non-nested) to figure out whether my classification model is over fitting the data.
The code used essentially is the same as taken from sklearn: https://scikit-learn....
0
votes
1
answer
334
views
How to use SelectFromModel with cross_validate in scikit-learn?
I am trying to use SelectFromModel to select features from my dataset before training a DecisionTreeClassifier model. I am also using cross_validate to evaluate the model performance. However, I am ...
0
votes
1
answer
542
views
Is this a proper cross-validation code with the Leave-One-Group-Out method?
Though the below code “works” (in that it does not give an error), I get very high AUCs which makes me wonder if it somehow skips over the actual type of cross-validation I am trying to make it ...
1
vote
0
answers
36
views
RandomizedSearchCV does not compue anything
so it's not my first time using RandomizedSearchCV but weirdly it stops computing, or at least it is not shown even though I set verbose = 2. It's weird cause it only fits the first 2 models and than ...
0
votes
1
answer
181
views
How to apply TimeSeries Cross Validation in Python for Data with irregular Dates of Observations and uneven Observations per Date?
Best Way to Perform TimeSeries Cross Validation with irregular Dates of Observations and uneven Observations per Date?
I have a dataset that I have been trying to utilize for XGBoost Regression. The ...
0
votes
2
answers
895
views
Scikit-Learn LOOCV vs doing it manually give different results, why?
So i have built a model for a small dataset and since it was a small dataset, i made a Leave-One-out Cross-Validation (LOOCV) check for its accuracy. so in short, i would remove one sample manually, ...