Skip to main content

All Questions

-2 votes
1 answer
87 views

Why does my RandomForestClassifier overfit despite using cross-validation? [closed]

I'm working on a binary classification problem using RandomForestClassifier from scikit-learn. My dataset has ~10,000 rows and ~20 numerical features. I used train_test_split and cross_val_score, but ...
Eshaan Saha's user avatar
0 votes
0 answers
71 views

GridSearchCV and cross_val_score with KNeighborsClassifier using roc_auc metric is returning error because of decision_function

I am working on a binary classification problem. The class distribution is Positive: 30% - Negative: 70%. Because of that, I decided to use as a metric roc_auc I am then, running a hyperparameter ...
Mael Fosso's user avatar
0 votes
1 answer
53 views

How to use GridSearchCV on a customized estimator?

I built a custom Estimator using sklearn BaseEstimator and ClassifierMixin. But when it comes to cross validation, GridSearchCV gives me nan values on the score. Here is the code of the estimator : ...
Yann's user avatar
  • 17
1 vote
0 answers
29 views

Problem with cros_val_score with Linear Regression

I have this code where I don't understand where the error is when I calculate the cross_val_score. You can find the code at the end. When I insert X and Y into cross_val_score I get this output= [-1....
Marco Di Giacomo's user avatar
1 vote
1 answer
341 views

How to weight samples with sklearns's cross_validate for scoring only?

I am running a regression task on a dataset which is composed of both authentic and augmented samples. The augmented samples are generated by jittering the authentic ones. I would like to select the ...
majpark's user avatar
  • 103
1 vote
1 answer
613 views

How does scikit's RFECV class compute cv_results_?

My understanding of Recursive Feature Elimination Cross Validation: (sklearn.feature_selection.RFECV) You provide an algorithm which is trained on the entire dataset and creates a feature importance ...
AvanishM's user avatar
1 vote
1 answer
780 views

How to perform cross-validation with LightGBM.LGBMRanker, while keeping groups together?

I'm on a search problem, I have a dataset of queries and urls. Each couple (query, url) has a relevance (the target), a float which should preserve the order of the urls, for a given query. I would ...
Durand's user avatar
  • 89
2 votes
1 answer
119 views

Getting different score values between manual cross validation and cross_val_score

I created a python for loop to split the training dataset into stratified KFolds and used a classifier inside the loop to train it. Then used the trained model to predict with the validation data. The ...
Tony's user avatar
  • 23
0 votes
1 answer
99 views

Different Cross-Validation Techniques Yielding Identical Evaluation Metrics

I implement three ML algorithms (K-Nearest Neighbor, Decision Trees, and Random Forest) and use four different cross-validation techniques (Hold-Out Method, Leave-One-Out Method, K-Fold Cross-...
cigsesgi's user avatar
0 votes
0 answers
137 views

Nested and non-nested cross validation exactly the same for different ML techniques

I am using K-fold CV (nested vs non-nested) to figure out whether my classification model is over fitting the data. The code used essentially is the same as taken from sklearn: https://scikit-learn....
IbbyR's user avatar
  • 26
0 votes
1 answer
334 views

How to use SelectFromModel with cross_validate in scikit-learn?

I am trying to use SelectFromModel to select features from my dataset before training a DecisionTreeClassifier model. I am also using cross_validate to evaluate the model performance. However, I am ...
Aman 007's user avatar
0 votes
1 answer
542 views

Is this a proper cross-validation code with the Leave-One-Group-Out method?

Though the below code “works” (in that it does not give an error), I get very high AUCs which makes me wonder if it somehow skips over the actual type of cross-validation I am trying to make it ...
user22409235's user avatar
1 vote
0 answers
36 views

RandomizedSearchCV does not compue anything

so it's not my first time using RandomizedSearchCV but weirdly it stops computing, or at least it is not shown even though I set verbose = 2. It's weird cause it only fits the first 2 models and than ...
dracule22's user avatar
0 votes
1 answer
181 views

How to apply TimeSeries Cross Validation in Python for Data with irregular Dates of Observations and uneven Observations per Date?

Best Way to Perform TimeSeries Cross Validation with irregular Dates of Observations and uneven Observations per Date? I have a dataset that I have been trying to utilize for XGBoost Regression. The ...
Captain Nemo's user avatar
0 votes
2 answers
895 views

Scikit-Learn LOOCV vs doing it manually give different results, why?

So i have built a model for a small dataset and since it was a small dataset, i made a Leave-One-out Cross-Validation (LOOCV) check for its accuracy. so in short, i would remove one sample manually, ...
Ne-oL's user avatar
  • 13

15 30 50 per page
1
2 3 4 5
25