Skip to main content

All Questions

0 votes
1 answer
44 views

How to preprocess date in Isolation Forest sklearn [closed]

I am using sklearn's IsolationForest model to detect anomalies on a time-series dataset. One of the features is date with the format MM-YYYY, the other features are numeric values. What is the best ...
Mar's user avatar
  • 21
-2 votes
1 answer
87 views

Why does my RandomForestClassifier overfit despite using cross-validation? [closed]

I'm working on a binary classification problem using RandomForestClassifier from scikit-learn. My dataset has ~10,000 rows and ~20 numerical features. I used train_test_split and cross_val_score, but ...
Eshaan Saha's user avatar
0 votes
1 answer
43 views

Confirm understanding of decision_function in Isolation Forest

I am looking to better understand sklearn IsolationForest decision_function. My understanding from this previous stack overflow post, What is the difference between decision function and ...
Mar's user avatar
  • 21
1 vote
1 answer
38 views

Why does RandomForestClassifier in scikit-learn predict even on all-NaN input?

I am training a random forest classifier in python sklearn, see code below- from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(random_state=42) rf.fit(X = df.drop("...
lsr729's user avatar
  • 844
0 votes
0 answers
19 views

Get analytical equation of RF regressor model [duplicate]

I have the following dataset: X1 X2 X3 y 0 0.548814 0.715189 0.602763 0.264556 1 0.544883 0.423655 0.645894 0.774234 2 0.437587 0.891773 0.963663 0.456150 3 ...
quant's user avatar
  • 4,492
2 votes
0 answers
56 views

Different Feature Selection Results Between Local (Ubuntu VM) and Databricks Using sklearn's SequentialFeatureSelector

I am migrating from running my machine learning pipeline in VS Code with Ubuntu on a VM into Databricks. When I test the same dataset using the same code, I get different selected features from ...
Mattie's user avatar
  • 31
0 votes
0 answers
51 views

TabPFN feature selection raises KeyError(f"None of [{key}] are in the [{axis_name}]")

I trained a tabPFN model, which I then tried applying a sequential feature selector for important feature selection. I've been getting this error KeyError(f"None of [{key}] are in the [{axis_name}...
Adam's user avatar
  • 156
2 votes
1 answer
109 views

Why does SequentialFeatureSelector return at most "n_features_in_ - 1" predictors?

I have a training dataset with six features and I am using SequentialFeatureSelector to find an "optimal" subset of the features for a linear regression model. The following code returns ...
CodingLikeAFox's user avatar
0 votes
1 answer
152 views

Length of features is not equal to the length of SHAP Values

Im running a random forest model and to get some feature importance and Im trying to run a SHAP analysis. The problem is that every time I try to plot the shap values, I keep getting this error: ...
Starterkit07's user avatar
1 vote
2 answers
206 views

Pipeline FutureWarning: This Pipeline instance is not fitted yet [closed]

I am working on a fairly simple machine learning problem in the form of a practicum. I am using the following code to preprocess the data: from preprocess.date_converter import DateConverter from ...
Santiago's user avatar
1 vote
1 answer
32 views

Why VotingClassifer performance with voting set to "hard" is different with different weights?

I wanted to test VotingClassifier from sklearn and comparae performance with different parameters. I used param grid and then I notice something unintelligible. I prepared three classifiers gnb = ...
Krzysztof's user avatar
0 votes
1 answer
24 views

In ordinal encoder what does handle_unknown= use_encoded_values do?

I've done my research about but I'm not satisfied with the answer I looked up both on the documentation and gemini. use_encoded_value what does it mean? Do I have to pass an argument to act as an ...
Remian-Feral's user avatar
0 votes
2 answers
100 views

How to train sklearn model in different Dataframes?

I have a ML model made with "knn" in scikit-learn and noticed that the more i have data, more precise my model is getting with it's predictions. The problem is, i have lot's of DataFrames ...
Guilherme Diniz Queiroz De Car's user avatar
0 votes
0 answers
39 views

Tweedie Regression: power >=2 ' "Some value(s) of y are out of the valid range of the loss", but y values are not

I'm running a Tweedie Regression, and for powers >= 2, I get an error telling me that my y values are out of the range of the HalfTweedieLoss. I understand the valid range of y for this loss to be &...
Laura Chutny's user avatar
0 votes
1 answer
72 views

ValueError: X has 7 features, but ColumnTransformer expects 13 features

I have the following code where I try to predict price of tools for which I use poisson regression. # --- Load and Prepare Data --- y = train['PriceToday'] X = train.drop(columns=['PriceToday']) # ...
H_H's user avatar
  • 17

15 30 50 per page
1
2 3 4 5
574