All Questions
Tagged with machine-learning scikit-learn
8,604 questions
0
votes
1
answer
44
views
How to preprocess date in Isolation Forest sklearn [closed]
I am using sklearn's IsolationForest model to detect anomalies on a time-series dataset. One of the features is date with the format MM-YYYY, the other features are numeric values.
What is the best ...
-2
votes
1
answer
87
views
Why does my RandomForestClassifier overfit despite using cross-validation? [closed]
I'm working on a binary classification problem using RandomForestClassifier from scikit-learn. My dataset has ~10,000 rows and ~20 numerical features. I used train_test_split and cross_val_score, but ...
0
votes
1
answer
43
views
Confirm understanding of decision_function in Isolation Forest
I am looking to better understand sklearn IsolationForest decision_function. My understanding from this previous stack overflow post, What is the difference between decision function and ...
1
vote
1
answer
38
views
Why does RandomForestClassifier in scikit-learn predict even on all-NaN input?
I am training a random forest classifier in python sklearn, see code below-
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X = df.drop("...
0
votes
0
answers
19
views
Get analytical equation of RF regressor model [duplicate]
I have the following dataset:
X1 X2 X3 y
0 0.548814 0.715189 0.602763 0.264556
1 0.544883 0.423655 0.645894 0.774234
2 0.437587 0.891773 0.963663 0.456150
3 ...
2
votes
0
answers
56
views
Different Feature Selection Results Between Local (Ubuntu VM) and Databricks Using sklearn's SequentialFeatureSelector
I am migrating from running my machine learning pipeline in VS Code with Ubuntu on a VM into Databricks. When I test the same dataset using the same code, I get different selected features from ...
0
votes
0
answers
51
views
TabPFN feature selection raises KeyError(f"None of [{key}] are in the [{axis_name}]")
I trained a tabPFN model, which I then tried applying a sequential feature selector for important feature selection. I've been getting this error
KeyError(f"None of [{key}] are in the [{axis_name}...
2
votes
1
answer
109
views
Why does SequentialFeatureSelector return at most "n_features_in_ - 1" predictors?
I have a training dataset with six features and I am using SequentialFeatureSelector to find an "optimal" subset of the features for a linear regression model. The following code returns ...
0
votes
1
answer
152
views
Length of features is not equal to the length of SHAP Values
Im running a random forest model and to get some feature importance and Im trying to run a SHAP analysis. The problem is that every time I try to plot the shap values, I keep getting this error:
...
1
vote
2
answers
206
views
Pipeline FutureWarning: This Pipeline instance is not fitted yet [closed]
I am working on a fairly simple machine learning problem in the form of a practicum. I am using the following code to preprocess the data:
from preprocess.date_converter import DateConverter
from ...
1
vote
1
answer
32
views
Why VotingClassifer performance with voting set to "hard" is different with different weights?
I wanted to test VotingClassifier from sklearn and comparae performance with different parameters. I used param grid and then I notice something unintelligible.
I prepared three classifiers
gnb = ...
0
votes
1
answer
24
views
In ordinal encoder what does handle_unknown= use_encoded_values do?
I've done my research about but I'm not satisfied with the answer I looked up both on the documentation and gemini. use_encoded_value what does it mean? Do I have to pass an argument to act as an ...
0
votes
2
answers
100
views
How to train sklearn model in different Dataframes?
I have a ML model made with "knn" in scikit-learn and noticed that the more i have data, more precise my model is getting with it's predictions. The problem is, i have lot's of DataFrames ...
0
votes
0
answers
39
views
Tweedie Regression: power >=2 ' "Some value(s) of y are out of the valid range of the loss", but y values are not
I'm running a Tweedie Regression, and for powers >= 2, I get an error telling me that my y values are out of the range of the HalfTweedieLoss. I understand the valid range of y for this loss to be &...
0
votes
1
answer
72
views
ValueError: X has 7 features, but ColumnTransformer expects 13 features
I have the following code where I try to predict price of tools for which I use poisson regression.
# --- Load and Prepare Data ---
y = train['PriceToday']
X = train.drop(columns=['PriceToday'])
# ...