28,285 questions
0
votes
0
answers
17
views
HalvingGridSearchCV cannot fit multi label DecisionTreeClassifier
I'm trying to use HalvingGridSearch to find the best DecisionTree model. My model performs a multi-label prediction on a single example, it is trained on a batch of data of size (n_samples x ...
-3
votes
0
answers
19
views
When should I use Random Forest instead of XGBoost, and vice versa?
I’ve been using both Random Forest and XGBoost for classification tasks. In most cases, I notice that XGBoost gives slightly better accuracy. However, I’m unsure about the specific scenarios where one ...
0
votes
2
answers
33
views
Python Sklearn.Model_Selection giving error numpy.dtype size changed
I have a train test split code
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(new_cleaned_df, test_size=0.05, random_state=42, shuffle=True)
train_df....
1
vote
1
answer
48
views
Linear regression prediction does not display properly
I want to make 2 different linear regressions for 2 diferent plots, but on the same figure. I have a problem with the y1_pred because it does not go for all the y axis where are scatters.
model1 = ...
0
votes
1
answer
44
views
How to preprocess date in Isolation Forest sklearn [closed]
I am using sklearn's IsolationForest model to detect anomalies on a time-series dataset. One of the features is date with the format MM-YYYY, the other features are numeric values.
What is the best ...
-2
votes
1
answer
87
views
Why does my RandomForestClassifier overfit despite using cross-validation? [closed]
I'm working on a binary classification problem using RandomForestClassifier from scikit-learn. My dataset has ~10,000 rows and ~20 numerical features. I used train_test_split and cross_val_score, but ...
0
votes
0
answers
25
views
Keras SKLearnClassifier wrapper can't fit MNIST data
I'm trying to use the SKLearnClassifier Keras wrapper to do some grid searching and cross validation using the sklearn library but I'm unable to get the model to work properly.
def build_model(X, y, ...
-2
votes
0
answers
22
views
Which library is more reliable for LDA and perplexity: gensim or scikit-learn? [closed]
When calculating the perplexity of an LDA model for N topics using train-test split with KFold, I noticed that in Gensim, the perplexity consistently increases as the number of topics grows—resulting ...
2
votes
1
answer
35
views
How to fit scaler for different subsets of rows depending on group variable and include it in a Pipeline?
I have a data set like the following and want to scale the data using any of the scalers in sklearn.preprocessing.
Is there an easy way to fit this scaler not over the whole data set, but per group? ...
0
votes
1
answer
43
views
Confirm understanding of decision_function in Isolation Forest
I am looking to better understand sklearn IsolationForest decision_function. My understanding from this previous stack overflow post, What is the difference between decision function and ...
1
vote
1
answer
38
views
Why does RandomForestClassifier in scikit-learn predict even on all-NaN input?
I am training a random forest classifier in python sklearn, see code below-
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(random_state=42)
rf.fit(X = df.drop("...
1
vote
2
answers
38
views
reg.predict is telling me I am not providing an array
It seems I have an issue with an array that I thought I coded correctly. When I ask for reg.score or reg.coef_ the code works great, but when I try to predict it throws an error that is saying it is ...
0
votes
0
answers
19
views
Get analytical equation of RF regressor model [duplicate]
I have the following dataset:
X1 X2 X3 y
0 0.548814 0.715189 0.602763 0.264556
1 0.544883 0.423655 0.645894 0.774234
2 0.437587 0.891773 0.963663 0.456150
3 ...
2
votes
0
answers
56
views
Different Feature Selection Results Between Local (Ubuntu VM) and Databricks Using sklearn's SequentialFeatureSelector
I am migrating from running my machine learning pipeline in VS Code with Ubuntu on a VM into Databricks. When I test the same dataset using the same code, I get different selected features from ...
0
votes
0
answers
51
views
TabPFN feature selection raises KeyError(f"None of [{key}] are in the [{axis_name}]")
I trained a tabPFN model, which I then tried applying a sequential feature selector for important feature selection. I've been getting this error
KeyError(f"None of [{key}] are in the [{axis_name}...