28,181 questions
Advice
0
votes
5
replies
136
views
What are the key libraries and documentation for implementing anomaly detection in Python?
I am starting a role in industry where I will be working on anomaly detection using machine learning, particularly for data analysis tasks.
I would like to understand which tools and libraries are ...
0
votes
1
answer
59
views
How to make a 2 Label Confusion Matrix and exporting into a json file?
I have to train a convolutional neural network on a dataset. The NN itself works and does what it's supposed to but now I want to make a confusion matrix and export it into a json file for further ...
2
votes
1
answer
87
views
Sklearn Pipelines, adding features and column transformers
I'm just trying out/experimenting with sklearn. I'm using the California housing dataset, and I'm trying to make a pipeline to create some additional features, then take the logarithm of some features,...
Advice
0
votes
4
replies
80
views
How to cluster data based on a single value in python?
I have a object data stored in a JSON:
One Drive link to json
These represent markers which I am placing on a 2D map (the lat/lng in the file are YX positions on the map).
In reality the 3D objects ...
Advice
0
votes
2
replies
60
views
Generic sklearn template
I am creating a reusable scikit-learn pipeline for tabular data with numeric and categorical columns.
I want to:
Impute missing numeric values with the median
Scale numeric columns
Impute ...
Best practices
1
vote
1
replies
61
views
XGBoost Pipeline with ColumnTransformer: Handling categorical + numerical data and improving accuracy
I am building a classification model using XGBoost with a Scikit-learn Pipeline that includes preprocessing for both numerical and categorical features.
import pandas as pd
import numpy as np
from ...
Best practices
0
votes
2
replies
64
views
Is this modular scikit-learn pipeline design correct for a binary classification task?
I'm building a modular machine learning pipeline for a binary classification problem using scikit-learn.
I structured the code into separate functions for preprocessing, scaling, and model training.
I ...
3
votes
2
answers
159
views
Fitting linear regression and computing metrics in python
I have two data series of model prediction and observations. I am able to make line plots of these series. I would like to add a linear regression fit of the two data series. i would also like to add ...
Advice
0
votes
1
replies
75
views
sklearn_boilerplate anything else should be included for everyday use?
this are sklearn imports I use in every notebook, is anything else to be included for everyday use?
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn....
0
votes
1
answer
91
views
How to maintain feature alignment when passing a custom model wrapper to SHAP KernelExplainer?
I'm working on an Explainable AI (XAI) project where I compare different model-agnostic frameworks (SHAP, LIME, DALEX). I'm using a custom wrapper to standardize my model's output (similar to a Scikit-...
2
votes
1
answer
118
views
How to identify users' profiles based on the output uplift values
I am currently learning the Causal Forest algorithm in Python. In an exercise, I need to evaluate a marketing campaign where a certain group of users have already received coupons. Given that Y is the ...
Advice
0
votes
7
replies
106
views
Numpy axis rules
I am a Python developer, but I don't understand one thing: what are the numpy axis? Sometimes, when I use Sklearn, I have errors about axis. And I need explanations about values and reshape functions.
1
vote
0
answers
51
views
sklearn's FactorAnalysis varimax orthogonal rotation increases correlation of factors
I'm using Scikit-Learn's FactorAnalysis in an application that relies on the assumption that the factors are uncorrelated. It would be great to have more interpretable factors, and an orthogonal ...
Best practices
0
votes
1
replies
140
views
best way to leverage polars multithreading with scikit-learn compatibility
I've been working on a project for rapidly testing thousands of outcome variables on a standard set of predictors and covariates using polars. It's working very well, with speed ups as high as 16x ...
1
vote
3
answers
119
views
What does PoissonRegression.predict() actually return in sklearn?
What is being returned by PoissonRegression.predict() in sklearn when I am predicting target values from data? Is it the actual predicted value of the target?