Questions tagged [scikit-learn]
scikit-learn is a popular machine learning package for Python that has simple and efficient tools for predictive data analysis. Topics include classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.
2,304 questions
0
votes
0
answers
45
views
Qiskit Problem: this solution is a bit slow, is there a way to make it faster and increase the accuracy a little bit?
I'm currently making a small binary classification program using Quantum Machine Learning (EstimatorQNN to be more specific). My program classifies data inside the Wisconsin Breast Cancer database and ...
5
votes
1
answer
92
views
Sklearn ROC Curve not square
I am using sklearn.metrics.roc_curve to calculate the points of a ROC curve.
This is the output I obtain.
This plot does not look as I would expect it to. The line ...
3
votes
1
answer
103
views
Principal Data Analysis - how to determine the key features contribute to PC1 using scikit-learn python
I struggle to select the key features that contribute to PC1. I will use the public breast cancer dataset to illustrate the issue. Please feel free to point me to previous post if this question has ...
0
votes
0
answers
21
views
Runtime complexity of scikit-learn’s One-vs-Rest LogisticRegression (LBFGS) vs. RidgeClassifier
I’m working through the runtime analysis of scikit-learn’s OneVsRestClassifier for two cases:
LogisticRegression (solver=lbfgs, ...
1
vote
1
answer
92
views
Sklearn's One-hot encoder adds an extra column for NaNs which are not there
I would appreciate your advice on how to resolve the following issue.
I am working with a dataset that contains two categorical features (actually, more than two, but two are enough to illustrate the ...
0
votes
0
answers
30
views
expected the model to forecast resolution time more accurately based on past ticket patterns. I was also hoping to unde
day
Modified today
Viewed 25 times
0
I want to build a model that forecasts ticket resolution time for a data science software support tickets . I’ve calculated queuing time and resolution time from ...
2
votes
1
answer
56
views
Clarification about scale dataset for MLP regression model and use of the scaling inverse transform
I am a lot confused about the pre-processing scaling process.
I have a dataset with several meteorological quantities (pressure, temperature, wind direction, etc.) and I am using it to forecast the ...
4
votes
1
answer
77
views
How to build model with smoothness via various data point
I am trying to model the arch of a basketball free throw projectory. Usually per person, this dataset has 6 points each where it is the height of the basketball via various seconds after the player ...
1
vote
0
answers
40
views
Nested cross-validation: which implementation to use? different purpose?
I am learning Machine Learning and exploring nested cross-validation. I don't understand the example given in scikit-learn as the model seems to learn from the whole dataset and the evaluation is not ...
3
votes
2
answers
144
views
Much higher scoring metrics with classification_report than cross_validate
I'm training a classifier on the DAIGT dataset. The objective is to differentiate human from AI text and so this is a binary classification problem. As a baseline before I move onto an LLM classifier, ...
7
votes
2
answers
159
views
Loan prediction model relying almost entirely on Credit_History and ignoring other features
I'm building a machine learning model to predict loan approval rate. My dataset includes features like:
Credit_History
...
5
votes
1
answer
81
views
"Singular values of x" in LinearRegression
LinearRegression has an attribute singular_ which returns "singular values of x". According to a definition I found: "singularity is ... when a ...
4
votes
0
answers
79
views
Why is DecisionTree using same feature and same condition twice
When trying to fit scikit-learn DecisionTreeClassifier on my data, I am observing some weird behavior.
x[54] (a boolan feature) ...
4
votes
0
answers
27
views
Low Accuracy from Geospatial Random forest ML modeling problem - Training Exported from qGIS, SCP
I am doing a geospatial assessment integrated with ML modeling. The problem is the very low accuracy percentage, as more training features increases, it gets lower. What could be the solution to such ...
1
vote
0
answers
38
views
Isolation Forest sample size
I am using sklearn's Isolation Forest as a model to detect anomalies. My dataset is relatively small, 50 records with only 2-3 features.
To prevent any overfitting, what would you recommend to tune ...