Newest 'python+scikit-learn+pandas' Questions

2 votes

1 answer

38 views

How to fit scaler for different subsets of rows depending on group variable and include it in a Pipeline?

I have a data set like the following and want to scale the data using any of the scalers in sklearn.preprocessing. Is there an easy way to fit this scaler not over the whole data set, but per group? ...

ascripter

6,265

asked Apr 16 at 14:58

1 vote

1 answer

56 views

How to apply different model on different rows of a pandas dataframe?

I have a pandas dataframe that looks like this: import pandas as pd df = pd.DataFrame({'id': [1,2], 'var1': [5,6], 'var2': [20,60], 'var3': [8, -2], 'model_version': ['model_a', 'model_b']}) I have 2 ...

quant

4,492

asked Feb 20 at 10:38

-1 votes

1 answer

48 views

Error in Pipeline code in ScikitLearn using Python

In below code of pipeline. Even though i have encoded the sex column, i am getting string to float error. from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from ...

Abubakker Hashmi

9

asked Jan 19 at 13:10

2 votes

0 answers

387 views

Model Training for Segmentation [duplicate]

I want to train and evaluate models to find the best models for my segments, but sklearn is having something go wrong with the tags and the estimators, and I can't figure out the issue. There might be ...

Sdeb

21

asked Dec 18, 2024 at 1:17

1 vote

1 answer

204 views

Ignore NaN to calculate mean_absolute_error

I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When ...

Daniel M M

75

asked Nov 12, 2024 at 19:20

-2 votes

1 answer

87 views

Cannot convert dataframe column to a int64 data type

I have a problem. In my Pandas DataFrame, I have a column called 'job' column. I've created a simple and custom transformer that will map values in that column that corresponds to the type of job. The ...

coffee_programmer

1

asked Nov 11, 2024 at 2:00

0 votes

1 answer

207 views

How to create a scaler applying log transformation and MinMaxScaler in sklearn

I want to apply log() to my DataFrame and MinMaxScaler() together. I want the output to be a pandas DataFrame() with indexes and columns from the original data. I want to use the parameters used to ...

Guilherme Parreira

1,051

asked Nov 7, 2024 at 18:41

3 votes

2 answers

128 views

How to preserve data types when working with pandas and sklearn transformers?

While working with a large sklearn Pipeline (fit using a DataFrame) I ran into an error that lead back to a wrong data type of my input. The problem occurred on an a single observation coming from an ...

Woodly0

468

asked Oct 23, 2024 at 13:01

-1 votes

1 answer

79 views

How can I achieve accurate imputation of missing values in a dataset?

I'm working with a dataset containing details about used cars, and I've encountered several missing values in the Fuel_Type column. The possible values include 'Gasoline', 'E85 Flex Fuel', 'Hybrid', '...

user27500319

1

asked Sep 27, 2024 at 7:15

0 votes

1 answer

254 views

How do I convert string data to numerical data using Label Encoder?

I was trying to convert string data into numerical data in a CSV excel sheet. It kept giving me an error about previously unseen labels, so I searched it up and found that we can use Label Encoder to ...

Kevin Phillips

1

asked Sep 2, 2024 at 12:53

-1 votes

1 answer

123 views

How to Optimize Memory Usage for Cross-Validation of Large Datasets

I have a very large DF (~200GB) of features that I want to perform cross validation on a random forest model with these features. The features are from a huggingface model in the form of a .arrow file....

youtube

504

asked Aug 18, 2024 at 5:13

0 votes

1 answer

42 views

Error get_features_name_out in getting back the feature name

I want to know the feature importance to my data, so I use permutation_importance. When I get the result, it seems the feature already decoded, and I want to know the name of my feauture using ...

statsbeginner

3

asked Aug 15, 2024 at 8:31

1 vote

2 answers

202 views

Convert Pandas dataframe of objects to a dataframe of vectors

I have a Pandas dataframe (over 1k of rows). There are numbers, objects, strings, and Boolean values in my dataframe. I want to convert each 'cell' of the dataframe to a vector, and work with the ...

Tavi

13

asked Aug 7, 2024 at 20:39

3 votes

2 answers

85 views

How can I link the records in the training dataset to the corresponding model predictions?

Using scikit-learn, I've set up a regression model to predict customers' maximum spend per transaction. The dataset I'm using looks a bit like this; the target column is maximum spend per transaction ...

SRJCoding

475

asked Jul 31, 2024 at 12:00

1 vote

1 answer

58 views

How to save single Random Forest model with cross validation?

I am using 10 fold cross validation, trying to predict binary labels (Y) based on the embedding inputs (X). I want to save one of the models (perhaps the one with the highest ROC AUC). I'm not sure ...

youtube

504

asked Jul 2, 2024 at 21:02

Collectives™ on Stack Overflow

All Questions

How to fit scaler for different subsets of rows depending on group variable and include it in a Pipeline?

How to apply different model on different rows of a pandas dataframe?

Error in Pipeline code in ScikitLearn using Python

Model Training for Segmentation [duplicate]

Ignore NaN to calculate mean_absolute_error

Cannot convert dataframe column to a int64 data type

How to create a scaler applying log transformation and MinMaxScaler in sklearn

How to preserve data types when working with pandas and sklearn transformers?

How can I achieve accurate imputation of missing values in a dataset?

How do I convert string data to numerical data using Label Encoder?

How to Optimize Memory Usage for Cross-Validation of Large Datasets

Error get_features_name_out in getting back the feature name

Convert Pandas dataframe of objects to a dataframe of vectors

How can I link the records in the training dataset to the corresponding model predictions?

How to save single Random Forest model with cross validation?

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags