Skip to main content

All Questions

2 votes
1 answer
38 views

How to fit scaler for different subsets of rows depending on group variable and include it in a Pipeline?

I have a data set like the following and want to scale the data using any of the scalers in sklearn.preprocessing. Is there an easy way to fit this scaler not over the whole data set, but per group? ...
ascripter's user avatar
  • 6,265
1 vote
1 answer
56 views

How to apply different model on different rows of a pandas dataframe?

I have a pandas dataframe that looks like this: import pandas as pd df = pd.DataFrame({'id': [1,2], 'var1': [5,6], 'var2': [20,60], 'var3': [8, -2], 'model_version': ['model_a', 'model_b']}) I have 2 ...
quant's user avatar
  • 4,492
-1 votes
1 answer
48 views

Error in Pipeline code in ScikitLearn using Python

In below code of pipeline. Even though i have encoded the sex column, i am getting string to float error. from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from ...
Abubakker Hashmi's user avatar
2 votes
0 answers
387 views

Model Training for Segmentation [duplicate]

I want to train and evaluate models to find the best models for my segments, but sklearn is having something go wrong with the tags and the estimators, and I can't figure out the issue. There might be ...
Sdeb's user avatar
  • 21
1 vote
1 answer
204 views

Ignore NaN to calculate mean_absolute_error

I'm trying to calculate MAE (Mean absolute error). In my original DataFrame, I have 1826 rows and 3 columns. I'm using columns 2 and 3 to calculate MAE. But, in column 2, I have some NaN values. When ...
Daniel M M's user avatar
-2 votes
1 answer
87 views

Cannot convert dataframe column to a int64 data type

I have a problem. In my Pandas DataFrame, I have a column called 'job' column. I've created a simple and custom transformer that will map values in that column that corresponds to the type of job. The ...
coffee_programmer's user avatar
0 votes
1 answer
207 views

How to create a scaler applying log transformation and MinMaxScaler in sklearn

I want to apply log() to my DataFrame and MinMaxScaler() together. I want the output to be a pandas DataFrame() with indexes and columns from the original data. I want to use the parameters used to ...
Guilherme Parreira's user avatar
3 votes
2 answers
128 views

How to preserve data types when working with pandas and sklearn transformers?

While working with a large sklearn Pipeline (fit using a DataFrame) I ran into an error that lead back to a wrong data type of my input. The problem occurred on an a single observation coming from an ...
Woodly0's user avatar
  • 468
-1 votes
1 answer
79 views

How can I achieve accurate imputation of missing values in a dataset?

I'm working with a dataset containing details about used cars, and I've encountered several missing values in the Fuel_Type column. The possible values include 'Gasoline', 'E85 Flex Fuel', 'Hybrid', '...
user27500319's user avatar
0 votes
1 answer
254 views

How do I convert string data to numerical data using Label Encoder?

I was trying to convert string data into numerical data in a CSV excel sheet. It kept giving me an error about previously unseen labels, so I searched it up and found that we can use Label Encoder to ...
Kevin Phillips's user avatar
-1 votes
1 answer
123 views

How to Optimize Memory Usage for Cross-Validation of Large Datasets

I have a very large DF (~200GB) of features that I want to perform cross validation on a random forest model with these features. The features are from a huggingface model in the form of a .arrow file....
youtube's user avatar
  • 504
0 votes
1 answer
42 views

Error get_features_name_out in getting back the feature name

I want to know the feature importance to my data, so I use permutation_importance. When I get the result, it seems the feature already decoded, and I want to know the name of my feauture using ...
statsbeginner's user avatar
1 vote
2 answers
202 views

Convert Pandas dataframe of objects to a dataframe of vectors

I have a Pandas dataframe (over 1k of rows). There are numbers, objects, strings, and Boolean values in my dataframe. I want to convert each 'cell' of the dataframe to a vector, and work with the ...
Tavi's user avatar
  • 13
3 votes
2 answers
85 views

How can I link the records in the training dataset to the corresponding model predictions?

Using scikit-learn, I've set up a regression model to predict customers' maximum spend per transaction. The dataset I'm using looks a bit like this; the target column is maximum spend per transaction ...
SRJCoding's user avatar
  • 475
1 vote
1 answer
58 views

How to save single Random Forest model with cross validation?

I am using 10 fold cross validation, trying to predict binary labels (Y) based on the embedding inputs (X). I want to save one of the models (perhaps the one with the highest ROC AUC). I'm not sure ...
youtube's user avatar
  • 504

15 30 50 per page
1
2 3 4 5
190