Newest 'data-imputation' Questions - Data Science Stack Exchange

8 votes

2 answers

87 views

Best practice for handling structured missing data

I'm working with some road traffic accident data and would appreciate advice on how to handle a structured case of missing values. For context, the features involved are: ...

ba5s85

83

asked Jan 6 at 16:54

4 votes

0 answers

46 views

Estimating Final Vehicle Counts from Pairwise Marginals Using Python

I am working with vehicle registration data from website . The website provides counts for various combinations of vehicle attributes such as Maker, RTO, Fuel, Category, SubCategory, and Emission. ...

Guru Moorthy

41

asked Oct 10, 2025 at 13:59

0 votes

0 answers

79 views

What is the best practice to impute missing data with patterns over the time? (potential of K-means clustering for imputation of missing values!?)

Years ago, I read in the paper that they proposed a K-means-based approach to impute missing values over energy time data. At the point in time, since I did not have access to that data, I tried to ...

Mario

645

asked Sep 4, 2025 at 19:41

3 votes

2 answers

132 views

How do i fill the Null values of a categorical column?

I'm working on a project using an E-commerce dataset. I'm facing an issue in the data cleaning stage. I have the customers dataset, which has approximately. 1.6 million rows. One of the feature, "...

Mohd Yasser

31

asked Apr 7, 2025 at 15:38

4 votes

1 answer

73 views

How do outliers affect the process of imputing missing data in categorical variables?

When dealing with missing data in categorical variables, common approaches include imputation by mode or predictive models. However, in some cases, certain categories have extremely low frequency or ...

Celine Yvone

441

asked Mar 18, 2025 at 22:11

0 votes

0 answers

51 views

Imputing on Temporal Data

I have a set of non stationary data; where certain features do not have a value. If this is the case, during imputation of these features do I need to ensure that I only use previous data to generate ...

user54565

115

asked Mar 11, 2025 at 22:23

0 votes

0 answers

36 views

How to compare between different ML models for imputation ,If I split data in to train and test dataset?

I have a full dataset and introduce some missingness by one of these type (MCAR,MAR,MNAR) then split data in to train and test dataset after that I impute missing values by using different ML ...

zhyan

101

asked Mar 1, 2025 at 23:32

5 votes

1 answer

60 views

Looking to replace missing time series values with values from a competitor that's correlated

I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...

utink

51

asked Jan 28, 2025 at 3:23

0 votes

1 answer

32 views

Would imputing using the target variable then analysing correlation between variables be bad due to bias

I have mortality and nutritional data for countries, the mortality data is full for every year but the nutritional data is very limited maybe 2 or 3 years of nutritional data within a 40 year period ...

Jet

1

asked Nov 6, 2024 at 2:15

0 votes

1 answer

78 views

Filling a lot of missing values with arbitrary value

I have a dataset of say 1 million observations. As a silly example, say we want to predict if a person can become a data scientist or not (0/1). I have variables that have a lot of missing values but ...

Kilkik

101

asked Oct 25, 2024 at 15:13

1 vote

0 answers

43 views

Filling NaNs by mode

I have data with a lot of NaNs: ...

Silvio sjsj

43

asked Aug 7, 2024 at 17:12

2 votes

0 answers

41 views

Should Imputation Models be Cross Validated

I have a project where I am predicting the best schools based on a series of tests scores, teacher attendance rates, etc. I would like to predict the best school to go to. Some of the data is of ...

Englishman Bob

133

asked Jul 30, 2024 at 16:40

0 votes

1 answer

84 views

handling predictions with optional or missing features

We have a few variables that are highly predictive in our modeling task. Is it sound to train models with a superset of features even though some are known NOT to be available at predict time? & ...

eliangius

381

asked Apr 4, 2024 at 22:34

2 votes

1 answer

80 views

Change of data shape when using IterativeImputer from sklearn

I am using the IterativeImputer from sklearn and I notice that it changes the data shape. Initially I have an (X,5) array where all columns except for the last one contain the missing value (which has ...

gmaravel

121

asked Apr 1, 2024 at 15:14

2 votes

0 answers

93 views

Best practices for handling "NA" when all NA values exist due to being below the limit of detection?

I am working in R, and have a data set which has a few metabolite concentration values as continuous variables. Anywhere that the concentration was too low to be detected it simply says <LOD. This ...

KLN-RDN

21

asked Mar 7, 2024 at 15:52

Stack Exchange Network

Questions tagged [data-imputation]

Best practice for handling structured missing data

Estimating Final Vehicle Counts from Pairwise Marginals Using Python

What is the best practice to impute missing data with patterns over the time? (potential of K-means clustering for imputation of missing values!?)

How do i fill the Null values of a categorical column?

How do outliers affect the process of imputing missing data in categorical variables?

Imputing on Temporal Data

How to compare between different ML models for imputation ,If I split data in to train and test dataset?

Looking to replace missing time series values with values from a competitor that's correlated

Would imputing using the target variable then analysing correlation between variables be bad due to bias

Filling a lot of missing values with arbitrary value

Filling NaNs by mode

Should Imputation Models be Cross Validated

handling predictions with optional or missing features

Change of data shape when using IterativeImputer from sklearn

Best practices for handling "NA" when all NA values exist due to being below the limit of detection?

Hot Network Questions