Skip to main content

Questions tagged [data-imputation]

Data imputation is the process of replacing missing data with substituted values. This could involve statistically representative data filling (e.g. local averages) or simply replacing the missing data with encoded values (e.g. replace NaNs with zeros).

8 votes
2 answers
87 views

I'm working with some road traffic accident data and would appreciate advice on how to handle a structured case of missing values. For context, the features involved are: ...
ba5s85's user avatar
  • 83
4 votes
0 answers
46 views

I am working with vehicle registration data from website . The website provides counts for various combinations of vehicle attributes such as Maker, RTO, Fuel, Category, SubCategory, and Emission. ...
Guru Moorthy's user avatar
0 votes
0 answers
79 views

Years ago, I read in the paper that they proposed a K-means-based approach to impute missing values over energy time data. At the point in time, since I did not have access to that data, I tried to ...
Mario's user avatar
  • 645
3 votes
2 answers
132 views

I'm working on a project using an E-commerce dataset. I'm facing an issue in the data cleaning stage. I have the customers dataset, which has approximately. 1.6 million rows. One of the feature, "...
Mohd Yasser's user avatar
4 votes
1 answer
73 views

When dealing with missing data in categorical variables, common approaches include imputation by mode or predictive models. However, in some cases, certain categories have extremely low frequency or ...
Celine Yvone's user avatar
0 votes
0 answers
51 views

I have a set of non stationary data; where certain features do not have a value. If this is the case, during imputation of these features do I need to ensure that I only use previous data to generate ...
user54565's user avatar
  • 115
0 votes
0 answers
36 views

I have a full dataset and introduce some missingness by one of these type (MCAR,MAR,MNAR) then split data in to train and test dataset after that I impute missing values by using different ML ...
zhyan's user avatar
  • 101
5 votes
1 answer
60 views

I have a dataset of a retailer that has the following attributes Date, Hour, Enters, Exits I have another dataset with the same attributes of a competitor that is correlated with the original dataset ...
utink's user avatar
  • 51
0 votes
1 answer
32 views

I have mortality and nutritional data for countries, the mortality data is full for every year but the nutritional data is very limited maybe 2 or 3 years of nutritional data within a 40 year period ...
Jet's user avatar
  • 1
0 votes
1 answer
78 views

I have a dataset of say 1 million observations. As a silly example, say we want to predict if a person can become a data scientist or not (0/1). I have variables that have a lot of missing values but ...
Kilkik's user avatar
  • 101
1 vote
0 answers
43 views

I have data with a lot of NaNs: ...
Silvio sjsj's user avatar
2 votes
0 answers
41 views

I have a project where I am predicting the best schools based on a series of tests scores, teacher attendance rates, etc. I would like to predict the best school to go to. Some of the data is of ...
Englishman Bob's user avatar
0 votes
1 answer
84 views

We have a few variables that are highly predictive in our modeling task. Is it sound to train models with a superset of features even though some are known NOT to be available at predict time? & ...
eliangius's user avatar
  • 381
2 votes
1 answer
80 views

I am using the IterativeImputer from sklearn and I notice that it changes the data shape. Initially I have an (X,5) array where all columns except for the last one contain the missing value (which has ...
gmaravel's user avatar
  • 121
2 votes
0 answers
93 views

I am working in R, and have a data set which has a few metabolite concentration values as continuous variables. Anywhere that the concentration was too low to be detected it simply says <LOD. This ...
KLN-RDN's user avatar
  • 21

15 30 50 per page
1
2 3 4 5
9