Preprocessing , EDA , and Feature Engineering

Question

What is the difference between EDA, Feature Engineering, and Preprocessing?

The main purpose is to make the raw data suitable for modeling. In EDA, we are cleaning the data and so does the preprocessing. Wheras in FE, we are scaling and imputing.

Palak Bansal · Accepted Answer · 2021-08-17 12:04:31Z

EDA(Exploratory Data Analysis) as suggested by the name is an initial analysis of the data. Understanding the distributions, getting an idea of the kind of values and their range. It's getting a feel of the data before further analysis and understanding the nature of it. This would ideally give you an idea of the kind of preprocessing it would require which comes after EDA.

Preprocessing is the next step which then includes its steps to make the data fit for your models and further analysis. EDA and preprocessing might overlap in some cases.

Feature engineering is identifying and extracting features from the data, understanding the factors the decisions and predictions would be based on.

Jonas Mueller · Accepted Answer · 2023-01-10 03:25:35Z

I assume you are talking about a Machine Learning application. I like to think about the distinction in terms of at what point a model is being trained:

EDA - no model trained yet, just exploring the data to see if there are potential problems in the dataset (outliers, mislabeled data, unwanted correlations between variables/samples, etc).

Preprocessing - the steps required to go from raw data to a format suitable to input to your ML model. For say a linear/logistic regression model, this would mean the input data needs to be converted to vector format (eg. imputing missing values, one-hot encoding categorical variables, etc). After preprocessing, you could train a model on the dataset.

Feature engineering - now that you have the data in a format where model can be trained, train model and see what happens. After that, start trying out ideas to transform the data values into a better representation such that the model can more easily learn to output accurate predictions. Here you may train many different versions of your model on differently transformed datasets, the goal is to produce the most accurate model you can by transforming the data values (eg. re-scaling numeric features, creating interaction terms, etc). The types of transformations considered during feature engineering are often inspired by discoveries made during the EDA phase of inspecting the dataset.

Shehab magdy · Accepted Answer · 2025-04-13 15:05:31Z

Let's Start With Feature Engineering, Feature Engineering is a broader concept that include EDA & Preprocessing. Feature Engineering (FE) is the process of selecting, manipulating and transforming raw data columns into features, these features can be used to derive insights or fed to a machine learning model.

But what is EDA & Preprocessing Exactly?

EDA (Exploratory Data Analysis) is the initial relationship with data, where you try to understand and investigate Data before making any assumptions,EDA tasks include identifying obvious errors, understanding patterns within the data, detect outliers or anomalous events, find interesting relations among the variables along with Data Visualizations to detect the prior properties

During EDA, you have found some errors and anomalies, here comes the Preprocessing, where you Process the data to the right, desired format, Preprocessing may include, Handling missing values, Dealing with Outliers and more. these tasks are "manipulating data" and this is already mentioned in the formal definition of Feature Engineering

So Wrapping Up, any change or manipulation in a Data to serve a purpose, increase accuracy and data quality, etc. is called Feature Engineering

Stack Exchange Network

Preprocessing , EDA , and Feature Engineering

3 Answers 3

But what is EDA & Preprocessing Exactly?

Hot Network Questions

Preprocessing , EDA , and Feature Engineering

3 Answers 3

But what is EDA & Preprocessing Exactly?

Related

Hot Network Questions