What is the difference between EDA, Feature Engineering, and Preprocessing?
The main purpose is to make the raw data suitable for modeling. In EDA, we are cleaning the data and so does the preprocessing. Wheras in FE, we are scaling and imputing.
EDA(Exploratory Data Analysis) as suggested by the name is an initial analysis of the data. Understanding the distributions, getting an idea of the kind of values and their range. It's getting a feel of the data before further analysis and understanding the nature of it. This would ideally give you an idea of the kind of preprocessing it would require which comes after EDA.
Preprocessing is the next step which then includes its steps to make the data fit for your models and further analysis. EDA and preprocessing might overlap in some cases.
Feature engineering is identifying and extracting features from the data, understanding the factors the decisions and predictions would be based on.
I assume you are talking about a Machine Learning application. I like to think about the distinction in terms of at what point a model is being trained:
EDA - no model trained yet, just exploring the data to see if there are potential problems in the dataset (outliers, mislabeled data, unwanted correlations between variables/samples, etc).
Preprocessing - the steps required to go from raw data to a format suitable to input to your ML model. For say a linear/logistic regression model, this would mean the input data needs to be converted to vector format (eg. imputing missing values, one-hot encoding categorical variables, etc). After preprocessing, you could train a model on the dataset.
Feature engineering - now that you have the data in a format where model can be trained, train model and see what happens. After that, start trying out ideas to transform the data values into a better representation such that the model can more easily learn to output accurate predictions. Here you may train many different versions of your model on differently transformed datasets, the goal is to produce the most accurate model you can by transforming the data values (eg. re-scaling numeric features, creating interaction terms, etc). The types of transformations considered during feature engineering are often inspired by discoveries made during the EDA phase of inspecting the dataset.
Let's Start With Feature Engineering, Feature Engineering is a broader concept that include EDA & Preprocessing. Feature Engineering (FE) is the process of selecting, manipulating and transforming raw data columns into features, these features can be used to derive insights or fed to a machine learning model.
EDA (Exploratory Data Analysis) is the initial relationship with data, where you try to understand and investigate Data before making any assumptions,EDA tasks include identifying obvious errors, understanding patterns within the data, detect outliers or anomalous events, find interesting relations among the variables along with Data Visualizations to detect the prior properties
During EDA, you have found some errors and anomalies, here comes the Preprocessing, where you Process the data to the right, desired format, Preprocessing may include, Handling missing values, Dealing with Outliers and more. these tasks are "manipulating data" and this is already mentioned in the formal definition of Feature Engineering
So Wrapping Up, any change or manipulation in a Data to serve a purpose, increase accuracy and data quality, etc. is called Feature Engineering