Questions tagged [exploratory-data-analysis]
EDA stands for "Exploratory data analysis". Developed by Tukey to contrast with Confirmatory Data Analysis or CDA (the formal testing of hypotheses). EDA is typically concerned with describing data numerically and graphically to make the data easier to understand and to yield new insights.
333 questions
0
votes
0
answers
26
views
What data mining freeware is available that replicates SAS EMiner's interactive Decision Tree node?
Its 2025, and yes I'm still using SAS EMiner's Decision Tree..... If anyone knows a modern freeware version that replicates the Interactive mode effectively (with controlling split cutoff values, a ...
0
votes
1
answer
51
views
Guidance for communicating insights to inform breakdown companies how to assess breakdown risk [closed]
I come from a machine learning background, however I am trying to learn more traditional data science. I have a dataset of vehicles and the target is the Breakdown Likelihood (1 to 3, 1 being lowest), ...
3
votes
1
answer
32
views
Why are all my tuned models (DT, GB, SVM) plateauing at ~70% F1 after rigorous data cleaning and feature engineering?
I'm working on a classification problem where the goal is to maximize the F1-score, hopefully above 80%. Despite a very thorough EDA and preprocessing workflow, I've hit a hard performance ceiling ...
4
votes
5
answers
708
views
How Do Quartiles Help Us Understand a Dataset?
It’s confusing to understand how quartile values can actually be used to give insights into a dataset. Please assist with examples. I struggle to interpret the values in the context of providing ...
0
votes
0
answers
36
views
Correlation Analysis prior to PCA [duplicate]
So, I have a general question regarding PCA. As far as I understand, before performing PCA you are supposed to perform a correlation analysis between the features so that redundant features can be ...
1
vote
1
answer
78
views
Laavan fit measures [closed]
I have done an Exploratory Factory Analysis. I want fit mesures of the model. I am on JASP and Jamovi. I need Goodness-of-Fit Index (GFI), Ajusted GFI (AGFI) and Normed Fit Index (NFI). I tried SEM ...
2
votes
1
answer
178
views
Factor Analysis: Theoretically Five Factors Expected, but Only One Emerges – What Should I Do?
I am currently conducting a factor analysis on a scale that theoretically consists of five factors. However, both Principal Component Analysis (PCA) and Maximum Likelihood (ML) extraction methods in ...
0
votes
0
answers
41
views
Is analyzing test scores a clustering problem or an EDA problem?
I have a dataset of 28 personality assessment features, which measures personality attributes like Diligence or Sociability to determine performance in the corporate workplace. I'm tasked with ...
0
votes
0
answers
89
views
association and correlation matrix
For ratio scale data it is relatively simple to create and visualize a correlation matrix e.g. as shown below. Ho can I do the same for a data frame that contains also nominal scale data? I would like ...
14
votes
3
answers
794
views
Visual assessment of scatterplots acceptable?
I have a fairly basic question about analyzing a dataset of measurements taken on a number of fish, which I’m doing as part of a student project. So I have measurements of four species of fish of ...
8
votes
1
answer
541
views
Regression techniques for a “triangular” scatterplot
I am doing a regression analysis of environmental data, and I encounter some rather specific relationships between my predictors and the response variable. I am doubtful that a simple linear ...
0
votes
0
answers
99
views
Which statistical model will be best for this data?
I'm trying to identify the relationship between the dependent variable and the independent variables. I've utilized linear regression, but I'm not sure if it's suitable given the distribution of my ...
1
vote
0
answers
59
views
Valid forms of exploratory data analysis for time series that don't assume stationarity?
Lets say we are given a time series sample and want to try to create a model to forecast future values of said time series
When trying to build a model to forecast time series data, many statistics ...
3
votes
1
answer
152
views
Exploratory analysis to find out characteristics of low scorers
I'm currently looking at three specific questions of a feedback survey and have been tasked with finding out the characteristics of the lowest scorers, to see if there are any patterns or common ...
0
votes
0
answers
104
views
Outliers in EDA - With or without?
I'm trying to carry out my first EDA on a Student Performance dataset. The dataset has 395 samples and consists of 33 attributes. After drawing the boxplots and doing some tests I detected outliers in ...