Questions tagged [outliers]
An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.
1,383 questions
5
votes
2
answers
477
views
Extreme outlier in real data
I'm looking at the amount of carbon in seven forest pools. For dead trees left on the landscape across many locations and over several harvest retention (logging) treatments, there is an extreme value ...
0
votes
0
answers
42
views
Winsorizing outliers across multiple analyses: once or multiple times? (SPSS)
I have a 2×2 experimental design with four conditions and eight outcome variables. I’m supposed to winsorize outliers, but I’m confused about how many times this needs to be done because I’m ...
1
vote
2
answers
275
views
outlier detection in classification
I am curious if there are any methods of outlier detection [read: NOT high leverage point detection] that be used in classification problems without fitting a model.
As I understand it, some commonly ...
1
vote
0
answers
34
views
How to assign an observation to a group but include an out-group option?
I have collected data from a number of known groups, and from individuals that I would like to assign to a group but may be from an unknown group.
For simplicity's sake, I have created an example with ...
5
votes
3
answers
533
views
How to handle outliers when some predictors perform better with them and others without
I’m working on a project where I need to build a predictive model for wine quality based on its chemical properties. The goal is to find which features best explain or predict the quality score.
I’ve ...
8
votes
4
answers
1k
views
Should I transform my data before or after removing outliers? (Highly skewed cortisol example)
I am analyzing cortisol data collected over multiple days, with three samples per day (Cortisol_1, Cortisol_2, Cortisol_3). My data are extremely skewed:
Skewness of Cortisol_1: 26.3
Skewness of ...
2
votes
0
answers
30
views
Hypothesis testing for a weekly seasonal effect in the presence of outliers
Suppose that I have a time series where the mean usually changes smoothly over time, and I want a hypothesis test for whether there is a weekly seasonal pattern to the data. The time series also ...
0
votes
0
answers
65
views
A simple-ish way of estimating the number of modes, and the 'pronounced'-ness of said modes of a discrete, finite distribution
Intuitively, let's say we're given a price $p$ for some product, and we want to compare the prices with what's available on the market (ex: to determine if we're being ripped off or not).
We come back ...
0
votes
0
answers
66
views
What does iteration in sigma clipping do
If I only want the high-SNR data, I do sigma-clipping to an array.
As this link says
Suppose you have a set of data. Compute its median m and its standard deviation ...
8
votes
1
answer
378
views
Does the presence of outliers always mean that robust regression analysis should be used?
I revised my question to be more specific, as suggested by the community. Since my knowledge of statistics is limited, I'm not entirely sure what it means to specialize in this subject—but I'll give ...
3
votes
2
answers
124
views
How to test if a single value in a set of values is higher than the remaining values
I have a set of $8$ participants $P_1, \ldots P_8$. Each participant takes two tasks $A$ and $B$, and each task results in an ordered vector of $6$ positive values. I'll denote the vector recorded ...
0
votes
0
answers
68
views
Should varIdent be used in a linear model with outliers in nlme in R
I am unsure whether/how to use varIdent from the nlme package to allow different variances across factor levels when analysing a dataset which has outliers.
I am specifically interested in mixed ...
3
votes
1
answer
167
views
What is the difference between Theil-sen estimator and Repeated median regression?
I am currently learning about robust regression and came across two variants: the Theil–Sen estimator and Repeated Median Regression. However, I got confused when comparing these two algorithms. Both ...
6
votes
1
answer
214
views
What regression method should I use for non-normal, outlier-heavy biomedical data with a continuous outcome?
I'm working with a large dataset of about 50,000 patients and trying to understand how protein expression levels influence erythrocyte (red blood cell) counts. The outcome variable — erythrocyte count ...
5
votes
1
answer
282
views
Moderation analysis assumption: univariate outliers after centering
I am conducting a moderation analysis for my thesis and am performing assumption testing.
I found a few univariate outliers and transformed any scores that were z-score of > (-)3.29. I then ...