6
$\begingroup$

I would like to clarify the meaning of this question.

If Z is a standardized random variable, which of the following statements is correct?

A) Its distribution is always Normal.

B) We always have E(Z²) = 1.

C) Mean and variance cannot be determined without additional information.

D) It cannot be a continuous random variable.

In my understanding, standardization implies mean 0 and variance 1, but not necessarily normality. Is option B therefore the correct one?

$\endgroup$
3
  • 2
    $\begingroup$ At least read the tag info before adding; how is mathematica relevant here? $\endgroup$ Commented Jan 27 at 13:27
  • 2
    $\begingroup$ What's the relationship between $E(Z^2)$ and mean and variance? $\endgroup$ Commented Jan 27 at 21:24
  • 1
    $\begingroup$ Here as often in statistics an injection of simple ideas of dimensional analysis would help a little. Almost 50 years David Finney argued this, and so far as I can see the situation has not much improved since then. Perhaps it's true that the majority of statistical people came through mathematics and not through some field in which thinking about units and dimensions is second nature. Either way, the reference to Finney's paper, a link, and much fruitful discussion can be found at stats.stackexchange.com/questions/604589/… $\endgroup$ Commented Jan 28 at 9:49

3 Answers 3

6
$\begingroup$

The correct answer is whatever your textbook or course instructor defines as a “standardized” random variable, so you need to look up the working definition for your course.

It is quite common for $Z$ to denote a standard normal random variable, $N(0,1)$, which could explain why your classmates are selecting the first option, but that is not what I would mean by standardized, which is that the mean has been subtracted out (zero mean) and then divide through by the variance (variance of one). In that case, the correct answer would be $B$ by squaring the mean and adding the variance. I suspect this is the “full-credit” answer. However, standardization could involve dividing by the range of a data set, in which case, the second moment would not have to be $1$ under every circumstance.

$\endgroup$
3
  • 2
    $\begingroup$ +1. The main takeaway is that the term "standardization" has no single commonly accepted meaning in statistics, so whenever it is used, one should explain what exactly is being done. (Problems arise when people believe that "their" notion of standardization - be it remapping to mean 0 and variance 1, or to the interval $[0,1]$ - is "the correct one".) $\endgroup$ Commented Jan 27 at 15:02
  • 2
    $\begingroup$ This is to me has the explanation muddled. The standardization $z := $ (value $-$ mean) / SD has as results a variable that is unit-free and dimensionless; the mean and SD of $z$ being 0 and 1 -- and variance also then being 1 by the elementary fact $1^2 = 1$. But initial division by variance would defy the principles of dimensional analysis, for a start. $\endgroup$ Commented Jan 28 at 9:35
  • $\begingroup$ I can recall direct, indirect and marginal standardization as methods for scaling tables, but context rules that out. I've also seen scalings like (value $-$ median) / IQR but that is so unusual that it would surely need explanation every time it is used. (It has a big disadvantage in that the IQR can easily be zero.) $\endgroup$ Commented Jan 28 at 9:37
6
$\begingroup$

Standardization typically means rescaling a random variable $X$ as $$Z=\dfrac {(X-\mu_X)} {\sigma_X}$$ This is to give the standardized distribution $Z$ a mean of 0, and a standard deviation of 1 ($\mu_Z=0, \sigma_Z=1$). This is done to give us a common "yardstick" to compare/describe different distributions. See e.g. here.

There may be other definitions (e.g. rescaling to $[0,1]$, or even to $[0,100]$ -percentages), but I would argue that these definitions are inadvisable, as using the term standardized is confusing, and instead the term rescaled should be used.

Note that one can standardize any distribution (as long as it has a finite mean and standard deviation). One can of course standardize the normal distribution, but also a uniform one, an exponentional, a log-normal, etc. But one can not rescale all distributions, say, to $[0,1]$, certainly not a normal one, the support of which is $[-\infty,+\infty]$, or an exponential, etc. So this is another reason why I would call different definitions of "standardized" inadvisable.

Now, the reason classmates picked A) is the use of $Z$ as the name of the variable, because the standard normal distribution is almost always called $Z$. But not all standardized distributions called $Z$ need to be standard normal. Below is a standardized exponential distribution. Standardized exponential

So, the correct answer is B). And the "giveaway" is the word always; it implies that, regardless of the original distribution of $X$, we will have $${\sigma_Z}^2=E[(Z-\mu_Z)^2]=E[(Z-0)^2]=E[Z^2]=1$$ because standardization implies ${\sigma_Z}^2=\sigma_z=1$.

$\endgroup$
4
  • 1
    $\begingroup$ A nuance here: calling definitions you dislike or discommend incorrect is to me a little strong; perhaps a better word would be inadvisable. Or say that to use standardized in the ways you deplore would certainly be non-standard. I would want to add that value / mean can be useful too on occasion. (+1 for what I see as a better explanation of the definition than other answers so far, and for pointing to the notion that scaling is a linked idea.) $\endgroup$ Commented Jan 28 at 9:43
  • 1
    $\begingroup$ Incidentally scaling an exponential to have minimum value of $-1$ is something I can't imagine being useful, but I could easily be missing something. $\endgroup$ Commented Jan 28 at 10:33
  • 1
    $\begingroup$ Minor quibble: it's not true that "one can standardize any distribution", since there are distributions that lack a well-defined mean and/or variance. (A popular example is the Cauchy distribution, which appears e.g. as the distribution of the ratio of two independent standard normally distributed random variables.) $\endgroup$ Commented Jan 28 at 16:20
  • $\begingroup$ @IlmariKaronen, true enough! "Well behaved" distributions is more accurate. Thanks for the correction $\endgroup$ Commented Jan 28 at 16:55
5
$\begingroup$

Standardisation usually refers to rescaling so that mean = 0 and variance = 1, but it does not necessarily require that. To be precise, that would be called z-standardisation. So I think the best most accurate answer is C: Mean and variance cannot be determined without additional information.

But as Dave indicates in his answer, this is one of those terms that gets defined inconsistently and people are likely to have different opinions based on their experience. So what is deemed correct will depend on the opinion of your instructor.

EDIT: Our tag guidance agrees with my interpretation above. But I'm now persuaded (see other answers as well as comments by Nick Cox) that at this level of training, B would be the most sensible answer for you to give.

$\endgroup$
6
  • $\begingroup$ Thanks for the explanations! One thing I still don’t understand: why did many of my classmates choose option A (that the distribution is always Normal)? Is there a common misunderstanding behind that? $\endgroup$ Commented Jan 27 at 13:33
  • $\begingroup$ @Cristina No idea, you'd have to ask them their reasoning. $\endgroup$ Commented Jan 27 at 13:38
  • $\begingroup$ Please see also my comment under @Dave's answer. $\endgroup$ Commented Jan 28 at 9:38
  • $\begingroup$ @NickCox Thanks, I've read your comments and agree. Did you also mean to imply that any standardisation other than (value - mean) / SD is so non-standard that it would require clarification, and so B is the only plausible correct answer here? I'm sympathetic to that argument, to be honest. $\endgroup$ Commented Jan 28 at 9:59
  • 1
    $\begingroup$ I would go along with the drift of that, just change the wording slightly. In this context and at this level of teaching the only definition of standardization that makes sense is based on mean and SD, but even outside the question any other definition would surely need explanation, however brief, such as scaling the observed range to $[0, 1]$. $\endgroup$ Commented Jan 28 at 10:04

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.