2
$\begingroup$

enter image description here Hello, I know this may seem obvious, but I just need to make sure that I can find variance for concrete examples, such as this one. The class I am taking is extremely compact and rushed, while the instructor only conceptually teaches us without any examples. As you may be able to imply, complex ideas in statistical vocabulary can be difficult, especially for me, to understand.

First, I want to walk you through my thought process, so one can easily recognize where I am stuck:

  1. I believe this a famous discrete distribution.
  2. Those who have covid = $N$.
  3. Those without covid = $N -M$.
  4. The question is asking what is the variance of the random variable that someone will be infected in the population $K$.
  5. We had just learned about $$S^2 = \frac{\sum(x_i-\bar{x})^2}{n- 1}$$so I feel like we use that equation to find the variance. I just do not know where to plug in what.
  6. Lastly, this may be asking for a lot, but what difference would it make to the variance, if those conditions shown in $part (b)$ exist?

Thank you a ton. I assure you I am trying my best to show explain what I know versus what I want to learn.

$\endgroup$
2
  • $\begingroup$ did you cover hypergeometric and binomial distribution? It sounds like a is talking about the former and b the latter, but i cant be sure. $\endgroup$ Commented Jul 26, 2021 at 20:50
  • $\begingroup$ One and two seem to be the same, except for setting of numerical values. Specifically $\bar{x}=X/K$, while $x_i=Y_i$. $\endgroup$ Commented Jul 26, 2021 at 20:54

1 Answer 1

2
$\begingroup$

The definition of variance that you give is used only (at least by competent statisticians) when estimating a population variance based on a sample. If you want a population variance and $n$ is the population size, and all members of the population are equally likely to be chosen, then the variance would be found by dividing by $n,$ not by $n-1.$ (But the one that involves dividing by $n-1$ is the one reported by most software packages.)

I will assume persons are drawn randomly from the population WITHOUT replacement.

Write $X$ as a sum: $X_1+\cdots+X_K,$ where each $X_i$ is $0$ or $1$ according as the $i$th person sampled is infected or not.

Then \begin{align} & \operatorname{var}(X) = \operatorname{var}(X_1+\cdots+X_K) \\[8pt] = {} & \operatorname{var}(X_1) + \cdots + \operatorname{var}(X_K) \\ & {}+2\operatorname{cov}(X_1,X_2) + \cdots\cdots \\[8pt] = {} & K\operatorname{var}(X_1) + 2\cdot\binom K2 \operatorname{cov}(X_1,X_2). \end{align}

You have $X_1=\begin{cases} 1 & \text{with probability } M/N, \\ 0 & \text{with probability } 1 - M/N = (N-M)/N. \end{cases}$

So $\operatorname{var}(X_1)= \big(M/N\big)\big((N-M)/N\big)$ and \begin{align} & \operatorname{cov}(X_1,X_2) = \operatorname E(X_1X_2) - \operatorname E(X_1)\operatorname E(X_2) \\[8pt] = {} & \Pr(X_1=X_2=1) -\left( \frac M N \right)^2 \\[8pt] = {} & \frac MN \cdot \frac{M-1}{N-1} - \left( \frac MN \right)^2. \end{align} (The covariance is negative since drawing an infected person from the population makes it less probable that the next one you draw will be infected.)

In the i.i.d. case, you sample WITH replacement, so the observations are independent. That makes the covariances $0,$ so the variance is bigger.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.