Confidence intervals and uncertainty estimation of classified polygon map

Question

I am not mathematician, neither statistician, but I try to use statistics in my work, so I do my best here to explain the problem I have.

I have a map of millions of hectares that consist of nine predicted categories, and a second map that is the reference map (the one that I consider the truth). So, for the predicted map I want to know the area well classified ($\hat{t}$), the proportion of the area well classified ($\hat{p}$) and their 95% confidence intervals.

In order to evaluate the uncertainty of the predicted map the approach was to split the area in small $N$ number of segments and sample $n$ number of them using stratified simple random sampling. The first issue here was that the segments are different in size (between 10 and 25 hectares) but their sampling probability was the same for all of them within each stratum. So, within each stratum the probability of inclusion was $pi_{hi} = n/N$, with $pi_{hi}$ the probability of inclusion of a segment $i$ in the stratum $h$. That is given, no chance to change this approach.

A segment i in the stratum h ($y_{hi}$) is correctly classified when the category of the predicted and the reference map are equal. Then, $y_{hi} = a_{hi}$ when it was correctly classified and $y_{hi} = 0$ when it was wrongly classified, with $a_i$ equal the area of the segment. Then, $\hat{t}_h = \sum_{i=1}^{n_h} y_{hi}/pi_{hi}$, and $\hat{t} = \sum_{h=1}^{L} \hat{t}_h$. The proportions are $\hat{p}_h = \hat{t}_h / A_h$, with $A$ the area of the stratum $h$.

So far so good.

In order to get the 95% CI I apply this equation (which I assume it is correct)

$CI_h = \hat{t}_h ± t_{h ~\alpha/2,df=n_h-1} * \sqrt{var(\hat{t}_h)}$

My problem is: is the variance of $\hat{t}_h$: $var(\hat{t}_h) = var(y_{hi})/n$ or is it $var(\hat{t}_h) = var(y_hi/pi_{hi})$?

The first of them gives non-sense values. The second one is more reasonable, but still quite low I think. Here a reproducible example to run in R

####### for R #############################################################
# this is an example to run for a stratum
set.seed(seed = 1234)
A = 2500000 # area of the stratum
n = 2300     # number of sampled segments within the stratum (by simple random sampling)
N = 175000  # total number of segments within the stratum
a_i = rnorm(n = 2300, mean = 14, sd = 4) # area of the i segments for i = 1, 2, ..., n.
index = rbinom(n = 2300,size = 1, prob = .7) # assume 70% of the segments well classified (1 means reference map and predicted map matched)
y_i = a_i * index # y_i = 0 for wrongly classified; y_i = a_i segment correctly classified
(pi_i = n/N)      # probability of inclusion (equal for all segments within the stratum, since their area (a_i) were neglected)
#[1] 0.01314286
(hatt = sum(y_i/pi_i)) # area well classified
#[1] 1734970
(hatp = hatt/A) # proportion of area well classified
#[1] 0.693988

# Option 1
(var_hatt1 = var(y_i)/n) # variance of hatt (too small, isn't it?)
#[1] 0.02278085
# I think here is the mistake in the equation. I think it should be
# Option 2
(var_hatt2 = var(y_i / pi_i)) # based on my common sense (probably wrong?)
#[1] 303332

# Confidence intervals with option 1
(error = qt(0.975, df = 15 - 1) * sqrt(var_hatt1)) # confidence rage/2  
#[1] 0.3237197
(hatt.ul.CI = hatt + error) # area upper limit confidence interval
#[1] 1734970
(hatt.ll.CI = hatt - error) # area lower limit confidence interval
#[1] 1734970
(hatt.ul.CI/A) # proportions
#[1] 0.6939882
(hatt.ll.CI/A)
#[1] 0.6939879

# Confidence intervals with option 2
(error = qt(0.975, df = 15 - 1) * sqrt(var_hatt2)) # confidence rage/2  
#[1] 1181.254
(hatt.ul.CI = hatt + error) # area upper limit confidence interval
#[1] 1736151
(hatt.ll.CI = hatt - error) # area lower limit confidence interval
#[1] 1733789
(hatt.ul.CI/A) # proportions
#[1] 0.6944605
(hatt.ll.CI/A)
#[1] 0.6935155
##############################################################################

Are any of the options (Option 1 or Option 2) correct? Can you see any other weird thing?

Should I compute the prediction interval instead of confidence intervals? How?

Could you calculate the confidence intervals using options 1 and 2 to compare them? — Guillermo Olmedo
– Guillermo Olmedo, Commented Jun 26, 2019 at 12:11

Marcos · Accepted Answer · 2019-06-26 15:01:01Z

0

The correct equation for $var(\hat{t}_h)$ was

$$ var(\hat{t}_h) = N^2 var(y_i)/n $$

That solved the problem

answered Jun 26, 2019 at 15:01

Marcos

213 bronze badges

Add a comment |

Stack Exchange Network

Confidence intervals and uncertainty estimation of classified polygon map

Are any of the options (Option 1 or Option 2) correct? Can you see any other weird thing?

Should I compute the prediction interval instead of confidence intervals? How?

1 Answer 1

Hot Network Questions

Confidence intervals and uncertainty estimation of classified polygon map

Are any of the options (Option 1 or Option 2) correct? Can you see any other weird thing?

Should I compute the prediction interval instead of confidence intervals? How?

1 Answer 1

Related

Hot Network Questions