0
$\begingroup$

There are many usual methods to measure the predictive power of logistic regression (or any method predicting probabilities such as probit regression). Some of them are inspired from R-squared for example (https://statisticalhorizons.com/r2logistic). You can also use ROC curves.

These indicators reach 100% when the dependent variable $Y$ can be perfectly predicted by $X$: you know by reading $X$ what $Y$ will be for sure: 0 or 1. And your model tells you without an error.

Now assume that $Y$ really has a strong degree of indetermination given $X$. For the same $X$, $Y$ might be 0 or 1, and there is not enough information in $X$ to decide. But you still expect to find a precise conditional probability. Logistic regression typically outputs this probability $p(Y|X)$.

Now, you want to measure the precision of this probability. For example, you want to estimate $E(\left(p_{model}(y|X)-p_{real}(y|X)\right)^2)$. Or find an indicator that reaches 100% when the probability is predicted perfectly: all information available in $X$ about $Y$ is used.

Is there a way to do it?

$\endgroup$
6
  • $\begingroup$ What you are asking for are measures of calibration (which are sometimes opposed to measures of discrimination). The 'simplest' thing would be to create a calibration plot. This plot has the predicted probabilities on the x-axis and the observed probabilities on the y-axis. The problem is, we do not have observed probabilities, only observed events ('ones' and 'zeroes'). So your choices are some arbitrary way of grouping (e.g. 10 equally sized groups based on predicted probability), or a smoothed curve. For the latter, in R i know the rms package has some good functions and measures available. $\endgroup$ Commented Sep 8, 2017 at 10:59
  • $\begingroup$ Thanks a lot. Sounds related to Hosmer-Lemeshow for the grouping method. I still wonder if there might be different techniques based on information theory (without arbitrary grouping). $\endgroup$ Commented Sep 8, 2017 at 11:17
  • $\begingroup$ That's right. Do note the Hosmer-Lemeshow test is (arguably) flawed. There are indeed such calibration measures available (the rms package offers some). One I've used sometimes is the Le Cessie-van Houwelingen statistic. Do note most of these are overall measures of calibration, and might still be prone to the effects of sample size. Plotting with a smoothed curve is therefore seen not only as a elegant option, but also the most informative solution (imagine you miscalibrate only within a certain region of predicted probabilities, no overall statistic will tell you this). $\endgroup$ Commented Sep 8, 2017 at 11:21
  • $\begingroup$ ps. if this answers your question I might work it into an answer with some references, but I'm quite swamped in work right now, so it may take a while. $\endgroup$ Commented Sep 8, 2017 at 11:23
  • $\begingroup$ Yes an answer would be welcome when you have time. Thanks. $\endgroup$ Commented Sep 8, 2017 at 11:26

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.