2
$\begingroup$

I have a dataset that I have divided into training and testing data, with approximately 160 samples in the training set and 40 in the testing set. I fitted a probability distribution to each dataset separately and used the negative log-likelihood (NLL) metric to assess how well the distribution fitted each dataset. I am using the following formula for evaluating NLL:

$$\text{NLL}= -\sum_{i=1}^n\log(P(y_i)) $$

Now, I want to compare the NLL values of the two datasets. However, there is a problem: the formula for calculating NLL incorporates the number of samples. Consequently, I believe that the NLL values for the training and testing datasets cannot be compared. How can I properly compare this metric on the training and testing datasets? Would it be fair to compare $\dfrac{\text{NLL}_{\text{train}}}{160}$ and $\dfrac{\text{NLL}_{\text{test}}}{40}$?

$\endgroup$
5
  • $\begingroup$ Can you please edit your post to include the exact formula you are using to compute the log-likelihoods? $\endgroup$ Commented Nov 11 at 10:32
  • $\begingroup$ Why do you want to use the log likelihoods? Are you open to other measures? $\endgroup$ Commented Nov 11 at 10:46
  • $\begingroup$ @StephanKolassa I have updated my post to include the formula for computing the negative log-likelihood. $\endgroup$ Commented Nov 11 at 12:04
  • $\begingroup$ @PeterFlom I would rather use negative log-likelihood because it is commonly used for assessing the goodness of fitted distributions. However, I am also open to other metrics (like continuous ranked probability score) if they help in my situation. $\endgroup$ Commented Nov 11 at 12:08
  • 1
    $\begingroup$ In thinking in terms of averages you're on a good track, but there's a lurking issue: how exactly do you fit distributions to the data and, if they are parameterized, how many parameters do you use in each case? $\endgroup$ Commented Nov 11 at 14:47

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.