How can negative log likelihood be properly compared between two sets with different sample sizes?

Ask Question

Asked 19 days ago

Modified 19 days ago

Viewed 49 times

I have a dataset that I have divided into training and testing data, with approximately 160 samples in the training set and 40 in the testing set. I fitted a probability distribution to each dataset separately and used the negative log-likelihood (NLL) metric to assess how well the distribution fitted each dataset. I am using the following formula for evaluating NLL:

$$\text{NLL}= -\sum_{i=1}^n\log(P(y_i)) $$

Now, I want to compare the NLL values of the two datasets. However, there is a problem: the formula for calculating NLL incorporates the number of samples. Consequently, I believe that the NLL values for the training and testing datasets cannot be compared. How can I properly compare this metric on the training and testing datasets? Would it be fair to compare $\dfrac{\text{NLL}_{\text{train}}}{160}$ and $\dfrac{\text{NLL}_{\text{test}}}{40}$?

edited Nov 11 at 12:01

asked Nov 11 at 10:14

User

2132 silver badges6 bronze badges

$\begingroup$ Can you please edit your post to include the exact formula you are using to compute the log-likelihoods? $\endgroup$

Stephan Kolassa
– Stephan Kolassa

2025-11-11 10:32:41 +00:00
Commented Nov 11 at 10:32
$\begingroup$ Why do you want to use the log likelihoods? Are you open to other measures? $\endgroup$

Peter Flom
– Peter Flom

2025-11-11 10:46:36 +00:00
Commented Nov 11 at 10:46
$\begingroup$ @StephanKolassa I have updated my post to include the formula for computing the negative log-likelihood. $\endgroup$

User
– User

2025-11-11 12:04:07 +00:00
Commented Nov 11 at 12:04
$\begingroup$ @PeterFlom I would rather use negative log-likelihood because it is commonly used for assessing the goodness of fitted distributions. However, I am also open to other metrics (like continuous ranked probability score) if they help in my situation. $\endgroup$

User
– User

2025-11-11 12:08:34 +00:00
Commented Nov 11 at 12:08
1

$\begingroup$ In thinking in terms of averages you're on a good track, but there's a lurking issue: how exactly do you fit distributions to the data and, if they are parameterized, how many parameters do you use in each case? $\endgroup$

whuber
– whuber ♦

2025-11-11 14:47:31 +00:00
Commented Nov 11 at 14:47

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How can negative log likelihood be properly compared between two sets with different sample sizes?

0

Hot Network Questions

How can negative log likelihood be properly compared between two sets with different sample sizes?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions