Jump to content

Talk:Additive smoothing

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

"Add-one" vs. "additive"

[edit]

The entry claims that Jurafsky and Martin use the term "additive smoothing", but they actually use the term "add-one smoothing" (as do Russell/Norvig). It would be worth seeing which term Manning/Schütze actually use, but I would have to get the book out of the library to find out. -AlanUS (talk) 18:22, 9 October 2011 (UTC)[reply]

Proposed merge with Pseudocount

[edit]

Same concept. QVVERTYVS (hm?) 15:31, 8 May 2014 (UTC)[reply]

Not really the same, but rather the pseudocount is used as a tool in additive smoothing, so doesn't have independent notability. So, agree that it's worth merging. Klbrain (talk) 18:13, 5 April 2017 (UTC)[reply]

Merger proposal

[edit]

We should merge Bayesian_average with this page. The methods are the same, and only the interpretations differ. Bscan (talk) 16:26, 20 July 2018 (UTC)[reply]

Weak oppose on the grounds that the application is sufficiently different to warrant separate discussion. Klbrain (talk) 07:04, 16 August 2019 (UTC)[reply]

Disadvantages

[edit]

The issues with this method, as outlined in Ken Church's paper "What's wrong with adding one" should be discussed here: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.2237 - Francis Tyers · 04:47, 14 February 2020 (UTC)[reply]

Slightly agree. Is there any reason you don't start a small section about this yourself? BernardoSulzbach (talk) 04:50, 14 February 2020 (UTC)[reply]

Overly technical

[edit]

This article is all jargon and insider baseball, without one explicit statement of the obvious and important point that 0/5 is different strength of evidence than 0/500, which add one smoothing is supposed to help address. 2601:647:CD02:88A0:5560:1900:364C:CFB2 (talk) 03:01, 9 October 2023 (UTC)[reply]

Simple improved pseudocount for empty bins using empirical Bayesian inference

[edit]

Conventional additive smoothing adds (Laplace) or (Jeffreys). Both are "non-informative" priors. We can improve on these by applying rudimentary empirical-Bayes to estimate a symmetric Dirichlet prior by applying the Good-Turing insight that the singleton count () and empty bin count () are the most valuable pieces of information for estimating the unseen mass in empty bins, as demonstrated by Gale & Sampson (1995). We can use this insight about singletons to empirically determine the best parameter for the prior, as shown below.

  • Conventional pseudocount smoothing with a symmetric Dirichlet prior (Lidstone):
  • Good-Turing smoothing for an empty bin: (Note that this is the probability for a single empty bin, not the cumulative probability for all empty bins. Here I impose an exchangeability assumption by dividing the unseen mass ) equally between bins.

where:
= tunable pseudocount. Jeffreys assumed 0.5
= number of bins containing count , e.g. is number of empty bins; number of singleton bins.
= total number of bins
= count in bin
= sum of all observed counts in all bins

We want to tune the value of such that the two equations above, are equal for empty bins:
, where in empty bins
, because

Edge cases:

  • When the denominator becomes negative or if , it is an indication that the true distribution is close to uniform, but I would recommend to limit the prior to Laplace's in such situations.
  • When , then this method fails, in which case you can fall back on more advanced methods, either (i) Simple Good-Turing (SGT) regression, giving a rigorous data-driven estimate of the unseen mass when when , or (ii) Minka's maximum-likelihood estimate by fixed-point iteration.
  • Incidentally, as (i.e. more Monte Carlo trials), with fixed number of bins, such that , the value of asymptotically approaches , though below I choose not to use this asymptotic simplification.

Pseudocode showing how to implement this:

Const B_t = 1000 as Long '1000 bins
Dim N as Long, B_0 as Long, B_1 as Long 'long integers
Dim a as Double 'double precision pseudocount
Dim denom as Double 'denominator
Dim c(1 to B_t) as Long 'input discrete count histogram from Monte Carlo simulation.
Dim p(1 to B_t) as Double 'output smoothed probability
'... insert code here to to precalculate c() with values from MC-simulation, then count N, B_0 and B_1 ...
If B_0 = 0 Then
   a = 0 'Edge case: no unseen mass, so no need to add pseudocount
Elseif B_1 = 0 Then 
   a = 1E-15 'Edge case: Should strictly use SGT or Minka's estimation instead!
Else 
   denom = B_0 * N - B_1 * B_t
   If denom <= 0 then
      a = 1 'Edge case: Distribution close to uniform.
   Else
      a = B_1 * N / denom 'additive pseudocount (empirical prior value)
      If 1 < a Then a = 1 'Edge case: Limit to max 1, to prevent overzealous smoothing (without more information).
   Endif
Endif
For i = 1 To B_t 'loop though all bins
   p(i) = (c(i) + a) / (N + B_t * a) 'Pseudocount smoothing, ensures all empty bins have non-zero probability
Next i

Peter.schild (talk) 11:31, 2 October 2025 (UTC)[reply]