9
$\begingroup$

I am working with the following data, originally from Rice’s Mathematical Statistics and Data Analysis (page-256), representing the number of alpha particle counts per 10-second interval:

    (* Number of clicks per 10-second interval *)
clicksPerTenSeconds = Range[0, 17];

(* Observed counts per interval *)
observedCounts = {1, 6, 11, 28, 56, 105, 126, 146, 164, 161, 
  123, 101, 74, 53, 23, 15, 9, 5};

(* Create list of points *)
points = Table[{clicksPerTenSeconds[[i]], observedCounts[[i]]}, 
   {i, Length[clicksPerTenSeconds]}];

(* Plot observed data *)
ListLinePlot[
  points,
  PlotMarkers -> {"o", Medium},
  PlotStyle -> Blue,
  PlotRange -> All,
  Frame -> True,
  FrameLabel -> {"Number of Clicks per 10-Second Interval", "Observed Counts"},
  FrameTicks -> {{Range[0, 160, 20], None}, {Range[0, 17, 1], None}},
  PlotLabel -> "Observed Alpha Particle Emission per 10-Second Interval",
  ImageSize -> Large,
  GridLines -> {Range[0, 17, 1], Range[20, Max[observedCounts], 20]},
  GridLinesStyle -> Directive[Gray, Dashed],
  LabelStyle -> {Black, 16, FontFamily -> "Times"]
]

enter image description here

This produces the plot of observed counts vs. the number of clicks per interval. Now I would like to find the Poisson distribution $e^{-\lambda} \lambda^{k}/k!$ that best fits this data?

Now I wish proceed by calculating the sample mean and take that as the best guess of $\lambda$ and using that value of $\lambda$ plot that fit with the actual data plot given above. How can one achieve this?

$\endgroup$
1
  • $\begingroup$ The data you present isn't exactly the data provided in the referenced text. The first frequency count is associated with 0, 1, and 2 clicks per interval and the last frequency count is associated with 17 or more clicks per interval. Why are you changing the data? $\endgroup$ Commented yesterday

4 Answers 4

6
$\begingroup$

@ydd's answer takes advantage of the fact that the parameter (lambda) in the PoissonDistribution is the sample mean.

Another, more general way to calculate this is to perform a maximum likelihood parameter estimate using EstimatedDistribution:

(* data from original problem *)
clicksPerTenSeconds = Range[0, 17];
observedCounts = {1, 6, 11, 28, 56, 105, 126, 146, 164, 161, 123, 101, 74, 53, 23, 15, 9, 5}; 

(* perform maximum likelihood estimation *)
data = WeightedData[clicksPerTenSeconds, observedCounts];
distribution = EstimatedDistribution[data, PoissonDistribution[lambda]]

(* plot comparison of results *)
Show[
 Histogram[data, {1}, "PDF", AxesLabel -> {"Clicks / 10 seconds", "PMF"}],
 DiscretePlot[PDF[distribution, x], {x, 0, 17}, Joined -> True]]

PoissonDistribution[8.36288] and histogram comparing data and fitted probability distribution

$\endgroup$
0
4
$\begingroup$

Use WeightedData:

w = WeightedData[clicksPerTenSeconds, observedCounts];

λ = Mean[w] // N
(*8.36288*)

We can also calculate the mean ourselves:

clicksPerTenSeconds . observedCounts/Total[observedCounts]//N
(*8.36288*)
$\endgroup$
2
  • $\begingroup$ Thanks @ydd, I'm a beginner to both Mathematica and Probability theory. Could you elaborate on "what actually WeightedData[clicksPerTenSeconds, observedCounts] does here?" $\endgroup$ Commented yesterday
  • $\begingroup$ It represents the observation values in clicksPerTenSeconds (so 0 clicks, 1 click, ... 17 clicks) with weights observedCounts (so 0 clicks was observed 1 time, 1 click was observed 6 times, ..., 17 clicks was observed 5 times). It may help to read the WeightedData documentation $\endgroup$ Commented yesterday
3
$\begingroup$

The data in the referenced text has two complicating factors: (1) the first group has the count for 0, 1, and 2 clicks per interval and (2) the last group has the count for 17 or more clicks per interval. That dataset is from a censored distribution.

If there wasn't that complication (the censoring), the maximum likelihood estimator is just the sample mean as given by @ydd.

To find the maximum likelihood estimator for the data with the censoring one needs to construct the likelihood. Then one finds the value of $\lambda$ that maximizes the likelihood or the log of the likelihood (which is more numerically stable).

To construct the likelihood the first and last count requires the cmf (cumulative mass function) of the Poisson distribution and the rest requires the pmf (probability mass function).

data = {{2, 18}, {3, 28}, {4, 56}, {5, 105}, {6, 126}, {7, 146}, 
  {8, 164}, {9, 161}, {10, 123}, {11, 101}, {12, 74}, {13, 53}, 
  {14, 23}, {15, 15}, {16, 9}, {17, 5}};

likelihood = CDF[PoissonDistribution[\[Lambda]], 2]^data[[1,2]]*
    Product[PDF[PoissonDistribution[\[Lambda]], data[[i, 1]]]^data[[i, 2]], {i, 2, Length[data] - 1}]*
    (1 - CDF[PoissonDistribution[\[Lambda]], 16])^data[[Length[data],2]];
logL = Log[likelihood] //. {Log[a_ b_] -> Log[a] + Log[b], 
    Log[Exp[a_]] -> a, Log[a_^b_] -> b Log[a]};
Plot[logL, {\[Lambda], 5, 12}]

Log of the likelihood function

We see that the maximum occurs around $\lambda=8.5$.

mle = FindMaximum[logL, {{\[Lambda], 8.5}}]
(* {-2990.67, {\[Lambda] -> 8.36919}} *)

Using Mathematica's probability functions one can also do the following to obtain the maximum likelihood estimate:

data = {{2, 18}, {3, 28}, {4, 56}, {5, 105}, {6, 126}, {7, 146}, {8, 
    164}, {9, 161}, {10, 123}, {11, 101}, {12, 74}, {13, 53}, {14, 
    23}, {15, 15}, {16, 9}, {17, 5}};
dist = CensoredDistribution[{2, 17}, PoissonDistribution[\[Lambda]]]
mle = FindDistributionParameters[WeightedData[data[[All, 1]], data[[All, 2]]], 
  dist, {{\[Lambda], 8.5}}]
(* {\[Lambda] -> 8.36919} *)

A next step might be performing a goodness-of-fit test and if the fit for a Poisson seems reasonable followed by finding a measure of precision for the estimate of $\lambda$

$\endgroup$
2
$\begingroup$

This is answer to the other half of the question - how to plot best fit over OP's original plot.

Take your favorite λ estimate from other answers or from the linked PDF file. I have chosen λ -> 8.36.

(*Number of clicks per 10-second interval*)
clicksPerTenSeconds = Range[0, 17];

(*Observed counts per interval*)
observedCounts = {1, 6, 11, 28, 56, 105, 126, 146, 164, 161, 123, 101,
    74, 53, 23, 15, 9, 5};

(*Create list of points*)
points = Table[{clicksPerTenSeconds[[i]], observedCounts[[i]]}, {i, 
    Length[clicksPerTenSeconds]}];

(*Plot observed data*)
ListLinePlot[points, PlotMarkers -> {"o", Medium}, PlotStyle -> Blue, 
  PlotRange -> All, Frame -> True, 
  FrameLabel -> {"Number of Clicks per 10-Second Interval", 
    "Observed Counts"}, 
  FrameTicks -> {{Range[0, 160, 20], None}, {Range[0, 17, 1], None}}, 
  PlotLabel -> 
   "Observed Alpha Particle Emission per 10-Second Interval", 
  ImageSize -> Large, 
  GridLines -> {Range[0, 17, 1], Range[20, Max[observedCounts], 20]}, 
  GridLinesStyle -> Directive[Gray, Dashed], 
  LabelStyle -> {Black, 16, FontFamily -> "Times"}];

(* plot with estimated λ *)

Plot[Total@observedCounts E^-λ λ^k/k! /. λ -> 
    8.36, {k, 0, 17}, PlotStyle -> ColorData[97, 2]];

Show[%%, %]

enter image description here

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.