5
$\begingroup$

I’m fitting generalized additive mixed models (GAMMs) using mgcv::gam() in R, and I’ve run into a question about the scale of model outputs.

The goal of my analysis was to examine how yield varies across different locations over time. I fit a global smooth term for Year to capture the overall temporal trend, and then included location-specific smooths to see how individual locations deviate from that global trend. I also added a random effect for PlotID to account for repeated measurements within plots nested in each location.

My first model was fitted with a Tweedie distribution (log link):

M1 <- gam(Yield ~ factor(Location) + s(Year) + 
           s(Year, by = factor(Location), k = 14) +  
           s(PlotID, bs = "re"), data = FarmData_A, 
          family = tw(link = "log"), method = "REML")

My second model was fitted with a gaussian distribution:

M2 <- gam(Yield ~ factor(Location) + s(Year) +  
           s(Year, by = factor(Location)) + 
           s(PlotID, bs = "re"), data = FarmData_B, 
           method = "REML")
  • The Tweedie model (M1) (with log link) produces a global smooth figure where the effect ranges only from about –3 to +2.

  • The Gaussian model (M2) produces a global smooth figure where the effect on the y-axis ranges roughly from –200 to +200.

I expected the global smooth plots to be on somewhat comparable scales, but instead they differ by orders of magnitude. Is this normal? Does it just reflect the link function and scale of the response variable?

I want to understand:

  • Is it okay that the ranges of partial effects differ so much?

  • How should I report my results, including the global trend? Specifically, is how I'm reporting it in my attached figures the best way to report results? Should I instead be plotting everything on the response scale rather than the link scale? For the global smooth, should the figure be shown on the link scale (to match the GAM output) or should I back-transform to the original response scale for interpretation?

Finally, any resources with examples showing the correct way to report results visually from similar models would be greatly appreciated.

Attached is the global smooth plot for both M2 (Gaussian distribution) and M1 (Tweedie Distribution), plotted through the "predict" function with type = “terms”. I have also attached the same global smooth plots along with their location-specific plots on a separate plot. Location specific plots and global smooth effect plots

Global smooth effect plots

$\endgroup$
2
  • 2
    $\begingroup$ Is it the credible region that spans -200 to +200 or the fitted function itself? I suspect the former, in which case that's a sure sign that something is wrong with the model in the case of M1. Can you include the partial effect plots you produced so we can see what is going on? Note that M1 doesn't include a global/average smooth of Year; is that just a copy / paste error from your script or a real difference in your models. I would expect identifiability problems with a model that includes a global smooth and location-specific ones if you specify them as you have shown in your post. $\endgroup$ Commented Aug 28 at 12:03
  • $\begingroup$ Thank you! I pasted the wrong M1 model, which should have included the global smooth trend. And I accidentally mixed up the partial effects- M2 (gaussian) is the model with effect ranging from -200 to 200, and M1 (tweedie) ranges from -2 to 2. I updated the post with the correct info and added the plots for the global smooth trends and location specific gam predictions . Could you also please clarify if you mean that a model that includes a global smooth and location-specific ones is not appropriate? $\endgroup$ Commented Aug 28 at 19:11

1 Answer 1

7
$\begingroup$

The different link functions means you'll have different units for the fitted $\eta$ function. Identity for the Gaussian makes it have the units of the response but the Tweedie is on the log scale. As a result any partial effects are not comparable, and any coefficients of interest are in different units as well (in that case, units of $\eta$ divided by units of the predictor) and are also not comparable across different links.

Yes, plot the fitted mean response.

$\endgroup$
1
  • 3
    $\begingroup$ Yes, but -200 to +200 on the log scale is a sign something is very wrong with the Tweedie model if the Gaussian one is a ballpark reasonable fit to the data. $\endgroup$ Commented Aug 28 at 12:05

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.