lm() and glm() equivalence for log-transformed response variable [duplicate]

Question

I can't seem to wrap my head around this: What is the glm() equivalent for lm(log(y) ~ x1 + x2, data=data)? Is it?

a. glm(y ~ x1 + x2, data=data, family=gausssian(link="log"))
b. glm(log(y) ~ x1 + x2, data=data, family=gausssian(link="identity"))
c. other

When you try those three models on your data, what results do you get? P.S. gaussian only has 2 s's. — user2554330
– user2554330, Commented Nov 2, 2025 at 10:47
This has been thoroughly discussed here. Note that they are not the same. — PBulls
– PBulls, Commented Nov 2, 2025 at 13:43
Also somewhat relevant stats.stackexchange.com/questions/77579/… — Glen_b
– Glen_b, Commented Nov 2, 2025 at 18:30

user2554330 · Accepted Answer · 2025-11-02 11:00:03Z

8

Model b matches the lm() model. Both of those assume that log(y) has a Gaussian distribution with mean a0 + a1*x1 + a2*x2.

Model a assumes that y has a Gaussian distribution, with mean exp(a0 + a1*x1 + a2*x2).

In both cases a0, a1, a2 are the coefficients you are estimating, and the Gaussian variance is constant.

answered Nov 2, 2025 at 11:00

user2554330

6464 silver badges5 bronze badges

Add a comment |

Dave · Accepted Answer · 2025-11-02 13:51:07Z

B is equivalent; A is not.

Let’s write out the math.

A $$ \log(\mathbb E[y]) = \beta_0+\beta_1x_1+\beta_2x_2\\\iff \mathbb E[y]=e^{\beta_0+\beta_1x_1+\beta_2x_2} $$

B $$ \mathbb E[\log(y)]=\beta_0+\beta_1x_1+\beta_2x_2 $$

B certainly looks like it is equivalent, especially considering that the estimation specified by your code will be minimization of square loss (same as in OLS, equivalent to Gaussian maximum likelihood estimation).

By the strong form of Jensen’s inequality, $\log(\mathbb E[y])<\mathbb E[\log(y)]$ with strict inequality, so the two are not equivalent to each other, ruling out the possibility that A is also equivalent to the lm specification.

Overall, B is equivalent to the lm specification while A is not.

Stack Exchange Network

lm() and glm() equivalence for log-transformed response variable [duplicate]

2 Answers 2

Linked

Hot Network Questions

lm() and glm() equivalence for log-transformed response variable [duplicate]

2 Answers 2

Linked

Related

Hot Network Questions