0

I'm working on a project, and I need to do a graph where there is two curves of logistic regression. I'd like to display the curve of the disease status (encoded by 0 and 1), along with the Age (numeric variable) and the Gender (factor with two levels: "hommes" and "femmes").

I tried this code :

ggplot(fusion)+aes(x = Age, y = Gallstone.Status , color = Gender )+
  geom_point()+
  labs(x= "Age", y = "Statut malade ou non", title = "Probabilité d'être malade selon le sexe et l'âge")+
  geom_smooth(data = fusion, x = Age, y = proba_reindex)

The result is a graph with the points, paired by color, and two curves of linear regression which I don't desire. How to make both of sigmoids displayed by R?

Also, I've tried to use the predict function to compute probabilities on each gender to have the disease with their age with:

hommes$prob_maladie_age <- predict(reg_logistique_hommes, type = "response")
femmes$prob_maladie_age <- predict(reg_logistique_femmes, type = "response")

I'm unsatisfied as their range is not between 0 and 1. Is it useful to standard them with the formula p_new = (p - min(p))/(max(p) - min(p)), would it have a sense? And is it useful to compute them to trace the two sigmoids? Can you give me a code to trace the two sigmoids?

I tried to trace a graph with a distinction of the two genders, and to reindex the probabilities.

3
  • 2
    Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (you have this covered, I think); unambiguous sample data using dput(head(fusion,20)), data.frame(..), read.table(..), or similar, pls also include something for reg_logistique_*; and often actual plots/output (with verbatim errors/warnings) versus intended output. Refs: stackoverflow.com/q/5963269, minimal reproducible example, and stackoverflow.com/tags/r/info. Commented Oct 17 at 16:14
  • 2
    BTW, your call to geom_smooth() is inconsistent: (1) since fusion was in the original ggplot(fusion), no need to include it here, though it does no harm; (2) geom_smooth(x=Age) is trying to find (perhaps successfully) a VECTOR named Age in the calling/global environment, and it is not attempting to use the same-named column in the data=. If it works without error, it means you have vectors named that, and you are falling prone to potential data corruption (or at least possibly-different data). Commented Oct 17 at 16:17
  • 1
    Finally, ask about plotting or ask about being unsatisfied with the ranges, the two are distinct. Questions on SO should stick to one topic/question at a time, please. Commented Oct 17 at 16:19

1 Answer 1

2

You want to predict based on an expand.grid by all the ages and both sexes.

> new_data <- expand.grid(age=seq(min(gsd$age), max(gsd$age)), fem=0:1) |> split(~fem)
> probs <- sapply(new_data, \(x) predict(g1, newdata=x, type='response'))
> head(probs, 3)  ## 0 = male, 1 = female
            0          1
1 0.006834485 0.04903182
2 0.007316132 0.05233056
3 0.007831455 0.05583820

Then you can use matplot,

> matplot(probs, type='l', col=c('red', 'blue'), ylim=0:1, las=1, xlab='age', 
+         ylab='risk', xaxt='n')
> axis(1, seq(1, nrow(el(new_data)), len=7), 
+      labels=seq(min(gsd$age), max(gsd$age), len=7))
> legend('topleft', lty=1:2, col=c('red', 'blue'), leg=c('male', 'female'))

enter image description here

or I guess you'll come up with an idea with ggplot yourself.


Data:

gsd <- structure(list(age = c(66L, 54L, 18L, 42L, 27L, 53L, 35L, 66L, 
64L, 41L, 24L, 53L, 42L, 54L, 63L, 37L, 43L, 67L, 64L, 20L, 58L, 
42L, 44L, 53L, 54L, 48L, 62L, 22L, 37L, 51L, 45L, 57L, 20L, 50L, 
59L, 41L, 47L, 60L, 32L, 39L, 25L, 53L, 21L, 39L, 35L, 62L, 45L, 
22L, 21L, 51L, 67L, 52L, 41L, 40L, 66L, 70L, 43L, 67L, 23L, 23L, 
19L, 69L, 20L, 71L, 38L, 19L, 72L, 55L, 27L, 57L, 22L, 50L, 66L, 
56L, 53L, 62L, 59L, 26L, 46L, 29L), fem = c(0L, 0L, 0L, 0L, 1L, 
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 
1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 
1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 
1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L), age_c = c(18, 6, 
-30, -6, -21, 5, -13, 18, 16, -7, -24, 5, -6, 6, 15, -11, -5, 
19, 16, -28, 10, -6, -4, 5, 6, 0, 14, -26, -11, 3, -3, 9, -28, 
2, 11, -7, -1, 12, -16, -9, -23, 5, -27, -9, -13, 14, -3, -26, 
-27, 3, 19, 4, -7, -8, 18, 22, -5, 19, -25, -25, -29, 21, -28, 
23, -10, -29, 24, 7, -21, 9, -26, 2, 18, 8, 5, 14, 11, -22, -2, 
-19), stone = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 
1L, 0L, 0L, 0L)), row.names = c(NA, -80L), class = "data.frame")

g1 <- glm(stone ~ age + fem, gsd, family='binomial')
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.