2
$\begingroup$

I'm new to R and cannot make plotting work as desired. The problem is that R seems to draw the same four lines over and over again, redundantly. The details of the case I'm having are as follows.

I have a dataset:

> str(dataset)
'data.frame':   57641 obs. of  3 variables:
 $ duration : num  3 8 7 2 4 8 2 2 8 8 ...
 $ graduated: logi  FALSE TRUE TRUE FALSE FALSE TRUE ...
 $ group    : num  651 651 671 671 651 651 651 651 651 651 ...

Then, I fit a Cox proportional hazards regression model to it:

survObj <- Surv(time = dataset$duration / 2, event = dataset$graduated)
model <- coxph(survObj ~ group, data=dataset)

Next, following this example, I create a frame that would hopefully group the survival functions by the group number:

frame <- data.frame(group = dataset$group)

> str(frame)
'data.frame':   57641 obs. of  1 variable:
 $ group: num  651 651 671 671 651 651 651 651 651 651 ...

There's four groups in the data:

> unique(dataset$group)
[1] 651 671 652 681

Using this new frame, I create a fitted survival model:

fitObjGrouped <- survfit(model, newdata = frame)

Finally, I plot the thing:

color_set <- rainbow(4)
plot(fitObjGrouped, col=color_set)

The result has the correct lines, but drawn many times over each other:

Survival model plotted - overlapping lines

As you can see, there's two red lines and two blue lines drawn last. They're the correct ones, one for each category, but closer observation reveals that there's a green or other color lines underneath each of them. When converting this to PDF the file size is 273 times larger than what it should be!

So the question is: why is R drawing the lines so many many times and how could I achieve correct model fitting and plotting at the same time?

Can somebody please help me to better understand the R commands I'm using? Thanks in advance!

$\endgroup$

2 Answers 2

1
$\begingroup$

Note that in the linked presentation, on the slide titled "Plotting the effects", the treat object has only 2 rows.

In your case, because frame has 57k rows, fitObjGrouped has predictions for each row of newdata. You can verify this with fitObjGrouped$n. To fix the problem, try:

frame <- data.frame(group = unique(dataset$group))

$\endgroup$
0
0
$\begingroup$

Try to make group a factor and you might want to make it a strata,

model <- coxph(survObj ~ strata(factor(group)), data=dataset)

Not sure whether this would help but it will definitely have impact on how Surv treats your information. Without the factor Surv will consider group a numeric variable where an increase in that number will lead to a higher number of cases. Converting it to a factor and strata will estimate different survival curves for each group.

Not sure whether this will help with your plot but it will certainly impact the outcome of your fits.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.