2

I'm plotting a Scatterplot with ggplot() as follows:

library(data.table)
library(plotly)
library(ggplot2)
library(lubridate)

dt.allData <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
                         DE = rnorm(365, 4, 1), Austria = rnorm(365, 10, 2), 
                         Czechia = rnorm(365, 1, 2), check.names = FALSE)

## Calculate Pearson correlation coefficient: ##
corrCoeff <- cor(dt.allData$Austria, dt.allData$DE,  method = "pearson", use = "complete.obs")
corrCoeff <- round(corrCoeff, digits = 2)

## Linear regression function extraction by creating linear model: ##
regLine <- lm(DE ~ Austria, data = dt.allData)

## Extract k and d values for the linear function f(x) = kx+d: ##
k <- round(regLine$coef[2], digits = 5)
d <- round(regLine$coef[1], digits = 2)
linRegFunction <- paste0("y = ", d, " + (", k, ")x")

## PLOT: ##
p1 <- ggplot(data = dt.allData, aes(x = Austria, y = DE, 
                                    text = paste("Date: ", date, '\n',
                                                 "Austria: ", Austria, "MWh/h", '\n',
                                                 "DE: ", DE, "\u20ac/MWh"),
                                    group = 1)
      ) +
      geom_point(aes(color = ifelse(date >= now()-weeks(5), "#419F44", "#F07D00"))) +
      scale_color_manual(values = c("#F07D00", "#419F44")) +
      geom_smooth(method = "lm", se = FALSE, color = "#007d3c") +
      annotate("text", x = 10, y = 10,
               label = paste("\u03c1 =", corrCoeff, '\n',
                             linRegFunction), parse = TRUE) +
      theme_classic() +
      theme(legend.position = "none") +
      theme(panel.background = element_blank()) +
      xlab("Austria") +
      ylab("DE")+
      ggtitle("DE vs Austria") +
      theme(plot.title = element_text(hjust = 0.5, face = "bold"))

# Correlation plot converting from ggplot to plotly: #
plot <- plotly::ggplotly(p1, tooltip = "text")

which gives the following plot here:

enter image description here

I use annotate() to represent the correlation coefficient and the regression function. I define the x and y coordinates manually so that the text output is displayed in the middle at the top. Since I have some of such data tables dt.allData that have different axis scalings, I would like to define in the plot that the text should always be displayed in the middle at the top, depending on the axis scaling without defining x and y coordinate manually before.

3
  • could you include a list of all the packages you are working with Commented Oct 30, 2020 at 9:45
  • I have added all libraries now. Commented Oct 30, 2020 at 9:56
  • For those interested: I just saw, that here is a related question using sup instead of span. Commented Oct 30, 2020 at 10:56

2 Answers 2

6

I'd suggest using ggtitle and hjust = 0.5:

Edit: using plotly::layout and a span tag to create the title:

library(data.table)
library(ggplot2)
library(plotly)
library(lubridate)

dt.allData <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 365),
                         DE = rnorm(365, 4, 1), Austria = rnorm(365, 10, 2), 
                         Czechia = rnorm(365, 1, 2), check.names = FALSE)

## Calculate Pearson correlation coefficient: ##
corrCoeff <- cor(dt.allData$Austria, dt.allData$DE,  method = "pearson", use = "complete.obs")
corrCoeff <- round(corrCoeff, digits = 2)

## Linear regression function extraction by creating linear model: ##
regLine <- lm(DE ~ Austria, data = dt.allData)

## Extract k and d values for the linear function f(x) = kx+d: ##
k <- round(regLine$coef[2], digits = 5)
d <- round(regLine$coef[1], digits = 2)
linRegFunction <- paste0("y = ", d, " + (", k, ")x")

## PLOT: ##
p1 <- ggplot(data = dt.allData, aes(x = Austria, y = DE, 
                                    text = paste("Date: ", date, '\n',
                                                 "Austria: ", Austria, "MWh/h", '\n',
                                                 "DE: ", DE, "\u20ac/MWh"),
                                    group = 1)
) +
  geom_point(aes(color = ifelse(date >= now()-weeks(5), "#419F44", "#F07D00"))) +
  scale_color_manual(values = c("#F07D00", "#419F44")) +
  geom_smooth(method = "lm", formula = 'y ~ x', se = FALSE, color = "#007d3c") +
  # ggtitle(label = paste("My pretty useful title", '\n', "\u03c1 =", corrCoeff, '\n', linRegFunction)) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(legend.position = "none") +
  theme(panel.background = element_blank()) +
  xlab("Austria") +
  ylab("DE")

# Correlation plot converting from ggplot to plotly: #
# using span tag (directly in control of font-size):
span_plot <- plotly::ggplotly(p1, tooltip = "text") %>% layout(
    title = paste(
      '<b>My pretty useful title</b>',
      '<br><span style="font-size: 15px;">',
      '\u03c1 =<i>',
      corrCoeff,
      '</i><br>',
      linRegFunction,
      '</span>'
    ),
    margin = list(t = 100)
  )
span_plot

Edit: added the sup alternative as per this answer

# using sup tag:
sup_plot <- plotly::ggplotly(p1, tooltip = "text") %>% layout(
    title = paste(
      '<b>My pretty useful title</b>',
      '<br><sup>',
      "\u03c1 =<i>",
      corrCoeff,
      '</i><br>',
      linRegFunction,
      '</sup>'
    ),
    margin = list(t = 100)
  )
sup_plot

result

Here you can find some related information in the plotly docs.

Sign up to request clarification or add additional context in comments.

8 Comments

Sorry, I have already a title here. This was missing in my question!
Ok, thanks for clarification. Modifiying each title with two more lines is not an option?
I'll try it with a subtitle, I haven't thought of that yet. Thanks for the thought bump.
The subtitle won't work with ggplotly. You'd need to add the lines to the title.
I see, using subtitle = paste("\u03c1 =", corrCoeff, '\n', linRegFunction) doesn't work. How can I do this, when my title should be bold the other two lines not?
|
0

First I would start by seeing if something like this could help you:

annotate("text", 
         x = mean(dt.allData$Austria, na.rm = TRUE), 
         y = max(dt.allData$DE, na.rm = TRUE),
         label = paste("\u03c1 =", 
                       corrCoeff, '\n',
                       linRegFunction), 
         parse = TRUE,
         hjust = .5)

and then, in the case where you want to go through a list of x,y pairs, you'd eventually you'd want to move towards functional programming where you are passing x columns x1, x2, x3 and ycolumns y1, y2, y3 to a map function which then pulls out the relevant information from each pair and plots them.

4 Comments

Thanks, but if I have some extreme outliers, then it is not in the middle of the plot.
Then let's use something that is sensitive to outliers, like the mean!
I've already tried to use mean(). Nevertheless, it is never exactly centered, and it often deviates a lot.
Huh. well, it works for the case that you've shared here!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.