2

I'm trying to make use of advanced tricks from data.table and ggplot2 functionalities to create a simple yet powerful function that automatically plots (in one image!) all columns (y) of an arbitrary data.table as a function of input column (x), optinally conditioned by column (k) - So that we can quickly visualize all data columns using a single line like this:

dt <- data.table(diamonds[1:100,])[order(carat),cut:=as.character(cut)] 

plotAllXYbyZ(dt)
plotAllXYbyZ(dt, x="carat", k="color")
plotAllXYbyZ(dt, x=1, y=c(2,8:10), k=3)

CLARIFICATION: The challenge is that columns can be of either type (numeric, character or factor). We want a function that deals with this automatically. - i.e. it should be able to plot all requested columns using melt and ggplot, as I'm trying in Answer below.

UPDATE: My code is posted below in Answer. It's functional (i.e. displays desired plots). However it has an issue, namely: It modifies the original data.table. - To address this issue I asked a new question here: Don't want original data.table to be modified when passed to a function

8
  • 2
    Are you meaning to pass x, y, and z as indexes or strings? plotAllXYbyZ(dt, x=1, y=3:10, z=2) looks like you want to pass column indexes, but aes(get(x)) looks like strings it would expect x = "mpg" as an input. Pick one and stick with it. Commented Jun 12, 2017 at 22:27
  • 1
    Also, as the diamonds data will illustrate, melting and faceting is a poor solution when you have mixed data types - you'll end up trying to mix categorical and numeric data in the value column. I have no idea how you would want the diamonds data output to look. Take a look at ggExtra::ggpairs, you can probably hack that function to do what you want. Commented Jun 12, 2017 at 22:32
  • 1
    You conversion of as.numeric(as.character()) doesn't make sense when applied to, say, diamonds$clarity. Nor does a line plot with multiple numericized factors on the y axis and a continuous x axis sound useful to me. Commented Jun 12, 2017 at 22:35
  • 1
    Voting to close as "unclear what you're asking" as it doesn't seem like this has been thought through very much. Commented Jun 12, 2017 at 22:37
  • CLARIFICATION: we want to create a plotting function that can plot all of these: NUMERIC, FACTOR, CHARACTER. I.e. it automatically converts any FACTOR, CHARACTER columns to NUMERIC so that they can be plotted. (So User does not need to worry about those). That's why I put: as.numeric(as.character()). This line will deal with diamonds, where diamonds$cut <- as.character(diamonds$cut). Using just as.numeric() will result in NA's Commented Jun 19, 2017 at 20:27

2 Answers 2

2

I hope this works for you:

plotAllXYbyZ <- function(dt, x, y, z) {
  # to make sure all columns to be melted for ploting are numerical 
  dt[, (y):= lapply(.SD, function(x) {as.numeric(as.character(x))}), .SDcols = y]
  dts <- melt(dt, id = c(x,z), measure = y)
  ggplot(dts, aes_string(x = colnames(dt)[x], y = "value", colours = colnames(dt)[z])) +
    geom_line() + facet_wrap(~ variable)
}

dt <- data.table(mtcars)    

plotAllXYbyZ(dt, x=1, y=3:10, z=2)

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for efforts. That's not the relationship I needed to plot. Please see below Answer to see the variables that need to go to facets vs. those that need to be melted. Note I still haven't find a good way to mix "factors" and "numeric" for plotting. Using ':=' modifies original data.table...
1

Thanks to comments above, below is the code that achieves the desired output. - Figures below show the output produced for these lines:

    dtDiamonds <- data.table(diamonds[1:100,])[order(carat),cut:=as.character(cut)]
    plotAllXYbyZ(dtDiamonds);   
    plotAllXYbyZ(dtDiamonds, x="carat", k="color") 
    plotAllXYbyZ(dtDiamonds, x=1, y=c(2,8:10), k=3)

In order to do that I had to introduce a function to convert everything to numeric. The only remaining issue is that the original dtDiamonds gets modified ! - because of ':='. To resolve this issue however I posted a separate question here:To address this issue I asked a new question here: Don't want original data.table to be modified when passed to a function. UPDATE: This issue is now resolved by using <-copy(dt) instead of <-dt.

# A function to convert factors and characters to numeric. 
my.as.numeric <- function (x) {
  if (is.factor(x)) {
    if (T %in% is.na(as.numeric(as.character(x)))) # for factors like "red", "blue"
      return (as.numeric(x))   
    else                                           # for factors like  "20", "30", ...
      return (as.numeric(as.character(x)))         # return: 20, 30, ...
  }
  else if (is.character(x)) {
    if (T %in% is.na(as.numeric(x))) 
      return (as.numeric(as.ordered(x)))  
    else                            # the same: for character variables like "20", "30", ...
      return (as.numeric(x))        # return: 20, 30, ... Otherwise, convert them to factor
    return (x)   
  }
}

 plotAllXYbyZ <- function(.dt, x=NULL, y=NULL, k=NULL) { 
  dt <- copy(.dt)    # NB: If copy is not used, the original data.table will get modified !
  if (is.numeric(x)) x <-  names(dt)[x]
  if (is.numeric(y)) y <-  names(dt)[y]
  if (is.numeric(k)) k <-  names(dt)[k]

  if (is.null(x)) x <- names(dt)[1]    

  "%wo%" <- function(x, y) x[!x %in% y]    
  if (is.null(y)) y <- names(dt) %wo% c(x,k)

  # to make sure all columns to be melted for plotting are numerical 
  dt[, (y):= lapply(.SD, function(x) {my.as.numeric(x)}), .SDcols = y]

  ggplot(melt(dt, id=c(x,k), measure = y)) + 
    geom_step(aes(get(x),value,col=variable))  +
    ifelse (is.null(k), list(NULL), list(facet_wrap(~get(k))) ) + 
    labs(x=x, title=sprintf("variable = F (%s | %s)", x, k))
}

enter image description here[enter image description here][enter image description here]3

2 Comments

To copy a data table so that the original is not modified, use data.table::copy. Lots of details here.
Also, rather than if (T %in% ...), a more common and more readable way is if (any(...))

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.