5

I have a data.table with a set of columns and I'm trying to create a new column as a pasted string of several of them. The number of the columns can change from time to time, but is based on a column name convention (e.g. col1:col).

I've tried to use the new {data.table} programming interface, but things aren't quite going to plan. Below is a reprex to help explain what I'm trying to achieve.

library(data.table)
dt <- data.table(id = 1:3,
                  col1 = letters[1:3],
                  col2 = letters[3:1],
                  col3 = LETTERS[1:3])

#       id   col1   col2   col3
#    <int> <char> <char> <char>
# 1:     1      a      c      A
# 2:     2      b      b      B
# 3:     3      c      a      C

new_var  <- "col_string"
var_cols <- paste0("col", 1:3)

dt[,
  let(var = paste0(paste0(cols[-length(cols)], collapse = ", "), " or ", cols[length(cols)])),
  env = list(
    var = new_var,
    cols = I(var_cols)
  )]

#       id   col1   col2   col3         col_string
#    <int> <char> <char> <char>             <char>
# 1:     1      a      c      A col1, col2 or col3
# 2:     2      b      b      B col1, col2 or col3
# 3:     3      c      a      C col1, col2 or col3

The structure behind col_string is correct, but it would instead expect it to look like:

#       id   col1   col2   col3  col_string
#    <int> <char> <char> <char>      <char>
# 1:     1      a      c      A   a, c or A
# 2:     2      b      b      B   b, b or B
# 3:     3      c      a      C   c, a or C

I'm aware that using I() in the env is the primary cause of the column names showing in my created variable, but when I remove it I get an error:

Error in list2lang(env) : 
  Character objects provided in the input are not scalar objects, if you need them as character vector rather than a name, then wrap each into 'I' call: [cols]

I've also tried to wrap with as.list instead of I with no success.

1 Answer 1

1

The I(..) portion within env= is used "to escape automatic conversion" (https://rdatatable.gitlab.io/data.table/articles/datatable-programming.html#substitute-variables-and-character-values), see the second code block for a clearer example of this:

substitute2(   # 'a' and 'd' should stay as character
  f(v1, v2),
  list(v1 = I("a"), v2 = list("b", list("c", I("d"))))
)
# f("a", list(b, list(c, "d")))

I don't think we need to use env= here:

dt[, c(new_var) := paste(do.call(paste, c(.SD[, 1:(length(var_cols)-1)], list(sep = ", "))),
                         "or", .SD[[length(var_cols)]]), .SDcols = var_cols]
#       id   col1   col2   col3 col_string
#    <int> <char> <char> <char>     <char>
# 1:     1      a      c      A  a, c or A
# 2:     2      b      b      B  b, b or B
# 3:     3      c      a      C  c, a or C

The c(new_var) is the way in data.table to force the LHS of := to be interpreted as a character string instead of the literal "new_var" being used as the new column name.

Sign up to request clarification or add additional context in comments.

1 Comment

Re "The c(new_var) is the way in data.table", the c is not necessary. Wrapping in (only) parentheses evaluates the argument, a property of c; see ?"(", as well as examples in the help text of := "...using forced evaluation"., "The parens are enough to stop the LHS being a symbol". Cheers

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.