8

I'm having trouble with the following in R using dplyr and separate. I have a dataframe like this:

x<-c(1,2,12,31,123,2341)
df<-data.frame(x)

and I'd like to split each digit into its constituent parts and create a new variable for each into the dataframe, like so:

     x  a   b   c   d
1    1
2    2
3   12  1   2
4   31  3   1
5  123  1   2   3
6 2341  2   3   4   1

I have tried:

df <- df |>
mutate(x = as.character(x))|>
separate(x, c("a", "b", "c", "d"), sep=1, remove=F)

but I get:

Error in names(out) <- as_utf8_character(into) :
  'names' attribute [4] must be the same length as the vector [2]
1
  • 3
    l = sapply(as.character(x), strsplit, split=''); list2DF(lapply(l, `length<-`, max(lengths(l)))) would be a simple option. Or cbind(x, do.call('rbind', lapply(l, `length<-`, max(lengths(l))))) to match your desired format. Efficient ways might be via read.fwf, textConnection or scan -- haven't thought about it. Commented Aug 7, 2025 at 8:47

8 Answers 8

11

Using separate_longer_position from tidyr:

library(tidyr)
library(dplyr)
df |>
  mutate(value=x) |>
  separate_longer_position(value, width = 1) |>
  mutate(name=letters[row_number()], .by=x) |>
  pivot_wider()
# A tibble: 6 × 5
      x a     b     c     d
  <dbl> <chr> <chr> <chr> <chr>
1     1 1     NA    NA    NA
2     2 2     NA    NA    NA
3    12 1     2     NA    NA
4    31 3     1     NA    NA
5   123 1     2     3     NA
6  2341 2     3     4     1
Sign up to request clarification or add additional context in comments.

Comments

10

You can also use tstrsplit() from data.table:

library(data.table)
setDT(df)[, letters[1:max(nchar(x))]:=tstrsplit(x,"")]

Output:

       x      a      b      c      d
   <num> <char> <char> <char> <char>
1:     1      1   <NA>   <NA>   <NA>
2:     2      2   <NA>   <NA>   <NA>
3:    12      1      2   <NA>   <NA>
4:    31      3      1   <NA>   <NA>
5:   123      1      2      3   <NA>
6:  2341      2      3      4      1

Comments

9
Answer recommended by R Language Collective

The separate function is superseded by separate_wider_position() and separate_wider_delim().

For your case, you can use separate_wider_position() from the tidyr package. The widths_vec controls the name and width of the newly created columns, and you want a width of 1.

library(tidyverse)

x <- c(1,2,12,31,123,2341)
df <- data.frame(x)

widths_vec <- setNames(rep(1, max(nchar(x))), letters[1:max(nchar(x))])

df %>% separate_wider_position(cols = x, widths = widths_vec, too_few = "align_start", cols_remove = F)

# A tibble: 6 × 5
  a     b     c     d         x
  <chr> <chr> <chr> <chr> <dbl>
1 1     NA    NA    NA        1
2 2     NA    NA    NA        2
3 1     2     NA    NA       12
4 3     1     NA    NA       31
5 1     2     3     NA      123
6 2     3     4     1      2341

Comments

6

Here are two base R solutions

  1. using read.fwf() as per Friede's idea + transforming NA to "" as per your requirement
  2. using strsplit + rbind and setting the names to letters later

and for both adding original vector x afterwards.

x <- c(1,2,12,31,123,2341)
n <- max(nchar(x))

fwf <- read.fwf(textConnection(as.character(x)), widths = rep(1, n), col.names = letters[1:n]) |> transform(x = x) |> {\(x) {x[is.na(x)] <- ""; x}}()
spl <- do.call(rbind, strsplit(sprintf(paste0("%-", n, "s"), x), "")) |> data.frame() |> setNames(letters[1:n]) |> transform(x = x)

Output:

  a b c d    x
1 1          1
2 2          2
3 1 2       12
4 3 1       31
5 1 2 3    123
6 2 3 4 1 2341

Comments

5

You can use read.table, but you need to specify the col.names while enabling fill and flush in the meantime. For example,

read.table(
    text = gsub("(?<=.)(?=.)", ",", x, perl = TRUE),
    col.names = head(letters, max(nchar(x))),
    sep = ",",
    fill = TRUE,
    flush = TRUE
)

And you will obtain

  a  b  c  d
1 1 NA NA NA
2 2 NA NA NA
3 1  2 NA NA
4 3  1 NA NA
5 1  2  3 NA
6 2  3  4  1

Comments

4

An approach using str_split (note the autocast to character) and unnest_wider

library(tidyr)

tibble(df, tibble(a = stringr::str_split(df$x, "")) %>%
unnest_wider(a, names_sep = "", names_repair = ~letters[1:length(.x)]))

Output

# A tibble: 6 × 5
      x a     b     c     d
  <dbl> <chr> <chr> <chr> <chr>
1     1 1     NA    NA    NA
2     2 2     NA    NA    NA
3    12 1     2     NA    NA
4    31 3     1     NA    NA
5   123 1     2     3     NA
6  2341 2     3     4     1

Add %>% dplyr::mutate(across(everything(), ~replace_na(.x, ""))) if you really want empty cells:

# A tibble: 6 × 5
      x a     b     c     d
  <dbl> <chr> <chr> <chr> <chr>
1     1 1     ""    ""    ""
2     2 2     ""    ""    ""
3    12 1     "2"   ""    ""
4    31 3     "1"   ""    ""
5   123 1     "2"   "3"   ""
6  2341 2     "3"   "4"   "1"

Comments

3

Your original approach is pretty close; you just needed to pad the column before using separate:

list(x= c(1,2,12,31,123,2341)) |>
  as.data.frame() |>
  dplyr::mutate(xChr = stringr::str_pad(x, max(nchar(x)), side = "right")) |>
  (\(d) tidyr::separate(d, xChr, letters[1:max(nchar(d$x))], sep = "(?<=.)"))()

#>      x a b c d
#> 1    1 1      
#> 2    2 2      
#> 3   12 1 2    
#> 4   31 3 1    
#> 5  123 1 2 3  
#> 6 2341 2 3 4 1

It'd look a bit simpler if we load {dplyr} and use %>%:

df %>% 
  mutate(xChr = stringr::str_pad(x, max(nchar(x)), side = "right")) %>% 
  tidyr::separate(xChr, letters[1:max(nchar(.$x))], sep = "(?<=.)")

Created on 2025-08-07 with reprex v2.1.1

Comments

3

With base R math:

splitnum <- function(x) {
  n <- pmax(ceiling(log10(x + 1)), 1)
  n10 <- 10^(0:max(n))
  out <- matrix(NA, length(x), length(n10))
  out[cbind(rep.int(1:length(x), n), sequence(n, n + 1, -1))] <-
    floor(rep.int(x, n)%%n10[sequence(n, 2)]/n10[sequence(n)])
  out[,1] <- x
  out
}

x <- c(1,2,12,31,123,2341,304,1e3,0)
splitnum(x)
#>       [,1] [,2] [,3] [,4] [,5]
#>  [1,]    1    1   NA   NA   NA
#>  [2,]    2    2   NA   NA   NA
#>  [3,]   12    1    2   NA   NA
#>  [4,]   31    3    1   NA   NA
#>  [5,]  123    1    2    3   NA
#>  [6,] 2341    2    3    4    1
#>  [7,]  304    3    0    4   NA
#>  [8,] 1000    1    0    0    0
#>  [9,]    0    0   NA   NA   NA

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.