4

I wish to efficiently extract values from a matrix (vals), using a single column number (val_col) and a matrix of row numbers (val_rows). Specifically, I also want my results in a matrix corresponding to val_rows.

# Matrix of values.
n_cols <- 3
n_rows <- 5
n_vals <- n_cols * n_rows

vals <- outer(
  seq_len(n_rows), seq_len(n_cols),
  paste, sep = "x"
)

vals
#>      [,1]  [,2]  [,3] 
#> [1,] "1x1" "1x2" "1x3"
#> [2,] "2x1" "2x2" "2x3"
#> [3,] "3x1" "3x2" "3x3"
#> [4,] "4x1" "4x2" "4x3"
#> [5,] "5x1" "5x2" "5x3"


# Column number.
val_col <- 2


# Matrix of row numbers.
n_rows_out <- 4
n_vals_out <- n_cols * n_rows_out

set.seed(111)
val_rows <- sample.int(n_rows, size = n_vals_out, replace = TRUE) |>
  matrix(ncol = n_cols, byrow = TRUE)

val_rows
#>      [,1] [,2] [,3]
#> [1,]    3    4    3
#> [2,]    1    3    5
#> [3,]    3    4    2
#> [4,]    1    5    5

Now the result I want is a matrix like this:

result <- structure(
  c("3x2", "1x2", "3x2", "1x2", "4x2", "3x2", "4x2", "5x2", "3x2", "5x2", "2x2", "5x2"),
  dim = 4:3
)

result
#>      [,1]  [,2]  [,3] 
#> [1,] "3x2" "4x2" "3x2"
#> [2,] "1x2" "3x2" "5x2"
#> [3,] "3x2" "4x2" "2x2"
#> [4,] "1x2" "5x2" "5x2"

But when I simply extract with [ my results are "flattened" into a vector.

result <- vals[val_rows, val_col]

result
#>  [1] "3x2" "1x2" "3x2" "1x2" "4x2" "3x2" "4x2" "5x2" "3x2" "5x2" "2x2" "5x2"

Even when I specify drop = FALSE the matrix is not structured like val_rows.

result <- vals[val_rows, val_col, drop = FALSE]

result
#>       [,1] 
#>  [1,] "3x2"
#>  [2,] "1x2"
#>  [3,] "3x2"
#>  [4,] "1x2"
#>  [5,] "4x2"
#>  [6,] "3x2"
#>  [7,] "4x2"
#>  [8,] "5x2"
#>  [9,] "3x2"
#> [10,] "5x2"
#> [11,] "2x2"
#> [12,] "5x2"

It seems I can simply modify the dim() in a one-liner.

result <- vals[val_rows, val_col] |>
  `dim<-`(c(n_rows_out, n_cols))

result
#>      [,1]  [,2]  [,3] 
#> [1,] "3x2" "4x2" "3x2"
#> [2,] "1x2" "3x2" "5x2"
#> [3,] "3x2" "4x2" "2x2"
#> [4,] "1x2" "5x2" "5x2"


result <- vals[val_rows, val_col, drop = TRUE] |>
  `dim<-`(c(n_rows_out, n_cols))

result
#>      [,1]  [,2]  [,3] 
#> [1,] "3x2" "4x2" "3x2"
#> [2,] "1x2" "3x2" "5x2"
#> [3,] "3x2" "4x2" "2x2"
#> [4,] "1x2" "5x2" "5x2"

And obviously, I can feed the vector into matrix() to restructure it like val_rows.

result <- vals[val_rows, val_col] |>
  matrix(ncol = n_cols, byrow = FALSE)

result
#>      [,1]  [,2]  [,3] 
#> [1,] "3x2" "4x2" "3x2"
#> [2,] "1x2" "3x2" "5x2"
#> [3,] "3x2" "4x2" "2x2"
#> [4,] "1x2" "5x2" "5x2"

But if I recall, populating values via matrix() is inefficient for repetition at scale. And mustn't we pass the entire result to dim() by value, simply to modify its dim attribute in place?

Anyway, these approaches redundantly "repair" the "damage" that should never have occurred in the first place. So what is the most efficient way to extract the values while retaining the structure of val_rows?

4
  • How about using transpose vals[val_rows, val_col] |> 'dim<-'(rev(dim(vals))) |> t(). Not sure if it's really faster than matrix though. Commented Apr 9 at 15:46
  • @AndreWildberg Thanks for the suggestion! Turns out I got my wires crossed, and my original `dim<-`() approach actually does give me what I want. I have updated my post with a sample output and some clarifications. I'm waiting to see if alternatives emerge that are more efficient for speed and/or memory, but I suspect Nadir's answer is best. Commented Apr 9 at 16:25
  • 1
    You need to load two tidyverse packages to create your example? What's wrong with outer(seq_len(n_rows), seq_len(n_cols), paste, sep = "x")? Commented Apr 10 at 5:11
  • 1
    @Roland Absolutely nothing is wrong with outer()! I simply didn't know it existed. 😂 Commented Apr 10 at 13:50

3 Answers 3

4

Yes, the efficient solution is just to restore the dimensions after extraction.

[ returns a vector here because matrix indexing in R follows column-major order. Since matrices in R are stored column-wise, the result of :

vals[val_rows, val_col]

is already in the correct internal order for a matrix with the same dimensions as val_rows.

So you can do:

result <- vals[val_rows, val_col]
dim(result) <- dim(val_rows)

result
#      [,1]  [,2]  [,3]
# [1,] "#8"  "#11" "#8"
# [2,] "#2"  "#8"  "#14"
# [3,] "#8"  "#11" "#5"
# [4,] "#2"  "#14" "#14"
# [5,] "#5"  "#11" "#5"

This gives the correct element-for-element result corresponding to val_rows.

If you want a one-liner:

result <- structure(vals[val_rows, val_col], dim = dim(val_rows))

Or, more explicitly :

result <- structure(vals[cbind(c(val_rows), val_col)], dim = dim(val_rows))

One important point: matrix(..., byrow = TRUE) is not appropriate here — it reorders the extracted values by row, whereas the indexing result is produced in column-major order.

At the R level, there's no special extraction form that preserves the shape of an index matrix: [ returns a vector, so restoring dim is the standard solution. Since a matrix/array in R is just a vector with a "dim" attribute :

dim(result) <- dim(val_rows)

is already about as direct as it gets in base R.

As for copying: R uses copy-on-modify semantics, so true in-place mutation isn't guaranteed from R code. Setting dim may be cheap when the object is unshared, but whether duplication actually occurs is an implementation detail you can't rely on.

In pure R, dim<- / structure(..., dim = ...) is the idiomatic and efficient answer.

Sign up to request clarification or add additional context in comments.

5 Comments

You might want to use array instead of structure.
Don't array() and structure() both pass by value? I'm hoping that something exists which is (1) similar to `dim<-`() and (2) can redimension "in place", without copying the underlying data.
Hello Nadir, and welcome! You are right that byrow = TRUE is inappropriate, and that helped me realize that my `dim<-`() approach was correct all along. I simply "had my wires crossed" regarding my output. I have updated my post with clearer output and questions.
Thanks Greg. Agreed ! dim<- is the idiomatic solution here. At the R level, [ returns a vector, so there's no special extraction form that preserves the shape directly; whether resetting dim happens truly in place depends on copy-on-modify semantics.
Another one-liner is 'dim<-'(vals[val_rows, val_col], dim(val_rows)). (With back quotes.)
3

You could also preallocate the resulting matrix and assign with [] <- :

res2 <- matrix(NA_character_, nrow = n_rows_out, ncol = n_cols)
res2[] <- vals[val_rows, val_col]
res2

#      [,1]  [,2]  [,3] 
# [1,] "3x2" "4x2" "3x2"
# [2,] "1x2" "3x2" "5x2"
# [3,] "3x2" "4x2" "2x2"
# [4,] "1x2" "5x2" "5x2"

Comments

1

First of all, my answer does NOT give an efficient solution (I strongly recommend the method by Rui Barradas ).

But I guess the following approach might fit your expected logic

apply(val_rows, 2, \(k) vals[k,val_col])

which gives

     [,1]  [,2]  [,3] 
[1,] "3x2" "4x2" "3x2"
[2,] "1x2" "3x2" "5x2"
[3,] "3x2" "4x2" "2x2"
[4,] "1x2" "5x2" "5x2"

2 Comments

Hi Thomas, thanks for pointing this out! Would you kindly clarify whether "this [inefficient] answer" refers to my own tentative solutions above, or to the answer posted by @NadirAMMISAID?
Mine is significantly less efficient than your solution and the answer by @NadirAMMISAID, since apply is sweeping indices per column, which is not fully vectorized as posted by you guys.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.