Rearrange 2D NumPy Array Efficiently

Question

Let's say I have a 2D NumPy array:

x = np.random.rand(100, 100000)

And I retrieve the column-wise sorted indices (i.e., each column is sorted independently from the others and the indices are returned):

idx = np.argsort(x, axis=0)

Then, for each column, I need the values from indices = [10, 20, 30, 40, 50] to be first the first 5 rows (of that column) and then followed by the rest of the sorted values (not the indices!).

A naive approach might be:

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)

for col in range(x.shape[1]):
    # For each column, fill the first few rows with `indices`
    out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices

    # Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
    n = indices.shape[0]
    for row in range(indices.shape[0], x.shape[0]):
        if idx[row, col] not in indices:
            out[n, col] = x[row, col]  # Again, note that we want the value, not the index
            n += 1

Divakar · Accepted Answer · 2020-05-21 14:37:55Z

1

Approach #1

Here's one based on previous post that doesn't need idx -

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

Approach #2

Another with np.isin masking that uses idx -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

Approach #2- Alternative If you are continously editing into out to change everything except those indices, an array-assignment might be the one for you -

n = len(indices)
out[:n] = x[indices]

mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower

edited May 21, 2020 at 14:37

answered May 21, 2020 at 14:22

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

slaw Over a year ago

Let's say I had to perform this multiple times (rather than once) but the size of out is the same in all of my iterations. Would it be "better" or more efficient to create an out array once with the appropriate size and then copy x[indices] and p2 into it? This way, I can avoid the costly memory creation?

Divakar Over a year ago

@slaw Think I am getting what you mean. See if Approach #2- Alternative works for you. Will edit Approach #1 accordingly.

slaw Over a year ago

Yes, I think that would work and avoids the np.vstack! I guess, technically, we could just do out[n:] = np.take_along_axis(...) and probably reuse the mask across the iterations as well. I'm going to read up on take_long_axis just so I understand what is going on there.

varagrawal · Accepted Answer · 2020-05-21 14:24:28Z

This should help you get started by eliminating the inner most loop and if condition. To start off, you can pass in x[:, col] as the input parameter x.

def custom_ordering(x, idx, indices):
    # First get only the desired indices at the top
    out = x[indices, :]

    # delete `indices` from `idx` so `idx` doesn't have the values in `indices`
    idx2 = np.delete(idx, indices)

    # select `idx2` rows and concatenate
    out = np.concatenate((out, x[idx2, :]), axis=0)

    return out

amzon-ex · Accepted Answer · 2020-05-21 15:09:37Z

Here is my solution to the problem:

rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)

out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

Tell me if I understood your problem correctly.

Naphat Amundsen · Accepted Answer · 2020-05-22 22:11:28Z

I do this with a smaller array and fewer indices such that I can easily sanity check the results, but it should translate to your use case. I think this solution is decently efficient as everything is done in place.

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])

# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0)

Here is with the output:

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
       [7, 4, 6],
       [6, 5, 2],
       [6, 8, 4],
       [2, 0, 2],
       [3, 0, 4],  # 5th row
       [4, 7, 4],
       [3, 1, 1],  # 7th row
       [3, 5, 3],
       [0, 5, 9]]) # 9th row

>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
       [3, 1, 1],
       [0, 5, 9]])

>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
       [3, 1, 1],  # Top of array looks like above
       [0, 5, 9],  #
       [6, 8, 4],
       [2, 0, 2],
       [7, 1, 8],  # Previous top row
       [4, 7, 4],
       [7, 4, 6],  # Previous second row
       [3, 5, 3],
       [6, 5, 2]]) # Previous third row

>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
       [3, 1, 1], # Top is the same, below is sorted
       [0, 5, 9], #
       [2, 0, 2],
       [3, 1, 2],
       [4, 4, 3],
       [6, 5, 4],
       [6, 5, 4],
       [7, 7, 6],
       [7, 8, 8]])

EDIT: This version here should handle if any elements in indices is less than len(indices)

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])

tmp = x[indices]

# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1

# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

Ohhh, interesting! I really like that this can be done in place AND without extra steps that require no mental gymnastics. A simple swap before sorting is quite elegant
Little nitpick: If you need to guarantee a stable sort this will not work.
@PaulPanzer When you say “stable sort” are you referring to cases in which there is a tie? I thought that only mattered in the case of argsort?
@slaw good point. If you ultimately are only interested in values and values that compare equal are truly indiscriminable then "stable" doesn't mean much.
Hmmm, unfortunately, I'm getting a race condition or a FIFO situation in the swapping line. I think it would be safer to have the swap happen through an intermediate array

Collectives™ on Stack Overflow

Rearrange 2D NumPy Array Efficiently

4 Answers 4

3 Comments

Comments

Comments

14 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

14 Comments

Linked

Related