4

Let's say I have a 2D NumPy array:

x = np.random.rand(100, 100000)

And I retrieve the column-wise sorted indices (i.e., each column is sorted independently from the others and the indices are returned):

idx = np.argsort(x, axis=0) 

Then, for each column, I need the values from indices = [10, 20, 30, 40, 50] to be first the first 5 rows (of that column) and then followed by the rest of the sorted values (not the indices!).

A naive approach might be:

indices = np.array([10, 20, 30, 40, 50])
out = np.empty(x.shape, dtype=int64)

for col in range(x.shape[1]):
    # For each column, fill the first few rows with `indices`
    out[:indices.shape[0], col] = x[indices, col]  # Note that we want the values, not the indices

    # Then fill the rest of the rows in this column with the remaining sorted values excluding `indices`
    n = indices.shape[0]
    for row in range(indices.shape[0], x.shape[0]):
        if idx[row, col] not in indices:
            out[n, col] = x[row, col]  # Again, note that we want the value, not the index
            n += 1
0

4 Answers 4

1

Approach #1

Here's one based on previous post that doesn't need idx -

xc = x.copy()
xc[indices] = (xc.min()-np.arange(len(indices),0,-1))[:,None]
out = np.take_along_axis(x,xc.argsort(0),axis=0)

Approach #2

Another with np.isin masking that uses idx -

mask = np.isin(idx, indices)
p2 = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out = np.vstack((x[indices],p2))

Approach #2- Alternative If you are continously editing into out to change everything except those indices, an array-assignment might be the one for you -

n = len(indices)
out[:n] = x[indices]

mask = np.isin(idx, indices)
lower = np.take_along_axis(x,idx.T[~mask.T].reshape(x.shape[1],-1).T,axis=0)
out[n:] = lower
Sign up to request clarification or add additional context in comments.

3 Comments

Let's say I had to perform this multiple times (rather than once) but the size of out is the same in all of my iterations. Would it be "better" or more efficient to create an out array once with the appropriate size and then copy x[indices] and p2 into it? This way, I can avoid the costly memory creation?
@slaw Think I am getting what you mean. See if Approach #2- Alternative works for you. Will edit Approach #1 accordingly.
Yes, I think that would work and avoids the np.vstack! I guess, technically, we could just do out[n:] = np.take_along_axis(...) and probably reuse the mask across the iterations as well. I'm going to read up on take_long_axis just so I understand what is going on there.
0

This should help you get started by eliminating the inner most loop and if condition. To start off, you can pass in x[:, col] as the input parameter x.

def custom_ordering(x, idx, indices):
    # First get only the desired indices at the top
    out = x[indices, :]

    # delete `indices` from `idx` so `idx` doesn't have the values in `indices`
    idx2 = np.delete(idx, indices)

    # select `idx2` rows and concatenate
    out = np.concatenate((out, x[idx2, :]), axis=0)

    return out

Comments

0

Here is my solution to the problem:

rem_indices = [_ for _ in range(x.shape[0]) if _ not in indices]    # get all remaining indices
xs = np.take_along_axis(x, idx, axis = 0)                                        # the sorted array
out = np.empty(x.shape)

out[:indices.size, :] = xs[indices, :]                                                  # insert specific values at the beginning
out[indices.size:, :] = xs[rem_indices, :]                                         # insert the remaining values after the previous

Tell me if I understood your problem correctly.

Comments

0

I do this with a smaller array and fewer indices such that I can easily sanity check the results, but it should translate to your use case. I think this solution is decently efficient as everything is done in place.

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([5,7,9])

# Swap top 3 rows with the rows 5,7,9 and vice versa
x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
# Sort the wanted portion of array
x[len(indices):].sort(axis=0) 

Here is with the output:

>>> import numpy as np
>>> x = np.random.randint(10, size=(10,3))
>>> indices = np.array([5,7,9])
>>> x
array([[7, 1, 8],
       [7, 4, 6],
       [6, 5, 2],
       [6, 8, 4],
       [2, 0, 2],
       [3, 0, 4],  # 5th row
       [4, 7, 4],
       [3, 1, 1],  # 7th row
       [3, 5, 3],
       [0, 5, 9]]) # 9th row

>>> # We want top of array to be
>>> x[indices]
array([[3, 0, 4],
       [3, 1, 1],
       [0, 5, 9]])

>>> # Swap top 3 rows with the rows 5,7,9 and vice versa
>>> x[:len(indices)], x[indices] = x[indices], x[:len(indices)].copy()
>>> # Assert that rows have been swapped correctly
>>> x
array([[3, 0, 4],  #
       [3, 1, 1],  # Top of array looks like above
       [0, 5, 9],  #
       [6, 8, 4],
       [2, 0, 2],
       [7, 1, 8],  # Previous top row
       [4, 7, 4],
       [7, 4, 6],  # Previous second row
       [3, 5, 3],
       [6, 5, 2]]) # Previous third row

>>> # Sort the wanted portion of array
>>> x[len(indices):].sort(axis=0)
>>> x
array([[3, 0, 4], #
       [3, 1, 1], # Top is the same, below is sorted
       [0, 5, 9], #
       [2, 0, 2],
       [3, 1, 2],
       [4, 4, 3],
       [6, 5, 4],
       [6, 5, 4],
       [7, 7, 6],
       [7, 8, 8]])

EDIT: This version here should handle if any elements in indices is less than len(indices)

import numpy as np
x = np.random.randint(10, size=(12,3)) 
indices = np.array([1,2,4])

tmp = x[indices]

# Here I just assume that there aren't any values less or equal to -1. If you use 
# float, you can use -np.inf, but there is no such equivalent for ints (which I 
# use in my example).
x[indices] = -1

# The -1 will create dummy rows that will get sorted to be on top of the array,
# which can switch with tmp later
x.sort(axis=0) 
x[indices] = tmp

14 Comments

Ohhh, interesting! I really like that this can be done in place AND without extra steps that require no mental gymnastics. A simple swap before sorting is quite elegant
Little nitpick: If you need to guarantee a stable sort this will not work.
@PaulPanzer When you say “stable sort” are you referring to cases in which there is a tie? I thought that only mattered in the case of argsort?
@slaw good point. If you ultimately are only interested in values and values that compare equal are truly indiscriminable then "stable" doesn't mean much.
Hmmm, unfortunately, I'm getting a race condition or a FIFO situation in the swapping line. I think it would be safer to have the swap happen through an intermediate array
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.