3

Let two ndarrays: A of shape (n, *m), and B of shape (n, ). Is there a way to sort A in-place using the order that would sort B?

Sorting A with B is easy using np.argsort, but this is not done in-place:

A = A[np.argsort(B)]

Comments:

  • A and B have different dtypes, and A can have more than two dimensions. Hence they can’t be stacked to use ndarray.sort().
  • A takes up a lot of space, which is why it needs to be sorted in-place. Any solution requiring twice the space occupied by A would therefore defeat this purpose.
  • The title of this question “Re-arranging numpy array in place” may sound related, but the question itself is not very clear, and the answers do not match my question.
2
  • Looks like this is just an advanced indexing task, A[arr,:], indexing the rows of A. That necessarily is a copy. Even if you use A[:] = ... to write the new values back 'on-to' A, there will be a temporary buffer with the data. Commented Oct 21, 2018 at 17:02
  • @hpaulj: indeed, I was wondering if there is a way to do this without indexing and hence without creating such a buffer. Commented Oct 21, 2018 at 17:13

2 Answers 2

1

Here is a solution that works by following cycles in the index array. It can optionally be compiled using pythran giving a significant speedup if rows are small (80x for 10 elements) and a small speedup if rows are large (30% for 1000 elements).

To keep it pythran compatible I had to simplify it a bit, so it only accepts 2D arrays and it only sorts along axis 0.

Code:

import numpy as np

#pythran export take_inplace(float[:, :] or int[:, :], int[:])

def take_inplace(a, idx):
    n, m = a.shape
    been_there = np.zeros(n, bool)
    keep = np.empty(m, a.dtype)
    for i in range(n):
        if been_there[i]:
            continue
        keep[:] = a[i]
        been_there[i] = True
        j = i
        k = idx[i]
        while not been_there[k]:
            a[j] = a[k]
            been_there[k] = True
            j = k
            k = idx[k]
        a[j] = keep

Sample run using compiled version. As indicated above compilation is only required for small rows, for larger rows pure python should be fast enough.

>>> from timeit import timeit
>>> import numpy as np
>>> import take_inplace
>>> 
>>> a = np.random.random((1000, 10))
>>> idx = a[:, 4].argsort()
>>>
>>> take_inplace.take_inplace(a, idx)
>>>
# correct
>>> np.all(np.arange(1000) == a[:, 4].argsort())
True
>>>
# speed
>>> timeit(lambda: take_inplace.take_inplace(a, idx), number=1000)
0.011950935004279017
>>>
# for comparison
>>> timeit(lambda: a[idx], number=1000)
0.02985276997787878
Sign up to request clarification or add additional context in comments.

1 Comment

This works perfectly, thanks a lot! I replaced the 1st line of take_inplace by n, *m = a.shape so that a could have 2 or more dimensions.
1

If you can set A beforehand as a structured array whose datatype is composed of a subarray of shape (m, ) and a scalar of the same type (e.g., np.int32), then you can sort it in-place with respect to B. For example:

import numpy as np

B = np.array([3, 1, 2])
A = np.array([[10, 11], [20, 21], [30, 31]])

(n, m) = A.shape

dt = np.dtype([('a', np.int32, (m, )), ('b', int)])
A2 = np.array([(a, b) for a, b in zip(A, B)], dtype=dt)

A2.sort(order='b')
print A2

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.