Move values to back of an np.array()

Question

I'm trying to create the most efficient (not nicest/prettiest) way of moving particular values within an np.array() to the back of said array.

I currently have two different solutions:

import numpy as np
from numba import jit

@jit(nopython = True)
def move_to_back_a(a, value):
    new_a = []

    total_values = 0

    for v in a:
        if v == value:
            total_values += 1
        else:
            new_a.append(v)

    return new_a + [value] * total_values

@jit(nopython = True)
def move_to_back_b(a, value):
    total_values = np.count_nonzero(a == value)

    ind = a == value
    a = a[~ind]

    return np.append(a, [value] * total_values)

Which give the following output:

In [7]: move_to_back_a(np.array([2,3,24,24,24,1]), 24)
Out[7]: [2, 3, 1, 24, 24, 24]

In [8]: move_to_back_b(np.array([2,3,24,24,24,1]), 24)
Out[8]: array([ 2,  3,  1, 24, 24, 24], dtype=int64)

It doesn't really matter whether I get back my output as a list or as an array, though I expect that returning an array will be more helpful in my future code.

The current timing on these tests is as follows:

In [9]: %timeit move_to_back_a(np.array([2,3,24,24,24,1]), 24)
2.28 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [10]: %timeit move_to_back_b(np.array([2,3,24,24,24,1]), 24)
3.1 µs ± 50.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Is there any way to make this code even faster?

Which Python/NumPy/numba version are you using? When I run it using 3.69/1.13.3/0.45.1 I get a numba error about np.count_nonzero being used in a wrong way — Graipher
– Graipher, Commented Jan 13, 2020 at 16:26
@Graipher I'm using 3.7/1.16.4/0.46.0 for Python, NumPy and Numba, respectively. — Menno Van Dijk
– Menno Van Dijk, Commented Jan 13, 2020 at 16:28
Is this the typical size of your arrays or can they get larger? — AlexV
– AlexV, Commented Jan 13, 2020 at 16:41
@MennoVanDijk: OK, I just updated to Numba 0.47.0, which fixed the problem. — Graipher
– Graipher, Commented Jan 13, 2020 at 16:45

Graipher · Accepted Answer · 2020-01-13 16:56:24Z

5

Your second function is faster for me when simplifying it like this:

@jit(nopython=True)
def move_to_back_c(a, value):
    mask = a == value
    return np.append(a[~mask], a[mask])

In addition, Python's official style-guide, PEP8, recommends not surrounding a = with spaces if it is used for keyword arguments, like your nopython=True.

Since numba apparently recently gained generator support, this might also be worth checking out:

@jit(nopython=True)
def _move_to_back_d(a, value):
    count = 0
    for x in a:
        if x != value:
            yield x
        else:
            count += 1
    for _ in range(count):
        yield value

@jit(nopython=True)
def move_to_back_d(a, value):
    return list(_move_to_back_d(a, value))

The timings I get on my machine for the given testcase are:

move_to_back_a    1.63 µs ± 14.5 ns
move_to_back_b    2.33 µs ± 21 ns
move_to_back_c    1.92 µs ± 17.5 ns
move_to_back_d    1.66 µs ± 9.69 ns

What is in the end as least as important is the scaling behavior, though. Here are some timings using larger arrays:

np.random.seed(42)
x = [(np.random.randint(0, 100, n), 42) for n in np.logspace(1, 7, dtype=int)]

While slightly slower for small arrays, for larger the mask approach is consistently faster.

edited Jan 13, 2020 at 16:56

answered Jan 13, 2020 at 16:32

Graipher

41.7k7 gold badges70 silver badges134 bronze badges

\$\begingroup\$ Thanks for the solution as well as the recommendation on style. I do get ~ 6.5 µs on this solution, which is quite a bit slower than move_to_back_a(). \$\endgroup\$

Menno Van Dijk
– Menno Van Dijk

2020-01-13 16:37:11 +00:00
Commented Jan 13, 2020 at 16:37
\$\begingroup\$ @MennoVanDijk: Did you compare it either with both @jit or both without, so it is fair? \$\endgroup\$

Graipher
– Graipher

2020-01-13 16:37:51 +00:00
Commented Jan 13, 2020 at 16:37
1

\$\begingroup\$ Just did, unfortunately, this still consistently runs about 0.3 µs above move_to_back_a() \$\endgroup\$

Menno Van Dijk
– Menno Van Dijk

2020-01-13 16:39:37 +00:00
Commented Jan 13, 2020 at 16:39
1

\$\begingroup\$ @MennoVanDijk: Added timings and another function. \$\endgroup\$

Graipher
– Graipher

2020-01-13 16:49:39 +00:00
Commented Jan 13, 2020 at 16:49
\$\begingroup\$ Was waiting with accepting this answer in hopes of receiving other answers. I have just accepted it since there does not seem to be more coming my way. Thanks a bunch for helping me with this. \$\endgroup\$

Menno Van Dijk
– Menno Van Dijk

2020-01-17 13:41:46 +00:00
Commented Jan 17, 2020 at 13:41

| Show 1 more comment

Stack Exchange Network

Move values to back of an np.array()

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Move values to back of an np.array()

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions