It doesn’t make sense to me. It looks like nonzero(arr != 0) just creates an intermediate array, allocating more memory. No way it is faster, otherwise why doesn’t NumPy optimize it? But here is my benchmark:
import numpy as np
from timeit import timeit
arr = np.random.randint(0, 2, 10_000_000)
a = timeit(lambda: np.nonzero(arr != 0), number=10)
b = timeit(lambda: np.nonzero(arr), number=10)
print(f"nonzero(arr != 0): {a}")
print(f"nonzero(arr): {b}")
Results:
nonzero(arr != 0): 0.20066774962469935
nonzero(arr): 0.5988789172843099
It seems that nonzero is just much better if you convert the input into a Boolean array first. I tested it using smaller integers, and the results are consistent:
arr = np.random.randint(0, 2, 10_000_000, dtype=np.uint8)
a = timeit(lambda: np.nonzero(arr != 0), number=10)
b = timeit(lambda: np.nonzero(arr), number=10)
c = timeit(lambda: np.nonzero(arr.view(bool)), number=10)
print(f"nonzero(arr != 0): {a}")
print(f"nonzero(arr): {b}")
print(f"nonzero(view): {c}")
Results:
nonzero(arr != 0): 0.15293325018137693
nonzero(arr): 0.5660374169237912
nonzero(view): 0.13332204101607203
So an intermediate array does add some overhead, but the overhead is much smaller than the nonzero() overhead on non-Boolean arrays.
Are there some reasons for this? My guess is that nonzero() is optimized (maybe using SIMD?) for a Boolean array, but somehow the optimization doesn’t happen for other types.
!=is a faster implementation thannonzerofor integer inputs, andnonzerois just returning the input when it's already booleans.