All your code could be rewritten as:
from numpy import random
from scipy.spatial import distance
def closest_node(node, nodes):
closest_index = distance.cdist([node], nodes).argmin()
return nodes[closest_index]
a = random.randint(1000, size=(50000, 2))
some_pt = (1, 2)
closest_node(some_pt, a)
You can just write randint(1000) instead of randint(0, 1000), the documentation of randint says:
If high is None (the default), then results are from [0, low).
You can use the size argument to randint instead of the loop and two function calls. So:
a = []
for x in range(50000):
a.append((np.random.randint(0,1000),np.random.randint(0,1000)))
Becomes:
a = np.random.randint(1000, size=(50000, 2))
It's also much faster (twenty times faster in my tests).
More importantly, scipy has the scipy.spatial.distance module that contains the cdist function:
cdist(XA, XB, metric='euclidean', p=2, V=None, VI=None, w=None)
Computes distance between each pair of the two collections of inputs.
So calculating the distance in a loop is no longer needed.
You use the for loop also to find the position of the minimum, but this can be done with the argmin method of the ndarray object.
Therefore, your closest_node function can be defined simply as:
from scipy.spatial.distance import cdist
def closest_node(node, nodes):
return nodes[cdist([node], nodes).argmin()]
I've compared the execution times of all the closest_node functions defined in this question:
Original:
1 loop, best of 3: 1.01 sec per loop
Jaime v1:
100 loops, best of 3: 3.32 msec per loop
Jaime v2:
1000 loops, best of 3: 1.62 msec per loop
Mine:
100 loops, best of 3: 2.07 msec per loop
All vectorized functions perform hundreds of times faster than the original solution.
cdist is outperformed only by the second function by Jaime, but only slightly.
Certainly cdist is the simplest.
Scipy.spatial.KDTreeis the way to go for such approaches. \$\endgroup\$