Skip to main content
deleted 38 characters in body
Source Link
hpaulj
  • 1.6k
  • 1
  • 9
  • 16

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000, (corrected times)

In [8][62]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000100 loops, best of 3: 7922.26 usms per loop

In [9][63]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 24.0624 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.


I tried tried factoring out the array construction, and got a speed improvement.

In [12]: V1=np.array(v1s)  # shape (3,3,10000,3)
In [13]: V2=np.array(v2s)
In [22]: timeit findAnglesBetweenTwoVectors2(V1,V2)
1000100 loops, best of 3: 5392.63 usms per loop

I don't think it affectsI've corrected the times, but are we getting the right output shape? Should it be (3,3) orusing (10000,)?v1s.shape==(10000,3)

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000,

In [8]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000 loops, best of 3: 792 us per loop

In [9]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 2.06 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.


I tried tried factoring out the array construction, and got a speed improvement.

In [12]: V1=np.array(v1s)  # shape (3,3,10000)
In [13]: V2=np.array(v2s)
In [22]: timeit findAnglesBetweenTwoVectors2(V1,V2)
1000 loops, best of 3: 539 us per loop

I don't think it affects the times, but are we getting the right output shape? Should it be (3,3) or (10000,)?

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000, (corrected times)

In [62]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
100 loops, best of 3: 2.26 ms per loop

In [63]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 4.24 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.


I tried tried factoring out the array construction, and got a speed improvement.

In [12]: V1=np.array(v1s)  # shape (3,10000,3)
In [13]: V2=np.array(v2s)
In [22]: timeit findAnglesBetweenTwoVectors2(V1,V2)
100 loops, best of 3: 2.63 ms per loop

I've corrected the times, using v1s.shape==(10000,3)

added 428 characters in body
Source Link
hpaulj
  • 1.6k
  • 1
  • 9
  • 16

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000,

In [8]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000 loops, best of 3: 792 us per loop

In [9]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 2.06 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.


I tried tried factoring out the array construction, and got a speed improvement.

In [12]: V1=np.array(v1s)  # shape (3,3,10000)
In [13]: V2=np.array(v2s)
In [22]: timeit findAnglesBetweenTwoVectors2(V1,V2)
1000 loops, best of 3: 539 us per loop

I don't think it affects the times, but are we getting the right output shape? Should it be (3,3) or (10000,)?

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000,

In [8]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000 loops, best of 3: 792 us per loop

In [9]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 2.06 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000,

In [8]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000 loops, best of 3: 792 us per loop

In [9]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 2.06 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.


I tried tried factoring out the array construction, and got a speed improvement.

In [12]: V1=np.array(v1s)  # shape (3,3,10000)
In [13]: V2=np.array(v2s)
In [22]: timeit findAnglesBetweenTwoVectors2(V1,V2)
1000 loops, best of 3: 539 us per loop

I don't think it affects the times, but are we getting the right output shape? Should it be (3,3) or (10000,)?

Source Link
hpaulj
  • 1.6k
  • 1
  • 9
  • 16

Here's a way of packing the calculation into on call to einsum.

def findAnglesBetweenTwoVectors1(v1s, v2s):
    dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
    return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))

for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000,

In [8]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000 loops, best of 3: 792 us per loop

In [9]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 2.06 ms per loop

Yours is clearly faster.

We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.