Here's a way of packing the calculation into on call to einsum.
def findAnglesBetweenTwoVectors1(v1s, v2s):
dot = np.einsum('ijk,ijk->ij',[v1s,v1s,v2s],[v2s,v1s,v2s])
return np.arccos(dot[0,:]/(np.sqrt(dot[1,:])*np.sqrt(dot[2,:])))
for random vectors of length 10, the speed for this function and yours is basically the same. But with 10000, (corrected times)
In [8][62]: timeit findAnglesBetweenTwoVectors0(v1s,v2s)
1000100 loops, best of 3: 7922.26 usms per loop
In [9][63]: timeit findAnglesBetweenTwoVectors1(v1s,v2s)
100 loops, best of 3: 24.0624 ms per loop
Yours is clearly faster.
We could try to figure out why the more complex einsum is slower (assuming the issue is there rather than the indexing in the last line). But this result is consistent with other einsum testing that I've seen and done. einsum on simple 2D cases is as fast as np.dot, and in some cases faster. But in more complex cases (3 or more indexes) it is slower than the equivalent constructed from multiple calls to np.dot or einsum.
I tried tried factoring out the array construction, and got a speed improvement.
In [12]: V1=np.array(v1s) # shape (3,3,10000,3)
In [13]: V2=np.array(v2s)
In [22]: timeit findAnglesBetweenTwoVectors2(V1,V2)
1000100 loops, best of 3: 5392.63 usms per loop
I don't think it affectsI've corrected the times, but are we getting the right output shape? Should it be (3,3) orusing (10000,)?v1s.shape==(10000,3)