1

I would like to piece together a new list which is a string using two columns of a numpy array. However, I can't seem to get this to work without looping through each element:

import numpy as np
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1))
print(test_list[:,0])
print(test_list[:,1])

def dumbstring(points):
    # Loop through and append a list
    string_pnts = []
    for x in points:
        string_pnts.append("X co-ordinate is %g and y is %g" % (x[0], x[1]))
    return string_pnts

def dumbstring2(points):
    # Prefill a list
    string_pnts = [""] * len(points)
    i = 0
    for x in points:
        string_pnts[i] = ("X co-ordinate is %g and y is %g" % (x[0], x[1]))
        i += 1
    return string_pnts

def numpystring(points):
    return ("X co-ordinate is %g and y is %g" % (points[:,0], points[:,1]))    

def numpystring2(point_x, point_y):
    return ("X co-ordinate is %g and y is %g" % (point_x, point_y))

The first two work (I would have thought pre-filling would be faster than appending but it seems the same):

%timeit tdumbstring = dumbstring(test_list) # 239ms
%timeit tdumbstring2 = dumbstring2(test_list) # 239ms

However, the last do not - I wonder is there no way to vectorise this function then?

tnumpystring = numpystring(test_list) # Error
tnumpystring2 = numpystring2(test_list[:,0],test_list[:,1]) # Error

Edit:

I tried Pandas as I don't actually need Numpy, however it was a bit slower:

import pandas as pd
df = pd.DataFrame(test_list)
df.columns = ['x','y']
% time pdtest = ("X co-ordinate is " + df.x.map(str) + " and y is " + df.y.map(str)).tolist()
print(test[:5])

I also tried mapping but that was also slower than looping through np:

def mappy(pt_x,pt_y):
    return("X co-ordinate is %g and y is %g" % (pt_x, pt_y))
%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
print(mtest1[:5])

Timings:

enter image description here

2
  • I tried using calling map instead of using a for loop, but that didn't do much. From what I see, the string formatting of the two points is taking the most time. I also toyed around with numpy.savetxt and a virtual StringIO "file" but that only slowed everything down. Take a look here for a related discussion: stackoverflow.com/questions/2721521/… Commented Feb 25, 2016 at 10:09
  • Thanks Greg, I also tried map and found it a bit slower. What was weird: I tried pandas and that was slower too Commented Feb 25, 2016 at 10:24

1 Answer 1

1

Here's a solution using numpy.core.defchararray.add, first set your type to str.

from numpy.core.defchararray import add    
test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(100000,1)).astype(str)

def stringy_arr(points):
    return add(add('X coordinate is ', points[:,0]),add(' and y coordinate is ', points[:,1]))

slightly faster timing:

%timeit stringy_arr(test_list)
1 loops, best of 3: 216 ms per loop

array(['X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6', ...,
       'X coordinate is 1 and y coordinate is 2',
       'X coordinate is 3 and y coordinate is 4',
       'X coordinate is 5 and y coordinate is 6'], 
      dtype='|S85')

# Previously tried functions
%time dumbstring(test_list)
1 loops, best of 3: 340 ms per loop

%timeit tdumbstring2 = dumbstring2(test_list)
1 loops, best of 3: 320 ms per loop

%time mtest1 = list(map(lambda x: mappy(x[0],x[1]),test_list))
1 loops, best of 3: 340 ms per loop

EDIT

You could also just use pure python with comprehension, much faster than my first proposed solution:

test_list = np.tile(np.array([[1,2],[3,4],[5,6]]),(10000000,1)).astype(str)  #10M
test_list = test_list.tolist()

def comp(points):
    return ['X coordinate is %s Y coordinate is %s' % (x,y) for x,y in points]

%timeit comp(test_list)
1 loops, best of 3: 6.53 s per loop

['X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',
 'X coordinate is 1 Y coordinate is 2',
 'X coordinate is 3 Y coordinate is 4',
 'X coordinate is 5 Y coordinate is 6',...

%timeit dumbstring(test_list)
1 loops, best of 3: 30.7 s per loop
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! Peculiar but I checked with 10,000,000 and for some reason: loop-append-list: 25.1s, prefill_list: 24.7s, map-lambda: 28s, pandas_df: 72s, stringy-including-time-to-str: 71s, stringy-already_array_string: 77s
Just ran it at 10,000,000; %timeit dumbstring(test_list) was 1 loops, best of 3: 31.3 s per loop` and %timeit stringy_arr(test_list) was 1 loops, best of 3: 21.5 s per loop. I don't know if any are really ideal, not surprising because the solution I gave is still 'element-wise'...
Kevin, apologies but I added a screenshot to my original post as I feel I'm going crazy. The basic for-loop appears to be the fastest for me ...
In your image, you missed a step for the list comprehension function, convert the array to a list, then test it. test_list = test_list.tolist(). See if that helps.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.