1

I have a function gen() which returns a numpy array of nElements number of floats. I'm looking for a more Pythonic (one liner?) way to do the following:

a = zeros((nSamples, nElements))
for i in xrange(nSamples):
     a[i,:] = gen()

This is one way to do it:

a = array([gen() for i in xrange(nSamples)]).reshape((nSamples, nElements))

But it understandably is a bit slower on account of not pre-allocating the numpy array:

import time
from numpy import *

nSamples  = 100000
nElements = 100

start = time.time()
a = array([gen() for i in xrange(nSamples)]).reshape((nSamples, nElements))
print (time.time() - start)

start = time.time()
a = zeros((numSamples, nElements))
for i in xrange(numSamples):
    a[i,:] = gen()
print (time.time() - start)

Output:

1.82166719437
0.502261161804

Is there a way to achieve the same one-liner while keeping the preallocated array for speed?

1
  • I'm no great guru of pythonicity, but I would use empty() rather than zeros() to save time, avoiding one useless pass over the entire array. Commented May 9, 2011 at 14:58

2 Answers 2

8

This may not answer your question directly, but since you mentioned Pythonic in the title... Please understand that Pythonic isn't necessarily a "one-liner" or the most clever and short (keystroke-wise) way of doing something. Quite the contrary - Pythonic code strives for clarity.

In the case of your code, I find:

a = zeros((nSamples, nElements))
for i in xrange(nSamples):
     a[i,:] = gen()

Much clearer than:

a = array([gen() for i in xrange(nSamples)]).reshape((nSamples, nElements))

Hence I wouldn't say the second one is more Pythonic. Probably less so.

Sign up to request clarification or add additional context in comments.

Comments

1

i believe this will do what you want:

a = vstack([ gen() for _ in xrange(nSamples) ])

as i don't have access to your gen function, i can't do timing tests. also, this (as well as your one-liner) are not as memory-friendly as your for loop version. the one-liners store all gen() outputs and then construct the array, whereas the for loop only needs to have one gen() in memory at a time (along with the numpy array).

1 Comment

Thanks for the input! This is indeed a bit slower than the for loop, but it works well for my purposes.