I'm analysing some imaging data that consists of large 3-dimensional arrays of pixel intensities with dimensions [frame, x, y]
. Since these are usually too big to hold in memory, they reside on the hard disk as PyTables arrays.
What I'd like to be able to do is read out the intensities in an arbitrary subset of pixels across all frames. The natural way to do this seems to be list indexing:
import numpy as np
import tables
tmph5 = tables.open_file('temp.hdf5', 'w')
bigarray = tmph5.create_array('/', 'bigarray', np.random.randn(1000, 200, 100))
roipixels = [[0, 1, 2, 4, 6], [34, 35, 36, 40, 41]]
roidata = bigarray[:, roipixels[0], roipixels[1]]
# IndexError: Only one selection list is allowed
Unfortunately it seems that PyTables currently only supports a single set of list indices. A further problem is that a list index can't contain duplicates - I couldn't simultaneously read pixels [1, 2]
and [1, 3]
, since my list of pixel x-coordinates would contain [1, 1]
. I know that I can iterate over rows in the array:
roidata = np.asarray([row[roipixels[0], roipixels[1]] for row in bigarray])
but these iterative reads become quite slow for the large number of frames I'm processing.
Is there a nicer way of doing this? I'm relatively new to PyTables, so if you have any tips on organising datasets in large arrays I'd love to hear them.
h5py
is much more natural for non-table-like data. However, it has the same limitation, in this case.