0

I have a pandas dataframe of shape (75,9).

Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3)

I have a strange phenomenon:

data = self.df[self.column_name].values[0]

is of shape (100,4,3), but

data = self.df[self.column_name].values

is of shape (75,), with min and max are 'not a numeric object'

I expected data = self.df[self.column_name].values to be of shape (75, 100, 4, 3), with some min and max.

How can I make a column of numpy arrays behave like a numpy array of a higher dimension (with length=number of rows in the dataframe)?


Reproducing:

    some_df = pd.DataFrame(columns=['A'])
    for i in range(10):
        some_df.loc[i] = [np.random.rand(4, 6)]
    print some_df['A'].values.shape
    print some_df['A'].values[0].shape

prints (10L,),(4L,6L) instead of desired (10L, 4L, 6L),(4L,6L)

4
  • Holy cow, people are still writing new Python 2 code? Commented Jun 16, 2019 at 10:10
  • Hopefully not for long. I believe a solution would be the same for any python
    – Gulzar
    Commented Jun 16, 2019 at 10:12
  • 1
    np.stack(....values) may create an array with the desired shape. It doesn't change the dataframe's own storage.
    – hpaulj
    Commented Jun 16, 2019 at 10:21
  • @hpaulj That's it! I'll accept if you post it as an answer. I'm guessing it isn't the best performance-wise, but still works for me
    – Gulzar
    Commented Jun 16, 2019 at 10:27

2 Answers 2

2
In [42]: some_df = pd.DataFrame(columns=['A']) 
    ...: for i in range(4): 
    ...:         some_df.loc[i] = [np.random.randint(0,10,(1,3))] 
    ...:                                                                                  
In [43]: some_df                                                                          
Out[43]: 
             A
0  [[7, 0, 9]]
1  [[3, 6, 8]]
2  [[9, 7, 6]]
3  [[1, 6, 3]]

The numpy values of the column are an object dtype array, containing arrays:

In [44]: some_df['A'].to_numpy()                                                          
Out[44]: 
array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
       array([[1, 6, 3]])], dtype=object)

If those arrays all have the same shape, stack does a nice job of concatenating them on a new dimension:

In [45]: np.stack(some_df['A'].to_numpy())                                                
Out[45]: 
array([[[7, 0, 9]],

       [[3, 6, 8]],

       [[9, 7, 6]],

       [[1, 6, 3]]])
In [46]: _.shape                                                                          
Out[46]: (4, 1, 3)

This only works with one column. stack like all concatenate treats the input argument as an iterable, effectively a list of arrays.

In [48]: some_df['A'].to_list()                                                           
Out[48]: 
[array([[7, 0, 9]]),
 array([[3, 6, 8]]),
 array([[9, 7, 6]]),
 array([[1, 6, 3]])]
In [50]: np.stack(some_df['A'].to_list()).shape                                           
Out[50]: (4, 1, 3)
1
  • after over a year, we meet again. I remember this method giving me many headaches, and wonder if this is the wrong way to go. Is there a standard way for handling tabular data which is long lists of multi dimensional arrays? [each with its own title, and same shape]
    – Gulzar
    Commented Oct 26, 2020 at 16:01
1

What you're asking for is not quite possible. Pandas DataFrames are 2D. Yes, you can store NumPy arrays as objects (references) inside DataFrame cells, but this is not really well supported, and expecting to get a shape which has one dimension from the DataFrame and two from the arrays inside is not possible at all.

You should consider storing your data either entirely in NumPy arrays of the appropriate shape, or in a single, properly 2D DataFrame with MultiIndex. For example you can "pivot" a column of 1D arrays to become a column of scalars if you move the extra dimension to a new level of a MultIndex on the rows:

  A
x [2, 3]
y [5, 6]

becomes:

    A
x 0 2
  1 3
y 0 5
  1 6

or pivot to the columns:

  A
  0 1
x 2 3
y 5 6
2
  • Now i have time to make this right. What is the code that pivots in each direction?
    – Gulzar
    Commented Jun 23, 2019 at 12:33
  • DataFrame.stack(), after you break the lists into separate columns (see stackoverflow.com/questions/35491274/… for that). Commented Jun 23, 2019 at 14:37

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.