Pandas Multi-Index DataFrame to Numpy Ndarray

Question

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray. The DataFrame is below:

               s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4):

[[[ 0.0  0.0  0.8  0.2 ]
  [ 0.1  0.0  0.9  0.0 ]]

 [[ 0.0  0.0  0.9  0.1 ]
  [ 0.0  0.0  1.0  0.0]]]

I have tried df.as_matrix() but this returns:

 [[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]
  [ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]

How do I return a list of lists for the first level with each list representing an Action records.

Just reshape afterwards?

Divakar
– Divakar

2017-09-06 15:25:27 +00:00
Commented Sep 6, 2017 at 15:25 — Divakar
– Divakar, Commented Sep 6, 2017 at 15:25
The shape in your result looks like (2, 2, 4).

Brad Solomon
– Brad Solomon

2017-09-06 20:08:53 +00:00
Commented Sep 6, 2017 at 20:08 — Brad Solomon
– Brad Solomon, Commented Sep 6, 2017 at 20:08

Brad Solomon · Accepted Answer · 2017-09-06 20:06:17Z

5

You could use the following:

dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

The first line just finds the number of groups that you want to groupby.

Why this (or groupby) is needed: as soon as you use .values, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.

answered Sep 6, 2017 at 20:06

Brad Solomon

41.2k39 gold badges167 silver badges260 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

g.a Over a year ago

Note, that instead of .values you now need to use .to_numpy, and that this method assumes you have all combinations of Action * State * State present in your dataframe.

Zero · Accepted Answer · 2017-09-06 15:28:22Z

1

One way

In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2], 
        [0.1, 0.0, 0.9, 0.0]],
       [[0.0, 0.0, 0.9, 0.1],
        [0.0, 0.0, 1.0, 0.0]]], dtype=object)

answered Sep 6, 2017 at 15:28

Zero

77.4k22 gold badges153 silver badges153 bronze badges

1 Comment

sccrthlt Over a year ago

Unfortunately this array does not have the same dimensions that the intended array does: np.shape() of your result gives (2,) and the intended np.shape() is (2,3,3)

sccrthlt · Accepted Answer · 2017-09-06 20:15:55Z

0

Using Divakar's suggestion, np.reshape() worked:

>>> print(P)

              s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

>>> np.reshape(P,(2,2,-1))

[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

>>> np.shape(P)

(2, 2, 4)

answered Sep 6, 2017 at 20:15

sccrthlt

4,5145 gold badges24 silver badges24 bronze badges

1 Comment

Brad Solomon Over a year ago

Thought you wanted a more generic solution ... whatever works!

Vadb · Accepted Answer · 2021-06-03 12:05:46Z

Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:

def df_to_numpy(df):
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

If df has missing sub-indexes reshape will not work. One way to add them would be (maybe there are better solutions):

def enforce_df_shape(df):
    try:
        ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
    except AttributeError:
        return df
    fulldf = pd.DataFrame(-1, columns=df.columns, index=ind)  # remove -1 to fill fulldf with nan
    fulldf.update(df)
    return fulldf

Tom Johnson · Accepted Answer · 2022-12-20 04:41:07Z

0

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:

x = df.s1.to_numpy().reshape(df.index.levshape)

This will give you a (2,2) containing the value of s1.

answered Dec 20, 2022 at 4:41

Tom Johnson

2,2243 gold badges20 silver badges38 bronze badges

Collectives™ on Stack Overflow

Pandas Multi-Index DataFrame to Numpy Ndarray

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

Comments

Linked

Related