2

I have a Pandas Dataframe that I derive from a process like this:

df1 = pd.DataFrame({'c1':['A','B','C','D','E'],'c2':[1,2,3,4,5]})
df2 = pd.DataFrame({'c1':['A','B','C'],'c2':[1,2,3],'c3': [np.array((1,2,3,4,5,6)),np.array((6,7,8,9,10,11)),np.full((6,),np.nan)]})
df3 = df1.merge(df2,how='left',on=['c1','c2'])

This looks like this:

c1 c2 c3
A 1 [1,2,3,4,5,6]
B 2 [6,7,8,9,10,11]
C 3 [nan,nan,nan,nan,nan,nan]
D 4 NaN
E 5 NaN

In order to run the next step of my code, I need all of the arrays in c3 to have a consistent length. For the inputs coming in that were present in the join (i.e. row 1 through 3) this was already taken care of. However, for the rows that were missing from df2 where I now have only a single NaN value (rows 4 and 5) I need to replace those NaN's with an array of NaN values like in row 3. The problem is that I can't figure out how to do that.

I've tried a number of things, starting with the obvious:

df3.loc[pd.isnull(df3.c3),'c3'] = np.full((6,),np.nan)

Which gave me a

ValueError: Must have equal len keys and value when setting with an iterable

Fair enough; I understand this error and why python is confused about what I'm trying to do. How about this?

for i in df3.index:
    df3.at[i,'c3'] = np.full((6,),np.nan) if all(pd.isnull(df3.c3)) else df3.c3

That code runs without error but then when I go to print out df3 (or use it) I get this error:

RecursionError: maximum recursion depth exceeded

That one I don't understand, but moving on, what if I preassign a column full of my NaN arrays and then I can do some logic after the join:

for i in df1.index: df1.at[i,'c4'] = np.full((6,),np.nan)

This gives me the understandable error:

ValueError: setting an array element with a sequence 

How about another variation of the same idea:

df1['c4'] = np.full((6,),np.nan)

This one gives a different, also understandable error:

ValueError: Length of values (6) does not match length of index (5)

Hence, the question: How do I replace values in my dataframe (in this case null values) with an empty numpy array of a given length?

For clarity, the desired final result is this:

c1 c2 c3
A 1 [1,2,3,4,5,6]
B 2 [6,7,8,9,10,11]
C 3 [nan,nan,nan,nan,nan,nan]
D 4 [nan,nan,nan,nan,nan,nan]
E 5 [nan,nan,nan,nan,nan,nan]
1
  • maybe you should use df3[i].c3 or df3.at[i,'c3'] instead of df3.c3 because df3.c3 gives all values in column but you need only value from current row.
    – furas
    Commented 2 days ago

2 Answers 2

1

A possible solution:

# the array with the 6 nan values
arr_nan = np.full(
    df3['c3'].map( 
        lambda x: np.size(x) if isinstance(x, np.ndarray) else 0).max(), np.nan)

df3.assign(c3 = df3['c3'].map(
    lambda y: arr_nan if not isinstance(y, np.ndarray) else y))

This solution first determines the length of the arrays in c3, and then replaces all non-array entries in c3 by the array of 6 np.nan.

Output:

  c1  c2                              c3
0  A   1              [1, 2, 3, 4, 5, 6]
1  B   2            [6, 7, 8, 9, 10, 11]
2  C   3  [nan, nan, nan, nan, nan, nan]
3  D   4  [nan, nan, nan, nan, nan, nan]
4  E   5  [nan, nan, nan, nan, nan, nan]
0
0

Get the index of the rows where you have na values, and create a Series with an equal amount of rows, and with the same index.

idx = df3[df3['c3'].isna()].index
df3.loc[idx, 'c3'] = pd.Series([np.full((6,), np.nan)] * len(idx), index=idx)

End result:

c1  c2                             c3
 A   1             [1, 2, 3, 4, 5, 6]
 B   2           [6, 7, 8, 9, 10, 11]
 C   3 [nan, nan, nan, nan, nan, nan]
 D   4 [nan, nan, nan, nan, nan, nan]
 E   5 [nan, nan, nan, nan, nan, nan]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.