Iterating over lists in pandas dataframe to remove everything after certain value (if the value exists) in list

Question

I want to filter my dataframe values based on the occurrence of '1' in my column events. When a 1 occurres, everything after the 1 should be removed.

I want to do this for my whole dataframe, which looks like this:

import pandas as pd

df = pd.DataFrame([['00000000000 ', [4, 5, 5, 3, 2, 1, 5]],
                   ['00000000001', [4, 5, 5, 1, 2, 1, 5, 5, 5]],
                   ['00000000002 ', [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]]],
                  columns=['session_id', 'events'])

This works with the following solution, like answered in this question.

df['events_short'] = ""
for i, row in df.iterrows():
    df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]

This only works if the '1' occurs, when it doesn't, I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-175-e4d3f228e32f> in <module>()
      1 df['events_short'] = ""
      2 for i, row in df.iterrows():
----> 3     df.at[i, 'events_short'] = row['events'][:row['events'].index(1)]

ValueError: 1 is not in list

Therefore, I need an exception, for when the 1 does not occur in the array. Can someone help me to set this up? Thanks!

Please share a reproducible code and dataframe and edit your question — OnY, Commented Jan 6, 2022 at 13:12
This could be helpful: stackoverflow.com/questions/20109391/… — Rivers, Commented Jan 6, 2022 at 13:15

OnY · Accepted Answer · 2022-01-06 13:18:48Z

2

You can use apply and find the first element in the list, and truncate it accordingly.

df['events_short']=df['events'].apply(lambda x:x[0:x.index(1)] if 1 in x else None)

If you want to include the 1:

df['events_short']=df['events'].apply(lambda x:x[0:x.index(1)+1] if 1 in x else None)

Note that apply as preferred (faster) than iterrow

answered Jan 6, 2022 at 13:18

OnY

8976 silver badges13 bronze badges

When I try this, I only get 'None' values, with an error message: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
– Oldbighorn
Commented Jan 6, 2022 at 13:24
@Oldbighorn I get no error, and None only where there are no 1 in the list. Try to open a new console and make sure you are using the exact example you uploaded and my code.
– OnY
Commented Jan 6, 2022 at 13:27
Allright! thanks a lot, now it works.
– Oldbighorn
Commented Jan 6, 2022 at 13:35

Add a comment |

mozway · Accepted Answer · 2022-01-06 13:46:23Z

1

While @OnY's answer is nice, it requires to read twice each list (once to find if the index is existing, once to find it).

A more efficient approach might be to use a helper function with try/except:

def upto1(l):
    try:
        return l[:l.index(1)]
    except ValueError:
        return l
    
df['events2'] = df['events'].apply(upto1)

example:

    session_id                          events          events2
0  00000000000           [4, 5, 5, 3, 2, 1, 5]  [4, 5, 5, 3, 2]
1  00000000001     [4, 5, 5, 1, 2, 1, 5, 5, 5]        [4, 5, 5]
2  00000000002  [4, 5, 1, 3, 2, 1, 5, 5, 5, 1]           [4, 5]
3  00000000003                       [0, 2, 3]        [0, 2, 3]

answered Jan 6, 2022 at 13:46

mozway

264k13 gold badges50 silver badges99 bronze badges

@Oldbighorn you don't have to accept my answer, you can keep that of Ony, this was more of a general comment, the true gain in speed should be checked in a real use-case
– mozway
Commented Jan 6, 2022 at 13:56

Add a comment |

Ben G · Accepted Answer · 2022-01-06 14:14:05Z

0

Building further off of @mozway's answer, it is (generally) good practice to avoid having the program intentionally raise an exception and catching, since the try-except can be slower than non-failing logic:

def upto1(l):
    return l[:l.index(1)] if 1 in l else l

df['events2'] = df['events'].apply(upto1)

answered Jan 6, 2022 at 14:14

Ben G

2982 silver badges6 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Iterating over lists in pandas dataframe to remove everything after certain value (if the value exists) in list

3 Answers 3

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Linked

Related