How to split dataframe into multiple sub-dataframes based on column value

Question

I got a dataframe df1 which looks like this:

Column1	Column2
13	1
12	1
15	0
16	0
15	1
14	1
12	1
11	0
21	1
45	1
44	0

The 1s indicate that a measurement started, I don't know how many 1s will be in one measurement and also don't know how many 0s will be in between two measurements. So what I want to yield are sub-dataframes, which are as long as I received a 1. So in my example it would be:

df2

Column1	Column2
13	1
12	1

df3

Column1	Column2
15	1
14	1
12	1

and df4

Column1	Column2
21	1
45	1

Alternatively, it would be acceptable to count up on Column2, so I can later split based on that value of Column2:

df5

Column1	Column2
13	1
12	1
15	0
16	0
15	2
14	2
12	2
11	0
21	3
45	3
44	0

I have no idea on how to approach this. Also with googling I could not find a proper approach. Thanks for any help.

Use .diff() and .cumsum().

Reinderien
– Reinderien

2025-11-25 15:49:08 +00:00
Commented Nov 25 at 15:49 — Reinderien
– Reinderien, Commented Nov 25 at 15:49

simon · Accepted Answer · 2025-11-27 09:05:07Z

I would use the hint provided in Reinderien's comment to realize your alternative approach (i.e. counting up to get df5 from your question):

import numpy as np
import pandas as pd

df1 = pd.DataFrame({
    "Column1": [13,12,15,16,15,14,12,11,21,45,44],
    "Column2": [1,1,0,0,1,1,1,0,1,1,0]
})

df5 = df1.copy()
starts = (np.diff(df5["Column2"], prepend=0) == 1)
df5["Column2"] = np.cumsum(starts) * df5["Column2"]

This produces in df5:

    Column1  Column2
0        13        1
1        12        1
2        15        0
3        16        0
4        15        2
5        14        2
6        12        2
7        11        0
8        21        3
9        45        3
10       44        0

Main ideas:

With np.diff(…), we get the positions where subsequent values are different:
```
[1 0 -1 0 1 0 0 -1 1 0 -1]
```
With np.diff(…) == 1, we only keep those where the value changes from 0 to 1 (as opposed to changing from 1 to 0), i.e. we only keep the starts of segments containing measurements:
```
[1 0 0 0 1 0 0 0 1 0 0]
```
or rather, [True False False …], since we have boolean values at this point.
With np.cumsum(…), we "spread out" this information to subsequent rows, at the same time incrementing the value at the start of each new segment:
```
[1 1 1 1 2 2 2 2 3 3 3]
```

With np.cumsum(…) * df5["Column2"], we suppress the gaps between segments again:

  [1 1 1 1 2 2 2 2 3 3 3]
* [1 1 0 0 1 1 1 0 1 1 0]

= [1 1 0 0 2 2 2 0 3 3 0]

If you prefer a Pandas-only solution, you could replace the last two lines by:

starts = df5["Column2"].diff() == 1
starts[0] = (df5["Column2"][0] == 1)
df5["Column2"] = starts.cumsum() * df5["Column2"]

Here, some extra effort (starts[0] = …) is necessary to get the correct value for df5's first row, which we equivalently achieved by np.diff(…, prepend=0) in the previous solution.

Panda Kim · Accepted Answer · 2025-11-25 16:44:59Z

1

grp = df['Column2'].ne(df['Column2'].shift()).cumsum()
cond = df['Column2'].ne(0)
out = [d for _, d in df[cond].groupby(grp)]

out

[   Column1  Column2
 0       13        1
 1       12        1,

    Column1  Column2
 4       15        1
 5       14        1
 6       12        1,

    Column1  Column2
 8       21        1
 9       45        1]

answered Nov 25 at 16:44

Panda Kim

13.7k2 gold badges8 silver badges15 bronze badges

Comments

PaulS · Accepted Answer · 2025-11-25 17:22:12Z

Another possible solution:

m = df['Column2'].eq(1)
starts = df['Column2'].diff().fillna(1).eq(1) & m
df['grp_id'] = starts.cumsum()
[df[(df['grp_id'].eq(i)) & m].drop(columns='grp_id') 
 for i in df['grp_id'][m].unique()]

This solution identifies contiguous measurement blocks by first creating a boolean mask of rows where Column2.eq(1), then detects the start of each block using diff() to find transitions from 0 to 1 (with fillna(1) handling the first row) combined with the mask to isolate only true beginnings; cumsum() converts these boolean start flags into sequential group IDs, and a list comprehension iterates over unique() group values to slice the dataframe via boolean indexing, producing separate sub-dataframes for each measurement block while dropping the temporary group column.

Output:

[   Column1  Column2
 0       13        1
 1       12        1,
    Column1  Column2
 4       15        1
 5       14        1
 6       12        1,
    Column1  Column2
 8       21        1
 9       45        1]

Bhumika Aggarwal · Accepted Answer · 2025-11-25 15:44:12Z

Here’s one way to solve it using pandas. The key idea is to assign a “measurement number” to each consecutive block of 1s in Column2. Then, if you want, you can split them into separate DataFrames.

import pandas as pd

# Sample DataFrame
data = {
    "Column1": [13,12,15,16,15,14,12,11,21,45,44],
    "Column2": [1,1,0,0,1,1,1,0,1,1,0]
}

df = pd.DataFrame(data)

# Step 1: Add a new column for measurement numbers
df['Measurement'] = 0

measurement = 0       # Counter for each measurement
in_measurement = False  # Flag to track if we are inside a block of 1s

# Step 2: Iterate over rows to assign measurement numbers
for i in range(len(df)):
    if df.loc[i, 'Column2'] == 1:
        if not in_measurement:
            measurement += 1   # Start of a new measurement
            in_measurement = True
        df.loc[i, 'Measurement'] = measurement
    else:
        in_measurement = False  # End of the current measurement

print(df)

Output:

    Column1  Column2  Measurement
0        13        1            1
1        12        1            1
2        15        0            0
3        16        0            0
4        15        1            2
5        14        1            2
6        12        1            2
7        11        0            0
8        21        1            3
9        45        1            3
10       44        0            0

Split into separate DataFrames

Once you have the Measurement column, you can split the DataFrame like this:

# Get unique measurement numbers
measurements = df['Measurement'].unique()

# Create a list of sub-DataFrames
dfs = [df[df['Measurement']==m].drop(columns='Measurement') for m in measurements if m != 0]

# Example: first measurement
df2 = dfs[0]
print(df2)

This will give you:

   Column1  Column2
0       13        1
1       12        1

You can similarly access df3, df4, etc.

How this works

in_measurement keeps track of whether you are currently inside a block of 1s.
measurement increments only when a new block starts.
Every 1 in the same block gets the same number.
0s are ignored but mark the end of a block.
After that, splitting into sub-DataFrames is easy.

Collectives™ on Stack Overflow

How to split dataframe into multiple sub-dataframes based on column value

4 Answers 4

Comments

Comments

Comments

Output:

Split into separate DataFrames

How this works

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Output:

Split into separate DataFrames

How this works

1 Comment

Related