I'm relatively new to pandas and I don't know the best approach to solve my problem. Well, I have a df with: an index, and the data in a column called 'Data' and an empty column called 'sum'.
I need help to create a function to add the sum of the variable group of rows of the 'Data' column in the column 'sum'. The grouping criteria is that there should not be empty rows in the group.
Here an example:
index Data Sum
0 1
1 1 2
2
3
4 1
5 1
6 1 3
7
8 1
9 1 2
10
11 1
12 1
13 1
14 1
15 1 5
16
17 1 1
18
19 1 1
20
As you see, the length of each group of data in 'Data' is variable, could be only one row or any number of rows. Always the sum must be at the end of the group. As an example: the sum of the group of rows 4,5,6 of the 'Data' column should be at row 6 in the 'sum' column.
any insight will be appreciated.
UPDATE
The problem was solved by implementing the Method 3 suggested by ansev. However due to a change in the main program, the sum of each block, now need to be at the beggining of each one (in case the block has more than one row). Then I use the df = df.iloc[::-1]
instruction twice in order to reverse the column and back again to normal. Thank you very much!!!!!
df = df.iloc[::-1]
blocks = df['Data'].isnull().cumsum()
m = blocks.duplicated(keep='last')
df['Sum'] = df.groupby(blocks)['Data'].cumsum().mask(m)
df = df.iloc[::-1]
print(df)
Data Sum
0 1.0 2.0
1 1.0 NaN
2 NaN NaN
3 NaN NaN
4 1.0 3.0
5 1.0 NaN
6 1.0 NaN
7 NaN NaN
8 1.0 2.0
9 1.0 NaN
10 NaN NaN
11 1.0 5.0
12 1.0 NaN
13 1.0 NaN
14 1.0 NaN
15 1.0 NaN
16 NaN NaN
17 1.0 1.0
18 NaN NaN
19 1.0 1.0
20 NaN NaN
NaN
columns, right? There are no "empty" columns in pandas?