How to summarize values in DataFrame between defined values in column in Python Pandas?

Question

I have DataFrame in Python Pandas like below:

Data types:

MONTH_NR - numeric
MONTH_NAME - object

VALUE - numeric

MONTH_NR	MONTH_NAME	VALUE
1	JANUARY	10
2	FEBRYARY	20
3	MARCH	15
4	APRIL	10
5	MAY	11
6	JUN	100
7	JULY	200
8	AUGUST	12
9	SEPTEMBER	20
10	OCTOBER	50
11	NOVEMBER	30
12	DECEMBER	50

And I need to add 3 new ROWS where will be:

sum of values from column "VALUE" after 6 month (from 1 till 6)
sum of values from column "VALUE" after 12 month (from 7 till 12)
sum of values from column "VALUE" after 12 month (from 1 till 12)

So as a result I need somethin like below:

    MONTH_NR | MONTH_NAME  | VALUE
    ---------|-------------|---------
    1        | JANUARY     |  10
    2        | FEBRYARY    |  20
    3        | MARCH       |  15
    4        | APRIL       |  10
    5        | MAY         |  11
    6        | JUN         |  100
SUM_AFTER_1_6|             |  166
    7        | JULY        |  200
    8        | AUGUST      |  12
    9        | SEPTEMBER   |  20
    10       | OCTOBER     |  50
    11       | NOVEMBER    |  30
    12       | DECEMBER    |  50
SUM_AFTER_7_12|             |  362
SUM_ALL      |             |  528

How can I do that in Python Pandas ?

answered. But why do you need such a heterogeneous dataframe, visually view the data? — inquirer, Commented Dec 10, 2022 at 10:44

inquirer · Accepted Answer · 2022-12-09 15:14:26Z

Here a nested list aaa is created in which the name and two indexes for the range in which the sums will be turn out through loc(note that slices in loc are taken inclusive. For example df.loc[0:5, 'VALUE'] would take rows 0 through 5, that is six rows.).

Further, in the List comprehension, a nested list is also created where the name, an empty string and the sum in each value inside the list. With np.insert, rows are inserted and the dataframe is overwritten and the original column names are set.

import pandas as pd
import numpy as np

df = pd.read_csv('df1.csv', header=0)

aaa = [['SUM_AFTER_1_6', 0, 5], ['SUM_AFTER_7_12', 6, 11], ['SUM_ALL', 0, 11]]
bbb = [[aaa[i][0], '', df.loc[aaa[i][1]:aaa[i][2], 'VALUE'].sum()] for i in range(len(aaa))]

df = pd.DataFrame(np.insert(df.values, [6, 12, 12], values=bbb, axis=0))

df.rename(columns={0: 'MONTH_NR', 1: 'MONTH_NAME', 2: 'VALUE'}, inplace=True)

print(df)

Output

          MONTH_NR MONTH_NAME VALUE
0                1    JANUARY    10
1                2   FEBRYARY    20
2                3      MARCH    15
3                4      APRIL    10
4                5        MAY    11
5                6        JUN   100
6    SUM_AFTER_1_6              166
7                7       JULY   200
8                8     AUGUST    12
9                9  SEPTEMBER    20
10              10    OCTOBER    50
11              11   NOVEMBER    30
12              12   DECEMBER    50
13  SUM_AFTER_7_12              362
14         SUM_ALL              528

bracko · Accepted Answer · 2022-12-06 21:34:31Z

0

Suggested approach:

add column order_num to df, my calculating month_nr * 10
calculate summary rows (into another df), with order_num = 65 for 1-6 sum, 125 for 7-12 sum and 130 for 1-12 sum
add calculated rows to original df
output df ordered by order_num

answered Dec 6, 2022 at 21:34

bracko

3721 silver badge9 bronze badges

bracko could you present example code please ? :)
– dingaro
Commented Dec 6, 2022 at 21:39

Add a comment |

Collectives™ on Stack Overflow

How to summarize values in DataFrame between defined values in column in Python Pandas?

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related