0

I have DataFrame in Python Pandas like below:

Data types:

  • MONTH_NR - numeric

  • MONTH_NAME - object

  • VALUE - numeric

    MONTH_NR MONTH_NAME VALUE
    1 JANUARY 10
    2 FEBRYARY 20
    3 MARCH 15
    4 APRIL 10
    5 MAY 11
    6 JUN 100
    7 JULY 200
    8 AUGUST 12
    9 SEPTEMBER 20
    10 OCTOBER 50
    11 NOVEMBER 30
    12 DECEMBER 50

And I need to add 3 new ROWS where will be:

  1. sum of values from column "VALUE" after 6 month (from 1 till 6)
  2. sum of values from column "VALUE" after 12 month (from 7 till 12)
  3. sum of values from column "VALUE" after 12 month (from 1 till 12)

So as a result I need somethin like below:

    MONTH_NR | MONTH_NAME  | VALUE
    ---------|-------------|---------
    1        | JANUARY     |  10
    2        | FEBRYARY    |  20
    3        | MARCH       |  15
    4        | APRIL       |  10
    5        | MAY         |  11
    6        | JUN         |  100
SUM_AFTER_1_6|             |  166
    7        | JULY        |  200
    8        | AUGUST      |  12
    9        | SEPTEMBER   |  20
    10       | OCTOBER     |  50
    11       | NOVEMBER    |  30
    12       | DECEMBER    |  50
SUM_AFTER_7_12|             |  362
SUM_ALL      |             |  528

How can I do that in Python Pandas ?

1
  • answered. But why do you need such a heterogeneous dataframe, visually view the data?
    – inquirer
    Commented Dec 10, 2022 at 10:44

2 Answers 2

0

Here a nested list aaa is created in which the name and two indexes for the range in which the sums will be turn out through loc(note that slices in loc are taken inclusive. For example df.loc[0:5, 'VALUE'] would take rows 0 through 5, that is six rows.).

Further, in the List comprehension, a nested list is also created where the name, an empty string and the sum in each value inside the list. With np.insert, rows are inserted and the dataframe is overwritten and the original column names are set.

import pandas as pd
import numpy as np

df = pd.read_csv('df1.csv', header=0)

aaa = [['SUM_AFTER_1_6', 0, 5], ['SUM_AFTER_7_12', 6, 11], ['SUM_ALL', 0, 11]]
bbb = [[aaa[i][0], '', df.loc[aaa[i][1]:aaa[i][2], 'VALUE'].sum()] for i in range(len(aaa))]

df = pd.DataFrame(np.insert(df.values, [6, 12, 12], values=bbb, axis=0))

df.rename(columns={0: 'MONTH_NR', 1: 'MONTH_NAME', 2: 'VALUE'}, inplace=True)

print(df)

Output

          MONTH_NR MONTH_NAME VALUE
0                1    JANUARY    10
1                2   FEBRYARY    20
2                3      MARCH    15
3                4      APRIL    10
4                5        MAY    11
5                6        JUN   100
6    SUM_AFTER_1_6              166
7                7       JULY   200
8                8     AUGUST    12
9                9  SEPTEMBER    20
10              10    OCTOBER    50
11              11   NOVEMBER    30
12              12   DECEMBER    50
13  SUM_AFTER_7_12              362
14         SUM_ALL              528
0

Suggested approach:

  • add column order_num to df, my calculating month_nr * 10
  • calculate summary rows (into another df), with order_num = 65 for 1-6 sum, 125 for 7-12 sum and 130 for 1-12 sum
  • add calculated rows to original df
  • output df ordered by order_num
1
  • bracko could you present example code please ? :)
    – dingaro
    Commented Dec 6, 2022 at 21:39

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.