Python Pandas: Sorting Pivot Table column by another column

Question

I am trying to pivot some data in Python pandas package by using the pivot_table feature but as part of this I have a specific, bespoke order that I want to see my columns returned in - determined by a Sort_Order field which is already in the dataframe. So for test example with:


raw_data = {'Support_Reason' : ['LD', 'Mental Health', 'LD', 'Mental Health', 'LD', 'Physical', 'LD'],
            'Setting' : ['Nursing', 'Nursing', 'Residential', 'Residential', 'Community', 'Prison', 'Residential'],
            'Setting_Order' : [1, 1, 2, 2, 3, 4, 2],
            'Patient_ID' : [6789, 1234, 4567, 5678, 7890, 1235, 3456]}

Data = pd.DataFrame(raw_data, columns = ['Support_Reason', 'Setting', 'Setting_Order', 'Patient_ID'])

Data

Then pivot:

pivot = pd.pivot_table(Data, values='Patient_ID', index=['Support_Reason'],
                   columns=['Setting'], aggfunc='count',dropna = False)
pivot  = pivot.reset_index()

pivot

This is exactly how I want my table to look except that the columns have defaulted to A-Z ordering. I would like them to be ordered Ascending as per the Setting_Order column - so that would be order of Nursing, Residential, Community then Prison. Is there some additional syntax that I could add to my pd.pivot_table code would make this possible please?

I realise there are a few different work-arounds for this, the simplest being re-ordering the columns afterwards(!) but I want to avoid having to hard-code column names as these will change over time (both the headings and their order) and the Setting and Setting_Order fields will be managed in a separate reference table. So any form of answer that will avoid having to list Settings in code would be ideal really.

Quick remark: When you create the dataframe with Data = pd.DataFrame(raw_data, columns = ['Support_Reason', 'Setting', 'Setting_Order', 'Patient_ID']), you don't have to specify the column names, as they are already included in the dictionary raw_data. In this way, you can avoid hard-coding the column names at that place. — Flursch, Commented Apr 1, 2022 at 14:19
Thanks Flursch - this is just my lack of expertise showing. The real-world example is imported from a flat-file csv anyway — nnn1234, Commented Apr 1, 2022 at 14:26

not_speshal · Accepted Answer · 2022-04-01 14:36:04Z

2

Try:

ordered = df.sort_values("Setting_Order")["Setting"].drop_duplicates().tolist()
pivot = pivot[list(pivot.columns.difference(ordered))+ordered]

edited Apr 1, 2022 at 14:36

answered Apr 1, 2022 at 14:26

not_speshal

23.2k2 gold badges17 silver badges33 bronze badges

Thanks @not_speshal - this works neatly like Flursch's example below, but also means the Support_Reason field is missing. It's vital that this field remains so the matrix type format makes sense
– nnn1234
Commented Apr 1, 2022 at 14:34
@nnn1234 - See the edited answer.
– not_speshal
Commented Apr 1, 2022 at 14:36

Add a comment |

Flursch · Accepted Answer · 2022-04-01 14:49:03Z

1

col_order = list(Data.sort_values('Setting_Order')['Setting'].unique())
pivot[col_order+['Support_Reason']]

Does this help?

edited Apr 1, 2022 at 14:49

answered Apr 1, 2022 at 14:13

Flursch

4833 silver badges4 bronze badges

This certainly provides the correct column order in the Pivot dataframe thanks @Flursch, although it's lost the Support_Reason field which is also crucial
– nnn1234
Commented Apr 1, 2022 at 14:29
@nnn1234 I have edited the code in my answer to also include the Support_Reason column.
– Flursch
Commented Apr 1, 2022 at 14:51

Add a comment |

Collectives™ on Stack Overflow

Python Pandas: Sorting Pivot Table column by another column

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Related