I am VERY new to the world of python/pandas/matplotlib, but I have been using it recently to create box and whisker plots. I was curious how to create a box and whisker plot for each sheet using a specific column of data, i.e. I have 17 sheets, and I have column called HMB and DV on each sheet. I want to plot 17 data sets on a Box and Whisker for HMB and another 17 data sets on the DV plot. Below is what I have so far.
I can open the file, and get all the sheets into list_dfs, but then don't know where to go from there. I was going to try and manually slice each set (as I started below before coming here for help), but when I have more data in the future, I don't want to have to do that by hand. Any help would be greatly appreciated!
import pandas as pd
import numpy as np
import xlrd
import matplotlib.pyplot as plt
%matplotlib inline
from pandas import ExcelWriter
from pandas import ExcelFile
from pandas import DataFrame
excel_file = 'Project File Merger.xlsm'
list_dfs = []
xls = xlrd.open_workbook(excel_file,on_demand=True)
for sheet_name in xls.sheet_names():
df = pd.read_excel(excel_file,sheet_name)
list_dfs.append(df)
d_psppm = {}
for i, sheet_name in enumerate(xls.sheet_names()):
df = pd.read_excel(excel_file,sheet_name)
d_psppm["PSPPM" + str(i)] = df.loc[:,['PSPPM']]
values_list = list(d_psppm.values())
print(values_list[:])
A sample output looks like below, for 17 list entries, but with different number of rows for each.
PSPPM
0 0.246769
1 0.599589
2 0.082420
3 0.250000
4 0.205140
5 0.850000,
PSPPM
0 0.500887
1 0.475255
2 0.472711
3 0.412953
4 0.415883
5 0.703716,...
The next thing I want to do is create a box and whisker plot, 1 plot with 17 box and whiskers. I am not sure how to get the dictionary to plot with the values and indices as the name. I have tried to dig, and figure out how to convert the dictionary to a list and then plot each element in the list, but have had no luck.
Thanks for the help!
fig, ax = plt.subplots()
and then iterate with multipleax.boxplot()
calls for each box. Personally, I would avoid boxplots (definitely make them notched if you decide to use them), and placing the data on as a jittered scatter is almost always better.dict
into a DataFrame?) thenpd.melt
to create long-form data and then plotsns.boxplot(x="variable", y="value", data=df)