Active questions tagged python-2.7+matplotlib+dataframe - Stack Overflow

How do I categorise and plot data from a Pandas dataframe using Matplotlib?

2019-03-23T16:17:51Z

I have a DataFrame of Tweet values and want to plot a graph of 'Favourites' against 'Date' and categorise/colour-code the data by 'User'.

I am able to get a scatter or bar plot of the data but cannot get a working solution to categorise based on the 'User'. The 'Date' also comes out as messy in the graph and I am unable to understand the cause of this problem.

I have tried using this tutorial to get a line graph but don't understand how to apply it to my DataFrame

DataFrame Structure

data_frame = pandas.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

data_frame['User'] = numpy.array([tweet.user.screen_name for tweet in tweets])
data_frame['ID'] = numpy.array([tweet.id for tweet in tweets])
data_frame['Length'] = numpy.array([len(tweet.text) for tweet in tweets])
data_frame['Date'] = numpy.array([tweet.created_at for tweet in tweets])
data_frame['Source'] = numpy.array([tweet.source for tweet in tweets])
data_frame['Favourites'] = numpy.array([tweet.favorite_count for tweet in tweets])
data_frame['Retweets'] = numpy.array([tweet.retweet_count for tweet in tweets])

return data_frame

Plotting

x = result.Date
y = result.Favourites

plt.xlabel("Date", fontsize=10)
plt.ylabel("Favourites", fontsize=10)


plt.figure(figsize=(30,30))

fig, ax = plt.subplots()


plt.scatter(x,y)

plt.savefig('plot.png')

I want the graph to show a line graph of Favourites against time with the different Users colour coded something like in the below example:

My current output is this:

Sample Data

Raw paste

Plotting multiple dates on year in scatterplot Python

2018-11-26T16:41:03Z

I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:

Date            Level        Price
2008-01-01      56           11
2008-01-03      10           12
2008-01-05      52           13
2008-02-01      66           14
2008-05-01      20           10
..
2009-01-01      12           11
2009-02-01      70           11
2009-02-05      56           12
..
2018-01-01      56           10
2018-01-11      10           17
..

Only way I know how to tackle this is to just manually select using iloc and eyeball the dates in the dataframe like this:

fig = plt.figure(figsize=(15,10))
ax1 = fig.add_subplot(111)

ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008')
ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009')

.
.
. (for each year I want)

plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()

But this takes a lot of time.

I'd like to automatically loop through each Date's Year and plot different Levels (Y) to Price (X) on colors by that given year and make a legend label for each year.

What would be a good strategy to do this?

Function does not finish executing in `hist` function only on second time

2018-10-02T06:48:47Z

In Python DataFrame, Im trying to generate histogram, it gets generated the first time when the function is called. However, when the create_histogram function is called second time it gets stuck at h = df.hist(bins=3, column="amount"). When I say "stuck", I mean to say that it does not finish executing the statement and the execution does not continue to the next line but at the same time it does not give any error or break out from the execution. What is exactly the problem here and how can I fix this?

import matplotlib.pyplot as plt
...
...
    def create_histogram(self, field):
        df = self.main_df    # This is DataFrame
        h = df.hist(bins=20, column="amount")
        fileContent = StringIO()
        plt.savefig(fileContent, dpi=None, facecolor='w', edgecolor='w',
                    orientation='portrait', papertype=None, format="png",
                    transparent=False, bbox_inches=None, pad_inches=0.5,
                    frameon=None)
        content = fileContent.getvalue()
        return content

Dataframe to plot different charts from groups in columns

2018-02-11T22:48:12Z

I have a dataframe that has multiple columns, containing different categories (['A'], ['1','2','3','4'])

Index1 Index2    X    Y
A      '1'       1    2
A      '1'       5    3
A      '1'       3    4
A      '2'       3    1
A      '2'       4    1
A      '2'       3    5 
A      '2'       1    2
A      '3'       5    3
A      '3'       3    4
A      '4'       3    1
A      '4'       4    1
A      '4'       3    5

I need to loop it so that it brings me four different splatter charts for each pair of indexes (in the future there will be a B index, that's the reason for the multiindex).

My code at the moment brings me one chart for every line (in this example would be 12 of them), if I break at the end it brings me only one

I tried .iterows() and .itertuples() both of them got me the same result (maybe have been using them wrong to)

import pandas as pd
from matplotlib import pyplot as plt

Index1 = ['A','A','A','A','A','A','A','A','A','A','A','A']
Index2 = ['1','1','1','2','2','2','2','3','3','4','4','4']
X = [1,5,3,3,4,3,1,5,3,3,4,3]
Y = [2,3,4,1,1,5,2,3,4,1,1,5]
df = pd.DataFrame(Index1)
df = df.assign(Index2 = Index2,X=X,Y=Y)
df.set_index(['Index1','Index2'])

second_index = 1     
for index in df.itertuples():
    df = df.groupby('Index2').get_group(second_index)
    df.plot.scatter(x = 'X', y = 'Y')
    plt.show()
    break

I have a similar code runing on a dictionary that works on the same logic and it brings me all the charts that I need.

p.s.: that's not the real code just the general idea, and I might have made some mistakes

Match 2 Graphs in an only image matplotlib

2018-01-11T02:23:09Z

I´m trying to create some graphs from a dataframe I imported. The problem is I can create an only image with both graphs. I have this output:

And I´m looking for this Output:

Here is the code:

from pandas_datareader import data
import pandas as pd
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('csv.csv', index_col = 'Totales', parse_dates=True)
df.head()
df['Subastas'].plot()
plt.title('Subastadas')
plt.xlabel('Fechas')
plt.ylabel('Cant de Subastadas')
plt.subplot()

df['Impresiones_exchange'].plot()
plt.title('Impresiones_exchange')
plt.xlabel('Fechas')
plt.ylabel('Cant de Impresiones_exchange')
plt.subplot()
plt.show()

CSV data:

Totales,Subastas,Impresiones_exchange,Importe_a_pagar_a_medio,Fill_rate,ECPM_medio
Total_07/01/2017,1596260396,30453841,19742.04,3.024863813,0.733696498
Total_07/12/2017,1336604546,57558106,43474.29,9.368463445,0.656716233
Total_07/01/2018,1285872189,33518075,20614.4,4.872889166,0.678244085

Also, I would like to save the output in an xlsx file too!

pandas: count the occurrence of month of years

2017-05-18T09:08:07Z

I have a large number of rows dataframe(df_m) as following, I want to plot the number of occurrence of month for years(2010-2017) of date_m column in the dataframe. Since the year range of date_m is from 2010 -2017.

 db  num           date_a     date_m   date_c zip_b  zip_a
0   old HKK10032    2010-07-14  2010-07-26  NaT NaN NaN
1   old HKK10109    2011-07-14  2011-09-15  NaT NaN NaN
2   old HNN10167    2012-07-15  2012-08-09  NaT 177-003 NaN
3   old HKK10190    2013-07-15  2013-09-02  NaT NaN NaN
4   old HKK10251    2014-07-16  2014-05-02  NaT NaN NaN
5   old HKK10253    2015-07-16  2015-05-01  NaT NaN NaN
6   old HNN10275    2017-07-16  2017-07-18  2010-07-18  1070062 NaN
7   old HKK10282    2017-07-16  2017-08-16  NaT NaN NaN
............................................................

Firstly, I abstract the month occurrence of month(1-12) for every year(2010-2017). But there is error in my code:

lst_all = []
for i in range(2010, 2018):
    lst_num = [sum(df_m.date_move.dt.month == j & df_m.date_move.dt.year == i) for j in range(1, 13)]
    lst_all.append(lst_num)
print lst_all

Python: plot different kinds of colors [duplicate]

2017-05-18T10:01:29Z

I wan t to plot 8 lines in a one figure with different color, and labels. however, there are two problems: 1) I found some lines almost the same color, I want to differet color for lines. 2) label is in the middle, I want to put on the upper right, and make the size smaller

My code is following:

    import matplotlib.cm as cm

    colors = iter(cm.rainbow(np.linspace(0, 1, len(lst_year))))
    for i in range(len(lst_year)):
         plt.plot(lst_month, lst_all[i], label='201{i} year'.format(i=i), color=next(colors))
    plt.legend(loc='best')

and my figure is following:

ValueError: invalid literal for float(): when adding annotation in pandas

2017-05-14T12:58:56Z

I get this error when I try to add an annotation to my plot - ValueError: invalid literal for float(): 10_May.

my dataframe:

my code (I use to_datetime and strftime before ploting as I needed to sort dates which were stored as strings):

# dealing with dates as strings
grouped.index = pd.to_datetime(grouped.index, format='%d_%b')
grouped = grouped.sort_index()
grouped.index = grouped.index.strftime('%d_%b')
plt.annotate('Peak',
             (grouped.index[9], grouped['L'][9]),
             xytext=(15, 15), 
             textcoords='offset points',
             arrowprops=dict(arrowstyle='-|>'))
grouped.plot()

grouped.index[9] returns u'10_May' while grouped['L'][9] returns 10.0. I know that pandas expect index to be float but I thought I can access it by df.index[]. Will appreciate your suggestions.

Stacking scatter_matrix and matshow

2017-05-11T12:04:51Z

I was using the iris data from sci-kit-learn to obtain following data frame:

df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])

Plotting the scatter_matrix and using matshow to plot the correlation matrix give me the graphs scatter_matrix plot and matshow(df.corr()), respectively.

My question is the following. Is there a way to stack these graphs? In other words, plot the scatter_matrix over the matshow(df.corr()) ?

Thanks in advance.

Python - Create 2 line graph from pandas DataFrame

2017-03-26T16:08:48Z

I am trying to create a line graph from my Pandas Dataframe. The Pandas Dataframe I have looks as follows:

    date    Interrupts_Person   Interrupts_Mean
0   20122013-100-3  0   11.727273
1   20122013-100-6  1   5.428571
2   20122013-17-6   6   8.900000
3   20122013-17-9   0   4.062500
4   20122013-21-4   4   5.637931
5   20122013-22-8   0   5.637931
6   20122013-3-8    0   4.846154
7   20122013-32-6   0   2.727273
8   20122013-32-6   0   2.727273
9   20122013-48-23  0   4.875000
10  20122013-48-23  0   4.875000

It is in total having 51 lines but i just copied the first 10 to keep things readable.I know how to make a simple line graph from a pandas dataframe, but now i want to do the following:

I want a line graph with the date on the X-axis and 2 lines in my graph, one for the column 'interrupts_person' and one for the column 'Interrupts_Mean'. If someone is familliar on how to make a line-graph like thism I would be realy thankfull for some help that continues my progress!

matplotlib: Plot multiple small figures in one big plot

2016-09-21T15:38:18Z

I have a pandas dataframe pandas_df with 6 input columns: column_1, column_2, ... , column_6, and one result column result. Now I used the following code to plot the scatter plot for every two input column pairs (so totally I have 6*5/2 = 15 figures). I did the following code 15 times, and each generated a big figure.

I am wondering is there a way to iterate over all possible column pairs, and plot all 15 figures as small figures in one big plot? Thanks!

%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

pandas_df.plot(x='column_1', y='column_2', kind = 'scatter', c = 'result')

Pandas plot without specifying index

2016-10-08T21:39:29Z

Given the data:

Column1; Column2; Column3
1; 4; 6
2; 2; 6
3; 3; 8
4; 1; 1
5; 4; 2

I can plot it via:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
titles = list(df)
for title in titles:
    if title == titles[0]:
        continue
    df.plot(titles[0],title, linestyle='--', marker='o')
    plt.savefig(title+'.png')

But if, instead, data was missing Column1 like:

Column2; Column3
4; 6
2; 6
3; 8
1; 1
4; 2

How do I plot it?

May be, something like df.plot(title, linestyle='--', marker='o')?

Pandas plot dataframe by index, how it works?

2016-10-08T20:21:44Z

Given the data:

Column1; Column2; Column3
1; 4; 6
2; 2; 6
3; 3; 8
4; 1; 1
5; 4; 2

With the following code I get the following graphic:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
df.plot(0,0)
plt.savefig('fig0.png')

And, with the following code I get the following graphic:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
df.plot(0,1)
plt.savefig('fig1.png')

What's the logic in df.plot(m,n)? Let's say I want to plot Column2 X Column3 what's m and n(df.plot(2,3)) ?

How to plot a Python Dataframe with category values like this picture?

2016-07-19T09:04:00Z

How can I achieve that using matplotlib?

subplots only plotting 1 plot using pandas

2016-07-13T14:20:28Z

I am trying to get two plots on one figure using matplotlib's subplots() command. I want the two plots to share an x-axis and have one legend for the whole plot. The code I have right now is:

observline = mlines.Line2D([], [], color=(1,0.502,0),\
markersize=15, label='Observed',linewidth=2)
wrfline=mlines.Line2D([], [], color='black',\
markersize=15, label='WRF',linewidth=2)
fig,axes=plt.subplots(2,1,sharex='col',figsize=(18,10))
df08.plot(ax=axes[0],linewidth=2, color=(1,0.502,0))\
.legend(handles=[observline,wrfline],loc='lower center', bbox_to_anchor=(0.9315, 0.9598),prop={'size':16})
axes[0].set_title('WRF Model Comparison Near %.2f,%.2f' %(lat,lon),fontsize=24)
axes[0].set_ylim(0,360)
axes[0].set_yticks(np.arange(0,361,60))
df18.plot(ax=axes[1],linewidth=2, color='black').legend_.remove()
plt.subplots_adjust(hspace=0)
axes[1].set_ylim(0,360)
axes[1].set_yticks(np.arange(0,361,60))
plt.ylabel('Wind Direction [Degrees]',fontsize=18,color='black')
axes[1].yaxis.set_label_coords(-0.05, 1)
plt.xlabel('Time',fontsize=18,color='black')
#plt.savefig(df8graphfile, dpi = 72)
plt.show()

and it produces four figures, each with two subplots. The top is always empty. The bottom is filled for three of them with my 2nd dataframe. The indices for each dataframe is a datetimeindex in the format YYYY-mm-DD HH:MM:SS. The data is values from 0-360 nearly randomly across the whole time series, which is for two months.

Here is an example of each figure produced:

how to draw a multiline chart using python pandas?

2016-04-21T06:17:59Z

Dataframe:

Dept,Date,Que
ece,2015-06-25,96
ece,2015-06-24,89
ece,2015-06-26,88
ece,2015-06-19,87
ece,2015-06-23,82
ece,2015-06-30,82
eee,2015-06-24,73
eee,2015-06-23,71
eee,2015-06-25,70
eee,2015-06-19,66
eee,2015-06-27,60
eee,2015-06-22,56
mech,2015-06-27,10
mech,2015-06-22,8
mech,2015-06-25,8
mech,2015-06-19,7

I need multiline chart with grid based on Dept column, i need each Dept in one line. For Ex:ece the sparkline should be 96,89,88,87,82,82.... like wise i need for other Dept also.

How to plot a multiindex dataframe having suplots for the first level index?

2015-08-05T10:10:05Z

I have a pandas multiindex dataframe with quarters 1-4 and hours 0-23 as the index. The data Looks like this

quarter hour    value1  value2  value3
1   0   0.06    0.47    0.50
1   1   0.65    0.04    0.65
1   2   0.58    0.10    0.60
1   3   0.51    0.07    0.17
...

4   20  0.82    0.17    0.96
4   21  0.08    0.98    0.09
4   22  0.73    0.43    0.73
4   23  0.99    0.85    0.42

How can I plot 4 linegraphs as subplots in a 2x2 arrangement having Q1 and Q4 on the top and Q2 and Q3 on the bottom?

I have been trying with

    f,  ((ax1, ax4), (ax2, ax3)) = plt.subplots(2, 2, sharex='col', sharey='row')
    ax1.plot(df.loc[1])

But it doesnt seem to work.