most recent 30 from stackoverflow.com2025-04-27T03:51:55Zhttps://stackoverflow.com/feeds/tag/python-2.7+matplotlib+dataframehttps://creativecommons.org/licenses/by-sa/4.0/rdfhttps://stackoverflow.com/q/553157961SwampCrawfordhttps://stackoverflow.com/users/79132922019-03-23T16:17:51Z2019-03-23T17:19:58Z
<p>I have a <code>DataFrame</code> of Tweet values and want to plot a graph of <code>'Favourites'</code> against <code>'Date'</code> and categorise/colour-code the data by <code>'User'</code>. </p>
<p>I am able to get a scatter or bar plot of the data but cannot get a working solution to categorise based on the <code>'User'</code>. The <code>'Date'</code> also comes out as messy in the graph and I am unable to understand the cause of this problem.</p>
<p>I have tried using <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#basic-plotting-plot" rel="nofollow noreferrer">this tutorial</a> to get a line graph but don't understand how to apply it to my <code>DataFrame</code></p>
<h3>DataFrame Structure</h3>
<pre><code>data_frame = pandas.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])
data_frame['User'] = numpy.array([tweet.user.screen_name for tweet in tweets])
data_frame['ID'] = numpy.array([tweet.id for tweet in tweets])
data_frame['Length'] = numpy.array([len(tweet.text) for tweet in tweets])
data_frame['Date'] = numpy.array([tweet.created_at for tweet in tweets])
data_frame['Source'] = numpy.array([tweet.source for tweet in tweets])
data_frame['Favourites'] = numpy.array([tweet.favorite_count for tweet in tweets])
data_frame['Retweets'] = numpy.array([tweet.retweet_count for tweet in tweets])
return data_frame
</code></pre>
<h3>Plotting</h3>
<pre><code>x = result.Date
y = result.Favourites
plt.xlabel("Date", fontsize=10)
plt.ylabel("Favourites", fontsize=10)
plt.figure(figsize=(30,30))
fig, ax = plt.subplots()
plt.scatter(x,y)
plt.savefig('plot.png')
</code></pre>
<p>I want the graph to show a line graph of <code>Favourites</code> against time with the different <code>User</code>s colour coded something like in the below example:<img src="https://pandas.pydata.org/pandas-docs/stable/_images/frame_plot_basic.png" alt="this example"></p>
<p>My current output is this: <img src="https://i.sstatic.net/NlWph.png" alt="this"></p>
<h3>Sample Data</h3>
<p><img src="https://i.sstatic.net/csqDo.png" alt="Output"></p>
<p><a href="https://pastebin.com/sdwryY1Y" rel="nofollow noreferrer">Raw paste</a></p>
https://stackoverflow.com/q/534855070HelloToEarthhttps://stackoverflow.com/users/83788852018-11-26T16:41:03Z2018-11-26T17:01:18Z
<p>I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:</p>
<pre><code>Date Level Price
2008-01-01 56 11
2008-01-03 10 12
2008-01-05 52 13
2008-02-01 66 14
2008-05-01 20 10
..
2009-01-01 12 11
2009-02-01 70 11
2009-02-05 56 12
..
2018-01-01 56 10
2018-01-11 10 17
..
</code></pre>
<p>Only way I know how to tackle this is to just manually select using <code>iloc</code> and eyeball the dates in the dataframe like this:</p>
<pre><code>fig = plt.figure(figsize=(15,10))
ax1 = fig.add_subplot(111)
ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008')
ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009')
.
.
. (for each year I want)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()
</code></pre>
<p>But this takes a lot of time.</p>
<p>I'd like to automatically loop through each Date's <strong>Year</strong> and plot different <strong>Levels</strong> (Y) to <strong>Price</strong> (X) on colors by that given year and make a legend label for each year.</p>
<p>What would be a good strategy to do this?</p>
https://stackoverflow.com/q/52603247-1Temp O'raryhttps://stackoverflow.com/users/26423512018-10-02T06:48:47Z2018-10-03T12:09:38Z
<p>In Python DataFrame, Im trying to generate histogram, it gets generated the first time when the function is called. However, when the <code>create_histogram</code> function is called second time it gets stuck at <code>h = df.hist(bins=3, column="amount")</code>. When I say "stuck", I mean to say that it does not finish executing the statement and the execution does not continue to the next line but at the same time it does not give any error or break out from the execution. What is exactly the problem here and how can I fix this? </p>
<pre><code>import matplotlib.pyplot as plt
...
...
def create_histogram(self, field):
df = self.main_df # This is DataFrame
h = df.hist(bins=20, column="amount")
fileContent = StringIO()
plt.savefig(fileContent, dpi=None, facecolor='w', edgecolor='w',
orientation='portrait', papertype=None, format="png",
transparent=False, bbox_inches=None, pad_inches=0.5,
frameon=None)
content = fileContent.getvalue()
return content
</code></pre>
https://stackoverflow.com/q/487373250Gabriel_Kochhttps://stackoverflow.com/users/92627882018-02-11T22:48:12Z2018-02-11T23:27:32Z
<p>I have a <code>dataframe</code> that has multiple columns, containing different categories (['A'], ['1','2','3','4'])</p>
<pre><code>Index1 Index2 X Y
A '1' 1 2
A '1' 5 3
A '1' 3 4
A '2' 3 1
A '2' 4 1
A '2' 3 5
A '2' 1 2
A '3' 5 3
A '3' 3 4
A '4' 3 1
A '4' 4 1
A '4' 3 5
</code></pre>
<p>I need to loop it so that it brings me four different splatter charts for each pair of indexes (in the future there will be a B index, that's the reason for the multiindex). </p>
<p>My code at the moment brings me one chart for every line (in this example would be 12 of them), if I <code>break</code> at the end it brings me only one </p>
<p>I tried <code>.iterows()</code> and <code>.itertuples()</code> both of them got me the same result (maybe have been using them wrong to) </p>
<pre><code>import pandas as pd
from matplotlib import pyplot as plt
Index1 = ['A','A','A','A','A','A','A','A','A','A','A','A']
Index2 = ['1','1','1','2','2','2','2','3','3','4','4','4']
X = [1,5,3,3,4,3,1,5,3,3,4,3]
Y = [2,3,4,1,1,5,2,3,4,1,1,5]
df = pd.DataFrame(Index1)
df = df.assign(Index2 = Index2,X=X,Y=Y)
df.set_index(['Index1','Index2'])
second_index = 1
for index in df.itertuples():
df = df.groupby('Index2').get_group(second_index)
df.plot.scatter(x = 'X', y = 'Y')
plt.show()
break
</code></pre>
<p>I have a similar code runing on a <code>dictionary</code> that works on the same logic and it brings me all the charts that I need. </p>
<p>p.s.: that's not the real code just the general idea, and I might have made some mistakes</p>
https://stackoverflow.com/q/481989380Martin Bouhierhttps://stackoverflow.com/users/87175072018-01-11T02:23:09Z2018-01-11T04:52:49Z
<p>I´m trying to create some graphs from a dataframe I imported. The problem is I can create an only image with both graphs. I have this output:
<a href="https://i.sstatic.net/7szrH.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/7szrH.png" alt="enter image description here"></a></p>
<p>And I´m looking for this Output:
<a href="https://i.sstatic.net/X6ArH.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/X6ArH.png" alt="enter image description here"></a></p>
<p>Here is the code:</p>
<pre><code>from pandas_datareader import data
import pandas as pd
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('csv.csv', index_col = 'Totales', parse_dates=True)
df.head()
df['Subastas'].plot()
plt.title('Subastadas')
plt.xlabel('Fechas')
plt.ylabel('Cant de Subastadas')
plt.subplot()
df['Impresiones_exchange'].plot()
plt.title('Impresiones_exchange')
plt.xlabel('Fechas')
plt.ylabel('Cant de Impresiones_exchange')
plt.subplot()
plt.show()
</code></pre>
<p>CSV data:</p>
<pre><code>Totales,Subastas,Impresiones_exchange,Importe_a_pagar_a_medio,Fill_rate,ECPM_medio
Total_07/01/2017,1596260396,30453841,19742.04,3.024863813,0.733696498
Total_07/12/2017,1336604546,57558106,43474.29,9.368463445,0.656716233
Total_07/01/2018,1285872189,33518075,20614.4,4.872889166,0.678244085
</code></pre>
<p>Also, I would like to save the output in an xlsx file too!</p>
https://stackoverflow.com/q/440433571tktktk0711https://stackoverflow.com/users/64284882017-05-18T09:08:07Z2017-05-18T10:31:15Z
<p>I have a large number of rows dataframe(df_m) as following, I want to plot the number of occurrence of month for years(2010-2017) of date_m column in the dataframe. Since the year range of date_m is from 2010 -2017. </p>
<pre><code> db num date_a date_m date_c zip_b zip_a
0 old HKK10032 2010-07-14 2010-07-26 NaT NaN NaN
1 old HKK10109 2011-07-14 2011-09-15 NaT NaN NaN
2 old HNN10167 2012-07-15 2012-08-09 NaT 177-003 NaN
3 old HKK10190 2013-07-15 2013-09-02 NaT NaN NaN
4 old HKK10251 2014-07-16 2014-05-02 NaT NaN NaN
5 old HKK10253 2015-07-16 2015-05-01 NaT NaN NaN
6 old HNN10275 2017-07-16 2017-07-18 2010-07-18 1070062 NaN
7 old HKK10282 2017-07-16 2017-08-16 NaT NaN NaN
............................................................
</code></pre>
<p>Firstly, I abstract the month occurrence of month(1-12) for every year(2010-2017). But there is error in my code:</p>
<pre><code>lst_all = []
for i in range(2010, 2018):
lst_num = [sum(df_m.date_move.dt.month == j & df_m.date_move.dt.year == i) for j in range(1, 13)]
lst_all.append(lst_num)
print lst_all
</code></pre>
https://stackoverflow.com/q/440446290tktktk0711https://stackoverflow.com/users/64284882017-05-18T10:01:29Z2017-05-18T10:01:29Z
<p>I wan t to plot 8 lines in a one figure with different color, and labels. however, there are two problems:
1) I found some lines almost the same color, I want to differet color for lines.
2) label is in the middle, I want to put on the upper right, and make the size smaller</p>
<p>My code is following:</p>
<pre><code> import matplotlib.cm as cm
colors = iter(cm.rainbow(np.linspace(0, 1, len(lst_year))))
for i in range(len(lst_year)):
plt.plot(lst_month, lst_all[i], label='201{i} year'.format(i=i), color=next(colors))
plt.legend(loc='best')
</code></pre>
<p>and my figure is following:</p>
<p><a href="https://i.sstatic.net/5vdXv.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/5vdXv.png" alt="figure"></a></p>
https://stackoverflow.com/q/439641571avisshttps://stackoverflow.com/users/59678862017-05-14T12:58:56Z2017-05-14T13:13:24Z
<p>I get this error when I try to add an annotation to my plot - <code>ValueError: invalid literal for float(): 10_May</code>. </p>
<p>my dataframe:</p>
<p><a href="https://i.sstatic.net/OtNFh.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/OtNFh.png" alt="enter image description here"></a></p>
<p>my code (I use <code>to_datetime</code> and <code>strftime</code> before ploting as I needed to sort dates which were stored as strings):</p>
<pre><code># dealing with dates as strings
grouped.index = pd.to_datetime(grouped.index, format='%d_%b')
grouped = grouped.sort_index()
grouped.index = grouped.index.strftime('%d_%b')
plt.annotate('Peak',
(grouped.index[9], grouped['L'][9]),
xytext=(15, 15),
textcoords='offset points',
arrowprops=dict(arrowstyle='-|>'))
grouped.plot()
</code></pre>
<p><code>grouped.index[9]</code> returns <code>u'10_May'</code> while <code>grouped['L'][9]</code> returns <code>10.0</code>.
I know that pandas expect index to be float but I thought I can access it by df.index[]. Will appreciate your suggestions.</p>
https://stackoverflow.com/q/439149110ℂyberneticianhttps://stackoverflow.com/users/29483342017-05-11T12:04:51Z2017-05-11T12:51:53Z
<p>I was using the iris data from sci-kit-learn to obtain following data frame:</p>
<pre><code>df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
</code></pre>
<p>Plotting the <code>scatter_matrix</code> and using <code>matshow</code> to plot the correlation matrix give me the graphs <a href="https://i.sstatic.net/9zzCX.png" rel="nofollow noreferrer">scatter_matrix plot</a> and
<a href="https://i.sstatic.net/R6pzR.png" rel="nofollow noreferrer">matshow(df.corr())</a>, respectively.</p>
<p>My question is the following. Is there a way to stack these graphs? In other words, plot the <code>scatter_matrix</code> over the <code>matshow(df.corr())</code> ?</p>
<p>Thanks in advance.</p>
https://stackoverflow.com/q/430311620Fave frrhttps://stackoverflow.com/users/77596622017-03-26T16:08:48Z2017-03-26T16:41:27Z
<p>I am trying to create a line graph from my Pandas Dataframe. The Pandas Dataframe I have looks as follows:</p>
<pre><code> date Interrupts_Person Interrupts_Mean
0 20122013-100-3 0 11.727273
1 20122013-100-6 1 5.428571
2 20122013-17-6 6 8.900000
3 20122013-17-9 0 4.062500
4 20122013-21-4 4 5.637931
5 20122013-22-8 0 5.637931
6 20122013-3-8 0 4.846154
7 20122013-32-6 0 2.727273
8 20122013-32-6 0 2.727273
9 20122013-48-23 0 4.875000
10 20122013-48-23 0 4.875000
</code></pre>
<p>It is in total having 51 lines but i just copied the first 10 to keep things readable.I know how to make a simple line graph from a pandas dataframe, but now i want to do the following:</p>
<p>I want a line graph with the date on the X-axis and 2 lines in my graph, one for the column 'interrupts_person' and one for the column 'Interrupts_Mean'.
If someone is familliar on how to make a line-graph like thism I would be realy thankfull for some help that continues my progress!</p>
https://stackoverflow.com/q/396209751Edamamehttps://stackoverflow.com/users/39932702016-09-21T15:38:18Z2016-11-07T08:47:48Z
<p>I have a pandas dataframe pandas_df with 6 input columns: <code>column_1, column_2, ... , column_6</code>, and one result column <code>result</code>. Now I used the following code to plot the scatter plot for every two input column pairs (so totally I have 6*5/2 = 15 figures). I did the following code 15 times, and each generated a big figure.</p>
<p>I am wondering is there a way to iterate over all possible column pairs, and plot all 15 figures as small figures in one big plot? Thanks! </p>
<pre><code>%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
pandas_df.plot(x='column_1', y='column_2', kind = 'scatter', c = 'result')
</code></pre>
https://stackoverflow.com/q/399376500KcFnMihttps://stackoverflow.com/users/50824632016-10-08T21:39:29Z2016-10-08T21:50:09Z
<p>Given the data:</p>
<pre><code>Column1; Column2; Column3
1; 4; 6
2; 2; 6
3; 3; 8
4; 1; 1
5; 4; 2
</code></pre>
<p>I can plot it via:</p>
<pre><code>import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
titles = list(df)
for title in titles:
if title == titles[0]:
continue
df.plot(titles[0],title, linestyle='--', marker='o')
plt.savefig(title+'.png')
</code></pre>
<p>But if, instead, data was missing <code>Column1</code> like:</p>
<pre><code>Column2; Column3
4; 6
2; 6
3; 8
1; 1
4; 2
</code></pre>
<p>How do I plot it?</p>
<p>May be, something like <code>df.plot(title, linestyle='--', marker='o')</code>?</p>
https://stackoverflow.com/q/399369831KcFnMihttps://stackoverflow.com/users/50824632016-10-08T20:21:44Z2016-10-08T21:08:34Z
<p>Given the data:</p>
<pre><code>Column1; Column2; Column3
1; 4; 6
2; 2; 6
3; 3; 8
4; 1; 1
5; 4; 2
</code></pre>
<p>With the following code I get the following graphic:</p>
<pre><code>import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
df.plot(0,0)
plt.savefig('fig0.png')
</code></pre>
<p><a href="https://i.sstatic.net/tEQ6A.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/tEQ6A.png" alt="enter image description here"></a></p>
<p>And, with the following code I get the following graphic:</p>
<pre><code>import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test0.csv',delimiter='; ', engine='python')
df.plot(0,1)
plt.savefig('fig1.png')
</code></pre>
<p><a href="https://i.sstatic.net/dAtwA.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/dAtwA.png" alt="enter image description here"></a></p>
<p>What's the logic in <code>df.plot(m,n)</code>? Let's say I want to plot <code>Column2 X Column3</code> what's <code>m</code> and <code>n</code>(<code>df.plot(2,3)</code>) ? </p>
https://stackoverflow.com/q/38453990-4xingluohttps://stackoverflow.com/users/65265022016-07-19T09:04:00Z2016-07-20T07:47:59Z
<p><img src="https://i.sstatic.net/sXOdd.png" alt="enter image description here"></p>
<p>How can I achieve that using matplotlib?</p>
https://stackoverflow.com/q/383543140Colorful Edhttps://stackoverflow.com/users/64644202016-07-13T14:20:28Z2016-07-13T14:20:28Z
<p>I am trying to get two plots on one figure using matplotlib's <code>subplots()</code> command. I want the two plots to share an x-axis and have one legend for the whole plot. The code I have right now is:</p>
<pre><code>observline = mlines.Line2D([], [], color=(1,0.502,0),\
markersize=15, label='Observed',linewidth=2)
wrfline=mlines.Line2D([], [], color='black',\
markersize=15, label='WRF',linewidth=2)
fig,axes=plt.subplots(2,1,sharex='col',figsize=(18,10))
df08.plot(ax=axes[0],linewidth=2, color=(1,0.502,0))\
.legend(handles=[observline,wrfline],loc='lower center', bbox_to_anchor=(0.9315, 0.9598),prop={'size':16})
axes[0].set_title('WRF Model Comparison Near %.2f,%.2f' %(lat,lon),fontsize=24)
axes[0].set_ylim(0,360)
axes[0].set_yticks(np.arange(0,361,60))
df18.plot(ax=axes[1],linewidth=2, color='black').legend_.remove()
plt.subplots_adjust(hspace=0)
axes[1].set_ylim(0,360)
axes[1].set_yticks(np.arange(0,361,60))
plt.ylabel('Wind Direction [Degrees]',fontsize=18,color='black')
axes[1].yaxis.set_label_coords(-0.05, 1)
plt.xlabel('Time',fontsize=18,color='black')
#plt.savefig(df8graphfile, dpi = 72)
plt.show()
</code></pre>
<p>and it produces four figures, each with two subplots. The top is always empty. The bottom is filled for three of them with my 2nd dataframe. The indices for each dataframe is a datetimeindex in the format YYYY-mm-DD HH:MM:SS. The data is values from 0-360 nearly randomly across the whole time series, which is for two months.</p>
<p>Here is an example of each figure produced:</p>
<p><a href="https://i.sstatic.net/ehTEx.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/ehTEx.png" alt="enter image description here"></a>
<a href="https://i.sstatic.net/75RrF.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/75RrF.png" alt="enter image description here"></a></p>
https://stackoverflow.com/q/367611211Sriramhttps://stackoverflow.com/users/58868792016-04-21T06:17:59Z2016-04-21T07:46:15Z
<p>Dataframe:</p>
<pre><code>Dept,Date,Que
ece,2015-06-25,96
ece,2015-06-24,89
ece,2015-06-26,88
ece,2015-06-19,87
ece,2015-06-23,82
ece,2015-06-30,82
eee,2015-06-24,73
eee,2015-06-23,71
eee,2015-06-25,70
eee,2015-06-19,66
eee,2015-06-27,60
eee,2015-06-22,56
mech,2015-06-27,10
mech,2015-06-22,8
mech,2015-06-25,8
mech,2015-06-19,7
</code></pre>
<p>I need multiline chart with grid based on Dept column, i need each Dept in one line.
For Ex:ece the sparkline should be 96,89,88,87,82,82.... like wise i need for other Dept also.</p>
https://stackoverflow.com/q/318294580Markus Whttps://stackoverflow.com/users/21488452015-08-05T10:10:05Z2015-08-05T10:10:05Z
<p>I have a pandas multiindex dataframe with quarters 1-4 and hours 0-23 as the index.
The data Looks like this</p>
<pre><code>quarter hour value1 value2 value3
1 0 0.06 0.47 0.50
1 1 0.65 0.04 0.65
1 2 0.58 0.10 0.60
1 3 0.51 0.07 0.17
...
4 20 0.82 0.17 0.96
4 21 0.08 0.98 0.09
4 22 0.73 0.43 0.73
4 23 0.99 0.85 0.42
</code></pre>
<p>How can I plot 4 linegraphs as subplots in a 2x2 arrangement having Q1 and Q4 on the top and Q2 and Q3 on the bottom?</p>
<p>I have been trying with</p>
<pre><code> f, ((ax1, ax4), (ax2, ax3)) = plt.subplots(2, 2, sharex='col', sharey='row')
ax1.plot(df.loc[1])
</code></pre>
<p>But it doesnt seem to work.</p>