most recent 30 from stackoverflow.com 2025-04-27T03:51:55Z https://stackoverflow.com/feeds/tag/python-2.7+matplotlib+dataframe https://creativecommons.org/licenses/by-sa/4.0/rdf https://stackoverflow.com/q/55315796 1 SwampCrawford https://stackoverflow.com/users/7913292 2019-03-23T16:17:51Z 2019-03-23T17:19:58Z <p>I have a <code>DataFrame</code> of Tweet values and want to plot a graph of <code>'Favourites'</code> against <code>'Date'</code> and categorise/colour-code the data by <code>'User'</code>. </p> <p>I am able to get a scatter or bar plot of the data but cannot get a working solution to categorise based on the <code>'User'</code>. The <code>'Date'</code> also comes out as messy in the graph and I am unable to understand the cause of this problem.</p> <p>I have tried using <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#basic-plotting-plot" rel="nofollow noreferrer">this tutorial</a> to get a line graph but don't understand how to apply it to my <code>DataFrame</code></p> <h3>DataFrame Structure</h3> <pre><code>data_frame = pandas.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets']) data_frame['User'] = numpy.array([tweet.user.screen_name for tweet in tweets]) data_frame['ID'] = numpy.array([tweet.id for tweet in tweets]) data_frame['Length'] = numpy.array([len(tweet.text) for tweet in tweets]) data_frame['Date'] = numpy.array([tweet.created_at for tweet in tweets]) data_frame['Source'] = numpy.array([tweet.source for tweet in tweets]) data_frame['Favourites'] = numpy.array([tweet.favorite_count for tweet in tweets]) data_frame['Retweets'] = numpy.array([tweet.retweet_count for tweet in tweets]) return data_frame </code></pre> <h3>Plotting</h3> <pre><code>x = result.Date y = result.Favourites plt.xlabel("Date", fontsize=10) plt.ylabel("Favourites", fontsize=10) plt.figure(figsize=(30,30)) fig, ax = plt.subplots() plt.scatter(x,y) plt.savefig('plot.png') </code></pre> <p>I want the graph to show a line graph of <code>Favourites</code> against time with the different <code>User</code>s colour coded something like in the below example:<img src="https://pandas.pydata.org/pandas-docs/stable/_images/frame_plot_basic.png" alt="this example"></p> <p>My current output is this: <img src="https://i.sstatic.net/NlWph.png" alt="this"></p> <h3>Sample Data</h3> <p><img src="https://i.sstatic.net/csqDo.png" alt="Output"></p> <p><a href="https://pastebin.com/sdwryY1Y" rel="nofollow noreferrer">Raw paste</a></p> https://stackoverflow.com/q/53485507 0 HelloToEarth https://stackoverflow.com/users/8378885 2018-11-26T16:41:03Z 2018-11-26T17:01:18Z <p>I have data that has Dates that I would like to be able to automatically split in a scatterplot and looks like:</p> <pre><code>Date Level Price 2008-01-01 56 11 2008-01-03 10 12 2008-01-05 52 13 2008-02-01 66 14 2008-05-01 20 10 .. 2009-01-01 12 11 2009-02-01 70 11 2009-02-05 56 12 .. 2018-01-01 56 10 2018-01-11 10 17 .. </code></pre> <p>Only way I know how to tackle this is to just manually select using <code>iloc</code> and eyeball the dates in the dataframe like this:</p> <pre><code>fig = plt.figure(figsize=(15,10)) ax1 = fig.add_subplot(111) ax1.scatter(df.iloc[____, 1], df.iloc[_____, 2], s=10, c='r', marker="o", label='2008') ax1.scatter(df.iloc[____, 1],df.iloc[____, 2], s=10, c='mediumblue', marker="o", label='2009') . . . (for each year I want) plt.ylabel('Level', fontsize=14) plt.xlabel('Price', fontsize=14) plt.legend(loc='upper left', prop={'size': 12}); plt.show() </code></pre> <p>But this takes a lot of time.</p> <p>I'd like to automatically loop through each Date's <strong>Year</strong> and plot different <strong>Levels</strong> (Y) to <strong>Price</strong> (X) on colors by that given year and make a legend label for each year.</p> <p>What would be a good strategy to do this?</p> https://stackoverflow.com/q/52603247 -1 Temp O'rary https://stackoverflow.com/users/2642351 2018-10-02T06:48:47Z 2018-10-03T12:09:38Z <p>In Python DataFrame, Im trying to generate histogram, it gets generated the first time when the function is called. However, when the <code>create_histogram</code> function is called second time it gets stuck at <code>h = df.hist(bins=3, column="amount")</code>. When I say "stuck", I mean to say that it does not finish executing the statement and the execution does not continue to the next line but at the same time it does not give any error or break out from the execution. What is exactly the problem here and how can I fix this? </p> <pre><code>import matplotlib.pyplot as plt ... ... def create_histogram(self, field): df = self.main_df # This is DataFrame h = df.hist(bins=20, column="amount") fileContent = StringIO() plt.savefig(fileContent, dpi=None, facecolor='w', edgecolor='w', orientation='portrait', papertype=None, format="png", transparent=False, bbox_inches=None, pad_inches=0.5, frameon=None) content = fileContent.getvalue() return content </code></pre> https://stackoverflow.com/q/48737325 0 Gabriel_Koch https://stackoverflow.com/users/9262788 2018-02-11T22:48:12Z 2018-02-11T23:27:32Z <p>I have a <code>dataframe</code> that has multiple columns, containing different categories (['A'], ['1','2','3','4'])</p> <pre><code>Index1 Index2 X Y A '1' 1 2 A '1' 5 3 A '1' 3 4 A '2' 3 1 A '2' 4 1 A '2' 3 5 A '2' 1 2 A '3' 5 3 A '3' 3 4 A '4' 3 1 A '4' 4 1 A '4' 3 5 </code></pre> <p>I need to loop it so that it brings me four different splatter charts for each pair of indexes (in the future there will be a B index, that's the reason for the multiindex). </p> <p>My code at the moment brings me one chart for every line (in this example would be 12 of them), if I <code>break</code> at the end it brings me only one </p> <p>I tried <code>.iterows()</code> and <code>.itertuples()</code> both of them got me the same result (maybe have been using them wrong to) </p> <pre><code>import pandas as pd from matplotlib import pyplot as plt Index1 = ['A','A','A','A','A','A','A','A','A','A','A','A'] Index2 = ['1','1','1','2','2','2','2','3','3','4','4','4'] X = [1,5,3,3,4,3,1,5,3,3,4,3] Y = [2,3,4,1,1,5,2,3,4,1,1,5] df = pd.DataFrame(Index1) df = df.assign(Index2 = Index2,X=X,Y=Y) df.set_index(['Index1','Index2']) second_index = 1 for index in df.itertuples(): df = df.groupby('Index2').get_group(second_index) df.plot.scatter(x = 'X', y = 'Y') plt.show() break </code></pre> <p>I have a similar code runing on a <code>dictionary</code> that works on the same logic and it brings me all the charts that I need. </p> <p>p.s.: that's not the real code just the general idea, and I might have made some mistakes</p> https://stackoverflow.com/q/48198938 0 Martin Bouhier https://stackoverflow.com/users/8717507 2018-01-11T02:23:09Z 2018-01-11T04:52:49Z <p>I´m trying to create some graphs from a dataframe I imported. The problem is I can create an only image with both graphs. I have this output: <a href="https://i.sstatic.net/7szrH.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/7szrH.png" alt="enter image description here"></a></p> <p>And I´m looking for this Output: <a href="https://i.sstatic.net/X6ArH.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/X6ArH.png" alt="enter image description here"></a></p> <p>Here is the code:</p> <pre><code>from pandas_datareader import data import pandas as pd import datetime import matplotlib.pyplot as plt df = pd.read_csv('csv.csv', index_col = 'Totales', parse_dates=True) df.head() df['Subastas'].plot() plt.title('Subastadas') plt.xlabel('Fechas') plt.ylabel('Cant de Subastadas') plt.subplot() df['Impresiones_exchange'].plot() plt.title('Impresiones_exchange') plt.xlabel('Fechas') plt.ylabel('Cant de Impresiones_exchange') plt.subplot() plt.show() </code></pre> <p>CSV data:</p> <pre><code>Totales,Subastas,Impresiones_exchange,Importe_a_pagar_a_medio,Fill_rate,ECPM_medio Total_07/01/2017,1596260396,30453841,19742.04,3.024863813,0.733696498 Total_07/12/2017,1336604546,57558106,43474.29,9.368463445,0.656716233 Total_07/01/2018,1285872189,33518075,20614.4,4.872889166,0.678244085 </code></pre> <p>Also, I would like to save the output in an xlsx file too!</p> https://stackoverflow.com/q/44043357 1 tktktk0711 https://stackoverflow.com/users/6428488 2017-05-18T09:08:07Z 2017-05-18T10:31:15Z <p>I have a large number of rows dataframe(df_m) as following, I want to plot the number of occurrence of month for years(2010-2017) of date_m column in the dataframe. Since the year range of date_m is from 2010 -2017. </p> <pre><code> db num date_a date_m date_c zip_b zip_a 0 old HKK10032 2010-07-14 2010-07-26 NaT NaN NaN 1 old HKK10109 2011-07-14 2011-09-15 NaT NaN NaN 2 old HNN10167 2012-07-15 2012-08-09 NaT 177-003 NaN 3 old HKK10190 2013-07-15 2013-09-02 NaT NaN NaN 4 old HKK10251 2014-07-16 2014-05-02 NaT NaN NaN 5 old HKK10253 2015-07-16 2015-05-01 NaT NaN NaN 6 old HNN10275 2017-07-16 2017-07-18 2010-07-18 1070062 NaN 7 old HKK10282 2017-07-16 2017-08-16 NaT NaN NaN ............................................................ </code></pre> <p>Firstly, I abstract the month occurrence of month(1-12) for every year(2010-2017). But there is error in my code:</p> <pre><code>lst_all = [] for i in range(2010, 2018): lst_num = [sum(df_m.date_move.dt.month == j &amp; df_m.date_move.dt.year == i) for j in range(1, 13)] lst_all.append(lst_num) print lst_all </code></pre> https://stackoverflow.com/q/44044629 0 tktktk0711 https://stackoverflow.com/users/6428488 2017-05-18T10:01:29Z 2017-05-18T10:01:29Z <p>I wan t to plot 8 lines in a one figure with different color, and labels. however, there are two problems: 1) I found some lines almost the same color, I want to differet color for lines. 2) label is in the middle, I want to put on the upper right, and make the size smaller</p> <p>My code is following:</p> <pre><code> import matplotlib.cm as cm colors = iter(cm.rainbow(np.linspace(0, 1, len(lst_year)))) for i in range(len(lst_year)): plt.plot(lst_month, lst_all[i], label='201{i} year'.format(i=i), color=next(colors)) plt.legend(loc='best') </code></pre> <p>and my figure is following:</p> <p><a href="https://i.sstatic.net/5vdXv.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/5vdXv.png" alt="figure"></a></p> https://stackoverflow.com/q/43964157 1 aviss https://stackoverflow.com/users/5967886 2017-05-14T12:58:56Z 2017-05-14T13:13:24Z <p>I get this error when I try to add an annotation to my plot - <code>ValueError: invalid literal for float(): 10_May</code>. </p> <p>my dataframe:</p> <p><a href="https://i.sstatic.net/OtNFh.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/OtNFh.png" alt="enter image description here"></a></p> <p>my code (I use <code>to_datetime</code> and <code>strftime</code> before ploting as I needed to sort dates which were stored as strings):</p> <pre><code># dealing with dates as strings grouped.index = pd.to_datetime(grouped.index, format='%d_%b') grouped = grouped.sort_index() grouped.index = grouped.index.strftime('%d_%b') plt.annotate('Peak', (grouped.index[9], grouped['L'][9]), xytext=(15, 15), textcoords='offset points', arrowprops=dict(arrowstyle='-|&gt;')) grouped.plot() </code></pre> <p><code>grouped.index[9]</code> returns <code>u'10_May'</code> while <code>grouped['L'][9]</code> returns <code>10.0</code>. I know that pandas expect index to be float but I thought I can access it by df.index[]. Will appreciate your suggestions.</p> https://stackoverflow.com/q/43914911 0 ℂybernetician https://stackoverflow.com/users/2948334 2017-05-11T12:04:51Z 2017-05-11T12:51:53Z <p>I was using the iris data from sci-kit-learn to obtain following data frame:</p> <pre><code>df = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target']) </code></pre> <p>Plotting the <code>scatter_matrix</code> and using <code>matshow</code> to plot the correlation matrix give me the graphs <a href="https://i.sstatic.net/9zzCX.png" rel="nofollow noreferrer">scatter_matrix plot</a> and <a href="https://i.sstatic.net/R6pzR.png" rel="nofollow noreferrer">matshow(df.corr())</a>, respectively.</p> <p>My question is the following. Is there a way to stack these graphs? In other words, plot the <code>scatter_matrix</code> over the <code>matshow(df.corr())</code> ?</p> <p>Thanks in advance.</p> https://stackoverflow.com/q/43031162 0 Fave frr https://stackoverflow.com/users/7759662 2017-03-26T16:08:48Z 2017-03-26T16:41:27Z <p>I am trying to create a line graph from my Pandas Dataframe. The Pandas Dataframe I have looks as follows:</p> <pre><code> date Interrupts_Person Interrupts_Mean 0 20122013-100-3 0 11.727273 1 20122013-100-6 1 5.428571 2 20122013-17-6 6 8.900000 3 20122013-17-9 0 4.062500 4 20122013-21-4 4 5.637931 5 20122013-22-8 0 5.637931 6 20122013-3-8 0 4.846154 7 20122013-32-6 0 2.727273 8 20122013-32-6 0 2.727273 9 20122013-48-23 0 4.875000 10 20122013-48-23 0 4.875000 </code></pre> <p>It is in total having 51 lines but i just copied the first 10 to keep things readable.I know how to make a simple line graph from a pandas dataframe, but now i want to do the following:</p> <p>I want a line graph with the date on the X-axis and 2 lines in my graph, one for the column 'interrupts_person' and one for the column 'Interrupts_Mean'. If someone is familliar on how to make a line-graph like thism I would be realy thankfull for some help that continues my progress!</p> https://stackoverflow.com/q/39620975 1 Edamame https://stackoverflow.com/users/3993270 2016-09-21T15:38:18Z 2016-11-07T08:47:48Z <p>I have a pandas dataframe pandas_df with 6 input columns: <code>column_1, column_2, ... , column_6</code>, and one result column <code>result</code>. Now I used the following code to plot the scatter plot for every two input column pairs (so totally I have 6*5/2 = 15 figures). I did the following code 15 times, and each generated a big figure.</p> <p>I am wondering is there a way to iterate over all possible column pairs, and plot all 15 figures as small figures in one big plot? Thanks! </p> <pre><code>%matplotlib notebook import matplotlib.pyplot as plt import matplotlib matplotlib.style.use('ggplot') pandas_df.plot(x='column_1', y='column_2', kind = 'scatter', c = 'result') </code></pre> https://stackoverflow.com/q/39937650 0 KcFnMi https://stackoverflow.com/users/5082463 2016-10-08T21:39:29Z 2016-10-08T21:50:09Z <p>Given the data:</p> <pre><code>Column1; Column2; Column3 1; 4; 6 2; 2; 6 3; 3; 8 4; 1; 1 5; 4; 2 </code></pre> <p>I can plot it via:</p> <pre><code>import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('test0.csv',delimiter='; ', engine='python') titles = list(df) for title in titles: if title == titles[0]: continue df.plot(titles[0],title, linestyle='--', marker='o') plt.savefig(title+'.png') </code></pre> <p>But if, instead, data was missing <code>Column1</code> like:</p> <pre><code>Column2; Column3 4; 6 2; 6 3; 8 1; 1 4; 2 </code></pre> <p>How do I plot it?</p> <p>May be, something like <code>df.plot(title, linestyle='--', marker='o')</code>?</p> https://stackoverflow.com/q/39936983 1 KcFnMi https://stackoverflow.com/users/5082463 2016-10-08T20:21:44Z 2016-10-08T21:08:34Z <p>Given the data:</p> <pre><code>Column1; Column2; Column3 1; 4; 6 2; 2; 6 3; 3; 8 4; 1; 1 5; 4; 2 </code></pre> <p>With the following code I get the following graphic:</p> <pre><code>import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('test0.csv',delimiter='; ', engine='python') df.plot(0,0) plt.savefig('fig0.png') </code></pre> <p><a href="https://i.sstatic.net/tEQ6A.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/tEQ6A.png" alt="enter image description here"></a></p> <p>And, with the following code I get the following graphic:</p> <pre><code>import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('test0.csv',delimiter='; ', engine='python') df.plot(0,1) plt.savefig('fig1.png') </code></pre> <p><a href="https://i.sstatic.net/dAtwA.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/dAtwA.png" alt="enter image description here"></a></p> <p>What's the logic in <code>df.plot(m,n)</code>? Let's say I want to plot <code>Column2 X Column3</code> what's <code>m</code> and <code>n</code>(<code>df.plot(2,3)</code>) ? </p> https://stackoverflow.com/q/38453990 -4 xingluo https://stackoverflow.com/users/6526502 2016-07-19T09:04:00Z 2016-07-20T07:47:59Z <p><img src="https://i.sstatic.net/sXOdd.png" alt="enter image description here"></p> <p>How can I achieve that using matplotlib?</p> https://stackoverflow.com/q/38354314 0 Colorful Ed https://stackoverflow.com/users/6464420 2016-07-13T14:20:28Z 2016-07-13T14:20:28Z <p>I am trying to get two plots on one figure using matplotlib's <code>subplots()</code> command. I want the two plots to share an x-axis and have one legend for the whole plot. The code I have right now is:</p> <pre><code>observline = mlines.Line2D([], [], color=(1,0.502,0),\ markersize=15, label='Observed',linewidth=2) wrfline=mlines.Line2D([], [], color='black',\ markersize=15, label='WRF',linewidth=2) fig,axes=plt.subplots(2,1,sharex='col',figsize=(18,10)) df08.plot(ax=axes[0],linewidth=2, color=(1,0.502,0))\ .legend(handles=[observline,wrfline],loc='lower center', bbox_to_anchor=(0.9315, 0.9598),prop={'size':16}) axes[0].set_title('WRF Model Comparison Near %.2f,%.2f' %(lat,lon),fontsize=24) axes[0].set_ylim(0,360) axes[0].set_yticks(np.arange(0,361,60)) df18.plot(ax=axes[1],linewidth=2, color='black').legend_.remove() plt.subplots_adjust(hspace=0) axes[1].set_ylim(0,360) axes[1].set_yticks(np.arange(0,361,60)) plt.ylabel('Wind Direction [Degrees]',fontsize=18,color='black') axes[1].yaxis.set_label_coords(-0.05, 1) plt.xlabel('Time',fontsize=18,color='black') #plt.savefig(df8graphfile, dpi = 72) plt.show() </code></pre> <p>and it produces four figures, each with two subplots. The top is always empty. The bottom is filled for three of them with my 2nd dataframe. The indices for each dataframe is a datetimeindex in the format YYYY-mm-DD HH:MM:SS. The data is values from 0-360 nearly randomly across the whole time series, which is for two months.</p> <p>Here is an example of each figure produced:</p> <p><a href="https://i.sstatic.net/ehTEx.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/ehTEx.png" alt="enter image description here"></a> <a href="https://i.sstatic.net/75RrF.png" rel="nofollow noreferrer"><img src="https://i.sstatic.net/75RrF.png" alt="enter image description here"></a></p> https://stackoverflow.com/q/36761121 1 Sriram https://stackoverflow.com/users/5886879 2016-04-21T06:17:59Z 2016-04-21T07:46:15Z <p>Dataframe:</p> <pre><code>Dept,Date,Que ece,2015-06-25,96 ece,2015-06-24,89 ece,2015-06-26,88 ece,2015-06-19,87 ece,2015-06-23,82 ece,2015-06-30,82 eee,2015-06-24,73 eee,2015-06-23,71 eee,2015-06-25,70 eee,2015-06-19,66 eee,2015-06-27,60 eee,2015-06-22,56 mech,2015-06-27,10 mech,2015-06-22,8 mech,2015-06-25,8 mech,2015-06-19,7 </code></pre> <p>I need multiline chart with grid based on Dept column, i need each Dept in one line. For Ex:ece the sparkline should be 96,89,88,87,82,82.... like wise i need for other Dept also.</p> https://stackoverflow.com/q/31829458 0 Markus W https://stackoverflow.com/users/2148845 2015-08-05T10:10:05Z 2015-08-05T10:10:05Z <p>I have a pandas multiindex dataframe with quarters 1-4 and hours 0-23 as the index. The data Looks like this</p> <pre><code>quarter hour value1 value2 value3 1 0 0.06 0.47 0.50 1 1 0.65 0.04 0.65 1 2 0.58 0.10 0.60 1 3 0.51 0.07 0.17 ... 4 20 0.82 0.17 0.96 4 21 0.08 0.98 0.09 4 22 0.73 0.43 0.73 4 23 0.99 0.85 0.42 </code></pre> <p>How can I plot 4 linegraphs as subplots in a 2x2 arrangement having Q1 and Q4 on the top and Q2 and Q3 on the bottom?</p> <p>I have been trying with</p> <pre><code> f, ((ax1, ax4), (ax2, ax3)) = plt.subplots(2, 2, sharex='col', sharey='row') ax1.plot(df.loc[1]) </code></pre> <p>But it doesnt seem to work.</p>