Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe

Question

I have a dataframe news_count. Here are its column names, from the output of news_count.columns.values:

 [('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '')
  ('day', '') ('month', '') ('year', '')]

I need to groupby by year and month and sum values of 'NEWS_SENTIMENT_DAILY_AVG'. Below is code I tried, but neither work:

Attempt 1

news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.values.sum()

'AttributeError: 'DataFrameGroupBy' object has no attribute'

Attempt 2

news_count.groupby(['year','month']).iloc[:,1].values.sum()

AttributeError: Cannot access callable attribute 'iloc' of 'DataFrameGroupBy' objects, try using the 'apply' method

Input data:

      ticker       date           EBIX UW Equity    month    year
      field             NEWS_SENTIMENT_DAILY_AVG
         0      2007-05-25                   0.3992      5       2007
         1      2007-11-06                   0.3936      11      2007 
         2      2007-11-07                   0.2039      11      2007
         3      2009-01-14                   0.2881       1      2014

And did you try news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.sum()? — coldspeed95
– coldspeed95, Commented Oct 2, 2017 at 22:38
The problem is it not identifying the NEWS_SENTIMENT_DAILY_AVG column. Error message - AttributeError: 'DataFrameGroupBy' object has no attribute 'NEWS_SENTIMENT_DAILY_AVG' — Arvinth Kumar
– Arvinth Kumar, Commented Oct 2, 2017 at 22:50
I'm not sure I can? because I'm not 100% sure I understand the structure of your dataframe, those columns look bad. Try explicitly reassigning them: df.columns = ['date', 'avg', 'day', 'month', 'year', ...] and so on. If you can do that, please update your dataframe, and try my suggestion in my first comment again. — coldspeed95
– coldspeed95, Commented Oct 2, 2017 at 23:27

Mehrdad Pedramfar · Accepted Answer · 2019-11-12 06:10:53Z

0

extract required columns from dataframe in news_count_res variable and then apply aggregation function

news_count_res = news_count[['year','month','NEWS_SENTIMENT_DAILY_AVG']]
news_count_res.group(['year','month']).sum()

edited Nov 12, 2019 at 6:10

Mehrdad Pedramfar

11.1k4 gold badges43 silver badges61 bronze badges

answered Nov 12, 2019 at 5:51

SRG

3452 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

P E Over a year ago

Thanks for this...but I'm getting "AttributeError: 'SeriesGroupBy' object has no attribute 'sample'" at "df_sample = df.groupby("persons").sample(frac=percentage_to_flag, random_state=random_state)". If I can figure out why, maybe it'll work for me...

P E · Accepted Answer · 2021-10-19 06:12:08Z

Thanks to answers so far (I've made comments there as I haven't got those solutions to work--maybe I'm not understanding something). In the meantime, I've also come up with another approach, which I still suspect isn't very Pythonic. It does get the job done and doesn't take too long for my purposes, but it would be great if I could figure out how to tweak the approaches suggested above to get them to work...any thoughts very welcome!

Here's what I've got:

    import pandas as pd
    import math
    y = ['Alex'] * 2321 + ['Doug'] * 34123  + ['Chuck'] * 2012 + ['Bob'] * 9281 
        z = ['xyz'] * len(y)
    df = pd.DataFrame({'persons': y, 'data' : z})
    percent = 10  #CHANGE AS NEEDED

    #add a 'helper'column with random numbers
    df['rand'] = np.random.random(df.shape[0])
    df = df.sample(frac=1)  #optional:  this shuffles data, just to show order doesn't matter

    #CREATE A HELPER LIST
    helper = pd.DataFrame(df.groupby('persons')['rand'].count()).reset_index().values.tolist()
    for row in helper:
        df_temp = df[df['persons'] == row[0]][['persons','rand']]
        lim = math.ceil(len(df_temp) * percent * 0.01)
        row.append(df_temp.nlargest(lim,'rand').iloc[-1][1])

    def flag(name,num):
        for row in helper:
            if row[0] == name:
                if num >= row[2]:
                    return 'yes'
                else:
                    return 'no'
    
    df['flag'] = df.apply(lambda x: flag(x['persons'], x['rand']), axis=1)

And to check the results:

piv = df.pivot_table(index="persons", columns="flag", values="data", aggfunc='count', fill_value=0)
piv = piv.apivend(piv.sum().rename('Total')).assign(Total=lambda x: x.sum(1))
piv['% selected'] = 100 * piv.yes/piv.Total
print(piv)

OUTPUT:
flag        no   yes  Total  % selected
persons                                
Alex      2088   233   2321   10.038776
Bob       8352   929   9281   10.009697
Chuck     1810   202   2012   10.039761
Doug     30710  3413  34123   10.002051
Total    42960  4777  47737   10.006913

Seems to work with different %s and different numbers of persons...but it would be nice to make it simpler, I think.

Arco · Accepted Answer · 2023-11-26 12:28:46Z

0

df = df.groupby(['col1', 'col2'], as_index = False).agg('value1':'sum', 'value2':'sum')


news_count = news_count.groupby(['year', 'month'],as_index = False).agg({'NEWS_SENTIMENT_DAILY_AVG':'sum'})

edited Nov 26, 2023 at 12:28

Arco

1831 silver badge9 bronze badges

answered May 19, 2023 at 15:20

Manoj Nahak

791 silver badge3 bronze badges

Collectives™ on Stack Overflow

Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe

Attempt 1

Attempt 2

3 Answers 3

1 Comment

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

Attempt 1

Attempt 2

3 Answers 3

1 Comment

Comments

Comments

Related