KeyError: Selecting text from a dataframe based on values of another dataframe

Question

I have the following two dataframes badges and comments. I have created a list of 'gold users' from badges dataframe whose Class=1.

Here Name means the 'Name of Badge' and Class means the level of Badge (1=Gold, 2=Silver, 3=Bronze).

I have already done the text preprocessing on comments['Text']and now want to find the count of top 10 words for gold users from comments['Text'].

I tried the given code but am getting error:

"KeyError: "None of [Index(['1532', '290', '1946', '1459', '6094', '766', '10446', '3106', '1',\n       '1587',\n       ...\n       '35760', '45979', '113061', '35306', '104330', '40739', '4181', '58888',\n       '2833', '58158'],\n      dtype='object', length=1708)] are in the [index]". Please provide me a way to fix this.

Dataframe 1 (badges)

   Id | UserId |  Name          |        Date              |Class | TagBased
   2  | 23     | Autobiographer | 2016-01-12T18:44:49.267  |   3  | False
   3  | 22     | Autobiographer | 2016-01-12T18:44:49.267  |   3  | False
   4  | 21     | Autobiographer | 2016-01-12T18:44:49.267  |   3  | False
   5  | 20     | Autobiographer | 2016-01-12T18:44:49.267  |   3  | False
   6  | 19     | Autobiographer | 2016-01-12T18:44:49.267  |   3  | False

Dataframe 2 (comments)

   Id|                    Text                             |    UserId  
    6|  [2006, course, allen, knutsons, 2001, course, ...  |    3   
    8|  [also, theo, johnsonfreyd, note, mark, haimans...  |    1

Code

for index,rows in comments.iterrows():
  gold_comments = rows[comments.Text.loc[gold_users]]
  Counter(gold_comments)

Deepak · Accepted Answer · 2020-07-03 15:32:27Z

1

You can consider this simple example, and take this forward to solve your problem. I have data set of quotes about animals and fruits. and I need to find out the top occurring word in each category. Count Vectorizer will be useful here

Consider data:

Code Snippet:

from sklearn.feature_extraction.text import CountVectorizer

def return_word_count_segment_wise(data, type):
    
    tfidf_vec = CountVectorizer(max_features=5)
    model = tfidf_vec.fit(data[data['Type'] == type].description)
    model_transform = tfidf_vec.transform(data[data['Type'] == type].description)
    
    feature_list = model.get_feature_names();    
    count_list = model_transform.toarray().sum(axis=0)    
    return dict(zip(feature_list,count_list))


return_word_count_segment_wise(data, 'Animal')

Outputs: {'cats': 3, 'is': 2, 'love': 4, 'my': 4, 'than': 3}

return_word_count_segment_wise(data, 'Fruits')

Outputs: {'fruit': 8, 'of': 5, 'that': 2, 'the': 3, 'we': 3}

Answering question asked in comment:

Try to merge both the dataframes, and then call the function while filtering out the customer segment using class (1/2/3)

merged_df = pd.merge(badges, comments, on = 'UserId')

return_word_count_segment_wise(merge_df, 1) # Get top 10 words for Gold class 
return_word_count_segment_wise(merge_df, 2) # Get top 10 words for Silver class
return_word_count_segment_wise(merge_df, 3) # Get top 10 words for Bronze class

And just in case you can't merge, you can filter out the another dataframe using this dataframe

to_check = comments[comments['userId'].isin(Badges[Badges['class'] == 1].userId)]
return_word_count_segment_wise(to_check, 3)

edited Jul 3, 2020 at 15:32

answered Jul 3, 2020 at 15:09

Deepak

2372 silver badges12 bronze badges

$\begingroup$ I am quite new to nlp, can you please try to write the code based on my data, as in my case I have two dataframes but yours is based on just one? This will be a little too complicated for me to write on my own. $\endgroup$

Ishan Dutta
– Ishan Dutta

2020-07-03 15:13:02 +00:00
Commented Jul 3, 2020 at 15:13
$\begingroup$ You can merge both the dataframes using pd.merge(), and then you will have single, and then you should be able to dig deeper. Let me know in case of confusion. I have updated the answer based on your question. $\endgroup$

Deepak
– Deepak

2020-07-03 15:26:44 +00:00
Commented Jul 3, 2020 at 15:26
$\begingroup$ I tried the first approach of merging dataframes but it did not process. I tried the second method "to_check", but I am getting an error: KeyError 'Type'. I would also mention that the Class column has values in object type and not integer. $\endgroup$

Ishan Dutta
– Ishan Dutta

2020-07-04 12:00:00 +00:00
Commented Jul 4, 2020 at 12:00
$\begingroup$ @Ishan let's catch up later today on Stackoverflow chat, will discuss there and resolve it. What do u say. $\endgroup$

Deepak
– Deepak

2020-07-04 13:00:32 +00:00
Commented Jul 4, 2020 at 13:00
$\begingroup$ that would be great. $\endgroup$

Ishan Dutta
– Ishan Dutta

2020-07-04 13:02:15 +00:00
Commented Jul 4, 2020 at 13:02

| Show 1 more comment

Stack Exchange Network

KeyError: Selecting text from a dataframe based on values of another dataframe

1 Answer 1

Hot Network Questions

KeyError: Selecting text from a dataframe based on values of another dataframe

1 Answer 1

Related

Hot Network Questions