Skip to main content

All Questions

Tagged with
0 votes
1 answer
59 views

catelog sentences into 5 words that represent them

I have dataframe with 1000 text rows. df['text'] I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1) every score will be in df["word1&...
rafine's user avatar
  • 471
0 votes
1 answer
84 views

Counting the Frequency of Some Words within some other Key Words in Text

I have two sets of word lists - first one I called search words and the second one I called key words. My goal is to calculate the frequency of search words within 10 words of key words. For example, ...
Sharif's user avatar
  • 391
-1 votes
2 answers
106 views

With spaCy, how can I get all lemmas from a string?

I have a pandas data frame with a column of text values (documents). I want to apply lemmatization on these values with the spaCy library using the pandas apply function. I've defined my to_lemma ...
Patrick's user avatar
  • 2,347
3 votes
1 answer
49 views

Text summarization with deep learning

I'm finetuning the Mt5 model on the Arabic part of the Xl-sum data set For ten epochs and the resulted manipulation model was stored in hugging face library, there were good results on the training ...
Noor's user avatar
  • 31
1 vote
2 answers
70 views

Identify starting row of actual data in Pandas DataFrame with merged header cells

My original df looks like this - df Note in the data frame: The headers are there till row 3 & from row 4 onwards, the values for those headers are starting. The numbers of rows & columns ...
Debojit Roy's user avatar
0 votes
0 answers
79 views

Named entity recognition (NER) task on a large dataset from a data frame column using chunking, and append to results to the original data frame

I want to perform a NER task on a column of a dataframe. The shape of the dataframe is: import pandas df.shape() (1312, 12) Now the column I wanted to use is called the TEXT column for the ...
ARJ's user avatar
  • 2,080
-1 votes
1 answer
181 views

removing paywall language from piece of text (pandas) [closed]

I'm trying to do some preprocessing on my dataset. Specifically, I'm trying to remove paywall language from the text (in bold below) but I keep getting an empty string as my output. Here is the sample ...
Yves's user avatar
  • 47
0 votes
2 answers
94 views

How to optimize the function which uses looping on lists on pandas dataframe?

I am using a function on a pandas dataframe as : import spacy from collections import Counter # Load English language model nlp = spacy.load("en_core_web_sm") # Function to filter out only ...
Atom Store's user avatar
  • 1,016
0 votes
1 answer
61 views

Matching strings containing 'and' in different languages and ampersands

Suppose that in 2 different data frames df1, df2 I have 2 columns df1['film'] = pd.Series(['Beavis & Butthead', 'Bonnie e Clyde', 'Adam & Eve']) df2['film'] = pd.Series(['Beavis und Butthead', ...
Azamat Bagatov's user avatar
0 votes
0 answers
57 views

Is there a faster method to process pandas list of string values

There are 13000 values approximately for a given column. The below function works in a way that the input is a list of strings and does the NER tagging for each word in the list. On an average there ...
srinivas muralidharan's user avatar
1 vote
1 answer
49 views

Error in unit testing on pre-processing raw data

import pandas as pd import spacy from spacy.lang.en.stop_words import STOP_WORDS import nltk nlp = spacy.load("en_core_web_md") class fileread: def readfile(self): file_path = '...
vinamrata's user avatar
0 votes
1 answer
102 views

Keras ValueError: cannot reshape array of size

I'm facing an error which I can't understand using Keras for a prediction task. Here is my code: import numpy as np import pandas as pd from sklearn.preprocessing import MinMaxScaler from keras.models ...
sobhan soleimani's user avatar
0 votes
1 answer
296 views

NLP preprocessing text in Data Frame, what is the correct order?

I’m trying to preprocess a data frame with two columns. Each cell contains a string, called "title" and "body". Based on this article I tried to reproduce the preprocessing. ...
Louis's user avatar
  • 341
0 votes
1 answer
131 views

NLP pre-processing on two columns in data frame gives error

I have the following data frame: gmeDateDf.head(2) title score id url comms_num body timestamp It's not about the money, it's about sending a... 55.0 l6ulcx https://v.redd.it/6j75regs72e61 6.0 NaN ...
Louis's user avatar
  • 341
1 vote
1 answer
50 views

Using a Word Counter in Python is understating results

As a complete preface, I am a beginner and learning. But, here's the sample schema of my products review table. Record_ID Product_ID Review Comment 1234 89847457 I love this product it was shipped ...
user14452102's user avatar

15 30 50 per page
1
2 3 4 5
47