All Questions
693 questions
0
votes
1
answer
59
views
catelog sentences into 5 words that represent them
I have dataframe with 1000 text rows. df['text']
I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1)
every score will be in df["word1&...
0
votes
1
answer
84
views
Counting the Frequency of Some Words within some other Key Words in Text
I have two sets of word lists - first one I called search words and the second one I called key words. My goal is to calculate the frequency of search words within 10 words of key words. For example, ...
-1
votes
2
answers
106
views
With spaCy, how can I get all lemmas from a string?
I have a pandas data frame with a column of text values (documents). I want to apply lemmatization on these values with the spaCy library using the pandas apply function. I've defined my to_lemma ...
3
votes
1
answer
49
views
Text summarization with deep learning
I'm finetuning the Mt5 model on the Arabic part of the Xl-sum data set
For ten epochs and the resulted manipulation model was stored in hugging face library, there were good results on the training ...
1
vote
2
answers
70
views
Identify starting row of actual data in Pandas DataFrame with merged header cells
My original df looks like this -
df
Note in the data frame:
The headers are there till row 3 & from row 4 onwards, the values for those headers are starting.
The numbers of rows & columns ...
0
votes
0
answers
79
views
Named entity recognition (NER) task on a large dataset from a data frame column using chunking, and append to results to the original data frame
I want to perform a NER task on a column of a dataframe. The shape of the dataframe is:
import pandas
df.shape()
(1312, 12)
Now the column I wanted to use is called the TEXT column for the ...
-1
votes
1
answer
181
views
removing paywall language from piece of text (pandas) [closed]
I'm trying to do some preprocessing on my dataset. Specifically, I'm trying to remove paywall language from the text (in bold below) but I keep getting an empty string as my output.
Here is the sample ...
0
votes
2
answers
94
views
How to optimize the function which uses looping on lists on pandas dataframe?
I am using a function on a pandas dataframe as :
import spacy
from collections import Counter
# Load English language model
nlp = spacy.load("en_core_web_sm")
# Function to filter out only ...
0
votes
1
answer
61
views
Matching strings containing 'and' in different languages and ampersands
Suppose that in 2 different data frames df1, df2 I have 2 columns
df1['film'] = pd.Series(['Beavis & Butthead', 'Bonnie e Clyde', 'Adam & Eve'])
df2['film'] = pd.Series(['Beavis und Butthead', ...
0
votes
0
answers
57
views
Is there a faster method to process pandas list of string values
There are 13000 values approximately for a given column. The below function works in a way that the input is a list of strings and does the NER tagging for each word in the list. On an average there ...
1
vote
1
answer
49
views
Error in unit testing on pre-processing raw data
import pandas as pd
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
import nltk
nlp = spacy.load("en_core_web_md")
class fileread:
def readfile(self):
file_path = '...
0
votes
1
answer
102
views
Keras ValueError: cannot reshape array of size
I'm facing an error which I can't understand using Keras for a prediction task.
Here is my code:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models ...
0
votes
1
answer
296
views
NLP preprocessing text in Data Frame, what is the correct order?
I’m trying to preprocess a data frame with two columns. Each cell contains a string, called "title" and "body".
Based on this article I tried to reproduce the preprocessing. ...
0
votes
1
answer
131
views
NLP pre-processing on two columns in data frame gives error
I have the following data frame:
gmeDateDf.head(2)
title
score
id
url
comms_num
body
timestamp
It's not about the money, it's about sending a...
55.0
l6ulcx
https://v.redd.it/6j75regs72e61
6.0
NaN
...
1
vote
1
answer
50
views
Using a Word Counter in Python is understating results
As a complete preface, I am a beginner and learning. But, here's the sample schema of my products review table.
Record_ID
Product_ID
Review Comment
1234
89847457
I love this product it was shipped ...