All Questions
1,765 questions
0
votes
0
answers
44
views
Count of Combination of bigrams
I have create a dataset as follows using bigrams
index | product_action
-------------------------------------------------------|
('customer', 'called') | action
('customer', '...
1
vote
1
answer
69
views
How do I remove escape characters from output of nltk.word_tokenize?
How do I get rid of non-printing (escaped) characters from the output of the nltk.word_tokenize method? I am working through the book 'Natural Language Processing with Python' and am following the ...
1
vote
0
answers
28
views
I am getting error while running this line of code gnb.fit(df_train, y_train)
Title: ValueError: could not convert string to float when training GaussianNB for SMS Spam Detection
Body:
I'm building an SMS spam detection tool and encountering an error while predicting with a ...
0
votes
0
answers
77
views
Issues with nltk's ne_chunk
I have been trying to use nltk's entity chunker, and tried different approaches but I keep getting the error:
LookupError Traceback (most recent call last)
...
...
1
vote
1
answer
55
views
Getting all leaf words (reverse stemming) into one Python List
On the same lines as the solution provided in this link, I am trying to get all leaf words of one stem word. I am using the community-contributed (@Divyanshu Srivastava) package get_word_forms
Imagine ...
0
votes
2
answers
74
views
Determining most popular words in the English dictionary within a dictionary of words
Forgive me if my wording is awful, but I'm trying to figure out how to determine the most used words in the English language from a set of words in a dictionary I've made. I've done some research on ...
1
vote
1
answer
107
views
How to extract specific entities from unstructured text
Given a generic text sentence (in a specific context) how can I extract word/entities of interest belonging to a specific "category" using python and any NLP library?
For example given a ...
1
vote
0
answers
47
views
How to parse multiple chunks in nltk?
is it possible to parse multiple chunks in a single nltk.regexp parser?
can grammar have multiple chunks define like this?
def parser(s):
grammar = """
NP: {<DT>?<JJ>...
7
votes
2
answers
11k
views
Unable to use nltk functions
I was trying to run some nltk functions on the UCI spam message dataset but ran into this problem of word_tokenize not working even after downloading dependencies.
import nltk
nltk.download('punkt')
...
0
votes
0
answers
59
views
why isn't tf.keras.layers.TextVectorization accepting standardization=None?
I'm still trying to get this work (and to learn!) so I am using a tiny corpus.
I do some preprocessing on the text in order to get specific bi-gram collocations using nltk (not relevant here but I ...
1
vote
0
answers
1k
views
How do I install the nltk library's "averaged_perceptron_tagger" on railway server?
Hi I am building an API with django REST Framework for generating a PowerPoint slide using python pptx package. I'm also using NLTK(Natural Language Toolkit) library to process text by tokenizing and ...
0
votes
1
answer
47
views
How to optimize this function and improve running time?
I have function aimed at creating a data-frame with three columns; bigram-phrase, count (of the bigram-phrase), and PMI score (for the bigram-phrase). Since I want to run this on a large dataset with ...
0
votes
2
answers
330
views
Extracting only technical keywords from a text using RAKE library in Python
I want to use rake to extract technical keywords from a job description that I've found on Linkedin, which looks like this:
input = "In-depth understanding of the Python software development ...
1
vote
1
answer
43
views
How can i get the first content of a python synsets list?
enter image description hereI have a scrapped text stored under the variable "message".
I have removed the StopWords and stored the result with the variable "without_stop_words".
I ...
-1
votes
1
answer
181
views
removing paywall language from piece of text (pandas) [closed]
I'm trying to do some preprocessing on my dataset. Specifically, I'm trying to remove paywall language from the text (in bold below) but I keep getting an empty string as my output.
Here is the sample ...