All Questions
599 questions
0
votes
0
answers
78
views
Creating regular expression(s) which finds capitalization errors
This is a Sentence which contains
Some capitalization errors.
So far I have this: (?<![.!?]\s)(?<!^)(?<!\sI\s)(?!I['’][a-z])(?!\b(?:Dr|Mr|Mrs)\.[\s\r\n])\b(?!I\b)[A-Z]\w*
It will find "...
-5
votes
1
answer
108
views
How to Separate Text and Code in Python Strings?
I've encountered an issue in python. I have a string that contains both a message and code, and I need to separate them and pass each to different functions. An example:
text = """
Can ...
1
vote
0
answers
47
views
How to parse multiple chunks in nltk?
is it possible to parse multiple chunks in a single nltk.regexp parser?
can grammar have multiple chunks define like this?
def parser(s):
grammar = """
NP: {<DT>?<JJ>...
1
vote
1
answer
80
views
Extracting dates from a sentence in spaCy
I have a string like so:
"The dates are from 30 June 2019 to 1 January 2022 inclusive"
I want to extract the dates from this string using spaCy.
Here is my function so far:
def ...
1
vote
1
answer
101
views
Detect rows in a column containing only emojis in a data frame
How to detect rows in a column containing only emojis in a data frame? The rows containing text with emojis will not be considered.
Given DF:
content
😎🤘🏾
Wow Amazing!!!
I am loving it😍😘
🤘🏾 ...
0
votes
1
answer
644
views
How to use spaCy Matcher to create a pattern for rule-based matching for a sequence that is only interpreted as a single token
I am new to nlp and spaCy but I am using it for my project. I am trying to use spaCy's Matcher class to create a pattern to extract information from clinical summaries, specifically mentions of IQ ...
2
votes
1
answer
61
views
In R Str_count: Counting occurrences of words at a certain distance e.g. 1 to 30 words apart
In a text document, I want to count the instances when uncertainty|unclear has occurred at a distance of 1 to 30 words from global|decrease in demand|fall in demand. However, my code as below seems to ...
1
vote
1
answer
56
views
Regex code to identify Keyword followed by complex pattern (from variable human inputs)
I am working on an NLP project with data that requires some cleaning of PII. I have dates and names mostly taken care using spaCy NER, but I need to find instances of (case insensitive) Room followed ...
2
votes
1
answer
22
views
Can't get the text separated by words when I'm doing data cleaning in NLP
I'm trying to do an exercise of NLP in Kaggle and when I'm doing the data cleaning of the text that I have to use to predict the output I can't get it to be separated by words, instead I get one ...
0
votes
1
answer
59
views
Regex to detect words based on the words Action, Object, Sumbject, etc in the middle of a text
I have the following text and I would like to detect the words after the subject, action and capabilities using regular expressions:
For this text:
T1 Subject num num xxx
T2 Action num num xxx
A1 ...
0
votes
1
answer
101
views
Identify abbreviations in a string column
Given the following data frame for instance (mind you the original data for this column is a dtype('0'))
df = pd.DataFrame({'product_description': ["CUTLERY HVY DUTY FORKS", "XYZ DISP ...
-1
votes
2
answers
83
views
How to count specific keywords in a transcript with a condition
I got a big data frame with a "Transcript" column between an bot and a user.
I need to count how many times in the transcript the user is asking for an agent/representative before giving the ...
0
votes
1
answer
52
views
Sugestions on the best way to work with NLP mixed some numerical and categorical features
I'm working with a dataset of medicinal products across different countries, with each country having it's own data source. This results in the data not always being quite 'standardized' (for a lack ...
2
votes
0
answers
251
views
Extract tables from text using regular expression
I want to extract table from the example text as follows but my out put is not correct. My sample text, code, current output and expected output is as follows. any help is appreciated.
dic = {'ID': ':...
0
votes
0
answers
53
views
Extraction of tables and their preceding words
I have a list of words(it is capital sensitive) and each word appears in a text which sometimes is followed by a table. I extracted text using pypdf2.
How to pull pairs of each table and given word ...