Skip to main content

All Questions

Tagged with
0 votes
0 answers
78 views

Creating regular expression(s) which finds capitalization errors

This is a Sentence which contains Some capitalization errors. So far I have this: (?<![.!?]\s)(?<!^)(?<!\sI\s)(?!I['’][a-z])(?!\b(?:Dr|Mr|Mrs)\.[\s\r\n])\b(?!I\b)[A-Z]\w* It will find "...
Stan Duncan's user avatar
-5 votes
1 answer
108 views

How to Separate Text and Code in Python Strings?

I've encountered an issue in python. I have a string that contains both a message and code, and I need to separate them and pass each to different functions. An example: text = """ Can ...
Callme-Milad's user avatar
1 vote
0 answers
47 views

How to parse multiple chunks in nltk?

is it possible to parse multiple chunks in a single nltk.regexp parser? can grammar have multiple chunks define like this? def parser(s): grammar = """ NP: {<DT>?<JJ>...
konto's user avatar
  • 11
1 vote
1 answer
80 views

Extracting dates from a sentence in spaCy

I have a string like so: "The dates are from 30 June 2019 to 1 January 2022 inclusive" I want to extract the dates from this string using spaCy. Here is my function so far: def ...
Muhammad Kamil's user avatar
1 vote
1 answer
101 views

Detect rows in a column containing only emojis in a data frame

How to detect rows in a column containing only emojis in a data frame? The rows containing text with emojis will not be considered. Given DF: content 😎🤘🏾 Wow Amazing!!! I am loving it😍😘 🤘🏾 ...
hxgx_0990's user avatar
0 votes
1 answer
644 views

How to use spaCy Matcher to create a pattern for rule-based matching for a sequence that is only interpreted as a single token

I am new to nlp and spaCy but I am using it for my project. I am trying to use spaCy's Matcher class to create a pattern to extract information from clinical summaries, specifically mentions of IQ ...
Tarran's user avatar
  • 11
2 votes
1 answer
61 views

In R Str_count: Counting occurrences of words at a certain distance e.g. 1 to 30 words apart

In a text document, I want to count the instances when uncertainty|unclear has occurred at a distance of 1 to 30 words from global|decrease in demand|fall in demand. However, my code as below seems to ...
Mohsin's user avatar
  • 39
1 vote
1 answer
56 views

Regex code to identify Keyword followed by complex pattern (from variable human inputs)

I am working on an NLP project with data that requires some cleaning of PII. I have dates and names mostly taken care using spaCy NER, but I need to find instances of (case insensitive) Room followed ...
ClaytonSummitt's user avatar
2 votes
1 answer
22 views

Can't get the text separated by words when I'm doing data cleaning in NLP

I'm trying to do an exercise of NLP in Kaggle and when I'm doing the data cleaning of the text that I have to use to predict the output I can't get it to be separated by words, instead I get one ...
Francisco Vives's user avatar
0 votes
1 answer
59 views

Regex to detect words based on the words Action, Object, Sumbject, etc in the middle of a text

I have the following text and I would like to detect the words after the subject, action and capabilities using regular expressions: For this text: T1 Subject num num xxx T2 Action num num xxx A1 ...
John Angelopoulos's user avatar
0 votes
1 answer
101 views

Identify abbreviations in a string column

Given the following data frame for instance (mind you the original data for this column is a dtype('0')) df = pd.DataFrame({'product_description': ["CUTLERY HVY DUTY FORKS", "XYZ DISP ...
Adelore Similoluwa Gloria's user avatar
-1 votes
2 answers
83 views

How to count specific keywords in a transcript with a condition

I got a big data frame with a "Transcript" column between an bot and a user. I need to count how many times in the transcript the user is asking for an agent/representative before giving the ...
Tal1992's user avatar
  • 73
0 votes
1 answer
52 views

Sugestions on the best way to work with NLP mixed some numerical and categorical features

I'm working with a dataset of medicinal products across different countries, with each country having it's own data source. This results in the data not always being quite 'standardized' (for a lack ...
Pedro Domingues's user avatar
2 votes
0 answers
251 views

Extract tables from text using regular expression

I want to extract table from the example text as follows but my out put is not correct. My sample text, code, current output and expected output is as follows. any help is appreciated. dic = {'ID': ':...
ella's user avatar
  • 201
0 votes
0 answers
53 views

Extraction of tables and their preceding words

I have a list of words(it is capital sensitive) and each word appears in a text which sometimes is followed by a table. I extracted text using pypdf2. How to pull pairs of each table and given word ...
ella's user avatar
  • 201

15 30 50 per page
1
2 3 4 5
40