All Questions
216 questions
0
votes
0
answers
36
views
Converting data into spacy format "convert_to_spacy_format" in Name entity recognition Model
Dataset structureCan somebody help me with the NER model in converting the data into spacy format.
The dataset format is shown in the screenshot here (https://www.kaggle.com/datasets/naseralqaydeh/...
-5
votes
1
answer
108
views
How to Separate Text and Code in Python Strings?
I've encountered an issue in python. I have a string that contains both a message and code, and I need to separate them and pass each to different functions. An example:
text = """
Can ...
0
votes
1
answer
61
views
Matching strings containing 'and' in different languages and ampersands
Suppose that in 2 different data frames df1, df2 I have 2 columns
df1['film'] = pd.Series(['Beavis & Butthead', 'Bonnie e Clyde', 'Adam & Eve'])
df2['film'] = pd.Series(['Beavis und Butthead', ...
0
votes
1
answer
111
views
Shorten product title to a specific length using python nlp libraries
I have a collection of products for which I need a specific product name shorter than 40 characters. My input product name is a string column longer than 40 characters per item, so I need to make this ...
0
votes
0
answers
47
views
How to load all non-digit letters of a particular language, let's say Russian?
So, a function is returning some garbage as text output, and all the characters in that output are in Russian Cyrillic. To avoid this, I need to do some checks, for which a list of all characters in ...
1
vote
1
answer
101
views
Detect rows in a column containing only emojis in a data frame
How to detect rows in a column containing only emojis in a data frame? The rows containing text with emojis will not be considered.
Given DF:
content
😎🤘🏾
Wow Amazing!!!
I am loving it😍😘
🤘🏾 ...
1
vote
3
answers
4k
views
How to highlight the differences between two strings in Python?
I want to highlight the differences between two strings in a colour using Python code.
Example 1:
sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
sentence2 ...
0
votes
1
answer
52
views
Sugestions on the best way to work with NLP mixed some numerical and categorical features
I'm working with a dataset of medicinal products across different countries, with each country having it's own data source. This results in the data not always being quite 'standardized' (for a lack ...
1
vote
1
answer
796
views
How to stop spaCy tokenizer from tokenizing words enclosed within brackets
I'm trying to make the spaCy tokenizer avoid certain words enclosed by brackets, like [intervention]. However, no matter what I try, I cannot get the right code to include a rule or an exception. ...
1
vote
1
answer
109
views
How to create a column as a list of similar strings onto a new column?
I've been trying to get a new row in a pandas dataframe which encapsullates as a list all the similar strings into it's original matching row.
This is the original pandas dataframe:
import pandas as ...
1
vote
1
answer
183
views
Find if one column of dataframe contains text from column of another dataframe and add a third column where there is a match
I am working in an NLP problem statement in python. I have two dataframes -
DF1 -
Problem
Region
I have wrong product
A
I have excess payment
A
address problem
B
I have delayed delivery
C
DF2 -
Key
...
2
votes
1
answer
80
views
How do you detect a key word in a sentence no matter the tense, form in python?
I am trying to use spaCy in Python to detect the word "grief" no matter the form, whether it is "I am grieving", "going through grief.""I grieved over __", if ...
3
votes
1
answer
127
views
Split string into segments according to the alphabet
I want to split the given string into alphabet segments that the string contains. So for example, if the following string is given:
Los eventos automovilísticos comenzaron poco después de la ...
0
votes
2
answers
142
views
AttributeError throwing up which says list object has no attribute lower
I am working on an example where training data and training labels are lists, but when I fit on it on a code, it throws an error. I guess, the problem is with the text pre-processing class.
Below is ...
1
vote
1
answer
308
views
Can i set minimum and maximum words in a sentence to split the text file using nltk tokenizer?
This is how i am converting large text file into sentences
import nltk.data
tokenizer = nltk.data.load("tokenizers/punkt/english.pickle")
text_file = """
CHARACTER. EXPANSION....