Skip to main content

All Questions

Tagged with
0 votes
0 answers
36 views

Converting data into spacy format "convert_to_spacy_format" in Name entity recognition Model

Dataset structureCan somebody help me with the NER model in converting the data into spacy format. The dataset format is shown in the screenshot here (https://www.kaggle.com/datasets/naseralqaydeh/...
Rohit Gupta's user avatar
-5 votes
1 answer
108 views

How to Separate Text and Code in Python Strings?

I've encountered an issue in python. I have a string that contains both a message and code, and I need to separate them and pass each to different functions. An example: text = """ Can ...
Callme-Milad's user avatar
0 votes
1 answer
61 views

Matching strings containing 'and' in different languages and ampersands

Suppose that in 2 different data frames df1, df2 I have 2 columns df1['film'] = pd.Series(['Beavis & Butthead', 'Bonnie e Clyde', 'Adam & Eve']) df2['film'] = pd.Series(['Beavis und Butthead', ...
Azamat Bagatov's user avatar
0 votes
1 answer
111 views

Shorten product title to a specific length using python nlp libraries

I have a collection of products for which I need a specific product name shorter than 40 characters. My input product name is a string column longer than 40 characters per item, so I need to make this ...
jmed1987's user avatar
0 votes
0 answers
47 views

How to load all non-digit letters of a particular language, let's say Russian?

So, a function is returning some garbage as text output, and all the characters in that output are in Russian Cyrillic. To avoid this, I need to do some checks, for which a list of all characters in ...
Pixel_Bear's user avatar
1 vote
1 answer
101 views

Detect rows in a column containing only emojis in a data frame

How to detect rows in a column containing only emojis in a data frame? The rows containing text with emojis will not be considered. Given DF: content 😎🤘🏾 Wow Amazing!!! I am loving it😍😘 🤘🏾 ...
hxgx_0990's user avatar
1 vote
3 answers
4k views

How to highlight the differences between two strings in Python?

I want to highlight the differences between two strings in a colour using Python code. Example 1: sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates." sentence2 ...
Oliver's user avatar
  • 602
0 votes
1 answer
52 views

Sugestions on the best way to work with NLP mixed some numerical and categorical features

I'm working with a dataset of medicinal products across different countries, with each country having it's own data source. This results in the data not always being quite 'standardized' (for a lack ...
Pedro Domingues's user avatar
1 vote
1 answer
796 views

How to stop spaCy tokenizer from tokenizing words enclosed within brackets

I'm trying to make the spaCy tokenizer avoid certain words enclosed by brackets, like [intervention]. However, no matter what I try, I cannot get the right code to include a rule or an exception. ...
ignacioct's user avatar
  • 345
1 vote
1 answer
109 views

How to create a column as a list of similar strings onto a new column?

I've been trying to get a new row in a pandas dataframe which encapsullates as a list all the similar strings into it's original matching row. This is the original pandas dataframe: import pandas as ...
AlSub's user avatar
  • 1,055
1 vote
1 answer
183 views

Find if one column of dataframe contains text from column of another dataframe and add a third column where there is a match

I am working in an NLP problem statement in python. I have two dataframes - DF1 - Problem Region I have wrong product A I have excess payment A address problem B I have delayed delivery C DF2 - Key ...
puja's user avatar
  • 15
2 votes
1 answer
80 views

How do you detect a key word in a sentence no matter the tense, form in python?

I am trying to use spaCy in Python to detect the word "grief" no matter the form, whether it is "I am grieving", "going through grief.""I grieved over __", if ...
Alvino123's user avatar
3 votes
1 answer
127 views

Split string into segments according to the alphabet

I want to split the given string into alphabet segments that the string contains. So for example, if the following string is given: Los eventos automovilísticos comenzaron poco después de la ...
Sirojiddin Komolov's user avatar
0 votes
2 answers
142 views

AttributeError throwing up which says list object has no attribute lower

I am working on an example where training data and training labels are lists, but when I fit on it on a code, it throws an error. I guess, the problem is with the text pre-processing class. Below is ...
Shivam Panchal's user avatar
1 vote
1 answer
308 views

Can i set minimum and maximum words in a sentence to split the text file using nltk tokenizer?

This is how i am converting large text file into sentences import nltk.data tokenizer = nltk.data.load("tokenizers/punkt/english.pickle") text_file = """ CHARACTER. EXPANSION....
greenm8rix's user avatar

15 30 50 per page
1
2 3 4 5
15