199 questions
Advice
0
votes
0
replies
37
views
Looking for a model that can robustly segment handwritten text lines (curved, overlapping, camera-captured images)
I’m currently working on extracting / segmenting text lines from handwritten documents. Most of the input images are camera-captured, which introduces several challenges:
Lines may be curved or ...
1
vote
2
answers
232
views
Benepar for syntactic segmentation
I want to use Benepar with a French model to do a syntactic segmentation.
I followed the tutorial but I have always have this error
RuntimeError: Error(s) in loading state_dict for ChartParser:
...
0
votes
1
answer
216
views
Text segmenting, line segmentation, from line to words, from words to characters with Python & OpenCV
For a text image input, I need to break the text into segments using the OPENCV library
Let's say the image has 4 lines of text, I need to write a function that breaks down and cuts the lines and ...
3
votes
0
answers
95
views
How to combine icu4x word segmenter with additional dictionary
The icu4x icu_segmenter::WordSegmenter seems like the best word segmenter out there.
I don't understand how data providers work with word segmentation at all. It seems very complicated to me and I ...
1
vote
1
answer
721
views
How to count number of "words" in Chinese/Japanese content in Javascript
I'm trying to write a method to count the number of words when the content is in chinese and japanese. This should exclude the special characters / punctuations / whiteSpaces.
I tried creating a regex ...
1
vote
0
answers
39
views
Solving Imbalance Classification on Video Transcript dataset
I am currently working on a problem that requires segmenting a video lecture transcript based on the topics present within the video. My dataset consists of sentence wise labels where 1 indicates the ...
0
votes
1
answer
593
views
How to split connected characters on image for further OCR?
OriginalImage1
BinarizedImage1
OriginalImage2
BinarizedImage2
OriginalImage3
BinarizedImage3
OriginalImage4
BinarizedImage4
I`m preparing image for OCR by Tesseract (pre-trained for this custom font) ...
2
votes
1
answer
476
views
Custom segmentation and override segmentation rules in spacy
I want to split into sentences a large corpus (.txt) with a custom rule i.e. {SENT} using Spacy 3.1.
My main issue is that I want to "disable" the segmentation from the pretrained spacy ...
1
vote
0
answers
47
views
segmenting bs4.element.Tag
Is it possible to segment a bs4.element.Tag into several bs4.element.Tag?
You can think of an application as the following:
1- The original bs4.element.Tag contains a paragraph.
2- We want to segment ...
1
vote
1
answer
2k
views
How to get the best merger from symspellpy word segmentation of many languages in Python?
The following code uses SymSpell in Python, see the symspellpy guide on word_segmentation.
It uses "de-100k.txt" and "en-80k.txt" frequency dictionaries from a github repo, you ...
9
votes
1
answer
3k
views
difference between Tokenization and Segmentation
What is the difference between Tokenization and Segmentation in NLP. I searched about them but I didn't really find any differences
.
0
votes
1
answer
89
views
How do i replace multiple consecutive parts of an array?
So the question revolve around character segmentation. My problem is the following:
I want to segment characters, based on y-axis pixel numbers, following this ( in python) : source
What i already ...
0
votes
1
answer
199
views
How to extract a whole word from a sentence by a specific fragment in C#?
How can I obtain a whole word within a string-type sentence? \
For instance, if the given string was:
The app has been updated to 88.0.1234.141 which contains a number of fixes and improvements.
And ...
-1
votes
2
answers
710
views
How to convert plain text in segmented chunks (Bytes) in python? [duplicate]
Is there a simple way to convert plain text into a segmented array of chunks in python? Each chunk should be for example 16 Bytes. If the last part of the plain text is smaller than 16 Bytes it should ...
2
votes
3
answers
436
views
Remove timestamp in the bracket from text Python
I'd like to remove all the timestamps in the parentheses in the below sample text data.
Input:
Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I
have a question about X. ( 8m 1s ) ...