Extracting text segments from image based on python module layoutparser

Ask Question

Asked 2 months ago

Modified 2 months ago

Viewed 115 times

I'm trying to extract text from an image such as this one:

However, if I just use OCR, then the text extracted starts from the first line in the first column, and then continues to the first line of the second column, which is wrong. OCR should read all lines from the first column, and all lines from the second column separately.

By searching on the web I found this on Stackoverflow: How to detect figures in a paper news image in Python? This is actually based on this article: https://www.linkedin.com/pulse/how-segment-figures-text-region-newspaper-using-layout-mohammad-oghli

In both articles you can clearly see that all "columns" are detected with layoutparser.

However If I run the same code with the image above, the boxes created in the image are totally wrong.

These are the packages that need to be installed:

pip install layoutparser # Install the base layoutparser library with
pip install "layoutparser[layoutmodels]" # Install DL layout model toolkit
pip install "layoutparser[ocr]" # Install OCR toolkit

Then we need to install the detectron2 deep learning model backend dependencies

pip install layoutparser torchvision && pip install "git+https://github.com/facebookresearch/[email protected]#egg=detectron2"

And here is the code:

import layoutparser as lp
import cv2
import matplotlib.pyplot as plt

# Convert the image from BGR (cv2 default loading style)
# to RGB
image = cv2.imread("test.jpg")
image = image[..., ::-1]

# Load the deep layout model from the layoutparser API
# For all the supported model, please check the Model
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html

model = lp.models.Detectron2LayoutModel('lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config',
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.7],
                                 label_map={1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"})

# Detect layout
layout = model.detect(image)

# Draw and display results
visualized_image = lp.draw_box(image, layout, box_width=10)
plt.figure(figsize=(12, 8))
plt.imshow(visualized_image)
plt.axis('off')

plt.show()

Does anyone have an idea of how to tackle this issue?

Hopefully someone can help me with my question. Thanks in advance.

edited Sep 30 at 17:09

asked Sep 30 at 13:36

Meerkat

2991 silver badge7 bronze badges

what do you use to extract it? Where is your code? What means "the image are totally wrong."? We can't see your code, we can't see your computer, and we can't read in your mind. You have to put all details in question (not in comments). And it could be better if you would create minimal working code which we could use for tests.

furas
– furas

2025-09-30 15:30:36 +00:00
Commented Sep 30 at 15:30
1

Totally right! I changed my post with code, together with input and ouput.

Meerkat
– Meerkat

2025-09-30 17:11:47 +00:00
Commented Sep 30 at 17:11

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Extracting text segments from image based on python module layoutparser

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked