Commonmark migration

Source Link

edited Jun 10, 2020 at 13:24

Community Bot

1

Read the image and convert it to gray-scale.
Read the image and convert it to gray-scale.
Apply the bitwise_not() function from OpenCV to separate the background from the foreground.
Apply the bitwise_not() function from OpenCV to separate the background from the foreground.
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).

At this level, I have the background almost white and the document is in black but containing some white gaps.

I applied erosion to fill the gaps.

Read each row of the image and if 20% of it contains black, then keep it, if it is white, delete it. And do the same with each column of the image.
Crop the image according to the min and max of the index of the black lines and columns.

At this level, I have the background almost white and the document is in black but containing some white gaps.

I applied erosion to fill the gaps.

Read each row of the image and if 20% of it contains black, then keep it, if it is white, delete it. And do the same with each column of the image.

Crop the image according to the min and max of the index of the black lines and columns.

Here is my code with some comments:

import cv2
import numpy as np

def crop(filename):
    #Read the image
    img = cv2.imread(filename)
    #Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    #Separate the background from the foreground
    bit = cv2.bitwise_not(gray)
    #Apply adaptive mean thresholding
    amtImage = cv2.adaptiveThreshold(bit, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 35, 15)
    #Apply erosion to fill the gaps
    kernel = np.ones((15,15),np.uint8)
    erosion = cv2.erode(amtImage,kernel,iterations = 2)
    #Take the height and width of the image
    (height, width) = img.shape[0:2]
    #Ignore the limits/extremities of the document (sometimes are black, so they distract the algorithm)
    image = erosion[50:height - 50, 50: width - 50]
    (nheight, nwidth) = image.shape[0:2]
    #Create a list to save the indexes of lines containing more than 20% of black.
    index = []
    for x in range (0, nheight):
        line = []
            
        for y in range(0, nwidth):
            line2 = []
            if (image[x, y] < 150):
                line.append(image[x, y])
        if (len(line) / nwidth > 0.2):    
            index.append(x)
    #Create a list to save the indexes of columns containing more than 15% of black.
    index2 = []
    for a in range(0, nwidth):
        line2 = []
        for b in range(0, nheight):
            if image[b, a] < 150:
                line2.append(image[b, a])
        if (len(line2) / nheight > 0.15):
            index2.append(a)
   
    #Crop the original image according to the max and min of black lines and columns.
    img = img[min(index):max(index) + min(250, (height - max(index))* 10 // 11) , max(0, min(index2)): max(index2) + min(250, (width - max(index2)) * 10 // 11)]
    #Save the image
    cv2.imwrite('res_' + filename, img)

Read the image and convert it to gray-scale.
Apply the bitwise_not() function from OpenCV to separate the background from the foreground.
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).

At this level, I have the background almost white and the document is in black but containing some white gaps.

I applied erosion to fill the gaps.

Read each row of the image and if 20% of it contains black, then keep it, if it is white, delete it. And do the same with each column of the image.

Crop the image according to the min and max of the index of the black lines and columns.

Here is my code with some comments:

import cv2
import numpy as np

def crop(filename):
    #Read the image
    img = cv2.imread(filename)
    #Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    #Separate the background from the foreground
    bit = cv2.bitwise_not(gray)
    #Apply adaptive mean thresholding
    amtImage = cv2.adaptiveThreshold(bit, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 35, 15)
    #Apply erosion to fill the gaps
    kernel = np.ones((15,15),np.uint8)
    erosion = cv2.erode(amtImage,kernel,iterations = 2)
    #Take the height and width of the image
    (height, width) = img.shape[0:2]
    #Ignore the limits/extremities of the document (sometimes are black, so they distract the algorithm)
    image = erosion[50:height - 50, 50: width - 50]
    (nheight, nwidth) = image.shape[0:2]
    #Create a list to save the indexes of lines containing more than 20% of black.
    index = []
    for x in range (0, nheight):
        line = []
            
        for y in range(0, nwidth):
            line2 = []
            if (image[x, y] < 150):
                line.append(image[x, y])
        if (len(line) / nwidth > 0.2):  
            index.append(x)
    #Create a list to save the indexes of columns containing more than 15% of black.
    index2 = []
    for a in range(0, nwidth):
        line2 = []
        for b in range(0, nheight):
            if image[b, a] < 150:
                line2.append(image[b, a])
        if (len(line2) / nheight > 0.15):
            index2.append(a)
   
    #Crop the original image according to the max and min of black lines and columns.
    img = img[min(index):max(index) + min(250, (height - max(index))* 10 // 11) , max(0, min(index2)): max(index2) + min(250, (width - max(index2)) * 10 // 11)]
    #Save the image
    cv2.imwrite('res_' + filename, img)

Read the image and convert it to gray-scale.
Apply the bitwise_not() function from OpenCV to separate the background from the foreground.
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).

At this level, I have the background almost white and the document is in black but containing some white gaps.

I applied erosion to fill the gaps.

Read each row of the image and if 20% of it contains black, then keep it, if it is white, delete it. And do the same with each column of the image.
Crop the image according to the min and max of the index of the black lines and columns.

Here is my code with some comments:

import cv2
import numpy as np

def crop(filename):
    #Read the image
    img = cv2.imread(filename)
    #Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    #Separate the background from the foreground
    bit = cv2.bitwise_not(gray)
    #Apply adaptive mean thresholding
    amtImage = cv2.adaptiveThreshold(bit, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 35, 15)
    #Apply erosion to fill the gaps
    kernel = np.ones((15,15),np.uint8)
    erosion = cv2.erode(amtImage,kernel,iterations = 2)
    #Take the height and width of the image
    (height, width) = img.shape[0:2]
    #Ignore the limits/extremities of the document (sometimes are black, so they distract the algorithm)
    image = erosion[50:height - 50, 50: width - 50]
    (nheight, nwidth) = image.shape[0:2]
    #Create a list to save the indexes of lines containing more than 20% of black.
    index = []
    for x in range (0, nheight):
        line = []
            
        for y in range(0, nwidth):
            line2 = []
            if (image[x, y] < 150):
                line.append(image[x, y])
        if (len(line) / nwidth > 0.2):    
            index.append(x)
    #Create a list to save the indexes of columns containing more than 15% of black.
    index2 = []
    for a in range(0, nwidth):
        line2 = []
        for b in range(0, nheight):
            if image[b, a] < 150:
                line2.append(image[b, a])
        if (len(line2) / nheight > 0.15):
            index2.append(a)
   
    #Crop the original image according to the max and min of black lines and columns.
    img = img[min(index):max(index) + min(250, (height - max(index))* 10 // 11) , max(0, min(index2)): max(index2) + min(250, (width - max(index2)) * 10 // 11)]
    #Save the image
    cv2.imwrite('res_' + filename, img)

adding more details

Link

edited Feb 11, 2019 at 14:18

singrium

327
1
4
12

deleted 102 characters in body; edited title

Source Link

edited Jan 30, 2019 at 3:26

Jamal

35.2k
13
134
238

Optimize code that trims Trimming blank space from images

I am working on scanned documents (ID card, Driver licenses, ...). The problem I faced while I apply some preprocessing on them is that the documents occupy just a small area of the image, all the rest area is whether white/blank space or noised space. For that reason I wanted to develop a Python code that automatically trims the unwanted area and keeps only the zone where the document is located (without I predefine the resolution). Well that'sThat's possible with using findContours() from OpenCV. However, not all the documents (especially the old ones) have clear contour and not all the blank space is white, so this will not work.
The

The idea that came to me is:

Read the image and convert it to gray-scale.
Apply the bitwise_not() function from OpenCV to separate the background background from the froegroundforeground.
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).

So I applied erosion to fill the gaps.
Read each row of the image and if 20% of it contains black, then keep keep it, if it is white, delete it.And And do the same with each column of of the image.
Crop the image according to the min and max of the index of the black black lines and columns.

Here is an example: I used an image from the internet to avoid any confidentiality problem
It. I used an image from the internet to avoid any confidentiality problem. It is to notice here that the image quality is much better (the white space does not contain noise) than the examples I work on.

INPUTInput: 1920x1080

OUTPUTOutput: 801x623

I tested this code with different documents, and it works well. The problem is that it takes a lot of time to process a single document (because of the loops and reading each pixel of the image twice: once with lines and the second with columns). I am sure that it is possible to do some modifications to optimize the code and reduce the processing time. But I am very beginner with Python and code optimization. May be

Maybe using numpy to process the matrix calculations or optimizing the loops would improve the code quality..
Any suggestion is more than welcome.
Thank you.

Optimize code that trims blank space from images

I am working on scanned documents (ID card, Driver licenses, ...). The problem I faced while I apply some preprocessing on them is that the documents occupy just a small area of the image, all the rest area is whether white/blank space or noised space. For that reason I wanted to develop a Python code that automatically trims the unwanted area and keeps only the zone where the document is located (without I predefine the resolution). Well that's possible with using findContours() from OpenCV. However not all the documents(especially the old ones) have clear contour and not all the blank space is white, so this will not work.
The idea that came to me is:

Read the image and convert it to gray-scale.
Apply the bitwise_not() function from OpenCV to separate the background from the froeground.
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).

So I applied erosion to fill the gaps.
Read each row of the image and if 20% of it contains black, then keep it, if it is white, delete it.And do the same with each column of the image.
Crop the image according to the min and max of the index of the black lines and columns.

Here is an example: I used an image from the internet to avoid any confidentiality problem
It is to notice here that the image quality is much better (the white space does not contain noise) than the examples I work on.
INPUT: 1920x1080

OUTPUT: 801x623

I tested this code with different documents, and it works well. The problem is that it takes a lot of time to process a single document (because of the loops and reading each pixel of the image twice: once with lines and the second with columns). I am sure that it is possible to do some modifications to optimize the code and reduce the processing time. But I am very beginner with Python and code optimization. May be using numpy to process the matrix calculations or optimizing the loops would improve the code quality..
Any suggestion is more than welcome.
Thank you.

Trimming blank space from images

I am working on scanned documents (ID card, Driver licenses, ...). The problem I faced while I apply some preprocessing on them is that the documents occupy just a small area of the image, all the rest area is whether white/blank space or noised space. For that reason I wanted to develop a Python code that automatically trims the unwanted area and keeps only the zone where the document is located (without I predefine the resolution). That's possible with using findContours() from OpenCV. However, not all the documents (especially the old ones) have clear contour and not all the blank space is white, so this will not work.

The idea that came to me is:

Read the image and convert it to gray-scale.
Apply the bitwise_not() function from OpenCV to separate the background from the foreground.
Apply adaptive mean threshold to remove as much possible of noise (and eventually to whiten the background).

I applied erosion to fill the gaps.
Read each row of the image and if 20% of it contains black, then keep it, if it is white, delete it. And do the same with each column of the image.
Crop the image according to the min and max of the index of the black lines and columns.

Here is an example. I used an image from the internet to avoid any confidentiality problem. It is to notice here that the image quality is much better (the white space does not contain noise) than the examples I work on.

Input: 1920x1080

Output: 801x623

I tested this code with different documents, and it works well. The problem is that it takes a lot of time to process a single document (because of the loops and reading each pixel of the image twice: once with lines and the second with columns). I am sure that it is possible to do some modifications to optimize the code and reduce the processing time. But I am very beginner with Python and code optimization.

Maybe using numpy to process the matrix calculations or optimizing the loops would improve the code quality.

added 37 characters in body

Source Link

edited Jan 29, 2019 at 15:40

singrium

327
1
4
12

Loading

added 57 characters in body

Source Link

edited Jan 29, 2019 at 9:59

singrium

327
1
4
12

Loading

adding more details

Link

edited Jan 29, 2019 at 8:23

singrium

327
1
4
12

Loading

Tweeted twitter.com/StackCodeReview/status/1089946208506994689

occurred Jan 28, 2019 at 18:00

added 41 characters in body

Source Link

edited Jan 28, 2019 at 16:27

singrium

327
1
4
12

Loading

adding more details

Source Link

edited Jan 28, 2019 at 16:21

singrium

327
1
4
12

Loading

Source Link

asked Jan 28, 2019 at 15:57

singrium

327
1
4
12

Loading

Stack Exchange Network

Return to Question

Optimize code that trims Trimming blank space from images

Optimize code that trims blank space from images

Trimming blank space from images