Skip to main content
Grammar corrections
Source Link
Toby Speight
  • 88.7k
  • 14
  • 104
  • 327

splitSplit the code

not thisThis long method does a lot of things:

  • it reads the file
  • preprocessedpreprocesses it
  • searches for the bounding box of the area of interest
  • cropcrops the result
  • writewrites the result to a file

Better would be to split this into more parts. This way, you don't need to comment as much, but let the function namenames speak for itselfthemselves.

If you do feel the need to comment, you can do that in the docstring. If you want to comment on the code, explain whywhy you do it, not howhow.

is not. It doesn't explain what problem this adaptive thresholding solves, and how you got to the parameters you use: (255, ADAPTIVE_THRESH_MEAN_C, 35 and 15

indexingIndexing

thereThere is also no need to put the parentheses around the height, width tuple.

magicMagic numbers

thereThere are a lot of magic number in your code: 15 and 35 in the adaptive threshold, 15 in the kerneling, 50 in the cropping,...

Better would be to give them a namenames, and define them in the function or use them as a parameterparameters to pass into the function.

keywordKeyword arguments

vectorizingVectorizing

does the same, but vectorized and about a 1000 times faster.

 

finalFinal result

split the code

not this long method does a lot of things:

  • it reads the file
  • preprocessed it
  • searches for the bounding box of the area of interest
  • crop the result
  • write the result to a file

Better would be to split this into more parts. This way, you don't need to comment as much, but let the function name speak for itself.

If you do feel the need to comment, you can do that in the docstring. If you want to comment on the code, explain why you do it, not how.

is not. It doesn't explain what problem this adaptive thresholding solves, and how you got to the parameters you use: (255, ADAPTIVE_THRESH_MEAN_C, 35 and 15

indexing

there is also no need to put the parentheses around the height, width tuple

magic numbers

there are a lot of magic number in your code: 15 and 35 in the adaptive threshold, 15 in the kerneling, 50 in the cropping,...

Better would be to give them a name, and define them in the function or use them as a parameter to pass into the function

keyword arguments

vectorizing

does the same, but vectorized and about a 1000 times faster

final result

Split the code

This long method does a lot of things:

  • it reads the file
  • preprocesses it
  • searches for the bounding box of the area of interest
  • crops the result
  • writes the result to a file

Better would be to split this into more parts. This way, you don't need to comment as much, but let the function names speak for themselves.

If you do feel the need to comment, you can do that in the docstring. If you want to comment on the code, explain why you do it, not how.

is not. It doesn't explain what problem this adaptive thresholding solves, and how you got to the parameters you use: 255, ADAPTIVE_THRESH_MEAN_C, 35 and 15

Indexing

There is also no need to put the parentheses around the height, width tuple.

Magic numbers

There are a lot of magic number in your code: 15 and 35 in the adaptive threshold, 15 in the kerneling, 50 in the cropping,...

Better would be to give them names, and define them in the function or use them as parameters to pass into the function.

Keyword arguments

Vectorizing

does the same, but vectorized and about 1000 times faster.

 

Final result

Source Link
Maarten Fabré
  • 9.4k
  • 1
  • 16
  • 27

split the code

not this long method does a lot of things:

  • it reads the file
  • preprocessed it
  • searches for the bounding box of the area of interest
  • crop the result
  • write the result to a file

Better would be to split this into more parts. This way, you don't need to comment as much, but let the function name speak for itself.

Comments

If you do feel the need to comment, you can do that in the docstring. If you want to comment on the code, explain why you do it, not how.

#Ignore the limits/extremities of the document (sometimes are black, so they distract the algorithm)

is a useful comment.

# Apply adaptive mean thresholding

is not. It doesn't explain what problem this adaptive thresholding solves, and how you got to the parameters you use: (255, ADAPTIVE_THRESH_MEAN_C, 35 and 15

indexing

negative indices start counting from the back of a sequence, so

(height, width) = img.shape[0:2]
image = erosion[50:height - 50, 50: width - 50]

can be replaced by erosion[50:-50, 50:-50]

there is also no need to put the parentheses around the height, width tuple

magic numbers

there are a lot of magic number in your code: 15 and 35 in the adaptive threshold, 15 in the kerneling, 50 in the cropping,...

Better would be to give them a name, and define them in the function or use them as a parameter to pass into the function

keyword arguments

amtImage = cv2.adaptiveThreshold(bit, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 35, 15)

would be a lot clearer as:

blocksize = 35
constant = 15
max_value = 255 # 8 bits
amtImage = cv2.adaptiveThreshold(
    src=bit, 
    maxValue=max_value ,
    adaptiveMethod=cv2.ADAPTIVE_THRESH_MEAN_C, 
    thresholdType=cv2.THRESH_BINARY, 
    blockSize=blocksize , 
    C=constant,
)

vectorizing

opencv uses numpy arrays internally, so you can use all the vectorisation goodies, instead of iterating over each pixel twice in python land.

bw_threshold = 150
limits = 0.2, 0.15

mask = image < bw_threshold
edges = []
for axis in (0, 1):
    count = mask.sum(axis=axis)
    limit = limits[axis] * image.shape[axis]
    index = np.where(count > limit)
    _min, _max = index[0][0], index[0][-1]
    edges.append((_min, _max))

does the same, but vectorized and about a 1000 times faster

final result

def preproces_image(
    image,
    *,
    kernel_size=15,
    crop_side=50,
    blocksize=35,
    constant=15,
    max_value=255,
):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    bit = cv2.bitwise_not(gray)
    image_adapted = cv2.adaptiveThreshold(
        src=bit,
        maxValue=max_value,
        adaptiveMethod=cv2.ADAPTIVE_THRESH_MEAN_C,
        thresholdType=cv2.THRESH_BINARY,
        blockSize=blocksize,
        C=constant,
    )
    kernel = np.ones((kernel_size, kernel_size), np.uint8)
    erosion = cv2.erode(image_adapted, kernel, iterations=2)
    return erosion[crop_side:-crop_side, crop_side:-crop_side]

def find_edges(image_preprocessed, *, bw_threshold=150, limits=(0.2, 0.15)):
    mask = image_preprocessed < bw_threshold
    edges = []
    for axis in (1, 0):
        count = mask.sum(axis=axis)
        limit = limits[axis] * image_preprocessed.shape[axis]
        index_ = np.where(count >= limit)
        _min, _max = index_[0][0], index_[0][-1]
        edges.append((_min, _max))
    return edges


def adapt_edges(edges, *, height, width):
    (x_min, x_max), (y_min, y_max) = edges
    x_min2 = x_min
    x_max2 = x_max + min(250, (height - x_max) * 10 // 11)
    # could do with less magic numbers
    y_min2 = max(0, y_min)
    y_max2 = y_max + min(250, (width - y_max) * 10 // 11)
    return (x_min2, x_max2), (y_min2, y_max2)

if __name__ == "__main__":

    filename_in = "NHnV7.png"
    filename_out = "res_NHnV7.png"

    image = cv2.imread(str(filename_in))
    height, width = image.shape[0:2]
    image_preprocessed = preproces_image(image)
    edges = find_edges(image_preprocessed)
    (x_min, x_max), (y_min, y_max) = adapt_edges(
        edges, height=height, width=width
    )
    image_cropped = image[x_min:x_max, y_min:y_max]
    cv2.imwrite(str(filename_out), image_cropped)