-1

Receipt clip contains structured background:

enter image description here

Tried to remove it using textcleaner ImageMagic wrapper script from Remove receipt image border using ImageMagick answer. Used code from answer How to use imagemagick.net in .net ? :

var proc = new Process
{
    StartInfo = new ProcessStartInfo
    {
        FileName = "textcleaner.sh",
        Arguments = "-f 20 -o 10 -e normalize krooningtaust.jpg result.jpg",
        UseShellExecute = true,
        RedirectStandardError = true,
        CreateNoWindow = true
    }
};

proc.Start();
string error = proc.StandardError.ReadToEnd();
proc.WaitForExit();

In result background still exists and tesseract Single Block Page segmentation mode does not recognize text:

enter image description here

How to remove background fully using C# .NET 9 or force Tesseract Single Block PSM mode do recognize text from it?

It looks like text is converted to blue color. How to remove all non- blue pixels or force Tesseract to use only blue for OCR ?

2

1 Answer 1

1

There are probably better ways to approach this problem but I don't have the benefit of time today, so I'll just answer your question about removing all non-blue pixels, using ImageMagick.

A suitable command would be:

magick receipt.jpg -fill white -fuzz 25% +opaque "rgb(18,90,198)" result.jpg

That says... "take my image, and fill with white all pixels that are not within 25% of rgb(18,90,198) and save as 'result.jpg'"

enter image description here

You may want to quantise to convert to black and white, threshold and save as lossless PNG as well:

magick receipt.jpg -fill white -fuzz 25% +opaque "rgb(18,90,198)" -colorspace gray -threshold 50% result.png

enter image description here

Further idea:

You might consider taking the image resulting from the above command and running a median filter to get rid of outlier pixels and then get the "cropbox" of the result:

magick result.png -statistic median 5 -format "%@" info:
739x1974+420+220

That tells you where to crop the image to exclude borders. You can then apply that to your original image:

magick receipt.jpg -crop 739x1974+420+220 cropped.png

enter image description here


In case you want to understand the "fuzz" distance it is expressed as a percentage of the distance from the black apex to the white apex in an RGB color cube. So, in a unit cube (one with sides of length 1.0), the black to white distance would be sqrt(3) and distances between colours (i.e. fuzz) would be expressed as a percentage of that.

Sign up to request clarification or add additional context in comments.

4 Comments

Great. How to merge textcleaner and your script so that only pure imagemagic can used? According to doc textcleaner does convert \( $infile -colorspace gray -type grayscale -contrast-stretch 0 \) \( -clone 0 -colorspace gray -negate -lat ${filtersize}x${filtersize}+${offset}% -contrast-stretch 0 \) \ -compose copy_opacity -composite -fill "$bgcolor" -opaque none +matte \ -deskew 40% -sharpen 0x1 \ $outfile
If you remove $outfile from the end of textcleaner command, you'll be in exactly the same position as my command is after magick receipt.jpg so you can then just continue with my -fill white -fuzz ... -opaque ... $outfile
How to convert this to c# code using imagemagick nuget package or other method?. Tried to use OpenCV but background is not removed as descibed in stackoverflow.com/questions/79765653/…
Conversion parameters in queston produced empty image. Modified parameters worked as posted in stackoverflow.com/questions/79766914/… After conversion text quality is poor. In original image text quality is better. How to make image quality better?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.