Page 1 of 1

Erase barcode from Image

Posted: 2016-10-30T11:54:32-07:00
by Muttu
Hi,
I have a PDF that is result of scanning some pages that has some printed bar codes and some with barcode stickers stuck to paper before scanning.
There is some text surrounding the bar codes that need to be extracted using OCR. The bar code is causing interference and tesseract is unable to extract required text
Please let me know if there is a way to erase barcode like structure from image without impacting on the text near by (under and above bar code)
Here is the portion of image.
https://www.dropbox.com/s/4lqdb5k498b8a ... 1.tif?dl=0

Thanks,
Muttu

Re: Erase barcode from Image

Posted: 2016-10-30T13:10:06-07:00
by snibgo
I don't have time to examine this in detail, but a possible approach is:

1. Slight blur. (Perhaps just a horizontal blur.)

2. Find rectangles that are around 50% density. These are either text or barcodes.

3. For each rectangle, scale down to one row and back up again. If this greatly changes the rectangle, it is text. If it doesn't change much, it is barcode.

4. For each barcode found in (3), composite a white rectangle over the input image.

Re: Erase barcode from Image

Posted: 2016-10-30T13:14:14-07:00
by GeeMack
Muttu wrote:There is some text surrounding the bar codes that need to be extracted using OCR. The bar code is causing interference and tesseract is unable to extract required text
Please let me know if there is a way to erase barcode like structure from image without impacting on the text near by (under and above bar code)
There may be a few ways to approach this problem according to different criteria. Are the bar codes in the exact same places on every document? If so, you can just run your image through ImageMagick and add some white rectangular overlays in the right locations. Piece of cake. If the pages are scanned with more random placement of the bar codes, or if the locations of the codes aren't known in advance, it would be more complicated, maybe much more.

If you can provide the version of ImageMagick you're using, what platform you're running it on, and maybe a bit more detail about your required tolerances, etc., someone here can surely point you in the right direction.

Re: Erase barcode from Image

Posted: 2016-10-30T13:17:38-07:00
by snibgo
Another possible approach, that might be really simple and quick:

1. Blur, only in the vertical direction.

2. Pixels that haven't changed significantly should be painted white.

Step (2) probably need refining so only the pixels that are part of a large group of unchanged pixels get painted.