Hi,
I have a PDF that is result of scanning some pages that has some printed bar codes and some with barcode stickers stuck to paper before scanning.
There is some text surrounding the bar codes that need to be extracted using OCR. The bar code is causing interference and tesseract is unable to extract required text
Please let me know if there is a way to erase barcode like structure from image without impacting on the text near by (under and above bar code)
Here is the portion of image.
https://www.dropbox.com/s/4lqdb5k498b8a ... 1.tif?dl=0
Thanks,
Muttu
Erase barcode from Image
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Erase barcode from Image
I don't have time to examine this in detail, but a possible approach is:
1. Slight blur. (Perhaps just a horizontal blur.)
2. Find rectangles that are around 50% density. These are either text or barcodes.
3. For each rectangle, scale down to one row and back up again. If this greatly changes the rectangle, it is text. If it doesn't change much, it is barcode.
4. For each barcode found in (3), composite a white rectangle over the input image.
1. Slight blur. (Perhaps just a horizontal blur.)
2. Find rectangles that are around 50% density. These are either text or barcodes.
3. For each rectangle, scale down to one row and back up again. If this greatly changes the rectangle, it is text. If it doesn't change much, it is barcode.
4. For each barcode found in (3), composite a white rectangle over the input image.
snibgo's IM pages: im.snibgo.com
- GeeMack
- Posts: 718
- Joined: 2015-12-01T22:09:46-07:00
- Authentication code: 1151
- Location: Central Illinois, USA
Re: Erase barcode from Image
There may be a few ways to approach this problem according to different criteria. Are the bar codes in the exact same places on every document? If so, you can just run your image through ImageMagick and add some white rectangular overlays in the right locations. Piece of cake. If the pages are scanned with more random placement of the bar codes, or if the locations of the codes aren't known in advance, it would be more complicated, maybe much more.Muttu wrote:There is some text surrounding the bar codes that need to be extracted using OCR. The bar code is causing interference and tesseract is unable to extract required text
Please let me know if there is a way to erase barcode like structure from image without impacting on the text near by (under and above bar code)
If you can provide the version of ImageMagick you're using, what platform you're running it on, and maybe a bit more detail about your required tolerances, etc., someone here can surely point you in the right direction.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Erase barcode from Image
Another possible approach, that might be really simple and quick:
1. Blur, only in the vertical direction.
2. Pixels that haven't changed significantly should be painted white.
Step (2) probably need refining so only the pixels that are part of a large group of unchanged pixels get painted.
1. Blur, only in the vertical direction.
2. Pixels that haven't changed significantly should be painted white.
Step (2) probably need refining so only the pixels that are part of a large group of unchanged pixels get painted.
snibgo's IM pages: im.snibgo.com