Hey guys,
Thanks for this active forum, helped me out a few times. But now I have a little issue myself:
I have some kind of 'exploded view': an image with an image with numbers: https://ibin.co/3sMyGGx8AT7Z.png
Problem is that numbers are a bit fuzzy: multiple shades of red.
I want to extract the numbers from this image with OCR. I'm getting around 70% of numbers correctly with OCR out of this image, but want to improve.
I know it's possible to remove everything but one color from an image, but have some difficulty implementing this for this image.
I have tried a lot of solutions:
convert image-bg.png -fuzz 22% -fill black -opaque "#da392f" image-clr.png
convert image-bg.png -fill white -fuzz 26% +opaque "#dd4337" image-clr.png
but the result contains a lot of noise which makes OCR a bit difficult. Can anybody hint me in the right direction? Best result would be a black-and-white image with only the numbers.
Thanks!
How to remove all colors except 'redish'
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: How to remove all colors except 'redish'
Three methods are useful here:
1. "-level-colors" can change a particular colour and white, to black and white respectively. This will change the grays to something else. See http://www.imagemagick.org/script/comma ... vel-colors
2. Convert to HCL and separate the "G" channel, and negate. The result is white where the input has zero saturation: white or black or any shade of gray. Hence it an be used as a mask, to turn all those gray shades into white.
3. A morphology method can isolate the thin lines, to distinguish the red text from the thin red lines.
A combination of these will solve the problem.
1. "-level-colors" can change a particular colour and white, to black and white respectively. This will change the grays to something else. See http://www.imagemagick.org/script/comma ... vel-colors
2. Convert to HCL and separate the "G" channel, and negate. The result is white where the input has zero saturation: white or black or any shade of gray. Hence it an be used as a mask, to turn all those gray shades into white.
3. A morphology method can isolate the thin lines, to distinguish the red text from the thin red lines.
A combination of these will solve the problem.
snibgo's IM pages: im.snibgo.com
Re: How to remove all colors except 'redish'
Thanks.
In the end I did not succeed. Cleaning up the image with enough clarity to be read with OCR turned out to be a bit far fetched.
I tried your recommendations but did not get a clear result. The best results I got was when I used an alpha channel to remove all but a few colors and then remove the alpha channel:
convert img-big.png -channel A -fuzz 7% -transparent "#dd4337" -transparent "#d82528" -transparent "#f7d7ca" -transparent "#ea9179" -transparent "#ecaa94" -transparent "#df5944" -transparent "#e47664" -negate +channel -alpha remove img-trans.png
But this was not enough for OCR to be read, too much noise from red lines. If I try to remove the noise and the red lines the numbers get too damaged to read.
But thanks anyway.
In the end I did not succeed. Cleaning up the image with enough clarity to be read with OCR turned out to be a bit far fetched.
I tried your recommendations but did not get a clear result. The best results I got was when I used an alpha channel to remove all but a few colors and then remove the alpha channel:
convert img-big.png -channel A -fuzz 7% -transparent "#dd4337" -transparent "#d82528" -transparent "#f7d7ca" -transparent "#ea9179" -transparent "#ecaa94" -transparent "#df5944" -transparent "#e47664" -negate +channel -alpha remove img-trans.png
But this was not enough for OCR to be read, too much noise from red lines. If I try to remove the noise and the red lines the numbers get too damaged to read.
But thanks anyway.