I have to prep an image for OCR, image text has variable kerning , either I need to remove the blob that is going to be unreadable completely or insert enough space between the letters for OCR to be able to read it.
In the attached example the word brown in caps, the letters BROW, the numbers 789 and again the letters azy in lazy either need to be removed or spaced apart, a little degradation is acceptable as long as the basic shape is still intact.
Ideas?
PS: I am a complete newbie to IM but I catch on quick.
Removing blobs of unreadabletext
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing blobs of unreadabletext
Unfortunately, as far as I know, IM is a pixel processor and knows nothing about text in an image. So it won't help space out the text better as it does not even know it is text.
If this image is pdf, then there may be some other tools to help. But if just jpg or png or tif or the like, then I doubt there is much hope.
The only IM processing that might help is morphologic operators, like thinning or erode, etc, but I have no experience doing it on such a compressed set of text. See http://www.imagemagick.org/Usage/morphology/
If this image is pdf, then there may be some other tools to help. But if just jpg or png or tif or the like, then I doubt there is much hope.
The only IM processing that might help is morphologic operators, like thinning or erode, etc, but I have no experience doing it on such a compressed set of text. See http://www.imagemagick.org/Usage/morphology/
Re: Removing blobs of unreadabletext
OK so that is out, the other method I was thinking of was removing the shadow and highlights then using morphology to produce a skeleton but so far I have not been able to remove the shadow as it varies greatly once I do resampling to bring up the DPI
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing blobs of unreadabletext
What happens if you threshold to try to get the bright highlights and everything else either black or transparent? Does that give you enough of the characters to recognize.
Re: Removing blobs of unreadabletext
I haven't been able to successfully threshold it to that point yet, though I am still trying.
I did get a skeleton out of it finally and though some of the letters are still connected but they are readable, i believe. My final step is to remove the wavyness of the text. this file does not show it because it is just a replica of what my boss will be sending me one everything is scanned.
I did get a skeleton out of it finally and though some of the letters are still connected but they are readable, i believe. My final step is to remove the wavyness of the text. this file does not show it because it is just a replica of what my boss will be sending me one everything is scanned.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing blobs of unreadabletext
Have you tried one of the distance metrics in -morphology
Re: Removing blobs of unreadabletext
distance metrics are over my head, at the moment, studying...
Re: Removing blobs of unreadabletext
ok after some studying I think it would take some custom metrics to do what I want it to do and I don't have the skillz...