Removing blobs of unreadabletext

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Twisty
Posts: 5
Joined: 2010-12-29T21:02:26-07:00
Authentication code: 8675308

Removing blobs of unreadabletext

Post by Twisty »

I have to prep an image for OCR, image text has variable kerning , either I need to remove the blob that is going to be unreadable completely or insert enough space between the letters for OCR to be able to read it.

In the attached example the word brown in caps, the letters BROW, the numbers 789 and again the letters azy in lazy either need to be removed or spaced apart, a little degradation is acceptable as long as the basic shape is still intact.


Ideas?

Image

PS: I am a complete newbie to IM but I catch on quick. ;)
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing blobs of unreadabletext

Post by fmw42 »

Unfortunately, as far as I know, IM is a pixel processor and knows nothing about text in an image. So it won't help space out the text better as it does not even know it is text.

If this image is pdf, then there may be some other tools to help. But if just jpg or png or tif or the like, then I doubt there is much hope.

The only IM processing that might help is morphologic operators, like thinning or erode, etc, but I have no experience doing it on such a compressed set of text. See http://www.imagemagick.org/Usage/morphology/
Twisty
Posts: 5
Joined: 2010-12-29T21:02:26-07:00
Authentication code: 8675308

Re: Removing blobs of unreadabletext

Post by Twisty »

OK so that is out, the other method I was thinking of was removing the shadow and highlights then using morphology to produce a skeleton but so far I have not been able to remove the shadow as it varies greatly once I do resampling to bring up the DPI
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing blobs of unreadabletext

Post by fmw42 »

What happens if you threshold to try to get the bright highlights and everything else either black or transparent? Does that give you enough of the characters to recognize.
Twisty
Posts: 5
Joined: 2010-12-29T21:02:26-07:00
Authentication code: 8675308

Re: Removing blobs of unreadabletext

Post by Twisty »

I haven't been able to successfully threshold it to that point yet, though I am still trying.

I did get a skeleton out of it finally and though some of the letters are still connected but they are readable, i believe. My final step is to remove the wavyness of the text. this file does not show it because it is just a replica of what my boss will be sending me one everything is scanned.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing blobs of unreadabletext

Post by fmw42 »

Have you tried one of the distance metrics in -morphology
Twisty
Posts: 5
Joined: 2010-12-29T21:02:26-07:00
Authentication code: 8675308

Re: Removing blobs of unreadabletext

Post by Twisty »

distance metrics are over my head, at the moment, studying...
Twisty
Posts: 5
Joined: 2010-12-29T21:02:26-07:00
Authentication code: 8675308

Re: Removing blobs of unreadabletext

Post by Twisty »

ok after some studying I think it would take some custom metrics to do what I want it to do and I don't have the skillz... :(
Post Reply