Page 1 of 2
Cleaning up noise around text
Posted: 2011-05-09T09:14:23-07:00
by mark0978
I've tried -noise radius and -noise geometry and they don't seem to do what I want at all. I have some b&w images (TIFF G4 Fax compression) with lots of noise around the characters. This noise takes the form of pixel blobs that are 1 pixel wide in most cases.
My desire is to do the following 3 steps (in this order):
Whiteout all black pixels that are 1 pixel wide
Whiteout all black pixels that are 1 pixel tall
Whiteout all black pixels that are 1 pixel wide
So the question is, do I have to crack out my C++ skills, or can I do this with imagemagick?
Re: Cleaning up noise around text
Posted: 2011-05-09T10:01:30-07:00
by fmw42
see -morphology close (it would be open if your image was white letters on black, but you need to use close for black letters on white background)
http://www.imagemagick.org/Usage/morphology/#basic
you will have to pick the shape/size of the filter to correspond to the noise you want to remove. If tall noise use narrow wide filter and vice versa.
Can you post a link to your image? It would help to have that to know if this is a viable approach.
Re: Cleaning up noise around text
Posted: 2011-05-09T16:02:25-07:00
by mark0978
Here is a snippet from the image. I've read quite about about morphology, but still haven't managed to come up with something that helps with cleanup without doing more damage than it fixes.
http://www.imagehawk.com/images/cleanup.tif
Re: Cleaning up noise around text
Posted: 2011-05-09T16:07:15-07:00
by fmw42
I don't think anything is going to help as the noise is nearly as big as the thickness of the text characters and the noise is too close to the characters. If they were further away, then perhaps something might be done.
This is not too bad using morphology close with a square shape. But you can try other shapes.
convert cleanup.tif -morphology close square:1 cleanup_close1.gif
Re: Cleaning up noise around text
Posted: 2011-05-09T16:24:08-07:00
by anthony
Most of the noise is cleaned up using
Code: Select all
convert cleanup.tif -morphology close diamond show:
How fred is right it is very hard when the noise is so close to the original text.
However you specified specifically what you want to do, and adding specific pixels (making white) can be done using a Thicken morphology operation.
For example remove black pixels that are one pixel wide
Code: Select all
convert cleanup.tif -morphology thicken '3x1:1,0,1' show:
remove black pixels that are one pixel high
Code: Select all
convert cleanup.tif -morphology thicken '1x3:1,0,1' show:
Or do both, one following the other (two rotated kernels)
Code: Select all
convert cleanup.tif -morphology thicken '1x3>:1,0,1' show:
The real problem however is your source image. It looks like the text was a JPEG that has been thresholded.
It looks like the threshold levels however was wrong, leaving ringing artefacts in the resulting image.
Re: Cleaning up noise around text
Posted: 2011-05-10T16:20:48-07:00
by mark0978
The morphology really does clean up the image for human readability, but when I zoom in though, I think it
square is going to possibly hurt the OCR
However
diamond may actually help quite a bit.
I get an invalid argument for -morphology when I use this command:
Code: Select all
convert cleanup.tif -morphology thicken '3x1:1,0,1'
so I'll try to update tonight and see if that will do the trick.
Version: ImageMagick 6.6.9-4 2011-04-01 Q16
http://www.imagemagick.org
Copyright: Copyright (C) 1999-2011 ImageMagick Studio LLC
Features: OpenMP
These images were made on a really expensive ($250,000) scanner. I'm guessing they didn' t know how to use it properly..... We are working with them to do a better job on future scans (including 300 dpi....)
Thanks for the help.
Re: Cleaning up noise around text
Posted: 2011-05-10T16:31:05-07:00
by fmw42
You need to specify an output image!
convert cleanup.tif -morphology thicken '3x1:1,0,1' result.gif
The following as Anthony suggested with diamond rather than square works well.
convert cleanup.tif -morphology close diamond:1 cleanup_close1.gif
Re: Cleaning up noise around text
Posted: 2011-05-10T16:39:12-07:00
by HugoRune
Due to the nature of the ringing noise, all black noise specks are separated by at least 1 pixel from the letters.
One good approach to remove this noise would be to dilate the image so that at least one "seed" part of each letter remains, then erode these seeds while using the original image as a mask; in effect a flood-fill for each letter.
This way the shape of the letters and other large blobs is preserved perfectly, and smaller blobs disappear.
The biggest dilate that still leaves a part of each letter shape seems to be a 3x4 rectangle for the example data; perhaps use something smaller to be on the safe side.
This command first dilates that 3x4 rectangle, end then erodes until the letters are all whole again
Code: Select all
convert cleanup.tif -write MPR:source ^
-morphology close rectangle:3x4 ^
-morphology erode square MPR:source -compose Lighten -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
-morphology erode square MPR:source -composite ^
cleaned.png
Re: Cleaning up noise around text
Posted: 2011-05-10T16:44:39-07:00
by fmw42
Very clever approach!
Fred
Re: Cleaning up noise around text
Posted: 2011-05-10T19:19:13-07:00
by anthony
HugoRune wrote:Due to the nature of the ringing noise, all black noise specks are separated by at least 1 pixel from the letters.
One good approach to remove this noise would be to dilate the image so that at least one "seed" part of each letter remains, then erode these seeds while using the original image as a mask; in effect a flood-fill for each letter.
this is basically known as "conditional dilation" (or for negated image "conditional erode" and while I have not explored this enough to generate examples it should actually be available RIGHT NOW!
The trick is to use a 'write mask' (the original image) on the 'seed image' and then dilate to infinity.
At this time I only have quick notes on using image write masks in
http://www.imagemagick.org/Usage/maskin ... ping_masks
For morphology I would use make sure the write mask was boolean by specifying it using -clip-mask
The clip mask should be white where you do not want the image to be updated.
Hmmm... This is my first attempt at conditional morphology, exactly as I envisaged!
Code: Select all
convert cleanup.tif -write MPR:source \
-morphology close rectangle:3x4 \
-clip-mask MPR:source \
-morphology erode:8 square \
+clip-mask cleaned.png
Hey it works!!!
This is the equivalent of HugoRune's conditional erode and gets the same result.
NOTE do not use an infinite erode (iteration count = -1), as it will never end (for a long time). Morphology does not actually understand write masks, so it sees pixel changes even though they are never written, as and such it never sees a final 'static' image. In IMv7 (yet to fork) use of infinite iterations to 'seed flood fill' may be possible.
Re: Cleaning up noise around text
Posted: 2011-05-10T19:27:33-07:00
by anthony
fmw42 wrote:You need to specify an output image!
convert cleanup.tif -morphology thicken '3x1:1,0,1' result.gif
The following as Anthony suggested with diamond rather than square works well.
convert cleanup.tif -morphology close diamond:1 cleanup_close1.gif
Both of you missed the '>' in my example to remove 1 pixel width and height pixels.
And that is not quite the same as a 'diamond'.
As for the use of the scanner. Yes I'd say they should scan a sample image in a number of ways so that you can look for figure out what is best. Either that or have then deliver a raw grayscale (color?) scan so you can adjust thresholding and other parameters yourself.
Re: Cleaning up noise around text
Posted: 2011-05-10T19:41:03-07:00
by fmw42
I did not miss it -- just finished your first example to replace show: with an image.
Re: Cleaning up noise around text
Posted: 2011-05-10T23:13:00-07:00
by anthony
Doesn't show: work on a Mac?
Re: Cleaning up noise around text
Posted: 2011-05-11T10:58:05-07:00
by fmw42
anthony wrote:Doesn't show: work on a Mac?
Yes, it does (and your commands show just fine), but the user left off both show: and an output and was complaining of getting errors.
I get an invalid argument for -morphology when I use this command:
convert cleanup.tif -morphology thicken '3x1:1,0,1'
So all I was trying to do was remind him of the need for an output image.
Re: Cleaning up noise around text
Posted: 2011-05-11T18:45:02-07:00
by anthony
Fair enough.... Back to the problem at hand.
mark0978... Are you satisfied with the solutions provided?