Page 1 of 1

Trying to get a better conversion from color to bw

Posted: 2012-07-26T14:49:37-07:00
by izavorin
I need to reprocess a bunch of old images. These are document images scanned at 400 dpi. What I need is something fairly simple: convert them to black and white (NOT grayscale) at 300 dpi and saving them in the TIFF format compressed as Group4, preserving quality as much as possible. Quality is not measured by visual appearance but how good the results are when the converted images are pushed through OCR.

Here is the command I am using now:

Code: Select all

convert -filter Lanczos -resample 300x300 -compress Group4 -monochrome input.tif output.tif
The result is average. What was noticed was that some original images, even though they were scanned from regular white office paper, appear to have some light "textured" background. When such images are converted to bitonal, in large white regions, i.e. regions without text, the textture gets somehow enhanced and turn into large black "clouds". These clouds almost appear lighter than areas with text even though this is more of an optical illusion since the image has only 2 colors.

Anyway, am I doing the conversion incorrectly? Do you think the problem I described above can be solved easily by some type of prefiltering before the images gets converted to b/w?

thanks much!

Re: Trying to get a better conversion from color to bw

Posted: 2012-07-26T15:17:37-07:00
by fmw42
convert -filter Lanczos -resample 300x300 -compress Group4 -monochrome input.tif output.tif
-resample does not know about -filter to my knowledge. Also your input image should come right after convert for proper IM 6 syntax. Also -monochrome will dither your image and probably should be before -compress, which should be right before the output.

So you may want to use -resize to change the width and height if desired and then -density to change the dpi and then use -type bilevel and or -threshold.

Do you have to change the width and height of the image rather than just keeping it the same and just changing density? For printing only the density matters.


If you can post a link to your input and output examples, perhaps we can test with it and find the appropriate commands

Re: Trying to get a better conversion from color to bw

Posted: 2012-07-26T15:38:32-07:00
by izavorin
fmw42 wrote: If you can post a link to your input and output examples, perhaps we can test with it and find the appropriate commands
Unfortunately, I can't provide any samples (the images are sensitive)
fmw42 wrote:
Do you have to change the width and height of the image rather than just keeping it the same and just changing density?
Well, if the image is 400dpi and I want to change it to 300dpi, I have to change its size to 75% otherwise the actual and specified DPIs will be different. The problem I just discovered, though, is that some originals are actually 300dpi, not 400. Which means that if I specify -resize instead of -resample for the image that is actually 300, the result will be incorrect, right? Is there a way to run ImageMagick on an image to output its DPI to stdout? If so, I might be able to write some type of script, although that would be a bit of a pain.
fmw42 wrote: So you may want to use -resize to change the width and height if desired and then -density to change the dpi and then use -type bilevel and or -threshold.
OK, thanks, I'll try it.

Re: Trying to get a better conversion from color to bw

Posted: 2012-07-26T18:06:17-07:00
by fmw42
My understanding, though I am no expert on this topic, is that if you are printing, the width and height does not matter. It is only the density in dpi. So you can easily just change the density to whatever suits you and it should be fine.

You can get the density from the string format (http://www.imagemagick.org/script/escape.php)

convert image -format "%x x %y" info:

%x x resolution (density)
%y y resolution (density)

You might want to review:

http://www.imagemagick.org/script/comma ... p#resample
http://www.imagemagick.org/script/comma ... php#resize
http://www.imagemagick.org/script/comma ... hp#density
http://www.imagemagick.org/script/comma ... .php#units
http://www.imagemagick.org/Usage/resize/
http://www.imagemagick.org/Usage/quantize/#two_color

Re: Trying to get a better conversion from color to bw

Posted: 2012-07-27T06:46:48-07:00
by izavorin
fmw42 wrote:My understanding, though I am no expert on this topic, is that if you are printing, the width and height does not matter. It is only the density in dpi. So you can easily just change the density to whatever suits you and it should be fine.
We're not printing anything out. In fact, the work I am doing will hopefully save a few trees by avoiding re-printing and re-scanning of a bunch of poorly scanned images. For a typical OCR engine, having correct DPI settings is essential, if they are out of whack this can affect OCR quality significantly.

Thanks, I'll take a look

Re: Trying to get a better conversion from color to bw

Posted: 2012-07-27T10:03:04-07:00
by fmw42
We're not printing anything out. In fact, the work I am doing will hopefully save a few trees by avoiding re-printing and re-scanning of a bunch of poorly scanned images. For a typical OCR engine, having correct DPI settings is essential, if they are out of whack this can affect OCR quality significantly.
OK. Sorry for my misunderstanding. To get the best quality, I would then use a combination of -resize and -density and -type bilevel and optionally threshold before the -type to get the best result if scattered grayscale noise remains. If you threshold, then you probably do not need the -type bilevel

For resizing see the following options:
http://www.imagemagick.org/script/comma ... p#geometry