I need to reprocess a bunch of old images. These are document images scanned at 400 dpi. What I need is something fairly simple: convert them to black and white (NOT grayscale) at 300 dpi and saving them in the TIFF format compressed as Group4, preserving quality as much as possible. Quality is not measured by visual appearance but how good the results are when the converted images are pushed through OCR.
Here is the command I am using now:
Code: Select all
convert -filter Lanczos -resample 300x300 -compress Group4 -monochrome input.tif output.tif
The result is average. What was noticed was that some original images, even though they were scanned from regular white office paper, appear to have some light "textured" background. When such images are converted to bitonal, in large white regions, i.e. regions without text, the textture gets somehow enhanced and turn into large black "clouds". These clouds almost appear lighter than areas with text even though this is more of an optical illusion since the image has only 2 colors.
Anyway, am I doing the conversion incorrectly? Do you think the problem I described above can be solved easily by some type of prefiltering before the images gets converted to b/w?
thanks much!