Page 1 of 1

cleaning up fax-quality images for legibility -- performance

Posted: 2009-04-17T04:20:13-07:00
by JoeBackward
Hello. ImageMagick is really great, thank you, thank you.

I'm using it to clean up images so people can read them.

It's a medical records application. We receive tons of unsearchable scanned pdf files -- with nothing but 300x300 dpi bitonal images in them. Really, tons of them. Too many to try to OCR them all. Plus we kind of like them unsearchable -- it helps maintain patient confidentiality. We don't have any control over this format.

Sometimes a user wants us to fax a document to somebody. We make a SOAP call to a fax service and transmit the the PDF. They resample it to fax resolution with ghostscript and stuff it into the fax modem.

Here's the problem. The resampling decimates the image ... drops scan rows and scan lines. So, if the text is small on the document, it's illegible on the fax. Not good, especially when it says "give Mrs. Robinson 1MG of morphine an hour" (or was that 100MG, I can't read it) ???!!!

Also, sometimes the images have a stippled background.

We've done this, with ImageMagick, to the images with good effect:

convert -density 400x400 -blur 1.0 -threshold 80% inputfile outputfile

This raises the resolution of the pdfs a bit, but with g4 compression the file size expansion is tolerable. Then, the decimation downsampling for faxing doesn't completely wreck the image. We've also tried this to diminish the stippled background:

convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile

That works well, but it takes a long time -- too long for our production app. Mrs. Robinson needs her morphine today, not tomorrow! (Half a minute per page is a problem: we need to fax out a few thousand pages in a peak hour.)

So, here's the question for you ImageMagick wizards... Is it possible to do this kind of thing faster?

Is it true that convert -density 400x400 turns the bitonal input image into a 24-bit continuous tone RGB image? Is there a way to tell it to use an 8-bit grayscale image instead? That should make the -blur and -despeckle operations quite a bit faster.

Is there a faster upsample-antialias operation than -blur?

I'd be grateful for any advice. So will Mrs. Robinson, when she recovers. Thanks.
Ollie Jones
Curaspan Health Group

Re: cleaning up fax-quality images for legibility -- performance

Posted: 2009-04-17T05:43:39-07:00
by magick
  • convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile
ImageMagick is only half of the performance equation. If your input file is PDF, ImageMagick uses the Ghostscript delegate program or library to render the PDF, ImageMagick then despeckles, blurs, thresholds the image and writes it back out. For performance increases you can always increase memory. If you have a multi-core system, recent versions of ImageMagick can run in parallel which can speed up the process linearly. We have had some posts where a command took 2 minutes on a users hosts. We took the same command to our 4-core Xeon system with Redhat Linux and OpenMP-enabled ImageMagick and the same command took 20 seconds.

Re: cleaning up fax-quality images for legibility -- performance

Posted: 2009-04-17T08:42:00-07:00
by JoeBackward
This is helpful, I'll upgrade. We have the 6.2.8 64-bit version now.

Still, what about getting the gaussian and despeckle operations to run on only one color channel? Is my assumption that rendering the pdf generates three color channels correct?

Thanks.

Re: cleaning up fax-quality images for legibility -- performance

Posted: 2009-04-17T09:25:40-07:00
by magick
ImageMagick processes the RGB channels. Our design decisions are documented here: http://www.imagemagick.org/script/architecture.php.

Re: cleaning up fax-quality images for legibility -- performance

Posted: 2009-04-22T21:27:24-07:00
by anthony
If you add a -channel many operators will limit themselves to just the one channel, and get a 3 times performance increase, at least on operators that have that limit. At the end just before saving you can then -separate the one channel you have being working on, junking the other channels.

See IM examples, Channels, Masks, and Transparency
http://www.imagemagick.org/Usage/channels/

channel does apply to -blur. This operator is a 2 pass 1 dimensional blur, producing a close equivelent to gaussian, and is a lot faster than a 2 dimentional -gaussian blur.
See Blur vs Gaussian Operator
http://www.imagemagick.org/Usage/convol ... r_gaussian