cleaning up fax-quality images for legibility -- performance

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
JoeBackward

cleaning up fax-quality images for legibility -- performance

Post by JoeBackward »

Hello. ImageMagick is really great, thank you, thank you.

I'm using it to clean up images so people can read them.

It's a medical records application. We receive tons of unsearchable scanned pdf files -- with nothing but 300x300 dpi bitonal images in them. Really, tons of them. Too many to try to OCR them all. Plus we kind of like them unsearchable -- it helps maintain patient confidentiality. We don't have any control over this format.

Sometimes a user wants us to fax a document to somebody. We make a SOAP call to a fax service and transmit the the PDF. They resample it to fax resolution with ghostscript and stuff it into the fax modem.

Here's the problem. The resampling decimates the image ... drops scan rows and scan lines. So, if the text is small on the document, it's illegible on the fax. Not good, especially when it says "give Mrs. Robinson 1MG of morphine an hour" (or was that 100MG, I can't read it) ???!!!

Also, sometimes the images have a stippled background.

We've done this, with ImageMagick, to the images with good effect:

convert -density 400x400 -blur 1.0 -threshold 80% inputfile outputfile

This raises the resolution of the pdfs a bit, but with g4 compression the file size expansion is tolerable. Then, the decimation downsampling for faxing doesn't completely wreck the image. We've also tried this to diminish the stippled background:

convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile

That works well, but it takes a long time -- too long for our production app. Mrs. Robinson needs her morphine today, not tomorrow! (Half a minute per page is a problem: we need to fax out a few thousand pages in a peak hour.)

So, here's the question for you ImageMagick wizards... Is it possible to do this kind of thing faster?

Is it true that convert -density 400x400 turns the bitonal input image into a 24-bit continuous tone RGB image? Is there a way to tell it to use an 8-bit grayscale image instead? That should make the -blur and -despeckle operations quite a bit faster.

Is there a faster upsample-antialias operation than -blur?

I'd be grateful for any advice. So will Mrs. Robinson, when she recovers. Thanks.
Ollie Jones
Curaspan Health Group
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: cleaning up fax-quality images for legibility -- performance

Post by magick »

  • convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile
ImageMagick is only half of the performance equation. If your input file is PDF, ImageMagick uses the Ghostscript delegate program or library to render the PDF, ImageMagick then despeckles, blurs, thresholds the image and writes it back out. For performance increases you can always increase memory. If you have a multi-core system, recent versions of ImageMagick can run in parallel which can speed up the process linearly. We have had some posts where a command took 2 minutes on a users hosts. We took the same command to our 4-core Xeon system with Redhat Linux and OpenMP-enabled ImageMagick and the same command took 20 seconds.
JoeBackward

Re: cleaning up fax-quality images for legibility -- performance

Post by JoeBackward »

This is helpful, I'll upgrade. We have the 6.2.8 64-bit version now.

Still, what about getting the gaussian and despeckle operations to run on only one color channel? Is my assumption that rendering the pdf generates three color channels correct?

Thanks.
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: cleaning up fax-quality images for legibility -- performance

Post by magick »

ImageMagick processes the RGB channels. Our design decisions are documented here: http://www.imagemagick.org/script/architecture.php.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: cleaning up fax-quality images for legibility -- performance

Post by anthony »

If you add a -channel many operators will limit themselves to just the one channel, and get a 3 times performance increase, at least on operators that have that limit. At the end just before saving you can then -separate the one channel you have being working on, junking the other channels.

See IM examples, Channels, Masks, and Transparency
http://www.imagemagick.org/Usage/channels/

channel does apply to -blur. This operator is a 2 pass 1 dimensional blur, producing a close equivelent to gaussian, and is a lot faster than a 2 dimentional -gaussian blur.
See Blur vs Gaussian Operator
http://www.imagemagick.org/Usage/convol ... r_gaussian
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
Post Reply