Hello. ImageMagick is really great, thank you, thank you.
I'm using it to clean up images so people can read them.
It's a medical records application. We receive tons of unsearchable scanned pdf files -- with nothing but 300x300 dpi bitonal images in them. Really, tons of them. Too many to try to OCR them all. Plus we kind of like them unsearchable -- it helps maintain patient confidentiality. We don't have any control over this format.
Sometimes a user wants us to fax a document to somebody. We make a SOAP call to a fax service and transmit the the PDF. They resample it to fax resolution with ghostscript and stuff it into the fax modem.
Here's the problem. The resampling decimates the image ... drops scan rows and scan lines. So, if the text is small on the document, it's illegible on the fax. Not good, especially when it says "give Mrs. Robinson 1MG of morphine an hour" (or was that 100MG, I can't read it) ???!!!
Also, sometimes the images have a stippled background.
We've done this, with ImageMagick, to the images with good effect:
convert -density 400x400 -blur 1.0 -threshold 80% inputfile outputfile
This raises the resolution of the pdfs a bit, but with g4 compression the file size expansion is tolerable. Then, the decimation downsampling for faxing doesn't completely wreck the image. We've also tried this to diminish the stippled background:
convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile
That works well, but it takes a long time -- too long for our production app. Mrs. Robinson needs her morphine today, not tomorrow! (Half a minute per page is a problem: we need to fax out a few thousand pages in a peak hour.)
So, here's the question for you ImageMagick wizards... Is it possible to do this kind of thing faster?
Is it true that convert -density 400x400 turns the bitonal input image into a 24-bit continuous tone RGB image? Is there a way to tell it to use an 8-bit grayscale image instead? That should make the -blur and -despeckle operations quite a bit faster.
Is there a faster upsample-antialias operation than -blur?
I'd be grateful for any advice. So will Mrs. Robinson, when she recovers. Thanks.
Ollie Jones
Curaspan Health Group
cleaning up fax-quality images for legibility -- performance
Re: cleaning up fax-quality images for legibility -- performance
- convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile
Re: cleaning up fax-quality images for legibility -- performance
This is helpful, I'll upgrade. We have the 6.2.8 64-bit version now.
Still, what about getting the gaussian and despeckle operations to run on only one color channel? Is my assumption that rendering the pdf generates three color channels correct?
Thanks.
Still, what about getting the gaussian and despeckle operations to run on only one color channel? Is my assumption that rendering the pdf generates three color channels correct?
Thanks.
Re: cleaning up fax-quality images for legibility -- performance
ImageMagick processes the RGB channels. Our design decisions are documented here: http://www.imagemagick.org/script/architecture.php.
- anthony
- Posts: 8883
- Joined: 2004-05-31T19:27:03-07:00
- Authentication code: 8675308
- Location: Brisbane, Australia
Re: cleaning up fax-quality images for legibility -- performance
If you add a -channel many operators will limit themselves to just the one channel, and get a 3 times performance increase, at least on operators that have that limit. At the end just before saving you can then -separate the one channel you have being working on, junking the other channels.
See IM examples, Channels, Masks, and Transparency
http://www.imagemagick.org/Usage/channels/
channel does apply to -blur. This operator is a 2 pass 1 dimensional blur, producing a close equivelent to gaussian, and is a lot faster than a 2 dimentional -gaussian blur.
See Blur vs Gaussian Operator
http://www.imagemagick.org/Usage/convol ... r_gaussian
See IM examples, Channels, Masks, and Transparency
http://www.imagemagick.org/Usage/channels/
channel does apply to -blur. This operator is a 2 pass 1 dimensional blur, producing a close equivelent to gaussian, and is a lot faster than a 2 dimentional -gaussian blur.
See Blur vs Gaussian Operator
http://www.imagemagick.org/Usage/convol ... r_gaussian
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/