cleaning up fax-quality images for legibility -- performance
Posted: 2009-04-17T04:20:13-07:00
Hello. ImageMagick is really great, thank you, thank you.
I'm using it to clean up images so people can read them.
It's a medical records application. We receive tons of unsearchable scanned pdf files -- with nothing but 300x300 dpi bitonal images in them. Really, tons of them. Too many to try to OCR them all. Plus we kind of like them unsearchable -- it helps maintain patient confidentiality. We don't have any control over this format.
Sometimes a user wants us to fax a document to somebody. We make a SOAP call to a fax service and transmit the the PDF. They resample it to fax resolution with ghostscript and stuff it into the fax modem.
Here's the problem. The resampling decimates the image ... drops scan rows and scan lines. So, if the text is small on the document, it's illegible on the fax. Not good, especially when it says "give Mrs. Robinson 1MG of morphine an hour" (or was that 100MG, I can't read it) ???!!!
Also, sometimes the images have a stippled background.
We've done this, with ImageMagick, to the images with good effect:
convert -density 400x400 -blur 1.0 -threshold 80% inputfile outputfile
This raises the resolution of the pdfs a bit, but with g4 compression the file size expansion is tolerable. Then, the decimation downsampling for faxing doesn't completely wreck the image. We've also tried this to diminish the stippled background:
convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile
That works well, but it takes a long time -- too long for our production app. Mrs. Robinson needs her morphine today, not tomorrow! (Half a minute per page is a problem: we need to fax out a few thousand pages in a peak hour.)
So, here's the question for you ImageMagick wizards... Is it possible to do this kind of thing faster?
Is it true that convert -density 400x400 turns the bitonal input image into a 24-bit continuous tone RGB image? Is there a way to tell it to use an 8-bit grayscale image instead? That should make the -blur and -despeckle operations quite a bit faster.
Is there a faster upsample-antialias operation than -blur?
I'd be grateful for any advice. So will Mrs. Robinson, when she recovers. Thanks.
Ollie Jones
Curaspan Health Group
I'm using it to clean up images so people can read them.
It's a medical records application. We receive tons of unsearchable scanned pdf files -- with nothing but 300x300 dpi bitonal images in them. Really, tons of them. Too many to try to OCR them all. Plus we kind of like them unsearchable -- it helps maintain patient confidentiality. We don't have any control over this format.
Sometimes a user wants us to fax a document to somebody. We make a SOAP call to a fax service and transmit the the PDF. They resample it to fax resolution with ghostscript and stuff it into the fax modem.
Here's the problem. The resampling decimates the image ... drops scan rows and scan lines. So, if the text is small on the document, it's illegible on the fax. Not good, especially when it says "give Mrs. Robinson 1MG of morphine an hour" (or was that 100MG, I can't read it) ???!!!
Also, sometimes the images have a stippled background.
We've done this, with ImageMagick, to the images with good effect:
convert -density 400x400 -blur 1.0 -threshold 80% inputfile outputfile
This raises the resolution of the pdfs a bit, but with g4 compression the file size expansion is tolerable. Then, the decimation downsampling for faxing doesn't completely wreck the image. We've also tried this to diminish the stippled background:
convert -density 400x400 -despeckle -blur 1.0 -threshold 80% inputfile outputfile
That works well, but it takes a long time -- too long for our production app. Mrs. Robinson needs her morphine today, not tomorrow! (Half a minute per page is a problem: we need to fax out a few thousand pages in a peak hour.)
So, here's the question for you ImageMagick wizards... Is it possible to do this kind of thing faster?
Is it true that convert -density 400x400 turns the bitonal input image into a 24-bit continuous tone RGB image? Is there a way to tell it to use an 8-bit grayscale image instead? That should make the -blur and -despeckle operations quite a bit faster.
Is there a faster upsample-antialias operation than -blur?
I'd be grateful for any advice. So will Mrs. Robinson, when she recovers. Thanks.
Ollie Jones
Curaspan Health Group