Page 1 of 1

Reducing Noise in scanned documents

Posted: 2011-02-25T02:27:10-07:00
by olaeblue
I am scanning some old documents and need to pre-process before OCR. I have found in paint.net a function called reduce noise where with a radius of 200 and a strength of 1 (strength is range 0 to 1) I can move from image 1 to image 2 (these are low res versions of the large tif files I actually have but give the general impression) basically cleaning up the grey background to white.

Image http://www.yrc.org.uk/data/files/downloads/orig.jpg
Image 1

Image http://www.yrc.org.uk/data/files/downloads/clean.jpg
Image 2

I have about 600 of these to do! What is the equivalent function in Imagemagick. -despeckle doesn't seem to do it & I think -blur might be right, but can't get right effect.

Any help gratefully received.

Re: Reducing Noise in scanned documents

Posted: 2011-02-25T09:08:41-07:00
by el_supremo
To clean up grayscale scans of documents, I've been using this command - although I wasn't then running them through an OCR program:

Code: Select all

convert Scan10007.bmp -threshold 65% -deskew 40% scan_7.png
You would have to play with the threshold value to suit your images - it converts the image to black and white. The output file format should not be JPG because it will introduce compression artifacts which will probably confuse the OCR engine.

Pete

Re: Reducing Noise in scanned documents

Posted: 2011-02-25T10:53:45-07:00
by fmw42
you might take a look at my bash script textcleaner at the link below

Re: Reducing Noise in scanned documents

Posted: 2011-03-01T15:46:58-07:00
by olaeblue
Thanks. Couldn't use bash script as windows person, but the explaination of what it does allowed me to build a suitable command line. :D