I am scanning some old documents and need to pre-process before OCR. I have found in paint.net a function called reduce noise where with a radius of 200 and a strength of 1 (strength is range 0 to 1) I can move from image 1 to image 2 (these are low res versions of the large tif files I actually have but give the general impression) basically cleaning up the grey background to white.
http://www.yrc.org.uk/data/files/downloads/orig.jpg
Image 1
http://www.yrc.org.uk/data/files/downloads/clean.jpg
Image 2
I have about 600 of these to do! What is the equivalent function in Imagemagick. -despeckle doesn't seem to do it & I think -blur might be right, but can't get right effect.
Any help gratefully received.
Reducing Noise in scanned documents
-
- Posts: 1015
- Joined: 2005-03-21T21:16:57-07:00
Re: Reducing Noise in scanned documents
To clean up grayscale scans of documents, I've been using this command - although I wasn't then running them through an OCR program:
You would have to play with the threshold value to suit your images - it converts the image to black and white. The output file format should not be JPG because it will introduce compression artifacts which will probably confuse the OCR engine.
Pete
Code: Select all
convert Scan10007.bmp -threshold 65% -deskew 40% scan_7.png
Pete
Sorry, my ISP shutdown all personal webspace so my MagickWand Examples in C is offline.
See my message in this topic for a link to a zip of all the files.
See my message in this topic for a link to a zip of all the files.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Reducing Noise in scanned documents
you might take a look at my bash script textcleaner at the link below
Re: Reducing Noise in scanned documents
Thanks. Couldn't use bash script as windows person, but the explaination of what it does allowed me to build a suitable command line.