Page 1 of 1

help with image processing - remove noise, rotate, crop

Posted: 2016-11-09T07:56:16-07:00
by hristopeev
Hello,

I have a large number of scanned images which are from a book with some exam questions. Examles:
https://dl.dropboxusercontent.com/u/639 ... ge-236.png
https://dl.dropboxusercontent.com/u/639 ... ge-237.png
https://dl.dropboxusercontent.com/u/639 ... ge-238.png
https://dl.dropboxusercontent.com/u/639 ... ge-329.png
https://dl.dropboxusercontent.com/u/639 ... ge-240.png
https://dl.dropboxusercontent.com/u/639 ... ge-239.png

What I try to achieve is the following:
1. Clean the noise from scanner - I mean these little dots and dashes that are around the text
2. Rotate the image - the middle vertical line should be perpendicular to the image's top and bottom edges
3. Crop each question in separate image
4. Remove white space from each individual image

I managed to partially achieve 1. Clean the noise from scanner using the following commands:

Code: Select all

convert source.png -write MPR:source -morphology close rectangle:3x2 result.png
convert source.png -write MPR:source -morphology close diamond result.png
Both are giving relatively satisfactory results. The problems are when on the page there is a drawing. This cleaning cleans even some pixels from the drawings. If someone can recommend better method for cleaning the noise will be great.

For 2 Rotate the image I tried http://fmwconcepts.com/imagemagick/unrotate/index.php from Fred's scripts but I didn't manage to make it work. Can someone advice how can I approach this?

For 3. Crop each question in separate image - I am not even sure if this is possible only with ImageMagic. Maybe I will need some OCR which detects where the question starts and ends and having these coordinates I can use ImageMagic to crop the image in several pieces? Any suggestions for tools/libraries will be highly appreciated.

For 4. This is clear, I had done it before.

I am using ImageMagick's command line too convert on Mac OS Sierra, version:
Version: ImageMagick 6.9.6-3 Q16 x86_64 2016-10-31 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC Modules
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib


If you need more information about the tools I am using or the images I am ready to assist.

Any help or directions for achieving the output will be really appreciated.

Thanks!