This topic is moved from Users section.
I have a large number of scanned images which are from a book with some exam questions. Examples:
https://dl.dropboxusercontent.com/u/639 ... ge-236.png
https://dl.dropboxusercontent.com/u/639 ... ge-237.png
https://dl.dropboxusercontent.com/u/639 ... ge-238.png
https://dl.dropboxusercontent.com/u/639 ... ge-329.png
https://dl.dropboxusercontent.com/u/639 ... ge-240.png
https://dl.dropboxusercontent.com/u/639 ... ge-239.png
What I try to achieve is the following:
1. Clean the noise from scanner - I mean these little dots and dashes that are around the text
2. Rotate the image - the middle vertical line should be perpendicular to the image's top and bottom edges
3. Crop each question in separate image
4. Remove white space from each individual image
I managed to partially achieve 1. Clean the noise from scanner using the following commands:
Code: Select all
convert file.png \
-write MPR:source \
-morphology close rectangle:3x2 \
file_rectangle_3x2.png
OR
convert file.png \
-write MPR:source \
-morphology close diamond \
-morphology erode square MPR:source -compose Lighten -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
-morphology erode square MPR:source -composite \
file_diamond.png
For 2 Rotate the image I tried http://fmwconcepts.com/imagemagick/unrotate/index.php from Fred's scripts but I didn't manage to make it work. Can someone advice how can I approach this?
For 3. Crop each question in separate image - I am not even sure if this is possible only with ImageMagic. Maybe I will need some OCR which detects where the question starts and ends and having these coordinates I can use ImageMagic to crop the image in several pieces? Any suggestions for tools/libraries will be highly appreciated.
For 4. This is clear, I had done it before.
I am using ImageMagick's command line too convert on Mac OS Sierra, version:
Version: ImageMagick 6.9.6-3 Q16 x86_64 2016-10-31 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC Modules
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib
Since the amount of scanned images is huge the processing will be migrated to a ubuntu server.
If you need more information about the tools I am using or the images I am ready to assist.
Any help or directions for achieving the output will be really appreciated and we are ready to pay for them.
Thanks!