Scanned book: removing spots
Posted: 2016-04-30T01:37:54-07:00
Hello,Version: ImageMagick 6.9.3-7 Q8 x86_64 2016-04-29 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP
Delegates (built-in): bzlib freetype jng jp2 jpeg lcms ltdl lzma png webp wmf xml zlib
following this forum and the manual I have been able process a scanned book and get nice BW images.
However, there remain spots near the border. These have to be removed before the greedy trimming, as these spots let the greedy trimming stop too early. I tried my hands on morphology but didn't get it right. (The morphology used in the script below closes little white gaps in the black characters.)
To remove these spots, the algorithm would need to:
1. Find grayscale clusters
2. Delete grayscale cluster if there is no other grayscale pixel close it.
3. Where close is a numeric value of 20 px. This would allow meaningful dots (above the character i etc.) to remain untouched.
Please refer to the script below. Any help would be appreciated.
Code: Select all
#!/bin/sh
# DEPENDS ON:
# brew install imagemagick --with-jp2 --with-openmp --with-quantum-depth-8
# brew install parallel
# http://www.fmwconcepts.com/imagemagick/autotrim/
mkdir -p w1/
FORMAT=png # output format
BWFUZZ=60 # higher values result in more black and less white area
TRIMFUZZ=80 # higher values result in more greedy trimming
LIMIT="-limit memory 300MB -limit map 600MB"
# To grayscale
find in/ -name "*.jp2" -exec basename {} .jp2 \; | parallel --bar -j 4 convert in/{}.jp2 $LIMIT -strip -flatten -alpha off -colorspace gray -fuzz $BWFUZZ% -fill white +opaque black +repage -morphology Open diamond -format $FORMAT w1/{}.$FORMAT
# Greedy trim
find w1/ -name "*.png" -exec basename {} \; | parallel --bar -j 4 autotrim -t -5 -b 5 -l -5 -r 5 -f $TRIMFUZZ w1/{} w1/{} > /dev/null
# To black-and-white
find w1/ -name "*.png" -exec basename {} \; | parallel --bar -j 4 convert w1/{} $LIMIT -quantize gray +dither -colors 2 -depth 2 +repage w1/{}
http://drive.google.com/uc?export=view& ... DZOYXhWSmM
To Grayscale
http://drive.google.com/uc?export=view& ... 2ZmNHNSTUE
After this step, the dots near the border need to be removed.
To Black-White
http://drive.google.com/uc?export=view& ... lNoYXFSOEU