Page 1 of 1

Clean scan text flatbed scanner ?

Posted: 2019-03-16T14:54:18-07:00
by rmortensen84
I try to find some standards I can use to clean scanned text documents in linux is there a standard command i can use with imagemagick in a script that will do part of the work?

I've seen the following:

Code: Select all

 for img in *.jpg; do mogrify -normalize -level 10%,90% -sharpen 0x1 $img; done 
On the following link :

https://dikant.de/2013/05/01/optimizing ... agemagick/


But looking for inspiration - (Maybe you can't make some standards and you have to adjust from scan to scan?)


I am currently using this script to scan documents:

Code: Select all

#!/bin/bash

# Hent dato
DATE=$(date +%Y-%m-%d-%H:%M:%S)

# Scan
scanimage --format=png --mode 'True Gray' --resolution 400 > /media/Nextcloud/rene/files/Documents/Scan/scan-$DATE.png

# Optimer billede
mogrify -normalize -level 10%,90% -sharpen 0x1 /media/Nextcloud/rene/files/Documents/Scan/scan-$DATE.png

# Refresh Nextcloud
bash /home/rene/nextcloud-refresh.sh

Re: Clean scan text flatbed scanner ?

Posted: 2019-03-16T17:51:52-07:00
by fmw42
There is no need to loop over each image with mogrify. Its purpose is to process all files in the directory. If you want to loop, then use convert rather than mogrify.

Also -normalize and -level should not be needed together. You should be able to set argument to -contrast-stretch to achieve the same result. But you will have to play with the arguments. I suspect that code and suggestion was designed for a specific kind of document being scanned. It may not be general enough or apply to your scan.

If you post and example scan (upload to some free hosting service that won't change the image and does not require a password and put the URL here), perhaps we can suggest a better approach to cleaning it. Since you are on a Unix-like system, you might want to look at my script, textcleaner, at my link below.

Re: Clean scan text flatbed scanner ?

Posted: 2019-03-17T07:13:45-07:00
by rmortensen84
Hello and thanks for reply / help :)

- I scan many different documents in black white (gray) it can be receipts or payslips or other documents therefore I want an imagemagick command that is universal but it may not be possible? - If I have to change the command constantly according to the document, it takes a long time: / :) - An example could be this document:

https://nextcloud.rmortensen84.tk/index ... YQ32CLQTn6



Or this:


https://nextcloud.rmortensen84.tk/index ... Hb8tLFzWgz

https://nextcloud.rmortensen84.tk/index ... NoZQG89fTy



The last one is in pdf i scan in .png now just example :)

Re: Clean scan text flatbed scanner ?

Posted: 2019-03-17T11:56:03-07:00
by fmw42
With PDF files that have images imbedded, you perhaps should extract the image from the PDF. Otherwise try rasterizing with a large density value. But that example looks pretty clean to me. With your PNG file, it has noise in it. You could try some noise removal such as -enhance or -morphology. For example

convert scan-2019-03-17-15_07_58.png -enhance -enhance -enhance -enhance -enhance result.png

You can include some sharpening using -unsharp after removing noise and also some contrast increase.

But I suspect you will not find any method that works for all your scans, especially when they are scanned at different resolutions and formats.