Hi there,
I am a newbie. I would like to extract text from some images captured by a webcam (1280x720 resolution). An example is given here (Italian language)
https://drive.google.com/file/d/0B-X1ZT ... sp=sharing
I need to preprocess images before OCR with tesseract, I plan to use textcleanear script but I am wonderingabout its parameters and options.
Any idea?
Thanks
help with OCR prepocess
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: help with OCR prepocess
My scripts can only be run on Unix systems. What is your IM version and platform. See viewtopic.php?f=1&t=9620
What are your questions about the script usage? It should be explanatory from the documentation and examples at http://www.fmwconcepts.com/imagemagick/ ... /index.php.
The main arguments are are -f and -o.
"-f filtersize ... FILTERSIZE is the size of the filter used to clean up the background. Values are integers>0. The filtersize needs to be larger than the thickness of the writing, but the smaller the better beyond this. Making it larger will increase the processing time and may lose text. The default is 15.'
"-o offset ... OFFSET is the offset threshold in percent used by the filter to eliminate noise. Values are integers>=0. Values too small will leave much noise and artifacts in the result. Values too large will remove too much text leaving gaps. The default is 5."
Best thing is to start with only a few arguments and test for best -f and -o. Then add other arguments as needed.
try this to start:
If you scan the whole page showing the page borders, you could perspectively correct the page first. See -distort perspective or my script unperspective.
What are your questions about the script usage? It should be explanatory from the documentation and examples at http://www.fmwconcepts.com/imagemagick/ ... /index.php.
The main arguments are are -f and -o.
"-f filtersize ... FILTERSIZE is the size of the filter used to clean up the background. Values are integers>0. The filtersize needs to be larger than the thickness of the writing, but the smaller the better beyond this. Making it larger will increase the processing time and may lose text. The default is 15.'
"-o offset ... OFFSET is the offset threshold in percent used by the filter to eliminate noise. Values are integers>=0. Values too small will leave much noise and artifacts in the result. Values too large will remove too much text leaving gaps. The default is 5."
Best thing is to start with only a few arguments and test for best -f and -o. Then add other arguments as needed.
try this to start:
Code: Select all
textcleaner -f 25 -o 5 s3.jpg result.png