Page 1 of 1

Trying to manipulate captcha for pre-OCR

Posted: 2016-07-28T08:17:18-07:00
by wakamura
I have a captcha I'm currently trying to use tesseract-OCR to read.
The problem is, there are dots and lines across the letters. I have been testing with imageMagick trying to remove these lines to make the OCR actually be able to read the letters but have failed to do so.

(Original image)
http://imgur.com/a/b3osy

(After image)
http://imgur.com/a/1sedw

The process I used to get to

Code: Select all

convert captcha1.png -level 20000,0,20000 captcha1.png
convert catpcha1.png catpcha1.pgm
convert captcha1.pgm -black-threshold 65000 captcha1.tif
convert captcha1.tif -negate captcha1.tif
convert captcha1.tif -threshold 90% captcha1.tif
convert captcha1.tif -morphology Convolve "3x3: 0.1,0.0,05 0.0,0.5,0.5 0.1,0.1,0.1" captcha1.tif
convert captcha1.tif -blur 1 captcha1.tif
convert captcha1.tif -threshold 80% captcha1.tif
convert captcha1.tif -morphology Convolve "3x3: 0.1,0.0,05 0.0,0.5,0.5 0.1,0.1,0.1" captcha1.tif
convert captcha1.tif -morphology Convolve "3x3: 0.1,0.0,05 0.0,0.5,0.5 0.1,0.1,0.1" captcha1.tif
convert captcha1.tif -threshold 80% captcha1.tif
convert captcha1.tif -morphology Convolve "3x3: 0.1,0.0,05 0.0,0.5,0.5 0.1,0.1,0.1" captcha1.tif
convert captcha1.tif -blur 1 captcha1.tif
convert captcha1.tif -threshold 90% captcha1.tif
convert captcha1.png -level 20000,0,20000 captcha1.png
convert catpcha1.png catpcha1.pgm
convert captcha1.pgm -black-threshold 65000 captcha1.tif
convert captcha1.tif -negate captcha1.tif
convert captcha1.tif -threshold 90% captcha1.tif
convert captcha1.tif -morphology Dilate rectangle:3x3 captcha1.tif
convert captcha1.tif -morphology Erode rectangle:5x1 captcha1.tif
As you can see, I am very new to this, and for the first part I have found a guide online, then tried using trial and error for a few days and still to no avail.
I have been first running it through cmd Prompt to try to get the OCR to actually read the letters correctly.
Can anyone point me to the right direction of removing the left over lines or a better method?

EDIT: Images werent showing

Re: Trying to manipulate captcha for pre-OCR

Posted: 2016-07-28T08:37:53-07:00
by snibgo
I won't help attempts to defeat captcha.

Re: Trying to manipulate captcha for pre-OCR

Posted: 2016-07-28T09:37:58-07:00
by wakamura
snibgo wrote:I won't help attempts to defeat captcha.
okay.