Page 1 of 1

Processing tick boxes on forms. A method.

Posted: 2009-08-17T06:28:58-07:00
by jaffamuffin
Using windows cmd scripting and imagemagick here is a working method to process a form tickboxes and produce a confidence level as well as the form data.

Could probably be improved A LOT, but this works and may serve as the basis for someone else to use.

Image
http://i30.tinypic.com/sbo9yo.gif (fullsize)

1. Crop out the questions.
2. Resize and threshold the boxes and output to text.
3. Count occurance of black and white and log to file (lots of black = high confidence)
4. Some awkward batch file text processing.
5. output.txt looks like this:

Takes about 45 seconds on my P4 Win2k to process 1 image.

Code: Select all

[QUESTION: 101 ANSWER: E CONFIDENCE: 5 ] 
[QUESTION: 102 ANSWER: D CONFIDENCE: 3 ] 
[QUESTION: 103 ANSWER: C CONFIDENCE: 3 ] 
[QUESTION: 104 ANSWER: B CONFIDENCE: 3 ] 
[QUESTION: 105 ANSWER: A CONFIDENCE: 2 ] 
[QUESTION: 106 ANSWER: B CONFIDENCE: 3 ] 
[QUESTION: 107 ANSWER: C CONFIDENCE: 4 ] 
[QUESTION: 108 ANSWER: D CONFIDENCE: 4 ] 
[QUESTION: 109 ANSWER: E CONFIDENCE: 6 ] 
[QUESTION: 110 ANSWER: D CONFIDENCE: 4 ] 
[QUESTION: 111 ANSWER: C CONFIDENCE: 3 ] 
[QUESTION: 112 ANSWER: B CONFIDENCE: 3 ] 
[QUESTION: 113 ANSWER: A CONFIDENCE: 3 ] 
[QUESTION: 114 ANSWER: B CONFIDENCE: 3 ] 
[QUESTION: 115 ANSWER: C CONFIDENCE: 3 ] 
[QUESTION: 116 ANSWER: D CONFIDENCE: 4 ] 
[QUESTION: 117 ANSWER: E CONFIDENCE: 1 ] 
[QUESTION: 118 ANSWER: D CONFIDENCE: 4 ] 
[QUESTION: 119 ANSWER: C CONFIDENCE: 2 ] 
[QUESTION: 120 ANSWER: A CONFIDENCE: 4 ] 
[QUESTION: 120 ANSWER: B CONFIDENCE: 4 ] 
[QUESTION: 120 ANSWER: C CONFIDENCE: 3 ] 
[QUESTION: 120 ANSWER: D CONFIDENCE: 5 ] 
[QUESTION: 120 ANSWER: E CONFIDENCE: 2 ] 
[QUESTION: 121 ANSWER: A CONFIDENCE: 4 ] 
[QUESTION: 122 ANSWER: B CONFIDENCE: 3 ] 
[QUESTION: 123 ANSWER: C CONFIDENCE: 2 ] 
[QUESTION: 124 ANSWER: D CONFIDENCE: 2 ] 
[QUESTION: 125 ANSWER: E CONFIDENCE: 2 ] 
[QUESTION: 126 ANSWER: D CONFIDENCE: 3 ] 
[QUESTION: 127 ANSWER: C CONFIDENCE: 3 ] 
[QUESTION: 128 ANSWER: B CONFIDENCE: 3 ] 
[QUESTION: 129 ANSWER: A CONFIDENCE: 5 ] 
[QUESTION: 130 ANSWER: B CONFIDENCE: 5 ] 
[QUESTION: 131 ANSWER: C CONFIDENCE: 4 ] 
[QUESTION: 132 ANSWER: C CONFIDENCE: 1 ] 
[QUESTION: 132 ANSWER: D CONFIDENCE: 5 ] 
[QUESTION: 133 ANSWER: E CONFIDENCE: 5 ] 
[QUESTION: 134 ANSWER: D CONFIDENCE: 5 ] 
[QUESTION: 135 ANSWER: C CONFIDENCE: 5 ] 
[QUESTION: 136 ANSWER: B CONFIDENCE: 5 ] 
[QUESTION: 137 ANSWER: A CONFIDENCE: 5 ] 
[QUESTION: 138 ANSWER: B CONFIDENCE: 5 ] 
[QUESTION: 139 ANSWER: C CONFIDENCE: 5 ] 
[QUESTION: 140 ANSWER: A CONFIDENCE: 3 ] 
[QUESTION: 140 ANSWER: B CONFIDENCE: 5 ] 
[QUESTION: 140 ANSWER: C CONFIDENCE: 3 ] 
[QUESTION: 140 ANSWER: D CONFIDENCE: 4 ] 
[QUESTION: 140 ANSWER: E CONFIDENCE: 5 ] 

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION

cd /d w:\tmp\

SET coord=1459
SET size=89
FOR /L %%A IN (101,1,120) DO (
	echo %%A
	convert %1 -crop "892x!size!+228+!coord!" +repage q%%A.tif
	SET /A coord=!coord! + !size!
	)
	
SET coord=1459
FOR /L %%A IN (121,1,140) DO (
	echo %%A
	convert %1 -crop "892x!size!+1328+!coord!" +repage q%%A.tif
	SET /A coord=!coord! + !size!
	)

SETLOCAL DISABLEDELAYEDEXPANSION
FOR /L %%A IN (101,1,140) DO (
		echo %%A
		REM convert q%%A.tif -resize 892x1! +repage -threshold 210 T%%A.tif
		convert q%%A.tif -resize 892x1! +repage -threshold 210 O%%A.txt
	
	)
goto:eof

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
SET wcount=0
SET bcount=0
IF EXIST marks.txt DEL marks.txt
FOR /L %%A IN (101,1,140) DO (
	FOR /F "skip=1 tokens=1,2,7 delims=,:() " %%B IN (O%%A.txt) DO (
		SET question=%%A
		SET pixel=%%B
		SET col=%%D
		IF "!col!"=="white" SET /A wcount=!wcount!+1
		IF "!col!"=="black" SET /A bcount=!bcount!+1
		IF !wcount! GTR 40 SET bcount=0 && SET wcount=0
		IF !bcount! GTR 8 (
			ECHO MARK DETECTED !pixel! 
			ECHO !question!,!pixel! >> marks.txt
			SET bcount=0
			ECHO !wcount!
		)
	)
)

REM	     A                  B                     C                      D                        E           
REM    0-155            155-340        340-535        535-725          725-892

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
if exist marks2.txt del marks2.txt

FOR /F "tokens=1,2 delims=," %%A IN (marks.txt) DO (
 	echo %%A %%B
	SET question=%%A
	SET /A pixel=%%B+1000
	IF !pixel! LSS 1155 (
		SET mark=A
	) ELSE (
		IF !pixel! LSS 1340 (
			SET mark=B
		) ELSE (
			IF !pixel! LSS 1535 (
				SET mark=C
			) ELSE (
				IF !pixel! LSS 1725 (
					SET mark=D
				) ELSE (
					SET mark=E
				)
			)
		)
	)
echo !question!,!mark! >> marks2.txt

)
	

REM	     A                  B                     C                      D                        E           
REM    0-155            155-340        340-535        535-725          725-892

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
if exist marks3.txt del marks3.txt
if exist marks5.txt del marks5.txt
FOR /L %%A IN (101,1,140) DO (
FOR %%D  IN (A B C D E ) DO (
	set count%%D=0
	)

	FOR /F "tokens=1,2 delims=," %%B IN (marks2.txt) DO (
		if %%A==%%B (
		rem echo "%%C"
			FOR %%D  IN (A B C D E ) DO (
				IF "%%C"=="%%D " (
				SET /A count%%D=count%%D + 1
			ECHO %%A,%%D,!count%%D! >> marks3.txt
				)
			)
		)
	)
)
sort /r marks3.txt > marks4.txt

FOR /L %%A IN (140,-1,101) DO (
	FOR %%B  IN (E D C B A ) DO (
		call :checkline %%A %%B
	)	
)		
sort marks5.txt > output.txt

call :cleanup

goto:eof

:checkline
FOR /F "tokens=1,2,3 delims=," %%C IN (marks4.txt) DO (
	IF %%C%%D==%1%2 (	
		ECHO [QUESTION: %%C ANSWER: %%D CONFIDENCE: %%E] >> marks5.txt
			goto :eof
	)
)
		
goto:eof
		
		
:cleanup
FOR /L %%A IN (101,1,140) DO (
		DEL O%%A.txt
		DEL q%%A.tif
		)
DEL marks.txt

FOR /L %%A IN (2,1,5) DO (
		DEL marks%%A.txt
		)
 goto:eof


Re: Processing tick boxes on forms. A method.

Posted: 2009-08-17T08:12:00-07:00
by HugoRune
Nice work! Thanks for sharing it!

I am having some trouble figuring out the batch file structure, but I think the labels are missing. Could you add them?
(:checkline, :processfile, ...) [edit: I think they are all there now, but I am still a bit confused about the program flow]
Is this all one big file?

Do you have some procedure for aligning the input images first?

Re: Processing tick boxes on forms. A method.

Posted: 2009-08-17T08:21:41-07:00
by jaffamuffin
HugoRune wrote:Nice work! Thanks for sharing it!

I am having some trouble figuring out the batch file structure, but I think the labels are missing. Could you add them?
(:checkline, :processfile, ...) [edit: looking closer, I found :checkline, but :pocessfile?]
Is this all one big file?

Do you have some procedure for aligning the input images first?
Sorry. it's a bit rough. I just deleted a few of the REMark lines, that was old code that isn't used any more. :checkline has to be called like that as a goto is used to breakout of the loop to ensure that only the highest confidence number is printed.. It's batch file weakness. The whole thing could probably be done in bash or php in about 20 lines or something..

Probably could condense it down to 1 file, but as it is, each code section is a batch file, just run each one in turn, and they generate the files required for the next one (e..g marks.txt etc) Each file does a different job.
You can run them all as one by using a 'master file' like this :

Code: Select all

@echo off
rem markread
rem %1 is image file dropped on, %~dp1 is drive and path to image file
echo %time%
cd /d %~dp1
CALL mark.bat %1
CALL tmp.bat
CALL tmp2.bat
CALL tmp3.bat
echo %time%
pause
for example

As for aligining the input images, I do it by using a very expensive quality scanner - the registration is within <10 px (300dpi A4), just becuase of the feed mechanism, but would love to know of a way (using imagemagick) to be able to register documents, (using the registration marks in the corners for example)

Re: Processing tick boxes on forms. A method.

Posted: 2009-08-17T18:03:02-07:00
by anthony
Great effort! And thank your for submitting.

Does your 'confidence' take into account a 'lack of answer' in other boxes?

that is for a specific question, box A was marked, but the other boxes was definitely not marked! Or C is dark, but B has a smudge that could reduce the confidence.

Watch out for those smudges, they could be deadly when dealing with averages.
Running it though a edge detector or dividing by a slightly blurred version of the image will help remove 'smudge' problems.


Also why in your output do you have 5 results for question 140 all with high confidence! Shouldn't that be a 'low confidence'