I am running a website and asking people to fill out and scan a grid and send it back to us. We need to be able to automatically process the grid - trim out all whitespace around the black bordered grid, and then cut it up into squares. I've got the cutting up part working fine, and up until we rolled it out the initial unrotate/trim was working. Now, however, we're starting to get in all sorts of different quality scans and our system is not dealing well with this variation. Is there a bulletproof way of taking the scanned image, unrotating it and trimming out the white border? There could be all sorts of noise in the whitespace, but the border around the grid is a thin, continuous black line.
Thanks for the help!
Dave
Processing a scanned grid
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Processing a scanned grid
can you post some links to example files that you are having trouble with?
Re: Processing a scanned grid
Hi Fred,
It's actually the same grid I had asked you about a few months ago: http://stepaheadtech.com/grid.png
I received an answer on StackOverflow about this question a few months ago as well: http://stackoverflow.com/questions/1007 ... dual-files
Do you think that's the right solution? It makes a lot of sense to me. Is there a way to do this without creating intermediary files?
Thanks!
Dave
It's actually the same grid I had asked you about a few months ago: http://stepaheadtech.com/grid.png
I received an answer on StackOverflow about this question a few months ago as well: http://stackoverflow.com/questions/1007 ... dual-files
Do you think that's the right solution? It makes a lot of sense to me. Is there a way to do this without creating intermediary files?
Thanks!
Dave
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Processing a scanned grid
can you provide a real scanned example with the grid filled or at least one grid filled where you have trouble? I am still not sure what you want to retrieve? Is it just the information in the boxes? If so can the information be filled with some other color that does not include white or black?
Re: Processing a scanned grid
Sure, here's an example: http://stepaheadtech.com/grid-scan.jpg
Up until now, I would take this kind of scanned image, and start by using your textcleaner script to clean it up and unrotate it. Then I used blur to autotrim, and cut out the white border. Next I'd resize it so it's always the same width height. Then I chopped it up by cropping out the rows, and then the cells. The trouble I'm hitting is that the blur/autotrim step is inconsistent - sometimes it cuts out the white border but sometimes it doesn't. When it doesn't, every subsequent step is off, and the result is garbage. This needs to be automated so I need to figure out how to cut out the whitespace around the grid with consistency, then I think the rest is easy.
Thanks,
Dave
Up until now, I would take this kind of scanned image, and start by using your textcleaner script to clean it up and unrotate it. Then I used blur to autotrim, and cut out the white border. Next I'd resize it so it's always the same width height. Then I chopped it up by cropping out the rows, and then the cells. The trouble I'm hitting is that the blur/autotrim step is inconsistent - sometimes it cuts out the white border but sometimes it doesn't. When it doesn't, every subsequent step is off, and the result is garbage. This needs to be automated so I need to figure out how to cut out the whitespace around the grid with consistency, then I think the rest is easy.
Thanks,
Dave
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Processing a scanned grid
Replace the blur with -morphology close to remove the small black noise specks that are especially outside the box. That way the -fuzz -trim should work better to trim after unrotating. In otherwords, use -morphology close to remove small specks before trying noisecleaner. see http://www.imagemagick.org/Usage/morphology/#close Choose a shape and size that is larger than your speck noise but smaller than the thin lines or any text thickness. Alternately use a small shape and iterate it a few times.
e.g.
-morphology close diamond:1
or
-morphology close diamond:1 -morphology close diamond:1
or
-morphology close diamond:2
e.g.
-morphology close diamond:1
or
-morphology close diamond:1 -morphology close diamond:1
or
-morphology close diamond:2
Re: Processing a scanned grid
Unfortunately this does not seem to work consistently with different scan qualities. I'm stuck right now, I don't know how to put together something that can consistently deal with varying levels of scanning quality.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Processing a scanned grid
dlauer wrote:Unfortunately this does not seem to work consistently with different scan qualities. I'm stuck right now, I don't know how to put together something that can consistently deal with varying levels of scanning quality.
Post two extremes examples so we can see what the issue might be?
Another approach:
Apply -deskew 40% to your image to unrotate it. Then use -scale to one row and then again to one column and threshold to be sure you make it binary. Then convert to txt format on each and search to find the first and last black pixel in each row or column. That will tell you the bounding box to crop your rotated image to eliminate the outer (mostly) white region.