Page 1 of 1

Filling gaps in vertical edges, removing notches, etc.

Posted: 2013-07-10T10:13:07-07:00
by aporthog
I'm trying to "neaten" vertical and horizontal edges for bitonal scanned text. I'm sort of playing around, not sure if it will amount to anything. Sometimes we have to do some heavy processing of bad scans that "damages" the text and maybe I can use what I learn for that. Anyway, say I take this fragment:

Image

And I want to fill in the 1-pixel gaps and remove 1-pixel notches in the vertical and horizontal edges outlined in red. Is there a simple way to do that? I've tried various morphology techniques:

Code: Select all

convert source.tif -verbose -morphology Thinning "3x7:0,0,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,0,1" out.tif
Which fills in a bit of the gap in the "i":

Image

Adding another thinning step fills the gap in a bit more:

Code: Select all

convert source.tif -verbose -morphology Thinning "3x7:0,0,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,0,1" -morphology Thinning "3x4:0,0,1, 0,1,1, 0,1,1, 0,0,1" out.tif
Image

Adding a final Thinning completely fills that particular gap:

Code: Select all

convert source.tif -verbose -morphology Thinning "3x7:0,0,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,0,1" -morphology Thinning "3x4:0,0,1, 0,1,1, 0,1,1, 0,0,1" -morphology Thinning "3x3:0,0,1, 0,1,1, 0,0,1" out.tif
Image

Piling on more and more steps will fill in some of the other gaps and remove some of the protruding notches (e.g., inside of the second "n") but obviously to create morphologies for all the possible situations is impossible. So I'm looking for more understanding of morphology to reduce the number of steps or better yet, a completely different approach. I want to leave anything greater than 1-pixel deviations alone and make sure the edges are a minimum length to avoid changing things that shouldn't be changed, e.g., the upper curve on the "f".

Some of my other morphology attempts did undesirable things, like connecting separate parts such as the two bottom serifs on the "n".

The source document is here:

http://ucblibrary4.berkeley.edu/~apollo ... source.tif

Re: Filling gaps in vertical edges, removing notches, etc.

Posted: 2013-07-25T18:49:24-07:00
by anthony
The problem I see is that you really need to include more 'do not care points to your morphology matrix.

You really do not care about the size of the notch, only that it is actually a notch.
So instead of a kernel you used...
0,0,1,
0,1,1,
0,1,1,
0,1,1,
0,1,1,
0,1,1,
0,0,1

What you really want is a kernel that will match more pixels within the notch

0,0,1,
0,0,1,
0,-,1,
0,-,1,
0,1,1,
0,-,1,
0,-,1,
0,0,1,
0,0,1,

Now flip this kernel horizontally to match the notch in the 'f' and 'u'

And a similar kernel for 'thicken' will had the 'bumps' in the 'n' and 'u'

The small bump in the final 'n' may need a much smaller kernel to handle due to proximity to lower 'serif'.