Filling gaps in vertical edges, removing notches, etc.
Posted: 2013-07-10T10:13:07-07:00
I'm trying to "neaten" vertical and horizontal edges for bitonal scanned text. I'm sort of playing around, not sure if it will amount to anything. Sometimes we have to do some heavy processing of bad scans that "damages" the text and maybe I can use what I learn for that. Anyway, say I take this fragment:
And I want to fill in the 1-pixel gaps and remove 1-pixel notches in the vertical and horizontal edges outlined in red. Is there a simple way to do that? I've tried various morphology techniques:
Which fills in a bit of the gap in the "i":
Adding another thinning step fills the gap in a bit more:
Adding a final Thinning completely fills that particular gap:
Piling on more and more steps will fill in some of the other gaps and remove some of the protruding notches (e.g., inside of the second "n") but obviously to create morphologies for all the possible situations is impossible. So I'm looking for more understanding of morphology to reduce the number of steps or better yet, a completely different approach. I want to leave anything greater than 1-pixel deviations alone and make sure the edges are a minimum length to avoid changing things that shouldn't be changed, e.g., the upper curve on the "f".
Some of my other morphology attempts did undesirable things, like connecting separate parts such as the two bottom serifs on the "n".
The source document is here:
http://ucblibrary4.berkeley.edu/~apollo ... source.tif
And I want to fill in the 1-pixel gaps and remove 1-pixel notches in the vertical and horizontal edges outlined in red. Is there a simple way to do that? I've tried various morphology techniques:
Code: Select all
convert source.tif -verbose -morphology Thinning "3x7:0,0,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,0,1" out.tif
Adding another thinning step fills the gap in a bit more:
Code: Select all
convert source.tif -verbose -morphology Thinning "3x7:0,0,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,0,1" -morphology Thinning "3x4:0,0,1, 0,1,1, 0,1,1, 0,0,1" out.tif
Adding a final Thinning completely fills that particular gap:
Code: Select all
convert source.tif -verbose -morphology Thinning "3x7:0,0,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,1,1, 0,0,1" -morphology Thinning "3x4:0,0,1, 0,1,1, 0,1,1, 0,0,1" -morphology Thinning "3x3:0,0,1, 0,1,1, 0,0,1" out.tif
Piling on more and more steps will fill in some of the other gaps and remove some of the protruding notches (e.g., inside of the second "n") but obviously to create morphologies for all the possible situations is impossible. So I'm looking for more understanding of morphology to reduce the number of steps or better yet, a completely different approach. I want to leave anything greater than 1-pixel deviations alone and make sure the edges are a minimum length to avoid changing things that shouldn't be changed, e.g., the upper curve on the "f".
Some of my other morphology attempts did undesirable things, like connecting separate parts such as the two bottom serifs on the "n".
The source document is here:
http://ucblibrary4.berkeley.edu/~apollo ... source.tif