Page 1 of 1
Removing corners from scanned cards
Posted: 2016-12-25T08:43:50-07:00
by pkz
Hi! I'm looking at preprocessing scanned catalog cards before doing OCR. To reduce OCR noise I want to remove the top right and left black area (the rounded corners). They differ in size and sometimes additional dark areas appear from misaligned cards (se first image below top left and bottom right).
I would also like to remove the black circle in the lower center part. The roundness varies depending on card types. It would of course be possible to add a sufficiently large polygon for the corners but is there some other strategy I could use? I am looking for something like "fill dark areas from the outside".
Examples of cards:
(see more examples at
https://data.kb.se/datasets/2016/09/hs_nominalkatalog/)
Re: Removing corners from scanned cards
Posted: 2016-12-25T12:08:38-07:00
by fmw42
If you convert those areas to transparency, then user snibgo has a hole-filling script. See
http://im.snibgo.com/fillholespri.htm and
http://im.snibgo.com/fillholes.htm
Alternately, you can do a fuzzy floodfill at each region
Code: Select all
convert image -fuzz XX% -fill somecolor -draw "color x,y floodfill" resultimage
where XX% determines how much tolerance to use to fill the region located at x,y and somecolor is your desired background color (they tan color in your image). See
http://www.imagemagick.org/Usage/draw/#color
Another way is to make the image into a binary mask and use connected components to label each isolated region and then discard those regions, which will be the larger ones. The use the filtered mask to recolor those regions with your tan background color. See
http://magick.imagemagick.org/script/co ... onents.php
This is a very simple way, but leaves a small border around the regions. It simply gets the average color of your image. Then creates a mask by thresholding and uses the mask to recolor the black parts of the image. Unix syntax.
Code: Select all
color=`convert Nominal_20151207_103630_000098.jpg -scale 1x1 -format "%[pixel:u.p{0,0}]" info:`
convert Nominal_20151207_103630_000098.jpg \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -threshold 35% -negate \) \
-compose over -composite result.jpg
Please always provide your IM version and platform when asking questions, since syntax may vary.
Re: Removing corners from scanned cards
Posted: 2016-12-25T16:23:53-07:00
by sgbotsford
I'm not fully familiar with imagemagick yet. But when I was building pipes out of NetPBM, you could do this by adding white rectangles at the appropriate offsets. It's been nearly 20 years but it would be something like
pnmadd originalfile.pgm, whitefile.pgm, -top -left | pnmadd - whitefile.pgm -top -right | ....
Offsets were handled in reasonably flexible manners.
The math would do a pixel by pixel addition, then clip, so adding a white box made that part of the image white.
Re: Removing corners from scanned cards
Posted: 2016-12-25T20:26:05-07:00
by fmw42
You can overlay color boxes the same color as your background color at any point in the background image. So yes, you can do the same thing. But you have to know how big to make each box to cover each black region. That is where connected components comes in. It can tell you the bounding box of every isolated black area in your image or even make an overlay mask for each actual shaped region.
Re: Removing corners from scanned cards
Posted: 2016-12-26T03:32:56-07:00
by pkz
Thank you! The fuzzy fill works very well. If I add a black 10px border around the image first it will touch all black areas (corners, skew gaps etc) and the fill will work for many scenarios.
Code: Select all
convert mypic.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
mypic.clean.jpg
Re: Removing corners from scanned cards
Posted: 2016-12-26T07:55:49-07:00
by pkz
After the area is filled, is there a simple way to shave off e.g. 2px or so to clean the darker part of the remaining paper edge? I guess one option would be to trace the contour and try to add an inset border some way (preferably a few pixels wide with gradually diminishing transparency).
Re: Removing corners from scanned cards
Posted: 2016-12-26T11:34:07-07:00
by fmw42
try this. But I suggest if you are going to do more than one command on an image, do not save intermediate results as jpg, since it is lossy and constant colors do not remain constant.
Unix syntax.
Code: Select all
convert YimUDze.jpg \
\( -clone 0 -fill white -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:3 \) \
-compose over -composite result.jpg
Best to combine this operation with our first operation in one command line.
Re: Removing corners from scanned cards
Posted: 2016-12-26T11:44:19-07:00
by fmw42
So you can do this as one command.
Input:
Code: Select all
convert Nominal_20151207_103630_000098.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill white -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:4 \) \
-compose over -composite result1.jpg
Or even this to fill with nearly the same as your background color.
Code: Select all
color=`convert Nominal_20151207_103630_000098.jpg \
-fuzz 50% -transparent black -scale 1x1 -alpha off -format "%[pixel:u.p{0,0}]" info:`
convert Nominal_20151207_103630_000098.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:12 \) \
-compose over -composite result2.jpg
Re: Removing corners from scanned cards
Posted: 2016-12-26T14:44:44-07:00
by pkz
Thank you! Total cleanup script will look something like this. Hole will be floodfilled as well. This script will hopefully reduce a lot of garbage from Tesseract OCR.
Code: Select all
# coordinates for hole fill
hole_x=`convert $1 -format "%[fx:50*w/100]" info:`
hole_y=`convert $1 -format "%[fx:88*h/100]" info:`
# fill color
color=`convert $1 -fuzz 50% -transparent black -scale 1x1 -alpha off -format "%[pixel:u.p{0,0}]" info:`
convert $1 \
-bordercolor black -border 10x10 \
-fuzz 40% -fill "$color" -draw "color $hole_x,$hole_y floodfill" \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:12 \) \
-compose over -composite \
-blur 1x65535 \
-contrast -contrast \
-normalize \
-despeckle \
-sharpen 1 \
-posterize 2 \
-colorspace Gray \
"$1.clean.jpg"
Re: Removing corners from scanned cards
Posted: 2016-12-26T16:08:55-07:00
by fmw42
If you are going to do OCR, then try my script, textcleaner, at my link below. For example on my result2.jpg
Code: Select all
textcleaner -e normalize -f 20 -o 10 result2.jpg result3.jpg
Re: Removing corners from scanned cards
Posted: 2016-12-27T02:12:53-07:00
by pkz
Thank you again. Will definitiely look into that.