Removing corners from scanned cards
Removing corners from scanned cards
Hi! I'm looking at preprocessing scanned catalog cards before doing OCR. To reduce OCR noise I want to remove the top right and left black area (the rounded corners). They differ in size and sometimes additional dark areas appear from misaligned cards (se first image below top left and bottom right).
I would also like to remove the black circle in the lower center part. The roundness varies depending on card types. It would of course be possible to add a sufficiently large polygon for the corners but is there some other strategy I could use? I am looking for something like "fill dark areas from the outside".
Examples of cards:
(see more examples at https://data.kb.se/datasets/2016/09/hs_nominalkatalog/)
I would also like to remove the black circle in the lower center part. The roundness varies depending on card types. It would of course be possible to add a sufficiently large polygon for the corners but is there some other strategy I could use? I am looking for something like "fill dark areas from the outside".
Examples of cards:
(see more examples at https://data.kb.se/datasets/2016/09/hs_nominalkatalog/)
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing corners from scanned cards
If you convert those areas to transparency, then user snibgo has a hole-filling script. See http://im.snibgo.com/fillholespri.htm and http://im.snibgo.com/fillholes.htm
Alternately, you can do a fuzzy floodfill at each region
where XX% determines how much tolerance to use to fill the region located at x,y and somecolor is your desired background color (they tan color in your image). See http://www.imagemagick.org/Usage/draw/#color
Another way is to make the image into a binary mask and use connected components to label each isolated region and then discard those regions, which will be the larger ones. The use the filtered mask to recolor those regions with your tan background color. See http://magick.imagemagick.org/script/co ... onents.php
This is a very simple way, but leaves a small border around the regions. It simply gets the average color of your image. Then creates a mask by thresholding and uses the mask to recolor the black parts of the image. Unix syntax.
Please always provide your IM version and platform when asking questions, since syntax may vary.
Alternately, you can do a fuzzy floodfill at each region
Code: Select all
convert image -fuzz XX% -fill somecolor -draw "color x,y floodfill" resultimage
Another way is to make the image into a binary mask and use connected components to label each isolated region and then discard those regions, which will be the larger ones. The use the filtered mask to recolor those regions with your tan background color. See http://magick.imagemagick.org/script/co ... onents.php
This is a very simple way, but leaves a small border around the regions. It simply gets the average color of your image. Then creates a mask by thresholding and uses the mask to recolor the black parts of the image. Unix syntax.
Code: Select all
color=`convert Nominal_20151207_103630_000098.jpg -scale 1x1 -format "%[pixel:u.p{0,0}]" info:`
convert Nominal_20151207_103630_000098.jpg \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -threshold 35% -negate \) \
-compose over -composite result.jpg
Please always provide your IM version and platform when asking questions, since syntax may vary.
-
- Posts: 3
- Joined: 2016-12-16T10:58:22-07:00
- Authentication code: 1151
Re: Removing corners from scanned cards
I'm not fully familiar with imagemagick yet. But when I was building pipes out of NetPBM, you could do this by adding white rectangles at the appropriate offsets. It's been nearly 20 years but it would be something like
pnmadd originalfile.pgm, whitefile.pgm, -top -left | pnmadd - whitefile.pgm -top -right | ....
Offsets were handled in reasonably flexible manners.
The math would do a pixel by pixel addition, then clip, so adding a white box made that part of the image white.
pnmadd originalfile.pgm, whitefile.pgm, -top -left | pnmadd - whitefile.pgm -top -right | ....
Offsets were handled in reasonably flexible manners.
The math would do a pixel by pixel addition, then clip, so adding a white box made that part of the image white.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing corners from scanned cards
You can overlay color boxes the same color as your background color at any point in the background image. So yes, you can do the same thing. But you have to know how big to make each box to cover each black region. That is where connected components comes in. It can tell you the bounding box of every isolated black area in your image or even make an overlay mask for each actual shaped region.
Re: Removing corners from scanned cards
Thank you! The fuzzy fill works very well. If I add a black 10px border around the image first it will touch all black areas (corners, skew gaps etc) and the fill will work for many scenarios.
Code: Select all
convert mypic.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
mypic.clean.jpg
Re: Removing corners from scanned cards
After the area is filled, is there a simple way to shave off e.g. 2px or so to clean the darker part of the remaining paper edge? I guess one option would be to trace the contour and try to add an inset border some way (preferably a few pixels wide with gradually diminishing transparency).
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing corners from scanned cards
try this. But I suggest if you are going to do more than one command on an image, do not save intermediate results as jpg, since it is lossy and constant colors do not remain constant.
Unix syntax.
Best to combine this operation with our first operation in one command line.
Unix syntax.
Code: Select all
convert YimUDze.jpg \
\( -clone 0 -fill white -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:3 \) \
-compose over -composite result.jpg
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing corners from scanned cards
So you can do this as one command.
Input:
Or even this to fill with nearly the same as your background color.
Input:
Code: Select all
convert Nominal_20151207_103630_000098.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill white -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:4 \) \
-compose over -composite result1.jpg
Or even this to fill with nearly the same as your background color.
Code: Select all
color=`convert Nominal_20151207_103630_000098.jpg \
-fuzz 50% -transparent black -scale 1x1 -alpha off -format "%[pixel:u.p{0,0}]" info:`
convert Nominal_20151207_103630_000098.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:12 \) \
-compose over -composite result2.jpg
Re: Removing corners from scanned cards
Thank you! Total cleanup script will look something like this. Hole will be floodfilled as well. This script will hopefully reduce a lot of garbage from Tesseract OCR.
Code: Select all
# coordinates for hole fill
hole_x=`convert $1 -format "%[fx:50*w/100]" info:`
hole_y=`convert $1 -format "%[fx:88*h/100]" info:`
# fill color
color=`convert $1 -fuzz 50% -transparent black -scale 1x1 -alpha off -format "%[pixel:u.p{0,0}]" info:`
convert $1 \
-bordercolor black -border 10x10 \
-fuzz 40% -fill "$color" -draw "color $hole_x,$hole_y floodfill" \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:12 \) \
-compose over -composite \
-blur 1x65535 \
-contrast -contrast \
-normalize \
-despeckle \
-sharpen 1 \
-posterize 2 \
-colorspace Gray \
"$1.clean.jpg"
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Removing corners from scanned cards
If you are going to do OCR, then try my script, textcleaner, at my link below. For example on my result2.jpg
Code: Select all
textcleaner -e normalize -f 20 -o 10 result2.jpg result3.jpg
Re: Removing corners from scanned cards
Thank you again. Will definitiely look into that.