Separating characters in a font file
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Separating characters in a font file
I'm using ImageMagick 7.0.7-35 Q16 x64 2018-05-21 on win7pro.
I'm looking to separate the characters in a 'bitmap-font-file'. Suppose I had a file like this: http://users.telenet.be/guy.lateur/IM/t ... ropped.png
I want to calculate the bounding rectangles around each character. Let's assume that every adjacent pair of letters has at least 1 vertical line between them that contains only the background (BG) color (black in this example). Here would be the pseudo code:
horizontal (main) scan:
. start at horizontal position (HP) 0, ie, the left of the image
. increase HP by 1 until you reach a vertical line that contains more than just the BG color
. mark this HP as the start/left of the current char
. increase HP by 1 until you reach a vertical line that contains only the BG color
. mark (this HP - 1) as the end/right of the current char
this can easily be extended to cover the entire horizontal range, ie all characters
for each char found, do a vertical scan:
. start at vertical position (VP) 0, ie, the top of the image
. increase the VP by 1 (ie, go down) until you reach a line that contains more than just the BG color
. mark this VP as the top of the character
. do the same for the bottom VP of the character, going up from the bottom VP of the image
Can I do this using IM? If so, can somebody please point me to a couple of command line options I could look into?
As a bonus question: would it be possible to generate an output image containing the lines depicting these bounding rectangles? That way I could easily check the algorithm works, ie, by overlaying in onto the original in my favourite gimp application..
Total noob, here, sorry.
TIA for any pointers!
I'm looking to separate the characters in a 'bitmap-font-file'. Suppose I had a file like this: http://users.telenet.be/guy.lateur/IM/t ... ropped.png
I want to calculate the bounding rectangles around each character. Let's assume that every adjacent pair of letters has at least 1 vertical line between them that contains only the background (BG) color (black in this example). Here would be the pseudo code:
horizontal (main) scan:
. start at horizontal position (HP) 0, ie, the left of the image
. increase HP by 1 until you reach a vertical line that contains more than just the BG color
. mark this HP as the start/left of the current char
. increase HP by 1 until you reach a vertical line that contains only the BG color
. mark (this HP - 1) as the end/right of the current char
this can easily be extended to cover the entire horizontal range, ie all characters
for each char found, do a vertical scan:
. start at vertical position (VP) 0, ie, the top of the image
. increase the VP by 1 (ie, go down) until you reach a line that contains more than just the BG color
. mark this VP as the top of the character
. do the same for the bottom VP of the character, going up from the bottom VP of the image
Can I do this using IM? If so, can somebody please point me to a couple of command line options I could look into?
As a bonus question: would it be possible to generate an output image containing the lines depicting these bounding rectangles? That way I could easily check the algorithm works, ie, by overlaying in onto the original in my favourite gimp application..
Total noob, here, sorry.
TIA for any pointers!
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Separating characters in a font file
If you just want the bounding boxes, I would do it like this:
The first bounding box is the background. All the following are the individual letters.
If you want to go further, eg to chop the image up into pieces, I have Windows BAT scripts for that.
Code: Select all
magick ^
testFonts-002-cropped.png ^
-fill White +opaque Black ^
-fill Red -draw "color 0,0 floodfill" ^
-fill White +opaque Red ^
-define connected-components:verbose=true ^
-connected-components 4 ^
x.png
Objects (id: bounding-box centroid area mean-color):
0: 918x220+0+0 484.3,95.1 107409 srgba(255,0,0,1.61014)
2: 141x200+458+9 537.4,125.7 20750 srgba(255,255,255,4.15831)
1: 141x200+158+9 218.1,125.4 20555 srgba(255,255,255,4.18828)
6: 139x142+7+66 77.2,141.5 15527 srgba(255,255,255,5.22071)
5: 141x144+608+65 674.2,132.7 14507 srgba(255,255,255,5.51747)
3: 143x199+765+9 823.3,89.7 11861 srgba(255,255,255,6.52525)
4: 144x143+308+65 370.3,134.5 11351 srgba(255,255,255,6.7735)
If you want to go further, eg to chop the image up into pieces, I have Windows BAT scripts for that.
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Separating characters in a font file
You can use -connected components to get the bounding boxes of each character as follows:
You can also draw the bounding boxes on your original image if you want. But that is OS dependent. In Unix, I would do:
Code: Select all
magick characters.png -auto-level -fill white +opaque black -type bilevel \
-define connected-components:verbose=true \
-define connected-components:area-threshold=5000 \
-define connected-components:mean-color=true \
-connected-components 4 \
ccl_binary.png
Code: Select all
Objects (id: bounding-box centroid area mean-color):
0: 918x220+0+0 484.3,95.1 107409 gray(0)
2: 141x200+458+9 537.4,125.7 20750 gray(255)
1: 141x200+158+9 218.1,125.4 20555 gray(255)
6: 139x142+7+66 77.2,141.5 15527 gray(255)
5: 141x144+608+65 674.2,132.7 14507 gray(255)
3: 143x199+765+9 823.3,89.7 11861 gray(255)
4: 144x143+308+65 370.3,134.5 11351 gray(255)
You can also draw the bounding boxes on your original image if you want. But that is OS dependent. In Unix, I would do:
Code: Select all
OLDIFS=IFS
IFS=$'\n'
drawcmds=""
arr=(`magick characters.png -auto-level -fill white +opaque black -type bilevel \
-define connected-components:verbose=true \
-define connected-components:area-threshold=5000 \
-define connected-components:mean-color=true \
-connected-components 4 null: | tail -n +2 | sed 's/^[ ]*//'` )
IFS=OLDIFS
num=${#arr[*]}
echo "$num"
for ((i=0; i<num; i++)); do
color=`echo "${arr[$i]}" | cut -d\ -f5`
bbox=`echo "${arr[$i]}" | cut -d\ -f2`
if [ "$color" = "gray(255)" ]; then
size=`echo "$bbox" | cut -d+ -f1`
ww=`echo "$size" | cut -dx -f1`
hh=`echo "$size" | cut -dx -f2`
offx=`echo "$bbox" | cut -d+ -f2`
offy=`echo "$bbox" | cut -d+ -f3`
x1=$offx
y1=$offy
x2=$(($x1+ww-1))
y2=$(($y1+hh-1))
drawcmds="$drawcmds rectangle $x1,$y1 $x2,$y2"
fi
done
magick characters.png -fill none -stroke red -draw "$drawcmds" -alpha off characters_bbox.png
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Re: Separating characters in a font file
Thank you both for your suggestions; that's already a very big step in the right direction!
Unfortunately, I don't think the 'connected-components' approach is going to take us all the way there. Consider this source font file: http://users.telenet.be/guy.lateur/IM/t ... ropped.png
If I run the snibgo's script on it, I get a separate bounding box for the dot on the i, which is undesirable. So hopefully all common characters are always connected horizontally (are they really, though?), but definitely not vertically. So we'll need to go a bit more fine grained, vertically, I think.
Btw, I'd like to end up with the result that fmw42's script produces (red line bounding boxes on top of original), but I'm on windows, so I'll need to 'translate' this first. Give me a couple of minutes to do that, please.. I'm using python scripts, btw.
Unfortunately, I don't think the 'connected-components' approach is going to take us all the way there. Consider this source font file: http://users.telenet.be/guy.lateur/IM/t ... ropped.png
If I run the snibgo's script on it, I get a separate bounding box for the dot on the i, which is undesirable. So hopefully all common characters are always connected horizontally (are they really, though?), but definitely not vertically. So we'll need to go a bit more fine grained, vertically, I think.
Btw, I'd like to end up with the result that fmw42's script produces (red line bounding boxes on top of original), but I'm on windows, so I'll need to 'translate' this first. Give me a couple of minutes to do that, please.. I'm using python scripts, btw.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Separating characters in a font file
Ah, yes, I remembered about dotted ij etc but then forgot. Just blur vertically, eg with "-morphology Convolve Blur:0x10,90", before changing the colours. But that messes with the top and bottom of the bounding rectangle. Hmm... thinking ...
snibgo's IM pages: im.snibgo.com
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Re: Separating characters in a font file
But surely that will give bounding boxes that are too high, won't it?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Separating characters in a font file
You could create an image of each character and use -debug annotate to find the glyph characteristics. See http://www.imagemagick.org/Usage/text/#font_info.
Alternately, you could do the blur and then reduces the bounding box by the blur distance.
Alternately, you could do the blur and then reduces the bounding box by the blur distance.
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Re: Separating characters in a font file
Thanks, I'll look into the font info thing. I'm starting from a .ttf file for now, indeed, so that should definitely come in handy. Heck, I even installed some apps (eg, ttfedit) to extract that very info from the .ttf file. I didn't get far with that approach, though.fmw42 wrote: ↑2018-05-30T14:32:58-07:00 You could create an image of each character and use -debug annotate to find the glyph characteristics. See http://www.imagemagick.org/Usage/text/#font_info.
Alternately, you could do the blur and then reduces the bounding box by the blur distance.
I'm still not convinced about the blurring approach, btw. What about a colon (:) character? You may have to blur more than you have room on top/bottom of you character, which would prevent you from 'calculating back'.
Maybe I should just leave it as it is, and do a postprocessing step that checks all bounding boxes, and merges them if they overlap horizontally. That shouldn't be too hard, should it?
Anyway, thanks again for the input. I'll leave it here for today. I'll let you know how this goes.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Separating characters in a font file
You can certainly do it more manually. From the white and black image, scale to a single row. PIxels in this Nx1 image that are black are between the characters, so that gives you the left/right boundaries. Then crop the white/black image to those boundaries, and trim gives you the top/bottom boundaries.
snibgo's IM pages: im.snibgo.com
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Re: Separating characters in a font file
Yes, a (vertical) trim sounds about right! I'll try to figure out how to do that.. (;-)snibgo wrote: ↑2018-05-30T15:14:59-07:00 You can certainly do it more manually. From the white and black image, scale to a single row. PIxels in this Nx1 image that are black are between the characters, so that gives you the left/right boundaries. Then crop the white/black image to those boundaries, and trim gives you the top/bottom boundaries.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Separating characters in a font file
My page Subimage rectangles shows how your input image can be guillotined in vertical slices, with each cut midway between the characters. The script does a "+repage" on each image. You could knock that out, and replace it with "-trim". Then identify on all the images gives you the bounding rectangles.
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Separating characters in a font file
You should be able to average the black and white image to one row using -scale and threshold at 0, then add a 1 pixel border of black. Then use -connected components to find the width of each white region. Then crop the full black and white image a full height and for each width and trim each sub image to get the bounding box dimensions.
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Re: Separating characters in a font file
Ok, here's the result I'm getting:
Here's the python script that generates this result:
I've used snibgo's advice to get the bounding boxes and fmw42's example to draw them. I'm using a custom BBox class for cleaning up of the bounding boxes -- eg, make sure the 'i' character only produces 1 BBox. Any BBoxes that overlap horizontally are merged into one BBox. The fact that f & g are united within 1 single BBox is because there is not at least 1 vertical line of space/background colour between them. I'll need to sort that out at the input side, because this does not satisfy the input requirements I'd put up initially.
Here's the BBox.py file: http://users.telenet.be/guy.lateur/IM/BBox.py
There's still some things to discover and some work to be done, obviously. I'm quite happy with what we've achieved so far, though, so thanks again for the help! If I make any other interesting progress, I'll be sure to post about it here. (:-)
Here's the python script that generates this result:
Code: Select all
import subprocess
import BBox # custom Bounding Box class BBox & utilities
filename = "\"" + r"F:\Amiga\_bmp\Fonts\IN3.png" + "\""
fileoutbboxes = "\"" + r"F:\Amiga\_bmp\Fonts\OUT3.png" + "\""
# get bounding boxes as illustrated by snibgo
cmd = "magick"
cmd += " " + filename
cmd += " " + "-fill White +opaque Black"
cmd += " " + "-fill Red -draw \"color 0,0 floodfill\""
cmd += " " + "-fill White +opaque Red"
cmd += " " + "-define connected-components:verbose=true"
cmd += " " + "-connected-components 4"
cmd += " " + ":null" # ?? output image is black ??
strOut = subprocess.check_output(cmd, shell=True).decode("utf-8").strip()
# generate & clean up bounding boxes -- see BBox.py
BBoxes = BBox.listFromIM(strOut) # im-connected-components to BBox list
BBox.sortX(BBoxes) # order list in BBox.x
BBoxes = BBox.uniteListX(BBoxes) # unite BBoxes that overlap along X
BBox.reprList(BBoxes) # print BBoxes
# draw bounding boxes as illustrated by fmw42
drawcmds = BBox.imDrawCommandsFromList(BBoxes) # IM rect draw commands
drawcmd = "magick"
drawcmd += " " + filename
drawcmd += " " + "-fill none -stroke red -draw " + drawcmds
drawcmd += " " + "-alpha off"
drawcmd += " " + fileoutbboxes
subprocess.check_output(drawcmd, shell=True)
Here's the BBox.py file: http://users.telenet.be/guy.lateur/IM/BBox.py
There's still some things to discover and some work to be done, obviously. I'm quite happy with what we've achieved so far, though, so thanks again for the help! If I make any other interesting progress, I'll be sure to post about it here. (:-)
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Separating characters in a font file
If I use my method of scaling the binary image to 1 row and then do connected components, I get the same as you above, since the "f" and "g" overlap.
Input:
But if I now threshold the binary 1 row image at 7%, I can separate the "f" and "g" at the expense of the red boxes being slightly too small.
Input:
Code: Select all
infile="characters2.png"
inname=`convert "$infile" -format "%t" info:`
ht=`convert "$infile" -format "%h" info:`
OLDIFS=IFS
IFS=$'\n'
drawcmds=""
arr=(`convert characters.png -auto-level -fill white +opaque black -scale x1! -type bilevel \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-connected-components 4 null: | tail -n +2 | sed 's/^[ ]*//'` )
IFS=OLDIFS
num=${#arr[*]}
for ((i=0; i<num; i++)); do
color=`echo "${arr[$i]}" | cut -d\ -f5`
if [ "$color" = "gray(255)" ]; then
bbox1=`echo "${arr[$i]}" | cut -d\ -f2`
size=`echo "$bbox1" | cut -d+ -f1`
ww1=`echo "$size" | cut -dx -f1`
offx1=`echo "$bbox1" | cut -d+ -f2`
newbbox=`convert "${infile}" +repage -crop ${ww1}x${ht}+${offx1}+0 +repage -format "%@" info:`
size2=`echo "$newbbox" | cut -d+ -f1`
ww2=`echo "$size2" | cut -dx -f1`
hh2=`echo "$size2" | cut -dx -f2`
offx2=`echo "$newbbox" | cut -d+ -f2`
offy2=`echo "$newbbox" | cut -d+ -f3`
x1=$((offx1+offx2))
y1=$offy2
x2=$(($x1+ww2-1))
y2=$(($y1+hh2-1))
drawcmds="$drawcmds rectangle $x1,$y1 $x2,$y2"
fi
done
convert "$infile" -fill none -stroke red -draw "$drawcmds" -alpha off "${inname}_bbox.png"
But if I now threshold the binary 1 row image at 7%, I can separate the "f" and "g" at the expense of the red boxes being slightly too small.
Code: Select all
infile="characters.png"
inname=`convert "$infile" -format "%t" info:`
ht=`convert "$infile" -format "%h" info:`
OLDIFS=IFS
IFS=$'\n'
drawcmds=""
arr=(`convert characters.png -auto-level -fill white +opaque black -scale x1! -threshold 7% -type bilevel \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-connected-components 4 null: | tail -n +2 | sed 's/^[ ]*//'` )
IFS=OLDIFS
num=${#arr[*]}
for ((i=0; i<num; i++)); do
color=`echo "${arr[$i]}" | cut -d\ -f5`
if [ "$color" = "gray(255)" ]; then
bbox1=`echo "${arr[$i]}" | cut -d\ -f2`
size=`echo "$bbox1" | cut -d+ -f1`
ww1=`echo "$size" | cut -dx -f1`
offx1=`echo "$bbox1" | cut -d+ -f2`
newbbox=`convert "${infile}" +repage -crop ${ww1}x${ht}+${offx1}+0 +repage -format "%@" info:`
size2=`echo "$newbbox" | cut -d+ -f1`
ww2=`echo "$size2" | cut -dx -f1`
hh2=`echo "$size2" | cut -dx -f2`
offx2=`echo "$newbbox" | cut -d+ -f2`
offy2=`echo "$newbbox" | cut -d+ -f3`
x1=$((offx1+offx2))
y1=$offy2
x2=$(($x1+ww2-1))
y2=$(($y1+hh2-1))
drawcmds="$drawcmds rectangle $x1,$y1 $x2,$y2"
fi
done
convert "$infile" -fill none -stroke red -draw "$drawcmds" -alpha off "${inname}_bbox_t7.png"
- guy lateur
- Posts: 12
- Joined: 2018-05-28T14:07:47-07:00
- Authentication code: 1152
Re: Separating characters in a font file
Thanks, interesting idea. Even if it doesn't really solve our problem, I should definitely check out that route!