Separating characters in a font file

guy lateur · Post by **guy lateur** » 2018-05-30T12:39:47-07:00

I'm using ImageMagick 7.0.7-35 Q16 x64 2018-05-21 on win7pro.

I'm looking to separate the characters in a 'bitmap-font-file'. Suppose I had a file like this: http://users.telenet.be/guy.lateur/IM/t ... ropped.png

I want to calculate the bounding rectangles around each character. Let's assume that every adjacent pair of letters has at least 1 vertical line between them that contains only the background (BG) color (black in this example). Here would be the pseudo code:

horizontal (main) scan:
. start at horizontal position (HP) 0, ie, the left of the image
. increase HP by 1 until you reach a vertical line that contains more than just the BG color
. mark this HP as the start/left of the current char
. increase HP by 1 until you reach a vertical line that contains only the BG color
. mark (this HP - 1) as the end/right of the current char
this can easily be extended to cover the entire horizontal range, ie all characters

for each char found, do a vertical scan:
. start at vertical position (VP) 0, ie, the top of the image
. increase the VP by 1 (ie, go down) until you reach a line that contains more than just the BG color
. mark this VP as the top of the character
. do the same for the bottom VP of the character, going up from the bottom VP of the image

Can I do this using IM? If so, can somebody please point me to a couple of command line options I could look into?

As a bonus question: would it be possible to generate an output image containing the lines depicting these bounding rectangles? That way I could easily check the algorithm works, ie, by overlaying in onto the original in my favourite gimp application..

Total noob, here, sorry.

TIA for any pointers!

Post by **snibgo** » 2018-05-30T12:59:57-07:00

If you just want the bounding boxes, I would do it like this:

Code: Select all

magick ^
  testFonts-002-cropped.png ^
  -fill White +opaque Black ^
  -fill Red -draw "color 0,0 floodfill" ^
  -fill White +opaque Red ^
  -define connected-components:verbose=true ^
  -connected-components 4 ^
  x.png

Objects (id: bounding-box centroid area mean-color):
  0: 918x220+0+0 484.3,95.1 107409 srgba(255,0,0,1.61014)
  2: 141x200+458+9 537.4,125.7 20750 srgba(255,255,255,4.15831)
  1: 141x200+158+9 218.1,125.4 20555 srgba(255,255,255,4.18828)
  6: 139x142+7+66 77.2,141.5 15527 srgba(255,255,255,5.22071)
  5: 141x144+608+65 674.2,132.7 14507 srgba(255,255,255,5.51747)
  3: 143x199+765+9 823.3,89.7 11861 srgba(255,255,255,6.52525)
  4: 144x143+308+65 370.3,134.5 11351 srgba(255,255,255,6.7735)

The first bounding box is the background. All the following are the individual letters.

If you want to go further, eg to chop the image up into pieces, I have Windows BAT scripts for that.

Post by **fmw42** » 2018-05-30T13:07:33-07:00

You can use -connected components to get the bounding boxes of each character as follows:

Code: Select all

magick characters.png -auto-level -fill white +opaque black -type bilevel \
-define connected-components:verbose=true \
-define connected-components:area-threshold=5000 \
-define connected-components:mean-color=true \
-connected-components 4 \
ccl_binary.png

Code: Select all

Objects (id: bounding-box centroid area mean-color):
  0: 918x220+0+0 484.3,95.1 107409 gray(0)
  2: 141x200+458+9 537.4,125.7 20750 gray(255)
  1: 141x200+158+9 218.1,125.4 20555 gray(255)
  6: 139x142+7+66 77.2,141.5 15527 gray(255)
  5: 141x144+608+65 674.2,132.7 14507 gray(255)
  3: 143x199+765+9 823.3,89.7 11861 gray(255)
  4: 144x143+308+65 370.3,134.5 11351 gray(255)

You can also draw the bounding boxes on your original image if you want. But that is OS dependent. In Unix, I would do:

Code: Select all

OLDIFS=IFS
IFS=$'\n'
drawcmds=""
arr=(`magick characters.png -auto-level -fill white +opaque black -type bilevel \
-define connected-components:verbose=true \
-define connected-components:area-threshold=5000 \
-define connected-components:mean-color=true \
-connected-components 4 null: | tail -n +2 | sed 's/^[ ]*//'` )
IFS=OLDIFS
num=${#arr[*]}
echo "$num"
for ((i=0; i<num; i++)); do
color=`echo "${arr[$i]}" | cut -d\  -f5`
bbox=`echo "${arr[$i]}" | cut -d\  -f2`
if [ "$color" = "gray(255)" ]; then
size=`echo "$bbox" | cut -d+ -f1`
ww=`echo "$size" | cut -dx -f1`
hh=`echo "$size" | cut -dx -f2`
offx=`echo "$bbox" | cut -d+ -f2`
offy=`echo "$bbox" | cut -d+ -f3`
x1=$offx
y1=$offy
x2=$(($x1+ww-1))
y2=$(($y1+hh-1))
drawcmds="$drawcmds rectangle $x1,$y1 $x2,$y2"
fi
done
magick characters.png -fill none -stroke red -draw "$drawcmds" -alpha off characters_bbox.png

guy lateur · Post by **guy lateur** » 2018-05-30T14:03:52-07:00

Thank you both for your suggestions; that's already a very big step in the right direction!

Unfortunately, I don't think the 'connected-components' approach is going to take us all the way there. Consider this source font file: http://users.telenet.be/guy.lateur/IM/t ... ropped.png

If I run the snibgo's script on it, I get a separate bounding box for the dot on the i, which is undesirable. So hopefully all common characters are always connected horizontally (are they really, though?), but definitely not vertically. So we'll need to go a bit more fine grained, vertically, I think.

Btw, I'd like to end up with the result that fmw42's script produces (red line bounding boxes on top of original), but I'm on windows, so I'll need to 'translate' this first. Give me a couple of minutes to do that, please..

I'm using python scripts, btw.

Post by **snibgo** » 2018-05-30T14:16:50-07:00

Ah, yes, I remembered about dotted ij etc but then forgot. Just blur vertically, eg with "-morphology Convolve Blur:0x10,90", before changing the colours. But that messes with the top and bottom of the bounding rectangle. Hmm... thinking ...

guy lateur · Post by **guy lateur** » 2018-05-30T14:19:48-07:00

snibgo wrote: ↑2018-05-30T14:16:50-07:00 Ah, yes, I remembered about dotted ij etc but then forgot. Just blur vertically, eg with "-morphology Convolve Blur:0x10,90", before changing the colours. But that messes with the top and bottom of the bounding rectangle. Hmm... thinking ...

But surely that will give bounding boxes that are too high, won't it?

Post by **fmw42** » 2018-05-30T14:32:58-07:00

You could create an image of each character and use -debug annotate to find the glyph characteristics. See http://www.imagemagick.org/Usage/text/#font_info.

Alternately, you could do the blur and then reduces the bounding box by the blur distance.

guy lateur · Post by **guy lateur** » 2018-05-30T14:51:16-07:00

fmw42 wrote: ↑2018-05-30T14:32:58-07:00 You could create an image of each character and use -debug annotate to find the glyph characteristics. See http://www.imagemagick.org/Usage/text/#font_info.

Alternately, you could do the blur and then reduces the bounding box by the blur distance.

Thanks, I'll look into the font info thing. I'm starting from a .ttf file for now, indeed, so that should definitely come in handy. Heck, I even installed some apps (eg, ttfedit) to extract that very info from the .ttf file. I didn't get far with that approach, though.

I'm still not convinced about the blurring approach, btw. What about a colon (:) character? You may have to blur more than you have room on top/bottom of you character, which would prevent you from 'calculating back'.

Maybe I should just leave it as it is, and do a postprocessing step that checks all bounding boxes, and merges them if they overlap horizontally. That shouldn't be too hard, should it?

Anyway, thanks again for the input. I'll leave it here for today. I'll let you know how this goes.

Post by **snibgo** » 2018-05-30T15:14:59-07:00

You can certainly do it more manually. From the white and black image, scale to a single row. PIxels in this Nx1 image that are black are between the characters, so that gives you the left/right boundaries. Then crop the white/black image to those boundaries, and trim gives you the top/bottom boundaries.

guy lateur · Post by **guy lateur** » 2018-05-30T15:21:35-07:00

snibgo wrote: ↑2018-05-30T15:14:59-07:00 You can certainly do it more manually. From the white and black image, scale to a single row. PIxels in this Nx1 image that are black are between the characters, so that gives you the left/right boundaries. Then crop the white/black image to those boundaries, and trim gives you the top/bottom boundaries.

Yes, a (vertical) trim sounds about right! I'll try to figure out how to do that.. (;-)

Post by **snibgo** » 2018-05-30T16:28:15-07:00

My page Subimage rectangles shows how your input image can be guillotined in vertical slices, with each cut midway between the characters. The script does a "+repage" on each image. You could knock that out, and replace it with "-trim". Then identify on all the images gives you the bounding rectangles.

Post by **fmw42** » 2018-05-30T16:38:36-07:00

You should be able to average the black and white image to one row using -scale and threshold at 0, then add a 1 pixel border of black. Then use -connected components to find the width of each white region. Then crop the full black and white image a full height and for each width and trim each sub image to get the bounding box dimensions.

guy lateur · Post by **guy lateur** » 2018-05-31T15:52:35-07:00

Ok, here's the result I'm getting:

Here's the python script that generates this result:

Code: Select all

import subprocess

import BBox     # custom Bounding Box class BBox & utilities

filename        = "\"" + r"F:\Amiga\_bmp\Fonts\IN3.png"     + "\""
fileoutbboxes   = "\"" + r"F:\Amiga\_bmp\Fonts\OUT3.png"    + "\""

# get bounding boxes as illustrated by snibgo
cmd =           "magick"
cmd += " " +    filename
cmd += " " +    "-fill White +opaque Black"
cmd += " " +    "-fill Red -draw \"color 0,0 floodfill\""
cmd += " " +    "-fill White +opaque Red"
cmd += " " +    "-define connected-components:verbose=true"
cmd += " " +    "-connected-components 4"
cmd += " " +    ":null"             # ?? output image is black ??
strOut = subprocess.check_output(cmd, shell=True).decode("utf-8").strip()

# generate & clean up bounding boxes -- see BBox.py
BBoxes = BBox.listFromIM(strOut)    # im-connected-components to BBox list
BBox.sortX(BBoxes)                  # order list in BBox.x
BBoxes = BBox.uniteListX(BBoxes)    # unite BBoxes that overlap along X
BBox.reprList(BBoxes)               # print BBoxes

# draw bounding boxes as illustrated by fmw42
drawcmds = BBox.imDrawCommandsFromList(BBoxes) # IM rect draw commands
drawcmd =           "magick"
drawcmd += " " +    filename
drawcmd += " " +    "-fill none -stroke red -draw " + drawcmds
drawcmd += " " +    "-alpha off"
drawcmd += " " +    fileoutbboxes
subprocess.check_output(drawcmd, shell=True)

I've used snibgo's advice to get the bounding boxes and fmw42's example to draw them. I'm using a custom BBox class for cleaning up of the bounding boxes -- eg, make sure the 'i' character only produces 1 BBox. Any BBoxes that overlap horizontally are merged into one BBox. The fact that f & g are united within 1 single BBox is because there is not at least 1 vertical line of space/background colour between them. I'll need to sort that out at the input side, because this does not satisfy the input requirements I'd put up initially.

Here's the BBox.py file: http://users.telenet.be/guy.lateur/IM/BBox.py

There's still some things to discover and some work to be done, obviously. I'm quite happy with what we've achieved so far, though, so thanks again for the help! If I make any other interesting progress, I'll be sure to post about it here. (:-)

Post by **fmw42** » 2018-05-31T17:13:48-07:00

If I use my method of scaling the binary image to 1 row and then do connected components, I get the same as you above, since the "f" and "g" overlap.

Input:

Code: Select all

infile="characters2.png"
inname=`convert "$infile" -format "%t" info:`
ht=`convert "$infile" -format "%h" info:`
OLDIFS=IFS
IFS=$'\n'
drawcmds=""
arr=(`convert characters.png -auto-level -fill white +opaque black -scale x1! -type bilevel \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-connected-components 4 null: | tail -n +2 | sed 's/^[ ]*//'` )
IFS=OLDIFS
num=${#arr[*]}
for ((i=0; i<num; i++)); do
color=`echo "${arr[$i]}" | cut -d\  -f5`
if [ "$color" = "gray(255)" ]; then
bbox1=`echo "${arr[$i]}" | cut -d\  -f2`
size=`echo "$bbox1" | cut -d+ -f1`
ww1=`echo "$size" | cut -dx -f1`
offx1=`echo "$bbox1" | cut -d+ -f2`
newbbox=`convert "${infile}" +repage -crop ${ww1}x${ht}+${offx1}+0 +repage -format "%@" info:`
size2=`echo "$newbbox" | cut -d+ -f1`
ww2=`echo "$size2" | cut -dx -f1`
hh2=`echo "$size2" | cut -dx -f2`
offx2=`echo "$newbbox" | cut -d+ -f2`
offy2=`echo "$newbbox" | cut -d+ -f3`
x1=$((offx1+offx2))
y1=$offy2
x2=$(($x1+ww2-1))
y2=$(($y1+hh2-1))
drawcmds="$drawcmds rectangle $x1,$y1 $x2,$y2"
fi
done
convert "$infile" -fill none -stroke red -draw "$drawcmds" -alpha off "${inname}_bbox.png"

But if I now threshold the binary 1 row image at 7%, I can separate the "f" and "g" at the expense of the red boxes being slightly too small.

Code: Select all

infile="characters.png"
inname=`convert "$infile" -format "%t" info:`
ht=`convert "$infile" -format "%h" info:`
OLDIFS=IFS
IFS=$'\n'
drawcmds=""
arr=(`convert characters.png -auto-level -fill white +opaque black -scale x1! -threshold 7% -type bilevel \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-connected-components 4 null: | tail -n +2 | sed 's/^[ ]*//'` )
IFS=OLDIFS
num=${#arr[*]}
for ((i=0; i<num; i++)); do
color=`echo "${arr[$i]}" | cut -d\  -f5`
if [ "$color" = "gray(255)" ]; then
bbox1=`echo "${arr[$i]}" | cut -d\  -f2`
size=`echo "$bbox1" | cut -d+ -f1`
ww1=`echo "$size" | cut -dx -f1`
offx1=`echo "$bbox1" | cut -d+ -f2`
newbbox=`convert "${infile}" +repage -crop ${ww1}x${ht}+${offx1}+0 +repage -format "%@" info:`
size2=`echo "$newbbox" | cut -d+ -f1`
ww2=`echo "$size2" | cut -dx -f1`
hh2=`echo "$size2" | cut -dx -f2`
offx2=`echo "$newbbox" | cut -d+ -f2`
offy2=`echo "$newbbox" | cut -d+ -f3`
x1=$((offx1+offx2))
y1=$offy2
x2=$(($x1+ww2-1))
y2=$(($y1+hh2-1))
drawcmds="$drawcmds rectangle $x1,$y1 $x2,$y2"
fi
done
convert "$infile" -fill none -stroke red -draw "$drawcmds" -alpha off "${inname}_bbox_t7.png"

guy lateur · Post by **guy lateur** » 2018-05-31T17:32:41-07:00

Thanks, interesting idea. Even if it doesn't really solve our problem, I should definitely check out that route!

Legacy ImageMagick Discussions Archive

Separating characters in a font file

Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file

Re: Separating characters in a font file