converting fuzzy outlined text for ocr
converting fuzzy outlined text for ocr
i've been struggling with this for a few days on my own with getting tesseract ocr to recognize white text mostly outlined in black on various backgrounds. what i've come up works most of the time but sometimes mixes up numbers. in the case when the background has any white in it, it mostly fails. can anyone help me come up with something that will work better? thanks in advance.
convert test1.tif -channel RGB -threshold 99% -fill black +opaque white -negate result1.tif
original
result
original
result
original
result
convert test1.tif -channel RGB -threshold 99% -fill black +opaque white -negate result1.tif
original
result
original
result
original
result
Last edited by inorkuo on 2018-10-14T05:44:40-07:00, edited 1 time in total.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: converting fuzzy outlined text for ocr
Try the following. Here I use mogrify to process all 3 files that I put into folder test1 and write the output to test2.
First I draw a black border around the image.
Next, I floodfill starting at the 0,0 with black to cover over the solid white background if it exists
Next, I do a fuzzy change of near white to pure white
Next, I change anything not white to black
Next, I negate it so that you have black text on a white background
Finally, I save the results to png to avoid further JPG compression losses.
test1
test2
test3
First I draw a black border around the image.
Next, I floodfill starting at the 0,0 with black to cover over the solid white background if it exists
Next, I do a fuzzy change of near white to pure white
Next, I change anything not white to black
Next, I negate it so that you have black text on a white background
Finally, I save the results to png to avoid further JPG compression losses.
Code: Select all
mogrify -path /Users/fred/desktop/test2 -format png -bordercolor white -border 1 -fuzz 20% -fill black -draw "color 0,0 floodfill" -fill white -opaque white -fill black +opaque white -negate *.jpg
test2
test3
Re: converting fuzzy outlined text for ocr
thanks for the advice. the results look much better than mine except why did the "y" and "ft" dissappear in the image with "497y 44ft"? is there any way to keep all of the numbers and letters?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: converting fuzzy outlined text for ocr
Because those letters were not completely enclosed by black. So the flood fill leaked into the white part of the letters and changed them to black which when negate became the white background.
Re: converting fuzzy outlined text for ocr
if i do the following, it fills in the lines enough so that if i play with the fuzz % i can get the "y" and "ft" to stay but then the triangle between the 9 and 7 doesn't get filled. any ideas?
convert test1.tif -morphology erode diamond:1 -morphology open octagon:2 result1.tif
convert test1.tif -morphology erode diamond:1 -morphology open octagon:2 result1.tif
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: converting fuzzy outlined text for ocr
I do not see any triangle between 9 and 7? Please post your images and the results of your extra processing.
Re: converting fuzzy outlined text for ocr
i would either have the "y" disappear or i would have this triangle shown below because increasing the black to enclose the letters would also enclose the white space between the 9 and 7.
i've abandoned the idea of fully enclosing the letters and my new strategy is to over saturate the image. the problematic backgrounds are cloudy skies. the image looks mostly white but there are almost always hints of blue. i'm doing a convert -modulate 100,5000 and i get this.
there is still some white splotches left in the image. also, the white will not always be in the same place. can you help me isolate the text from here?
i've abandoned the idea of fully enclosing the letters and my new strategy is to over saturate the image. the problematic backgrounds are cloudy skies. the image looks mostly white but there are almost always hints of blue. i'm doing a convert -modulate 100,5000 and i get this.
there is still some white splotches left in the image. also, the white will not always be in the same place. can you help me isolate the text from here?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: converting fuzzy outlined text for ocr
try this on your black/white version above
I do not advise saving intermediate images to JPG, since it is a lossy format. Save the PNG or TIFF for intermediate results.
Code: Select all
convert tri.jpg -morphology open octagon:4 -threshold 60% -negate result.png
Re: converting fuzzy outlined text for ocr
thanks for the advice. here's what i ended up with that seems to be working for various backgrounds.
Code: Select all
convert in.tif -modulate 100,5000 -bordercolor white -border 1 -fuzz 17% -fill blue -draw "color 0,0 floodfill" -flatten -alpha OFF -format png temp.png
convert temp.png -fuzz 5% -fill black +opaque white -negate -morphology erode octagon -format tif out.tif
Last edited by inorkuo on 2018-10-15T12:33:31-07:00, edited 1 time in total.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: converting fuzzy outlined text for ocr
You should not need two convert commands. This should work as one long command. (Unix syntax for line feeds)
Code: Select all
convert in.tif \
-modulate 100,5000 \
-bordercolor white -border 1 \
-fuzz 17% -fill blue -draw "color 0,0 floodfill" \
-flatten -alpha OFF \
-fuzz 5% -fill black +opaque white -negate \
-morphology erode octagon \
out.tif
Re: converting fuzzy outlined text for ocr
great thanks again. i have merged the two lines now.
i missed a "-fill" before "black +opaque white -negate". i edited my original post.
i missed a "-fill" before "black +opaque white -negate". i edited my original post.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: converting fuzzy outlined text for ocr
I edited my command accordingly.