I bring up this old thread to ask if, in last two years, something new came around about image segmentation.
I'd like to be able to extract all words from an image, to use each single image as if it was a character in a text, thus allowing text reflow even in scanned pages.
Example image:
data:image/s3,"s3://crabby-images/3153b/3153bd55f3c1055f33dd38ee7f2c556b6b34ce57" alt="Image"
By copying a line and pasting it 1 pixel to the right and again 2 pixels to the right, all characters in a word get "melted" together, but words are still separated; melting characters should make it easier to identify single words, .e. determining the boxes which include them. Once I have a box per each word, it's easy to extract and save them.
The script I'm trying to write picks each word and saves it into a separated file.
I think I should use -segment function, but I'm not sure. Can anybody confirm, and give some hints about were to start from?
http://www.tiem.utk.edu/~sylv/HTML/Imag ... gment.html