Page 1 of 1

Split text lines into multiple images

Posted: 2015-11-13T03:06:01-07:00
by kaliber
Hello. I would to split an image into multiple images according to its text lines.
I need to cut them horizontally, into n pieces.

Example:
Image

Output:
Image

Image

Image

The following image shows the pixel rows that contains ONLY white color (highlighted by yellow). This maybe could help.
Image


Recap
Preconditions:
- the background of input image is always white #ffffff (it never changes - ie. i am not scanning text from a book).
- text lines never overlaps (there is a clear separation between text lines)

Postconditions (let's be n = #text lines) :
- it should produces n images per n lines
- the result images should be trimmed (it's easy using -trim option)

Thanks for help :)

Re: Split text lines into multiple images

Posted: 2015-11-13T03:17:38-07:00
by snibgo
Fred probably has a script that can do this.

You could do it like ths:

1. Scale to a single column. This will be white only where there is no text in the row.

2. Output this as text.

3. In a script, find the y-coordinate of first non-white pixel. Then look for the next white pixel (or end of file) and back-track one. This gives you the start and end y-coordinates of the first line of text.

4. Crop the image, the entire width but only between these y-values. Trim.

5. Repeat (3) and (4) until no more white pixels (no more text).

If you have my process modules, you wouldn't need to write the column as text, but you would still need a scripted loop.

Re: Split text lines into multiple images

Posted: 2015-11-13T10:45:33-07:00
by fmw42
snibgo's method is exactly what I would have suggested. I do not have a script that will do that. Imagemagick has -connected-components that would allow you to find the bounding box of each word (connected set of characters). See viewtopic.php?f=4&t=26493

Re: Split text lines into multiple images

Posted: 2015-11-16T12:00:38-07:00
by fmw42
Because your lines are equally spaced with wide spaces and have about half spaced margins at top and bottom, this seems to work. However, I do not know how universal it would be for all your images and you would need to know how many lines you have.

Code: Select all

convert ex1.png -crop 1x3@ -trim +repage ext_%d.png
See cropping into equal parts at http://www.imagemagick.org/Usage/crop/#crop_equal

I think you would be better off scripting a loop as user snibgo has suggested.