Split text lines into multiple images

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
kaliber
Posts: 5
Joined: 2015-11-13T01:29:10-07:00
Authentication code: 1151

Split text lines into multiple images

Post by kaliber »

Hello. I would to split an image into multiple images according to its text lines.
I need to cut them horizontally, into n pieces.

Example:
Image

Output:
Image

Image

Image

The following image shows the pixel rows that contains ONLY white color (highlighted by yellow). This maybe could help.
Image


Recap
Preconditions:
- the background of input image is always white #ffffff (it never changes - ie. i am not scanning text from a book).
- text lines never overlaps (there is a clear separation between text lines)

Postconditions (let's be n = #text lines) :
- it should produces n images per n lines
- the result images should be trimmed (it's easy using -trim option)

Thanks for help :)
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Split text lines into multiple images

Post by snibgo »

Fred probably has a script that can do this.

You could do it like ths:

1. Scale to a single column. This will be white only where there is no text in the row.

2. Output this as text.

3. In a script, find the y-coordinate of first non-white pixel. Then look for the next white pixel (or end of file) and back-track one. This gives you the start and end y-coordinates of the first line of text.

4. Crop the image, the entire width but only between these y-values. Trim.

5. Repeat (3) and (4) until no more white pixels (no more text).

If you have my process modules, you wouldn't need to write the column as text, but you would still need a scripted loop.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split text lines into multiple images

Post by fmw42 »

snibgo's method is exactly what I would have suggested. I do not have a script that will do that. Imagemagick has -connected-components that would allow you to find the bounding box of each word (connected set of characters). See viewtopic.php?f=4&t=26493
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split text lines into multiple images

Post by fmw42 »

Because your lines are equally spaced with wide spaces and have about half spaced margins at top and bottom, this seems to work. However, I do not know how universal it would be for all your images and you would need to know how many lines you have.

Code: Select all

convert ex1.png -crop 1x3@ -trim +repage ext_%d.png
See cropping into equal parts at http://www.imagemagick.org/Usage/crop/#crop_equal

I think you would be better off scripting a loop as user snibgo has suggested.
Post Reply