Split image horizontally while avoiding to cut text
Split image horizontally while avoiding to cut text
Hi guys,
I have images which are mostly text, black on white.
I need to cut them horizontally, into two pieces (nearly 50/50), but text should not be cut in the middle.
Example (red is where the image gets cut):
Bad:
Good:
What is the easiest way to achieve this?
Thanks!
I have images which are mostly text, black on white.
I need to cut them horizontally, into two pieces (nearly 50/50), but text should not be cut in the middle.
Example (red is where the image gets cut):
Bad:
Good:
What is the easiest way to achieve this?
Thanks!
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Split image horizontally while avoiding to cut text
Average the image down to one column using -scale 1xH!
Convert to text
Find the brightest (whitest) pixel near the middle, which then should be the space between lines of text
Crop at that location.
see
http://www.imagemagick.org/script/comma ... .php#scale
http://www.imagemagick.org/Usage/files/#txt
To make it easier, you can also automatically trim the outer white, then the outer black, then the next area of white outside the text. That way the black stripes around the sides and bottom will not contribute and you should then be able to find a white pixel in the column near the middle. That coordinate should then be used to crop (compensated by the trim size)
If you start by cropping in half and just use the bottom part, then the first white pixel ( or middle of the first set of white pixels) can then be found and used to for the crop coordinates or the original (after adjusting for the size of the top section)
Convert to text
Find the brightest (whitest) pixel near the middle, which then should be the space between lines of text
Crop at that location.
see
http://www.imagemagick.org/script/comma ... .php#scale
http://www.imagemagick.org/Usage/files/#txt
To make it easier, you can also automatically trim the outer white, then the outer black, then the next area of white outside the text. That way the black stripes around the sides and bottom will not contribute and you should then be able to find a white pixel in the column near the middle. That coordinate should then be used to crop (compensated by the trim size)
If you start by cropping in half and just use the bottom part, then the first white pixel ( or middle of the first set of white pixels) can then be found and used to for the crop coordinates or the original (after adjusting for the size of the top section)
Re: Split image horizontally while avoiding to cut text
Thanks for your help.
I couldn't figure out how to "Find the brightest (whitest) pixel near the middle".
I ended up using a PHP CLI script to do the work, as I'm a bit more familiar with PHP than ImageMagick.
In general, an ImageMagick solution would require a script too, right?
I probably wouldn't be able to do it just with command line arguments.
I couldn't figure out how to "Find the brightest (whitest) pixel near the middle".
I ended up using a PHP CLI script to do the work, as I'm a bit more familiar with PHP than ImageMagick.
In general, an ImageMagick solution would require a script too, right?
I probably wouldn't be able to do it just with command line arguments.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Split image horizontally while avoiding to cut text
The result of fmw42's process will be an image 1 pixel wide by (n) pixels high. Suppose this is "w2.png". To find the first (highest) white pixel:
The result (sent to stderr) might be:
So the fifth pixel down (counting the first as zero) is white.
Code: Select all
compare -metric RMSE -subimage-search w2.png xc:White NULL:
Code: Select all
0 (0) @ 0,5
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Split image horizontally while avoiding to cut text
Yes, that is correct, except in the simple case where you use only the bottom half of the image and do what user snibgo suggested above.In general, an ImageMagick solution would require a script too, right?
I probably wouldn't be able to do it just with command line arguments.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Split image horizontally while avoiding to cut text
The subimage-search technique could find the white pixel nearest the centre: crop into two, "-flip" the top half, "+append" them together, then search for the first white pixel. The y-coordinate tells if it is in the top or bottom half.
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Split image horizontally while avoiding to cut text
Actually using compare needs some modification. If you let it run its full course, it would find the pixel with the largest match score (closest to white), which may be further down the image. You need to choose some threshold in the rmse value and use -similarity-threshold, so it stops at the first acceptable value. So you would need to get the column stats first and decide on an rmse value for the -similarity-threshold.
see
http://www.imagemagick.org/script/comma ... -threshold
see
http://www.imagemagick.org/script/comma ... -threshold
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Split image horizontally while avoiding to cut text
Here is a short set of command lines.
I first took your second image and removed the red line. So note that there is a line there that will be brighter than where you want the split. Also the bottom of the image will be brighter in the column, since it has no black border there. Thus one needs to stop the compare at the first closest match. The use of -dissimilarity-threshold is there so that the compare does not stop because the white pixel has too large an rmse when compared to any black pixel. This forces the search not to stop for too large a mismatch.
Input:
Commands:
Crop the image into two nearly equal halves vertically.
Get the image width for use later when doing the final crop
Get the height of the top half for use when computing where to do the final crop
Scale the bottom half to one column
Get the y offset from the results of the compare
Add the y offset to the height of the top half to compute the y location in the full image to do the crop
Crop the original image into two parts defined by the compare offset.
Results:
If you know the thickness of the spacing, you can add half the spacing to the newH computation so that it splits it in the middle of the spacing.
I first took your second image and removed the red line. So note that there is a line there that will be brighter than where you want the split. Also the bottom of the image will be brighter in the column, since it has no black border there. Thus one needs to stop the compare at the first closest match. The use of -dissimilarity-threshold is there so that the compare does not stop because the white pixel has too large an rmse when compared to any black pixel. This forces the search not to stop for too large a mismatch.
Input:
Commands:
Crop the image into two nearly equal halves vertically.
Get the image width for use later when doing the final crop
Get the height of the top half for use when computing where to do the final crop
Scale the bottom half to one column
Get the y offset from the results of the compare
Add the y offset to the height of the top half to compute the y location in the full image to do the crop
Crop the original image into two parts defined by the compare offset.
Code: Select all
convert Fql1c1.png -crop 1x2@ +repage Fql1c1_%d.png
WW=`convert Fql1c1.png -format "%w" info:`
topH=`convert Fql1c1_0.png -format "%h" info:`
convert Fql1c1_1.png -scale 1x! Fql1c1_1_col.png
yoff=`compare -metric rmse -subimage-search -similarity-threshold 0.01% \
-dissimilarity-threshold 100% Fql1c1_1_col.png xc:white null: \
2>&1 | tr -cs "0-9" " " | cut -d\ -f4`
newH=$((topH+yoff))
convert Fql1c1.png -crop ${WW}x${newH} +repage Fql1c1_crop_%d.png
If you know the thickness of the spacing, you can add half the spacing to the newH computation so that it splits it in the middle of the spacing.
Last edited by fmw42 on 2013-07-28T17:42:53-07:00, edited 1 time in total.
Re: Split image horizontally while avoiding to cut text
Cool results. I'm working on way to split left page from right page. Assuming this logic I could find the binding which should be the darkest point between the two pages
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Split image horizontally while avoiding to cut text
Caution:
My above solution works only for a closely ideal situation.
If you are scanning text from a book, my solution above may not work well, because the text may not end up perfectly horizontal. Thus the spaces between lines of text will not be distinguishable when averaged down to one column. Each page would need to be separate if both pages are scanned together and then unrotated (-deskew possibly if the rotation is small). Even so, the curvature of the spine may distort the text such than the lines of text curve so that when unrotated, you still have a similar problem after averaging down to one column.
My above solution works only for a closely ideal situation.
If you are scanning text from a book, my solution above may not work well, because the text may not end up perfectly horizontal. Thus the spaces between lines of text will not be distinguishable when averaged down to one column. Each page would need to be separate if both pages are scanned together and then unrotated (-deskew possibly if the rotation is small). Even so, the curvature of the spine may distort the text such than the lines of text curve so that when unrotated, you still have a similar problem after averaging down to one column.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Split image horizontally while avoiding to cut text
Looking atB_Gaspar wrote:Cool results. I'm working on way to split left page from right page. Assuming this logic I could find the binding which should be the darkest point between the two pages
if averaging down to one row, that may not be the case, because you have some very large font dark text for which one other column may average down to one pixel that is darker than the center column. But if the margins are wide, you should be able to find the darkest pixel near the center than has a rather light area on either side. Also the image is rotated so there would not be any one column. You would need to find the center of the darker region near the middle of the picture surrounded by a section of very light pixels.
I would also suggest that you floodfill the outside to white before looking for the center line so that you do not have extra black around the outside which would make all results darker and make it harder to distinguish pixels in the one average down row.