Use hough-lines to detect scanned book pages

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
prohtex
Posts: 4
Joined: 2016-12-20T11:45:27-07:00
Authentication code: 1151

Use hough-lines to detect scanned book pages

Post by prohtex »

Hello forum,

I'm working on a series of scripts for postprocessing scanned images. I've made use of a lot of Imagemagick functionality so far and the tips in this forum have been invaluable. One area where I'm stuck is identifying the edges of the scanned pages in the images for further cropping. My application requires non-destructive editing, so I'm trying to iterate over a directory of images and analyze each to see if I can identify the edge of the paper and export these as coordinates or bounding box.

I've been somewhat successful in tweaking canny and hough to create an image that identifies the edges of the paper, and also a whole lot of other stuff. The trick has been to scale the original image down quite a lot - there is a sweet spot.

Below is my code and some sample images. I'm not really sure where to go from here. Is there a way to retrieve these lines as coordinates? Is there a way to disregard lines that are within certain 360 degree angle ranges (to discard all the wacky 45 degree lines)? Most importantly, does anybody have any ideas for how I might be able to identify the bounds of the page?

Many thanks in advance for any assistance!

Code: Select all

$size = "1400";
$canny = "0x1+10%+30%";
$hough = "9x9+200";

foreach ($files as $file) {
	$ext = pathinfo($file,PATHINFO_EXTENSION);
	$output = $dest."/".basename($file,$ext)."jpg";	
	$msg = "Detecting edges in ".$file;
	$cmd = "convert ".$file."[".$size."x".$size."] -auto-orient \( +clone -canny ".$canny." -write /tmp/".basename($file)."_canny.png ";
	$cmd .= "-background none -fill red -stroke red -strokewidth 1 -hough-lines ".$hough." -write /tmp/".basename($file)."_lines.png \) ";
	$cmd .= "-composite ".$output;
	$thread[] = array($msg, $cmd);
	}
Image
Image
Image
Image
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Use hough-lines to detect scanned book pages

Post by snibgo »

To get line coordinates, see http://www.imagemagick.org/script/comma ... ough-lines
A text file listing the endpoints and counts may be created by using the suffix, .mvg, for the output image.
To get the angle, you need to do a simple calculation: arctan(dx/dy).

For finding edges, you can narrows the search. For example, the left edge will be a long nearly-vertical line is the left 50% (or smaller) of the image. This is usually enough, but two other features may be useful:

1. The left edge will be the line that is furthest to the left, etc.

2. The two sides of this line will be of contrasting intensity (eg light paper, black background).
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Use hough-lines to detect scanned book pages

Post by fmw42 »

Why not average the image down (-scale) to one column and then again to one row and save as txt: format. Then sequence along the txt: column and row to find the transition between black (very dark) and white (very bright) given some fuzz value for the difference between successive pixels. This should work for your text image. With regard to your other image, threshold everything that is not essentially background black to white and do the same.
prohtex
Posts: 4
Joined: 2016-12-20T11:45:27-07:00
Authentication code: 1151

Re: Use hough-lines to detect scanned book pages

Post by prohtex »

@snibgo Thanks for this information - very helpful! When you say "The two sides of this line will be of contrasting intensity", how would I check that? Crop the image to the coordinates of that line?

@fmw42 That's an excellent idea, thanks! I didn't know it was possible to output to txt file. My issue, though is that there are a great deal of pages with photos and dark backgrounds that would be difficult to distinguish. Also, the hough lines approach seems to identify (sometimes) the correct page edge vs the edge of the fanned out pages behind it. This is something that would not be possible evaluating pixel by pixel.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Use hough-lines to detect scanned book pages

Post by snibgo »

prohtex wrote:When you say "The two sides of this line will be of contrasting intensity", how would I check that? Crop the image to the coordinates of that line?
Yes. For example: suppose the line is nearly vertical. Crop to the full height, 10% or so of the image width. Rotate to put the line exactly vertical. Crop to get rid of the extra added triangles. Scale down to 2x1 pixels. This result has two pixels, being the average to the left and right of the line. If the left pixel is very dark (background scanner black) and the right pixel is light (close to paper white), we have found the left margin of the page.
snibgo's IM pages: im.snibgo.com
Post Reply