Use hough-lines to detect scanned book pages
Posted: 2017-02-01T13:02:07-07:00
Hello forum,
I'm working on a series of scripts for postprocessing scanned images. I've made use of a lot of Imagemagick functionality so far and the tips in this forum have been invaluable. One area where I'm stuck is identifying the edges of the scanned pages in the images for further cropping. My application requires non-destructive editing, so I'm trying to iterate over a directory of images and analyze each to see if I can identify the edge of the paper and export these as coordinates or bounding box.
I've been somewhat successful in tweaking canny and hough to create an image that identifies the edges of the paper, and also a whole lot of other stuff. The trick has been to scale the original image down quite a lot - there is a sweet spot.
Below is my code and some sample images. I'm not really sure where to go from here. Is there a way to retrieve these lines as coordinates? Is there a way to disregard lines that are within certain 360 degree angle ranges (to discard all the wacky 45 degree lines)? Most importantly, does anybody have any ideas for how I might be able to identify the bounds of the page?
Many thanks in advance for any assistance!
I'm working on a series of scripts for postprocessing scanned images. I've made use of a lot of Imagemagick functionality so far and the tips in this forum have been invaluable. One area where I'm stuck is identifying the edges of the scanned pages in the images for further cropping. My application requires non-destructive editing, so I'm trying to iterate over a directory of images and analyze each to see if I can identify the edge of the paper and export these as coordinates or bounding box.
I've been somewhat successful in tweaking canny and hough to create an image that identifies the edges of the paper, and also a whole lot of other stuff. The trick has been to scale the original image down quite a lot - there is a sweet spot.
Below is my code and some sample images. I'm not really sure where to go from here. Is there a way to retrieve these lines as coordinates? Is there a way to disregard lines that are within certain 360 degree angle ranges (to discard all the wacky 45 degree lines)? Most importantly, does anybody have any ideas for how I might be able to identify the bounds of the page?
Many thanks in advance for any assistance!
Code: Select all
$size = "1400";
$canny = "0x1+10%+30%";
$hough = "9x9+200";
foreach ($files as $file) {
$ext = pathinfo($file,PATHINFO_EXTENSION);
$output = $dest."/".basename($file,$ext)."jpg";
$msg = "Detecting edges in ".$file;
$cmd = "convert ".$file."[".$size."x".$size."] -auto-orient \( +clone -canny ".$canny." -write /tmp/".basename($file)."_canny.png ";
$cmd .= "-background none -fill red -stroke red -strokewidth 1 -hough-lines ".$hough." -write /tmp/".basename($file)."_lines.png \) ";
$cmd .= "-composite ".$output;
$thread[] = array($msg, $cmd);
}