Page 1 of 1

How to measure white area ratio

Posted: 2007-02-01T18:23:00-07:00
by Dilshod
Dear developers,

I'd like to pose a challenging question to you. How is it possible to identify a blank fax page that is mostly filled with white color using the ImageMagick tool. This is common problem that arises when someone sends a blank page facing the screen.

here is identify output for blank page:

Code: Select all

Image: some.tif
  Format: TIFF (Tagged Image File Format)
  Geometry: 1728x1064
  Class: DirectClass
  Type: Bilevel
  Endianess: MSB
  Colorspace: Gray
  Channel depth:
    Gray: 1-bits
  Channel statistics:
    Red:
      Min: 0 (0)
      Max: 1(1)
      Mean: 0.998839  (0.998839)
      Standard deviation: 0.0340568 (0.0340568)
  Colors: 2
  Histogram:
      2135: (  0,  0,  0)       black
   1836457: (255,255,255)       white
  Rendering-intent: Undefined
  Resolution: 204x98
  Units: PixelsPerInch
  Filesize: 10kb
  Interlace: None
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: grey74
  Dispose: Undefined
  Iterations: 0
  Scene: 0 of 2
  Compression: Fax
  Orientation: TopLeft
  Signature: 2c03ed86e50eb5f4aad60447d070fd2b273daf427df3040d17604b4dac1518ee
  Tainted: False
  Version: ImageMagick 6.2.4 10/28/05 Q16 http://www.imagemagick.org
Image: some.tif
  Format: TIFF (Tagged Image File Format)
  Geometry: 1728x1064
  Class: DirectClass
  Type: Bilevel
  Endianess: MSB
  Colorspace: Gray
  Channel depth:
    Gray: 1-bits
  Channel statistics:
    Red:
      Min: 0 (0)
      Max: 1(1)
      Mean: 0.998745  (0.998745)
      Standard deviation: 0.0354081 (0.0354081)
  Colors: 2
  Histogram:
      2308: (  0,  0,  0)       black
   1836284: (255,255,255)       white
  Rendering-intent: Undefined
  Resolution: 204x98
  Units: PixelsPerInch
  Filesize: 10kb
  Interlace: None
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: grey74
  Dispose: Undefined
  Iterations: 0
  Scene: 1 of 2
  Compression: Fax
  Orientation: TopLeft
  Signature: 3950b17e68c37f07b6db24458ecd905cb0252fc06a08bb93ea967d903d4128ef
  Tainted: False
  User Time: 1.340u
  Elapsed Time: 0:02
  Pixels per second: 1.3mb
  Version: ImageMagick 6.2.4 10/28/05 Q16 http://www.imagemagick.org

and here is identify output for a normal fax page:

Code: Select all

Image: some_1.tif
  Format: TIFF (Tagged Image File Format)
  Geometry: 1728x1051
  Class: DirectClass
  Type: Bilevel
  Endianess: MSB
  Colorspace: Gray
  Channel depth:
    Gray: 1-bits
  Channel statistics:
    Red:
      Min: 0 (0)
      Max: 1(1)
      Mean: 0.913866  (0.913866)
      Standard deviation: 0.280562 (0.280562)
  Colors: 2
  Histogram:
    156431: (  0,  0,  0)       black
   1659697: (255,255,255)       white
  Rendering-intent: Undefined
  Resolution: 204x98
  Units: PixelsPerInch
  Filesize: 98kb
  Interlace: None
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: grey74
  Dispose: Undefined
  Iterations: 0
  Scene: 0 of 2
  Compression: Fax
  Orientation: TopLeft
  Signature: 0bfae87cd62b1ac9e6392c6ab2d200f3780ee0e3ef180a0764af9c22afd2d4b5
  Tainted: False
  Version: ImageMagick 6.2.4 10/28/05 Q16 http://www.imagemagick.org
Image: some_1.tif
  Format: TIFF (Tagged Image File Format)
  Geometry: 1728x1052
  Class: DirectClass
  Type: Bilevel
  Endianess: MSB
  Colorspace: Gray
  Channel depth:
    Gray: 1-bits
  Channel statistics:
    Red:
      Min: 0 (0)
      Max: 1(1)
      Mean: 0.939581  (0.939581)
      Standard deviation: 0.238261 (0.238261)
  Colors: 2
  Histogram:
    109833: (  0,  0,  0)       black
   1708023: (255,255,255)       white
  Rendering-intent: Undefined
  Resolution: 204x98
  Units: PixelsPerInch
  Filesize: 98kb
  Interlace: None
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: grey74
  Dispose: Undefined
  Iterations: 0
  Scene: 1 of 2
  Compression: Fax
  Orientation: TopLeft
  Signature: 4e8d4e22656cde6c1a8333474c3cd8acd03d71f25c6c8eea74226e139bdd4077
  Tainted: False
  User Time: 1.300u
  Elapsed Time: 0:02
  Pixels per second: 1.3mb
  Version: ImageMagick 6.2.4 10/28/05 Q16 http://www.imagemagick.org
Thank you,

dt

Posted: 2007-02-01T23:11:55-07:00
by anthony
You can attempt to '-trim' the page with a '-fuzz' setting of say 10%

If the image size did not change (before IM v6.3.2) or you get the special 'Null' image (current release) then the page was blank. it could be any color, but basically uniform in that color.

Posted: 2007-02-02T16:41:44-07:00
by Dilshod
Anthony, thanks for the response.

I tried your suggestion, however, this won't work because the page isn't entirely white. Virtually all pages include fax headers and sometimes there are small pieces of black spots.

Do you think using the ratio of black to white pixel will do? And how is possible to get this information using MagickWand?

Code: Select all

  Histogram:
      2135: (  0,  0,  0)       black
   1836457: (255,255,255)       white 
thanks,

-dt

Re: How to measure white area ratio

Posted: 2007-02-04T07:25:17-07:00
by anthony
use -shave to strip off the border, with those FAX added bits. Then use a very small blur trim to discard signle dots.
http://www.cit.gu.edu.au/~anthony/graph ... #trim_blur

Re: How to measure white area ratio

Posted: 2007-02-04T15:28:24-07:00
by Dilshod
Thanks, Anthony. It works. I used -median instead of -blur. It seems it better this way, however, this method is a bit costly CPU-wise. It takes 4 seconds to process 2 pages. I may play some more with -blur to get it right.

This is what I used:

Code: Select all

convert -shave 5%x5% -median 2  some.tif some_2.tif

Re: How to measure white area ratio

Posted: 2007-02-04T16:48:13-07:00
by anthony
Both -median and -blur are convolution operators that overlay a 'neighbourhood' array over each and every pixel in the image (watch out for virtual pixels). As such both are very costly in terms of processing power.

However you produced a ratio. Have a look at IM Examples 'comparing images' and the verbose identification of images. It may be that the faster way is to do a -shave, then the identify and if number black pixel is less than some threshold to treat it as blank.

It is all very relative and you need to decide what is the best solution.

WARNING: see if you can differentiate bettween a blank fax but containing only constant vertical lines, and a more normal one with horizontal lines of text.
These are fairly common, produces by fax machines with a bad scanner.
(Do you have a practical example?)

My idea to differentiate is to divide the image into seperate rows, then average all the row images, and fuzzy trim (not blurred trim) to see if there is any sharp vertical contant in the image. That constant can then be removed from all the rows of the previous image, before applying the original 'blank fax' determination method, you finally go with.

Re: How to measure white area ratio

Posted: 2007-02-05T17:56:24-07:00
by ridera
Here is a snippet of my php code that comes close to what I think you are looking for. I don't recall checking to see how long it took to execute.

Code: Select all

	$convert= CONVERT;

//	exec("$identify -verbose $fp_filename", $img_info);			//limited to 1024 colors for histogram
	
	exec("$convert $fp_filename histogram:-", $img_info);						//for histogram

	if(isset($_GET['IMstats']))echo "stats: <pre>". print_r($img_info, TRUE) . '</pre>';

//***** Get the background color ***/
	$pattern= '\s*(\d+)\:';													//for histogram
	
	$max=0;
	$density= array();
	$max_str= '';

	foreach($img_info as $value){
	
		if(!preg_match("%$pattern%", $value, $density)) continue;
		
		if($density[1] < $max) continue;

		$max= $density[1];
		
		$max_str= $value;
	}//end foreach

	$pattern= '(?<=:)\s*(\(([\d,\040]+)\)\s*\#*[a-z0-9]+)';			//247286: (237,233,225,   0) #EDE9E1"

	preg_match("%$pattern%i", $max_str, $bkgr_color);	//$bkgr_color[1] is complete; [2] is just numbers
											
	preg_match_all("%(\d{3})%", $bkgr_color[2], $wcolors);	
	
	$white_fail= FALSE;
	
	foreach($wcolors[1] as $value){
		
		if($value > $resize_args['white_min']) continue;					//min white value
	
		$white_fail= TRUE;
	}//end foreach	

Re: How to measure white area ratio

Posted: 2007-02-05T20:36:34-07:00
by anthony
Sorry, I have used PHP before, but am not familuar enough with it to understand the PHP implemented algorithm.