Page 1 of 1

connected-components different verbose results with threshold

Posted: 2015-10-26T12:12:19-07:00
by gvandyk
HI

I am using the following command:

Code: Select all

convert img.gif -define connected-components:verbose=true -connected-components 4 null: > out.txt
on my black & white image this gives back only srgb(255,255,255) and srgb(0,0,0) output

but when I add an area threshold such as

Code: Select all

convert img.gif -define connected-components:area-threshold=1000 -define connected-components:verbose=true -connected-components 4 null: > out.txt
it gives back different srgb values of different shades of gray, the areas are still the same size, but the srgb values are now different.

What is different, as I believed the area-threshold only filters larger areas, not change the output?

How can I add a threshold and still only get srgb(255,255,255) and srgb(0,0,0) lines back?

My version of IM is 6.9.0-10

Re: connected-components different verbose results with threshold

Posted: 2015-10-26T13:57:13-07:00
by fmw42
After an area threshold, the pixels that are thresholded out are merged with their respective background values and new graylevels are computed for the new regions. That is if you merged a white pixel into a black region, then that region contains some black pixels and the white pixel, which is changed to black, so the region is recomputed to represent the average of what original pixels are included.

This is an issue that I feel needs to be changed or to have a new option so that after merging, the new region has the same graylevel values as it did before the merge. That is to represent the average of the region after merging, so black areas are still black and white areas are still white, but the counts will change. I have a request in to the developers already to make this enhancement.

Re: connected-components different verbose results with threshold

Posted: 2015-10-27T09:52:53-07:00
by gvandyk
Is there a way to limit the results from the connected-components verbose call.

I am getting the "too many objects" error, and would like to only get the top results (largest areas), without loosing the "black & white".

I assume that I can filter the srgb results to assume that values above (128) would be white, and those below black, is this assumption correct?

Re: connected-components different verbose results with threshold

Posted: 2015-10-27T09:57:44-07:00
by fmw42
I have done post processing in unix to filter and even recolor the results from the textual output. I am not sure why you are getting "too many results"? Are you using Q16 IM? If so, that should allow 65535 different regions. Perhaps you can post your input image and some of the textual results you are getting.

See my unix bash shell script, kmeans, at my link below for color reduction and the filtering that I used to recolor after the regions were changed.

Re: connected-components different verbose results with threshold

Posted: 2015-10-28T09:57:27-07:00
by gvandyk
A link to a file with the "too many objects" run on my Q16 IM:

https://www.dropbox.com/s/yew6v4a6utrvn ... s.gif?dl=0

I am trying to extract pages from images that I have scanned with a book scanner (cameras), and I need to detect the proper page.

With this I am looping through gray values from 255 to 150 with a fuzz value of 25 using:

Code: Select all

convert infile.jpg -fuzz 25% -fill white -opaque gray(grayValue) -fill black +opaque white tmp.gif
The first white area that gets extracted at the highest gray level, that is more than 50% the width of the scanned image is assumed to be a page. This page then gives me the margin of one of the page edges (page end). I then carry on looping until I find the closest other edge that is greater than "glassmargin". This is to determine the middle of the book.

Once these 2 margins are found, the page can be extracted.

I need to do the loop, as I have 1000's of pages, with differing backgrounds and page sizes.

It is within these loops, that I am getting the "too many objects" on some gray/fuzz values.

Is there a better way to determine the page boundaries, as this method takes very long to get the proper page boundaries, but it does work.

Re: connected-components different verbose results with threshold

Posted: 2015-10-28T10:04:44-07:00
by gvandyk
In addition to the above, I am stepping the grayvalue down by 5 each time I go through the loop to find the edges.

One of the original images:

https://www.dropbox.com/s/97957iwifwi5n ... 3.JPG?dl=0

The above "gif" was generated with a gray value of: 250, so the bottom code created the "too many objects" gif.

Code: Select all

convert IMG_0073.JPG -fuzz 25% -fill white -opaque gray(250) -fill black +opaque white manyObjs.gif

Re: connected-components different verbose results with threshold

Posted: 2015-10-28T10:06:55-07:00
by fmw42
CCL is likely giving you too many objects due to all the noise in the image. Does it abort or does it still give you the textual data up to the point where it reaches to many objects.

You could try to use -morphology either open or close or smooth to get rid of the noise first. See http://www.imagemagick.org/Usage/morphology/#basic

Re: connected-components different verbose results with threshold

Posted: 2015-10-28T10:09:32-07:00
by fmw42
Also try adding -depth 16 to your CCL command right after reading the input image.

Re: connected-components different verbose results with threshold

Posted: 2015-10-28T14:48:09-07:00
by fmw42
This seems to avoid the too many objects message by using -morphology close octagon:1

Code: Select all

convert manyObjs.gif -morphology close octagon:1 -define connected-components:verbose=true -connected-components 4 null:
But I would like to see your original image before any processing.

Re: connected-components different verbose results with threshold

Posted: 2015-10-29T07:19:22-07:00
by gvandyk
The original image that the gif was generated from can be found here:

https://www.dropbox.com/s/97957iwifwi5n ... 3.JPG?dl=0

Re: connected-components different verbose results with threshold

Posted: 2015-10-29T11:18:03-07:00
by fmw42
Here is a potentially different approach using my unix bash shell script, textcleaner. After the text cleaner, I have assumed that all pages extracted from the same book have the same scale and region size for the text. So I measure an area about the text with some white border, but not too much. I then resized the image by 1/16 and did a subimage compare search to find the region that best matches to a mid gray region of that estimated text area reduced size. Once I had the offsets, I scale them back up by 16 and did a crop. (A more exact result could be achieved by repeating the subimage search at full resolution on a region that was somewhat bigger, but not full image size)

textcleaner -f 25 -o 10 -g IMG_0073.JPG tmp.png
http://www.fmwconcepts.com/misc_tests/p ... ct/tmp.png

width=1850
height=2750
factor=16
pct=`convert xc: -format "%[fx:100/$factor]" info:`
ww=`convert xc: -format "%[fx:round($width/$factor)]" info:`
hh=`convert xc: -format "%[fx:round($height/$factor)]" info:`
convert tmp.png -resize $pct% tmp2.png
http://www.fmwconcepts.com/misc_tests/p ... t/tmp2.png

vals=`compare -metric rmse -subimage-search -dissimilarity-threshold 1 tmp2.png \( -size ${ww}x${hh} xc:gray \) null: 2>&1`
coords=`echo $vals | sed -n 's/^.*[@] \(.*,.*\)/\1/p'`
xx=`echo $coords | cut -d, -f1`
yy=`echo $coords | cut -d, -f2`
xoff=$((xx*factor))
yoff=$((yy*factor))
convert tmp.png -crop ${width}x${height}+${xoff}+${yoff} +repage tmp3.png
http://www.fmwconcepts.com/misc_tests/p ... t/tmp3.png

Sorry. If you are on Windows, I do not have a corresponding script, but you can process the images similarly using -lat 25x25+10%. The rest of my code would need to be modified to Windows equivalents. I am not a Windows user, so could not help with this.