Page 1 of 1

Finding clipped (over exposed) images

Posted: 2011-11-10T08:36:14-07:00
by jonrescca
Hi list members,

This is my first question on the forum, so spare me ;-)

I want to write an bash script, that uses imagemagick to find overexposed images in a pile of about 10.000. Can anyone point me in the right direction? I understand that we have luminosity and RGB histograms, what would be best to use? And maybe if possible I would love to extend the script to detect color moiré too.

I've started now with getting the 20 most used colors from an image (mainly black/white) to detect color moiré

Code: Select all

convert $inputfile -colorspace rgb -colors 20 -unique-colors txt:
And grep trough that output for certain RGB values, but that could be utterly stupid to do off course ;-)

Thanks for any advice,
Jon

Re: Finding clipped (over exposed) images

Posted: 2011-11-10T10:36:15-07:00
by fmw42
With regards to overexposed images, perhaps just look at the mean and standard deviation of the grayscale version of the image (or overall information from all channels combined). You can see those in the verbose information

identify -verbose yourimage

or you can use string format

convert yourimage -format "%[mean]" info:
convert yourimage -format "%[standard_deviation]" info:

see string formats
http://www.imagemagick.org/script/escape.php

I would perhaps expect that if your image is overexposed, it will have a large mean and a small standard deviation. (The histogram will be pushed to the white end and have a narrow peak).

I don't know if the above will work for your tests or not, but you can look into it if you want. It really depends upon how overexposed your images are.

Such an image will also have a low contrast. So you could also try looking at the difference between the min and max values from the verbose information or the string formats.

Re: Finding clipped (over exposed) images

Posted: 2011-11-10T16:58:24-07:00
by anthony
Over exposed images are ones which has a very high percentage of maximum, or near maximum values.

I would threshold the images and get a verbose information of the result.
convert image.jpg -channel RGB -threshold 99% -separate -append -verbose info:
The separate/append ensures you look at the actual values of the image, and not colors.

Images which are over exposes should have a much larger percentage of near maximum colors (grayscale mean) in the above results. Check the above against a photo of say a book page, that was not overexposed, or have light reflections on it. This should get you a fairly good cut-off, as such a photo should not contain too many maximum values.

And please let us know what you learn, and particularly what 'mean' you eventually use to find your over exposed photos.

Re: Finding clipped (over exposed) images [Solved]

Posted: 2011-11-14T00:30:12-07:00
by jonrescca
The hint by Anthony gave me excellent results, after little tweaking of the treshold, I could even pick out the images with just some clipped pixels.

The basic script I wrote around this is the following:

Code: Select all

#!/bin/sh

# usage: find . -type f -name "*.tif" -a ! -name ".*" | while read i ; do clipping_detector.sh $i clippers.csv ; done

ImageFile=$1
ResultsFile=$2
ClippingTreshhold=$3

LuminosityMean=$(convert $ImageFile -channel RGB -threshold 99% -separate -append -format "%[mean]" info:)
ImageFileName=`basename $ImageFile`
ClippingDetected=`perl -e "if ( $LuminosityMean > $ClippingTreshhold ) { print 1; } else { print 0; }"`

if [ $ClippingDetected -eq 1 ]; then
	echo $ImageFileName $LuminosityMean
	echo $ImageFile $ImageFileName $LuminosityMean >> $ResultsFile
else
	echo $ImageFileName -
	echo $ImageFile $ImageFileName - >> $ResultsFile
fi
The resulting csv can be used to open the clipped images, and edit their clipped channels.
I found the treshold to be around 80-120, sometimes more, depending on the subject. I must note that we use batches of around 500 to 10.000 images of the same type, like scans of newspapers.

Re: Finding clipped (over exposed) images [Solved]

Posted: 2011-11-14T19:03:24-07:00
by anthony
jonrescca wrote:The hint by Anthony gave me excellent results, after little tweaking of the treshold, I could even pick out the images with just some clipped pixels.
A small expansion of the script to make it work with less extra shell wrapping.

Code: Select all

#!/bin/sh
#
#  usage:  clipping_detector.sh threshold *.tif > clippers.csv
#
ClippingTreshhold="$1"
shift;

sub detect_clipping() {
  ImageFile="$1"

  LuminosityMean=$( convert "$ImageFile" -channel RGB -threshold 99% -separate -append -format "%[mean]" info:)
  ImageFileName=$( basename "$ImageFile" )
  ClippingDetected=$( perl -e "if ( $LuminosityMean > $ClippingTreshhold ) { print 1; } else { print 0; }" )

  if [ $ClippingDetected -eq 1 ]; then
	echo >&2 "$ImageFileName $LuminosityMean"
	echo "$ImageFile,$ImageFileName,$LuminosityMean"
  else
	echo >&2 "$ImageFileName -"
	echo "$ImageFile,$ImageFileName,-"
  fi
}

for i in "$@"; do
   detect_clipping "$i"
done
The output is split into standard output (for CVS, comma separated list, file) and standard error for progress report.

if you want to do recursion use...
find . -name '*.tif' -print 0 | xargs -0 clipping_detector.sh 80 > clippers.cvs

Most importantly I quote filename variables and use -print0 with xargs. this allows it to correct handle files with spaces and other unusual characters.



What I really would like to do is be able to replace the shell function in the above with a single "convert" call that will output the results specified to stdout and stderr. Currently "convert" can output to either or both.. for example...

Code: Select all

    convert "$ImageFile" -channel RGB -threshold 99% -separate -append \
                -format "%f %[mean]" -write info:fd:2  -format "%d/%f,%f,%[mean]" info:
will write the results to stderr (for progress) and stdout (for result recording).

See Basics: Alternative identify output handling
http://www.imagemagick.org/Usage/basics/#identify_alt
And File Handling, File Descriptor output
http://www.imagemagick.org/Usage/files/#fd

The problem is that at this time IM can not select different strings based on the threshold. It can output a different number (using %[fx:...]) or color string (using %[pixel:]) but it can not output different strings based on either number or string comparison. That is currently it can not handle the 'IF' condition that follows.

Sorry for my rant. This is something that is of major concern to me in IMv7 CLI redevelopment, I am feeling a need for a better form of string macro processing, than simply percent escapes.