Hi there,
I tried searching for copy/duplicate/detect duplicate, though I didn't find something here. If I overlooked something (as in this has been already answered somewhere here on the board) please let me know. I also read: https://www.imagemagick.org/Usage/compare/. I am looking for a way to automate this.
Long story short: Something like a year ago I lost my photos as well as my backups and did end up with a folder containing most of them as well as modified (denoise, gamma, sharpen, scaled) duplicates of the original. Now I need to get rid of the duplicates. First of all I really just want to detect duplicates - choosing which of the duplicates to keep isn't that important currently.
So I tried the following:
1. Simple IM Fingerprint (storing all photos fingerprint in an array and while iterating over all my photos checking if something matches) - that seems to work quite good.
2. Downscale to 64x64 (as well tested 32x32), convert to grayscale, created 3 by 90-degree rotated versions, take the fingerprints of that to check for duplicates.
I might need a helping hand / idea about 2. To downscale
- first I used sample. That is pretty fast though no copies are detected.
- then I used scale. That is a little bit slower though still no copies are detected.
- then I used resize with POINT and BOX a little bit slower - still no copies.
- then I used resize with GAUSSIAN and HERMITE - GAUSSIAN is the slowest(!), HERMITE is a bit slower than above variants. THIS one detects some duplicates (so.. yes, it does work. It's just a little bit too slow).
Using sample/scale and follow that by a gaussian blur is still faster than using resize with GAUSSIAN - but it does not detect duplicates. So I'm curious why is a GAUSSIAN_RESIZE as well as HERMITE_RESIZE working and SAMPLE/SCALE+GAUSSIAN/BLUR not?
By the way, the fingerprint I am using is the one PHP's \Imagick::getImageSignature() gives back. Is that probably wrong to use for what I want to do? I'm not limited to PHP, Bash would be fine as well. How do you do that?
I noticed that auto-levels does not change the fingerprint. Looking for a way that color-distorted or gamma-corrected photos would still be detected as copies. For that I do the grayscale conversation. I also thought and tried creating an edge mask to use that - however, creating that mask takes way too long.
Thanks in advance,
Jean
How do you detect duplicates? And how does IM Fingerprint work?
Re: How do you detect duplicates? And how does IM Fingerprint work?
Okay, I wrote something which seems to work, based on what I did read about aHash. Here's the PHP Code:
Code: Select all
$im = new \Imagick($file);
$im->sampleImage(16, 16);
$im->transformImageColorspace(\Imagick::COLORSPACE_GRAY);
$data = $im->getImageChannelMean(\Imagick::CHANNEL_RED);
$mean = $data['mean'];
$im->thresholdImage($mean);
$hash = $im->getImageSignature();
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: How do you detect duplicates? And how does IM Fingerprint work?
You could use perceptual hash techniques. ImageMagick has a color phash. See https://imagemagick.org/discourse-serve ... =4&t=24906
I have built some other perceptual hash scripts at http://www.fmwconcepts.com/imagemagick/ ... /index.php.
I have built some other perceptual hash scripts at http://www.fmwconcepts.com/imagemagick/ ... /index.php.