Find identical images in bulk?
-
- Posts: 16
- Joined: 2016-01-11T05:12:11-07:00
- Authentication code: 1151
Find identical images in bulk?
Imagemagick version 6.9.3
Windows Platform
I have a folder with number of images. I want to check if each image in the folder has an identical image in the folder.
I would prefer a text output.
Windows Platform
I have a folder with number of images. I want to check if each image in the folder has an identical image in the folder.
I would prefer a text output.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Find identical images in bulk?
I've never used it, but hash should do the trick. I understand two identical images should give the same hash value. See http://www.imagemagick.org/script/escape.php
Create a loop that calls convert for every image, like this:
Arrange to write them all out to a text file. That file has two fields: the hash, and the filename. You computer may have tools to find duplicate hash values (eg sort, remove duplicates, and compare files).
Create a loop that calls convert for every image, like this:
Code: Select all
convert file.ext -format "%%# %%f\n" info:
snibgo's IM pages: im.snibgo.com
-
- Posts: 16
- Joined: 2016-01-11T05:12:11-07:00
- Authentication code: 1151
Re: Find identical images in bulk?
I think you are talking about the image meta being identical. I want to check if they are visually similar images and the degree of similarity.snibgo wrote:I've never used it, but hash should do the trick. I understand two identical images should give the same hash value. See http://www.imagemagick.org/script/escape.php
Create a loop that calls convert for every image, like this:Arrange to write them all out to a text file. That file has two fields: the hash, and the filename. You computer may have tools to find duplicate hash values (eg sort, remove duplicates, and compare files).Code: Select all
convert file.ext -format "%%# %%f\n" info:
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Find identical images in bulk?
You first said "identical". I believe hash gives that: two identical images give the same hash value.
Now you say you want "the degree of similarity". If you want that for every pair of images, the only way is to compare every pair of images.
Now you say you want "the degree of similarity". If you want that for every pair of images, the only way is to compare every pair of images.
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Find identical images in bulk?
You would then have to use the IM compare function pair-by-pair.
See
http://www.imagemagick.org/script/compare.php
http://www.imagemagick.org/Usage/compare/
Or use the phash values stored in the verbose data. See viewtopic.php?f=4&t=24906. I also have unix bash shell scripts to generate a simpler has and also do the comparison. See my scripts, phashconvert and phashcompare at the links below. These work primarily on color images (sRGB).
See also identify -verbose -moments at http://www.imagemagick.org/script/identify.php
See
http://www.imagemagick.org/script/compare.php
http://www.imagemagick.org/Usage/compare/
Or use the phash values stored in the verbose data. See viewtopic.php?f=4&t=24906. I also have unix bash shell scripts to generate a simpler has and also do the comparison. See my scripts, phashconvert and phashcompare at the links below. These work primarily on color images (sRGB).
See also identify -verbose -moments at http://www.imagemagick.org/script/identify.php
Re: Find identical images in bulk?
ccleaner has a dup finder feature which works great on windoz and does produce a text report:
https://www.piriform.com/ccleaner/download
Tools -> dup_finder. And, it's free
Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.
In Perl: something like...
And, there is a great deal of speed difference in md5 programs
use Digest::MD5::File qw(dir_md5_hex file_md5_hex); #
file_md5_hex($file) may be faster than shelling out to the
OS to do the crunching...
https://www.piriform.com/ccleaner/download
Tools -> dup_finder. And, it's free
Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.
In Perl: something like...
Code: Select all
@file = `find $mydir -type f`; chomp @file
@file = grep(/\.jpg$|\.tiff$|\.png$/i, @file); # Filter IN your img types
%s2fa=(); # Size to file array hash (hash of arrays)
foreach $file (@file) {
$size=-s $file; # Get file size quickly
push @{$s2fa{$size}}, $file ; # Populate size -> @file hash
}
foreach $size (keys %s2fa) {
@file=@{$s2fa{$size}}; # Get array of all files with this size
next unless scalar @file > 1; # Unique size, not a dup
# Do an MD5 on all files in @file to find dups...
# Create another hash of arrays (exactly like size -> @file)
# except with md5 rather than size as key.
}
use Digest::MD5::File qw(dir_md5_hex file_md5_hex); #
file_md5_hex($file) may be faster than shelling out to the
OS to do the crunching...
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Find identical images in bulk?
I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Find identical images in bulk?
This is true for comparisons of files. It is not true for comparisons of images.BrianP007 wrote:Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.
For example, two files may be entirely different but contain identical images. An MD5 hash on files, and file sizes, tells us nothing about whether the images are the same.
snibgo's IM pages: im.snibgo.com
-
- Posts: 16
- Joined: 2016-01-11T05:12:11-07:00
- Authentication code: 1151
Re: Find identical images in bulk?
Yes, you are correct. Even resolution might differ in some cases but the subject is more or less the same.fmw42 wrote:I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Find identical images in bulk?
joshuafinny wrote:Yes, you are correct. Even resolution might differ in some cases but the subject is more or less the same.fmw42 wrote:I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
Only direct way in Imagemagick is perceptual hash. See viewtopic.php?f=4&t=24906