Page 1 of 1

Deterministic images / hashes?

Posted: 2015-09-18T18:33:56-07:00
by mbradber
We're using ImageMagick for resizing images. Our dependency system relies on md5 hashes to store signatures of assets to determine if they have changed. I've noticed that the 'convert' tool is not deterministically generating the same images (the hashes are different everytime you regenerate an image using 'convert' with the same arguments).

I've viewed this thread with a similar issue
http://imagemagick.org/discourse-server ... hp?t=27227

suggesting to use the -strip option, however this hasn't solved the problem, the md5 of the generated image is still changing every time. Is there any possible way to get a deterministic hash on PNG files that go through the 'convert' tool for resizing?

Thanks

Re: Deterministic images / hashes?

Posted: 2015-09-18T19:15:51-07:00
by snibgo
If you use the same version of IM and delegates, with no randomness, the same command should make the same image each time it is run. But the file may be different, because of the metadata.

To ensure you are not chasing the wrong problem, you should check the image is the same:

Code: Select all

compare -metric AE dir1/in.png dir2/in.png NULL:
If this returns 0 then zero pixels are different.

Then use exiftool to find what metadata is different. I don't know what metadata exiftool picks up from the filesystem, as opposed to the data in the file. I suppose exiftool documentation will say.

Re: Deterministic images / hashes?

Posted: 2015-09-18T20:26:55-07:00
by mbradber
Thanks for the advice. I ran that compare command and the two images (generated exactly the same way through convert) did return 0. I also stripped any superfluous metadata. When running the identify command, and then comparing through a diff tool, these differences show up in the two files for some reason.

http://ibin.co/2G8vCAyuM6d5

Something very relevant is that I am using convert to resize images to 25% and 12.5% respectively. The images that are scaled to 25% do not have these differences, when I convert using 25% the md5s are the same. The above differences only show up for scaling to 12.5% so I'm thinking scaling too small starts to actually introduce some randomness to the files?

Re: Deterministic images / hashes?

Posted: 2015-09-18T20:50:26-07:00
by snibgo
The two images have the same number of pixels but slightly different values in the RGB channels, as shown by the mean of each channel. So the images are different, and the compare command I gave should tell you how many pixels are different. Green is the worst channel, where the pixels differ by about 0.05 (out of 255) on average. This is a small number but will show up on MD5, of course.

If the two images came from the same source image, resized the same amount (whether 25%, 12.5%, 1.4% or anything else), I don't see where any randomness should enter.

I don't quite understand your comments about resizing. If one image was resized by 25% and then 50% of that, where the other image was resized by 12.5% in one step, that would account for the difference, because of quantisation to integer.

Can you share the input image, and command?

Re: Deterministic images / hashes?

Posted: 2015-09-21T12:05:26-07:00
by mbradber
I agree with you and would expect that process of resizing to produce deterministic images, but it doesn't look like it does.

Here is an example image of the same size as my actual image, and the same result occurs as shown below (md5 changes for 12.5% resize and not for 25% resize)

Hash changing:
http://ibin.co/2GRWgTuVBO0A

Example image:
http://ibin.co/2GRX0Sn8WfTu

And as I showed earlier, if you ran the command for 12.5% resize on that image to generate 2 different images, those 2 images would have differences when examining them through the 'identify' tool. Very strange.

Re: Deterministic images / hashes?

Posted: 2015-09-21T12:27:45-07:00
by snibgo
Your syntax is incorrect. You should read, then resize and strip, then write the output.

What version of IM are you using? I'm on 6.9.1-6. Repeating your 12.5% resize, either with your commands or the correct order, gives identical bytes in repeated conversions.

Re: Deterministic images / hashes?

Posted: 2015-09-21T15:27:32-07:00
by mbradber
I was using a slightly older version of 6.9.1. I just upgraded to 6.9.2-3 and I am no longer seeing this issue. Kind of disappointing that this was an issue just because of a version difference. I'm still curious as to what was causing this in the first place (maybe some floating point issues? I noticed it happening when scaling by decimal amounts more than other cases).

However, thanks for your time and patience snibgo.

Re: Deterministic images / hashes?

Posted: 2015-09-21T16:08:33-07:00
by fmw42
Perhaps it was the version of libpng?