Page 1 of 1

[solved] Make "compare" command deterministic?

Posted: 2015-03-19T23:41:21-07:00
by fotinakis
A weird one for you...

It looks like the compare command produces different results if the base file ctime changes between runs (ie. if the file is touched). That is, it produces a file with different bytes, the image signature itself doesn't change.

Is there any way to make compare produce deterministic output (ie. ignore filesystem metadata)? Caveats: I need file bytes to be the same, I cannot easily use the image signature for what I'm doing. Also, I can't just set ctime because it's actually the filesystem inode data modification time and cannot be changed easily: http://stackoverflow.com/q/4537291/128597.

Here's my "proof" of this behavior:

Code: Select all

$ compare base_image.png other_image.png diff_image.png
$ md5 *
MD5 (base_image.png) = 8f50e3814de43750547facfc1c36851b
MD5 (other_image.png) = 448632ae88cb97de138f0cdc5ad1bba4
MD5 (diff_image.png) = ce5715a887fd1585688889f021b3a7b6
^ This can be repeated as many times, and will deterministically create the same diff_image.png.

But, merely touching the base image will change the resulting file:

Code: Select all

$ touch base_image.png 
$ compare base_image.png other_image.png diff_image.png
$ md5 *
MD5 (base_image.png) = 8f50e3814de43750547facfc1c36851b
MD5 (other_image.png) = 448632ae88cb97de138f0cdc5ad1bba4
MD5 (diff_image.png) = fc320022b8fbf1d3d390c3ca093a17e2
Interestingly, touching the other_image.png does not affect the compare output.

Re: Make "compare" command deterministic?

Posted: 2015-03-20T00:01:01-07:00
by snibgo
I'm not sure I understand the question. MD5 uses all the file bytes, including date metadata. To get a hash value of just the image:

Code: Select all

convert -format %# info:
However, this can return different hash values for two images that are visually identical, eg 8-bit and 16-bit versions of the same image.

If you have two generations of a file, you can use compare to find if the images are visually identical.

Re: Make "compare" command deterministic?

Posted: 2015-03-20T00:09:26-07:00
by fotinakis
Understandable confusion—for clarity I'm talking above about ctime (the "changed time" of the file), not any image metadata. Since it's filesystem metadata it is not included in the image bytes and therefore not included in the MD5 hash. Check the base_image.png MD5 above and you'll notice it's the same in both runs, even after being touched, but that the touch does affect what compare produces.

Re: Make "compare" command deterministic?

Posted: 2015-03-20T00:46:29-07:00
by snibgo
diff_image.png contains image metadata, including (according to exiftool) two different modification dates and two different create dates. The "PNG:datemodify" seems to come from the filesystem timestamp of the first input file, in your case base_image.png. This is why touching other_image.png doesn't change the MD5 checksum.

Re: Make "compare" command deterministic?

Posted: 2015-03-20T10:40:08-07:00
by fotinakis
Aha! That makes sense, thank you. I've successfully used the "convert -strip" tool to strip out the PNG metadata from diff_image and now the results are deterministic:

Code: Select all

$ compare base_image other_image diff_image
$ md5 diff_image 
MD5 (diff_image) = ca723dbec4b3dbc11184f0aba416799f
$ compare base_image other_image diff_image
$ md5 diff_image 
MD5 (diff_image) = ca723dbec4b3dbc11184f0aba416799f
$ touch base_image 
$ compare base_image other_image diff_image
$ md5 diff_image 
MD5 (diff_image) = 28f792ff8bdaee3b9df0d7090b4b9d19
$ convert -strip diff_image diff_image 
$ md5 diff_image 
MD5 (diff_image) = 60a0b0799560d3066a2ce131145f4a34
$ touch base_image 
$ compare base_image other_image diff_image
$ convert -strip diff_image diff_image 
$ md5 diff_image 
MD5 (diff_image) = 60a0b0799560d3066a2ce131145f4a34
Thanks for the help!