Page 1 of 1

compare TIFF and PDF files

Posted: 2009-07-13T13:47:07-07:00
by john.l.lopez
Colleagues
I am currently doing a mass TIFF to PDF conversion project for our PLM system, about 150k files. All files are contained in the database. I am using a batch processing macro in Acrobat 9 ProX to affect the conversion. While the conversions appear to go OK, reviewing the document properties with IDENTIFY, I see that the original TIFF image may be 300x300 DPI but the resultant PDF is 72x72, which is the default PDF resolution (PPI).

I need to create the resultant conversion at the same resolution as the original, create it in PDF/A compliant format (ISO 32000-1), and validate the image between the original file (TIFF) and the converted file (PDF).

I have used COMPARE to validate differences between TIFF files and it works great. However, when attempting to do the same between the original TIFF file and the PDF, I get the following error:

# compare -fuzz 10% 1460D0100_SH1.tif 1460D0100_SH1.pdf cmpr.gif
compare: image size differs `1460D0100_SH1.tif' @ compare.c/CompareImageChannels/153.

This of course makes sence if the file resolution is being changed. We are still determining if the change in resolution will affect the downstream readability of the file, yet intuitively one would want to maintain the same resolution.

One thought was that the User coordinate system in the PDF was obscuring the actual raster image resolution. However, that can not be validated, at least with the options I have tried with VALIDATE.

My attempted course of action was to convert both the TIFF and PDF to a common image format, and then compare them. I attempted this, but found the PDF to GIF conversion was very bad, and of course failed in comparison (note file sizes):
07/07/2009 01:26 PM 150,745 1460D0100_SH1.pdf
06/02/2009 04:26 PM 148,590 1460D0100_SH1.tif
07/13/2009 04:23 PM 23,245 cmpr_pdf.gif
07/13/2009 04:23 PM 360,890 cmpr_tiff.gif

I had also thought that if I could extract a raw uncompressed raster image from each, then do a md5 checksum that may also offer some way of validation of the images. However, I am unfamiliar enough that I can not do that with either the TIFF or PDF.

Hence, I am at a loss for a course of action, to both make the PDF conversion yield a proper resolution and be PDF/A compliant, and for a mechanism to provide quality assurance that the TIFF and the resulting PDF images are the same, within a reasonable tolerance.

Thank you in advance for your assistance.

Regards
John Lopez
PDM Architect
Goodrich Corp.

Re: compare TIFF and PDF files

Posted: 2009-07-13T16:14:44-07:00
by fmw42
are the tif and pdf exactly the same width and height? that message, I believe, should only occur if the two images are not the same width and height. might you have different virtual canvases

what do you get from

identify -vebose imagename

on both images?


can you provide and example of a tif and pdf?

Re: compare TIFF and PDF files

Posted: 2009-07-14T05:39:33-07:00
by john.l.lopez
FMW42 - thanks for your reply.

the images are slightly different in size which could account for the error. The TIFF is:
Resolution: 400x400
Print size: 17.92x12.8675

and the PDF is:
Resolution: 72x72
Print size: 17.9167x12.8611

Also note the differential in geometry specifications:
Tiff: Geometry: 7168x5147+0+0 (17.92*400 x 12.8675*400)
PDF: Geometry: 1290x926+0+0 (17.9167*72 x 12.8611*72)

Per one of the questions above, the resolution change is a quandry. Is this an actual change to the image resolution, as rendered at conversion time, or is this the PDF user coordinate system obfuscating the actual image resolution with the PDF canvas normalization of 72x72 PPI? From the geometry specs it looks like the image rendering is actually loosing resolution on conversion, and perhaps a change of size as well.

Unfortunately, I can not supply the file(s) as it is export controlled.

Thanks for your help on this. Below is the output from identify.

regards,

John

# identify -verbose 1460D0100_SH1.tif
Image: 1460D0100_SH1.tif
Format: TIFF (Tagged Image File Format)
Class: DirectClass
Geometry: 7168x5147+0+0
Resolution: 400x400
Print size: 17.92x12.8675
Units: PixelsPerInch
Type: Bilevel
Base type: Bilevel
Endianess: MSB
Colorspace: RGB
Depth: 1-bit
Channel depth:
gray: 1-bit
Channel statistics:
gray:
min: 0 (0)
max: 1 (1)
mean: 0.972868 (0.972868)
standard deviation: 0.162467 (0.162467)
Histogram:
35892705: (255,255,255) #FFFFFF white
1000991: ( 0, 0, 0) #000000 black
Rendering intent: Undefined
Interlace: None
Background color: white
Border color: rgb(223,223,223)
Matte color: grey74
Transparent color: black
Page geometry: 7168x5147+0+0
Dispose: Undefined
Iterations: 0
Compression: Group4
Orientation: TopLeft
Properties:
create-date: 2009-07-13T18:43:00+00:00
modify-date: 2009-06-02T20:26:08+00:00
signature: cebe4df0d2246fb0a85a41b6cf77a96661085d6bd0c3575e4f9e5bce80663701
tiff:rows-per-strip: 5147
tiff:timestamp: 2000:12:13 10:38:09
Artifacts:
verbose: true
Tainted: False
Filesize: 145kb
Number pixels: 35.18mb
Pixels per second: 46.91mb
User time: 0.750u
Elapsed time: 0:01
Version: ImageMagick 6.4.7 2008-12-15 Q16 http://www.imagemagick.org

# identify -verbose 1460D0100_SH1.pdf
Image: 1460D0100_SH1.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 1290x926+0+0
Resolution: 72x72
Print size: 17.9167x12.8611
Units: Undefined
Type: Bilevel
Base type: Bilevel
Endianess: Undefined
Colorspace: RGB
Depth: 16/1-bit
Channel depth:
gray: 1-bit
Channel statistics:
gray:
min: 0 (0)
max: 65535 (1)
mean: 63729.8 (0.972455)
standard deviation: 10725.9 (0.163666)
Histogram:
1161636: (65535,65535,65535) #FFFFFFFFFFFF white
32904: ( 0, 0, 0) #000000000000 black
Rendering intent: Undefined
Interlace: None
Background color: white
Border color: rgb(223,223,223)
Matte color: grey74
Transparent color: black
Page geometry: 1290x926+0+0
Dispose: Undefined
Iterations: 0
Compression: Undefined
Orientation: Undefined
Properties:
create-date: 2009-07-14T12:22:06+00:00
modify-date: 2009-07-14T12:22:06+00:00
pdf:HiResBoundingBox: 1290.24x926.46+0+0
pdf:Version: PDF-1.6
signature: 4e807e125a11596f3ce5b604b05745c122e58aa455a6ea655b320170f785b59b
Artifacts:
verbose: true
Tainted: False
Filesize: 147kb
Number pixels: 1.139mb
Version: ImageMagick 6.4.7 2008-12-15 Q16 http://www.imagemagick.org

Re: compare TIFF and PDF files

Posted: 2009-07-14T06:53:05-07:00
by john.l.lopez
I did get release from Export Control to post a redacted drawing for evaluation. The statistics follow. Note that this one was done with Acrobat 5 vs. the first example with Acrobat 9, both using a custom Batch Processing macro.


# compare -fuzz 10% RedactedCopy.tif RedactedCopy.pdf cmpr01.gif
compare: image size differs `RedactedCopy.tif' @ compare.c/CompareImageChannels/153.

# identify -verbose RedactedCopy.tif
Image: RedactedCopy.tif
Format: TIFF (Tagged Image File Format)
Class: DirectClass
Geometry: 7168x5147+0+0
Resolution: 96x96
Print size: 74.6667x53.6146
Units: PixelsPerInch
Type: Bilevel
Base type: Bilevel
Endianess: MSB
Colorspace: RGB
Depth: 1-bit
Channel depth:
gray: 1-bit
Channel statistics:
gray:
min: 0 (0)
max: 1 (1)
mean: 0.980462 (0.980462)
standard deviation: 0.138407 (0.138407)
Histogram:
36172854: (255,255,255) #FFFFFF white
720842: ( 0, 0, 0) #000000 black
Rendering intent: Undefined
Interlace: None
Background color: white
Border color: rgb(223,223,223)
Matte color: grey74
Transparent color: black
Page geometry: 7168x5147+0+0
Dispose: Undefined
Iterations: 0
Compression: LZW
Orientation: TopLeft
Properties:
create-date: 2009-07-14T13:10:58+00:00
modify-date: 2009-07-14T13:10:58+00:00
signature: 9d4259d169f955ed4c7772e6e31eca5927f85768a9ee8821957cbbbffa7d5c6c
tiff:rows-per-strip: 13
tiff:timestamp: 2000:12:13 10:38:09
Artifacts:
verbose: true
Tainted: False
Filesize: 250kb
Number pixels: 35.18mb
Pixels per second: 161.4mb
User time: 0.219u
Elapsed time: 0:01
Version: ImageMagick 6.4.7 2008-12-15 Q16 http://www.imagemagick.org

# identify -verbose RedactedCopy.pdf
Image: RedactedCopy.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 5376x3860+0+0
Resolution: 72x72
Print size: 74.6667x53.6111
Units: Undefined
Type: Bilevel
Base type: Bilevel
Endianess: Undefined
Colorspace: RGB
Depth: 16/1-bit
Channel depth:
gray: 1-bit
Channel statistics:
gray:
min: 0 (0)
max: 65535 (1)
mean: 64260.2 (0.980547)
standard deviation: 9051.03 (0.13811)
Histogram:
20347689: (65535,65535,65535) #FFFFFFFFFFFF white
403671: ( 0, 0, 0) #000000000000 black
Rendering intent: Undefined
Interlace: None
Background color: white
Border color: rgb(223,223,223)
Matte color: grey74
Transparent color: black
Page geometry: 5376x3860+0+0
Dispose: Undefined
Iterations: 0
Compression: Undefined
Orientation: Undefined
Properties:
create-date: 2009-07-14T13:13:28+00:00
modify-date: 2009-07-14T13:13:29+00:00
pdf:HiResBoundingBox: 5376x3860.25+0+0
pdf:Version: PDF-1.4
signature: be0be6a9c1f87eacf5626027c46b238bd8c3a35bb0e8131169ac33f74442592d
Artifacts:
verbose: true
Tainted: False
Filesize: 2.474mb
Number pixels: 19.79mb
Pixels per second: 158.3mb
User time: 0.125u
Elapsed time: 0:01
Version: ImageMagick 6.4.7 2008-12-15 Q16 http://www.imagemagick.org

Re: compare TIFF and PDF files

Posted: 2009-07-14T06:57:16-07:00
by john.l.lopez
PS: how do I upload files to the forum? I assume the [IMG] tag is for web accessible references.

Re: compare TIFF and PDF files

Posted: 2009-07-14T10:47:12-07:00
by fmw42
Your last set of images appear to have been modified so that the print size is the same in inches. That is the pixel dimensions and resolution (dpi) were adjusted to make them print to the same size. However, each image has a different pixel dimension. That is why you are getting the compare message as it only looks at the pixel dimensions (geometry).

So the problem is in your Adobe processing where you are somehow ensuring that the print size is the same rather than keeping the pixel dimensions the same.

Check out the IM function -rescale which does the same thing.

What has probably happened is that the dpi was changed and the pixel dimensions so that they print to the same size.


# identify -verbose RedactedCopy.tif
Image: RedactedCopy.tif
Geometry: 7168x5147+0+0
Resolution: 96x96
Print size: 74.6667x53.6146

# identify -verbose RedactedCopy.pdf
Geometry: 5376x3860+0+0
Resolution: 72x72
Print size: 74.6667x53.6111


Note that Geometry and Resolution are different, but the print size is (nearly) the same.

If you resize one image by the ratio of the Resolution, you can make the two image have the same Geometry.

5376*96/72=7168 and 3860*96/72=5147

so 96/72=1.33333333 or 72/96 = .75


Why don't you let IM do the conversion? It should keep the pixel sizes the same.

Otherwise, you will need to use -resize with one of the ratios above to convert one image to have the same size as the other image in order to use compare. But that will introduce another level of resampling and thus some slight additional minor blurring.

You cannot upload images to the forum. You have to put them on some (free) server and then link to them either with the URL or the IMG tag buttons.

Re: compare TIFF and PDF files

Posted: 2009-07-15T06:29:46-07:00
by john.l.lopez
Hi

yes ... that is exactly the problem.

What I need is a way to configure Acrobat so that it will NOT change the resolution on conversion of the TIFF to PDF.

Again, I am using a Batch Processing macro to to the conversion. I have note found anything in any of the available sequences or commands that allow me to force acrobat to retain the source raster images resolution. Likewise, I also need to force it to conform to PDF/A (ISO 32000-1).

Thanks again for your help.

Re: compare TIFF and PDF files

Posted: 2009-07-15T10:50:57-07:00
by fmw42
you can batch process your conversion in IM.