resolution of PDF files forced to 72x72 even when -density is used

atariZen · Post by **atariZen** » 2018-03-04T08:35:50-07:00

I have a 600 dpi PBM file, and I simply want to wrap in a PDF with no changes. I would expect this to work, but it gets downsampled to 72x72:

Code: Select all

$ convert source.pbm target.pdf
$ identify -verbose target.pdf | grep -i reso
  Resolution: 72x72

So I also tried to brute-force it to maintain the 600 dpi:

Code: Select all

$ convert -units pixelsperinch source.pbm -density 600 target.pdf
$ identify -verbose target.pdf | grep -i reso
  Resolution: 72x72

I also tried replacing "-density" with "-resample" in the above attempt and same result. I'm first wondering if perhaps it's correct, and the "identify" command is lying when it comes to PDF containers. So I tried extracting the images:

Code: Select all

$ pdfimages -all target.pdf img
$ identify -verbose img-000.jpg | grep -i reso
<no output>

The first bizarre finding is that the extracted images are JPG, considering the input images were pbm and no changes should be made to them. Then the next astonishment is that these JPG images have no resolution (yet they display just fine). However, in the GUI tool that displays the JPG, the properties are said to be 300 dpi.

Post by **snibgo** » 2018-03-04T09:41:04-07:00

When raster images are wrapped inside a PDF, there are multiple resolutions: that of each raster image, and that of the overall PDF. "identify" reports just the overall PDF resolution. pdfimages reports the resolution of each raster image.

You should use "-units", eg:

Code: Select all

f:\web\im>%IM%convert toes.png -density 600 -units pixelsperinch t.pdf

f:\web\im>pdfimages -list t.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image     267   233  rgb     3   8  image  no         8  0   600   601  156K  86%

Why is y-ppi 601? I don't know.

"-compress" will direct the compression method, if any.

Some (old) versions of pdfimages only create JPG outputs.

atariZen · Post by **atariZen** » 2018-03-05T11:51:07-07:00

Thanks snibgo, you've cleared some things up. I'm happy to hear about "pdfimages -list".. that's quite useful.

I now have a working solution that's verifiable. But I will mention some annoyances to warn others, perhaps to also serve as a note to developers:

* ImageMagick alters the resolution when it's not told to do so. This fails the rule of least astonishment. E.g.

Code: Select all

$ convert source_600dpi.pbm target_72dpi.pdf
$ pdfimages -list target_72dpi.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    5100  6601  gray    1   8  image  no         8  0    72    72  690K 2.1%

That's not good. I didn't tell it to downsample 600dpi to 72dpi. But I'm glad I can at least force convert to do the right thing using -density.

* The "identify -verbose" command often gives no resolution for raster images, which must have a resolution. And for PDFs, you say it gives the "overall" resolution, but when the PDF is nothing other than a single raster image and nothing else, I expect the overall resolution to match that of the embedded image. Since it's always showing 72dpi, I suspect the PDF may contain a resolution for rendering/display property. No big deal, but a PDF is perhaps the one case where it would actually be sensible for the identify command to omit resolution, and in fact it's giving something that mismatches the objects inside.

* Regarding pdfimages, my version supports the "-all" parameter, which extracts images without conversion. When I convert a pbm to a pdf, ImageMagick apparently converts the pbm to a png file before wrapping it in a PDF even if I supply "-compress none". Unless perhaps there is some inherent problem with embedding pbm files in a PDF, this is unexpected.

Anyway, I can live with these things. Thanks for the help.

muccigrosso · Post by **muccigrosso** » 2018-03-05T12:40:13-07:00

atariZen wrote: ↑2018-03-05T11:51:07-07:00 * The "identify -verbose" command often gives no resolution for raster images, which must have a resolution. And for PDFs, you say it gives the "overall" resolution, but when the PDF is nothing other than a single raster image and nothing else, I expect the overall resolution to match that of the embedded image. Since it's always showing 72dpi, I suspect the PDF may contain a resolution for rendering/display property. No big deal, but a PDF is perhaps the one case where it would actually be sensible for the identify command to omit resolution, and in fact it's giving something that mismatches the objects inside.

Yeah, PDFs are a pain. But it does make sense that they have their own resolution. Imagine taking a small 1" square high-res image and putting into a letter-sized PDF where it occupies the whole page. What's the resolution of the image now? It's not the same as it would be if you extracted the image from the PDF. Or the opposite case in which you take a large low-res image and squeeze it down so that it fits onto a PDF page.

In any case, I always use pdfimages to get images out of PDFs. You can extract most of them in their native formats, though not jbig2, which is pesky because it gets used a lot for real 2-bit images in my experience. A fax-quality tiff comes close, but maybe double the size.

Post by **snibgo** » 2018-03-06T01:55:45-07:00

atariZen wrote:ImageMagick alters the resolution when it's not told to do so. [...]
$ convert source_600dpi.pbm target_72dpi.pdf

The PBM format has no resolution metadata. Including "600dpi" in the name doesn't make it so. So the image resolution hasn't changed, but merely been set to 72 DPI.

atariZen wrote:I didn't tell it to downsample 600dpi to 72dpi.

"Downsample" implies that pixels have been re-sampled. When this happens, the number of pixels changes. Setting a different density doesn't also re-sample or downsample. The "-resample" operation changes both the density and number of pixels.

atariZen wrote:... when the PDF is nothing other than a single raster image and nothing else, I expect the overall resolution to match that of the embedded image.

That's a common expectation, but mistaken.

Legacy ImageMagick Discussions Archive

resolution of PDF files forced to 72x72 even when -density is used

resolution of PDF files forced to 72x72 even when -density is used

Re: resolution of PDF files forced to 72x72 even when -density is used

Re: resolution of PDF files forced to 72x72 even when -density is used

Re: resolution of PDF files forced to 72x72 even when -density is used

Re: resolution of PDF files forced to 72x72 even when -density is used