Page 1 of 1

PDF to JPG page size and resolution issues

Posted: 2007-07-21T18:28:39-07:00
by Clive Webster
I'm using Ruby Magic - ie ImageMagick Version 6.3.0 11/22/06 Q8

My app requires that a jpg preview is of the same physical size (ie inches or centimeters) as the source image. These source images may be TIFF, PDF, EPS, JPG etc, etc of ANY size. I'm trying to then produce JPG RGB previews that are 300 DPI but the output Dimensions MUST equal to the input Dimensions ie a PDF of 85mmx55mm (biz card size) must result in a 300dpi JPG of 85mmx55mm and a letterhead PDF of 210mmx297mm must reulst in a 300dpi JPG of 210mmx297mm. So in other words:- the input files do NOT have any fixed Postscript page size.

For example: I'm reading a PDF called 'bate_optionsub_spot.pdf'

If I do an identify I get:-

Code: Select all

identify -verbose bate_optionsub_spot.pdf
Image: bate_optionsub_spot.pdf
  Format: PDF (Portable Document Format)
  Class: DirectClass
  Geometry: 255x103
  Type: ColorSeparation
  Endianess: Undefined
  Colorspace: CMYK
  Channel depth:
    Cyan: 8-bit
    Magenta: 8-bit
    Yellow: 8-bit
    Black: 8-bit
  Channel statistics:
    Cyan:
      Min: 0 (0)
      Max: 250 (0.980392)
      Mean: 3.70805 (0.0145414)
      Standard deviation: 29.1035 (0.114131)
    Magenta:
      Min: 0 (0)
      Max: 144 (0.564706)
      Mean: 2.13547 (0.00837437)
      Standard deviation: 16.7621 (0.0657337)
    Yellow:
      Min: 0 (0)
      Max: 33 (0.129412)
      Mean: 0.48597 (0.00190576)
      Standard deviation: 3.83109 (0.0150239)
    Black:
      Min: 0 (0)
      Max: 255 (1)
      Mean: 37.3345 (0.14641)
      Standard deviation: 85.9152 (0.336922)
  Total ink density: 169%
  Colors: 41
  Histogram:
     21146: #00000000 black
        79: #00000022 black
       191: #00000044 black
       358: #00000088 black
        53: #00000055 black
        42: #00000077 black
        67: #00000066 black
        72: #000000DD black
      3027: #000000FF black
       111: #000000EE black
        80: #00000099 black
       124: #00000011 black
        89: #00000033 black
        84: #000000CC black
        50: #000000AA black
       198: #000000BB black
         2: #000000D8 black
         1: #000000F0 black
         1: #000000E2 black
         2: #000000D0 black
         1: #000000E6 black
         1: #000000E1 black
         1: #000000C9 black
         1: #000000B8 black
         1: #000000D7 black
         1: #000000E9 black
         5: #10090200 cmyk(16,9,2,0)
         2: #21130400 cmyk(33,19,4,0)
         7: #321C0600 cmyk(50,28,6,0)
        84: #42260801 cmyk(66,38,8,1)
         2: #53300B01 cmyk(83,48,11,1)
         2: #64390D01 cmyk(100,57,13,1)
         6: #75430F01 cmyk(117,67,15,1)
        22: #854D1102 cmyk(133,77,17,2)
         1: #96561302 cmyk(150,86,19,2)
         2: #A7601602 cmyk(167,96,22,2)
         1: #B86A1802 cmyk(184,106,24,2)
         1: #C8731A03 cmyk(200,115,26,3)
         4: #D97D1C03 cmyk(217,125,28,3)
         8: #EA861E03 cmyk(234,134,30,3)
       335: #FA902104 cmyk(250,144,33,4)
  Rendering intent: Undefined
  Resolution: 28.35x28.35
  Units: PixelsPerCentimeter
  Filesize: 107.211kb
  Interlace: None
  Background color: white
  Border color: cmyk(223,223,223,0)
  Matte color: grey74
  Transparent color: black
  Page geometry: 255x103+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: Undefined
  Orientation: Undefined
  Signature: 414b03ff74f1c58fc3c05701ce7ab774b1ea5f26160f7deea522bc1e066e670b
  Tainted: False
  Version: ImageMagick 6.3.0 11/22/06 Q8 http://www.imagemagick.org

Which when I look at the original Media/Crop Box in the Pdf (or view it in Acrobat Professional) seems to be correct ie 3.55 x 1.43 inches = 255x103 pixels at 28.35 DotsPerCentimeter.

If I enter:

Code: Select all

convert -density 300x300 -units PixelsPerInch -verbose bate_optionsub_spot.pdf out.jpg
Then the verbose listing gives:

Code: Select all

[ghostscript library] -q -dBATCH -dSAFER -dMaxBitmap=500000000 -dNOPAUSE -dAlign
ToPixels=0 "-sDEVICE=bmpsep8" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-g1063x42
9" "-r300x300"  "-sOutputFile=C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6
kF6d" "-fC:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-HmTuK5fw" "-fC:/DOCUME~1/
CLIVEW~1.CLI/LOCALS~1/Temp/magick-johD5.JT"C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Tem
p/magick-Xma6kF6d[0] BMP 1063x429 1063x429+0+0 PseudoClass 256c 8-bit 1.74535mb
0.040u 0:01
C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6kF6d[1] BMP 1063x429 1063x429+
0+0 PseudoClass 256c 8-bit 1.74535mb 0.030u 0:01
C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6kF6d[2] BMP 1063x429 1063x429+
0+0 PseudoClass 256c 8-bit 1.74535mb 0.020u 0:01
C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6kF6d[3] BMP 1063x429 1063x429+
0+0 PseudoClass 256c 8-bit 1.74535mb 0.010u 0:01
bate_optionsub_spot.pdf PDF 1063x429 1063x429+0+0 DirectClass 8-bit 1.74535mb 0.
150u 0:01
bate_optionsub_spot.pdf=>out.jpg PDF 1063x429 1063x429+0+0 DirectClass 8-bit 69.
4238kb 0.090u 0:01
And the resultant image (in Photoshop) is:
  • pixels: 1063 x 429
    resolution: 762 dpi
    dimension: 1.395 x 0.563 inches
Which is obviously WRONG- where does that 762 DPI come from?


So I tried to do the same thing direct in Ghostscript:

Code: Select all

GSWIN32C -dNOPAUSE -r300 -dBATCH -sDEVICE=jpeg -sOutputFile="out.jpg" -dFirstPage=1 -dLastPage=1 
-GraphicsAlphaBits=4 bate_optionsub_spot.pdf
And got the output

Code: Select all

AFPL Ghostscript 8.54 (2006-05-17)
Copyright (C) 2005 artofcode LLC, Benicia, CA.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Loaded into Photoshop this gave me:-
  • pixels: 1064 x 428
    resolution: 300 dpi
    dimension: 3.547 x 1.427 inches
Which is CORRECT !


So Ghostscript seems to work out the source page size from the CropBox whereas ImageMagick doesn't (or at least I cant figure out why it ends up at 762dpi)


Then I discovered that I can change what ImageMagick did by changing the Postscript Page Size. This seems bizarre - since ImageMagick already knows what the page size is (see output of identify above) - and I'm guessing that this may relate to other posts where the page size of each PDF page size is different. Surely IM can ascertain the page size from either the MediaBox or the CropBox for each page in the PDF ?!

So my solution to this (via ruby) in order to get a 300dpi image is to:

Code: Select all

img=Magick::Image::read(srcFolder + '\\' + file).first
followed by

Code: Select all

if img.format == "PDF"
  images = Magick::Image::read(srcFolder + '\\' + file){
    self.density = "300x300"  # re-read the image sampling at 300 dpi - else the image is an exploded 72 dpi and is 'fuzzy'
    self.units = Magick::PixelsPerInchResolution
    }
  img = images.first
  img.x_resolution = 300  # reset the parameters - else the image is saved at some obscure DPI
  img.y_resolution = 300
  img.units = Magick::PixelsPerInchResolution
end
So my questions are:-
1. When you've told IM to read at 300dpi why do you need to reset the resolution on output to remind it what it has received? Is there a better way that doesn't require a new 'read' of the PDF?
2. For other images types (TIFF etc-) then you just need to 'img=Magick::Image::read(srcFolder + '\\' + file).first' followed by the below. Why is this different from compared to the PDF format?

Code: Select all

          
          # change resolution to DPI
          if img.units == Magick::PixelsPerCentimeterResolution
          	img.x_resolution = img.x_resolution * 2.54
          	img.y_resolution = img.y_resolution * 2.54
             img.units = Magick::PixelsPerInchResolution
          end
          
          # Write the hires image (limited to 300 dpi) in the original color space, at 100% quality
          if img.x_resolution.to_i > 300
            img = img.resample(300,300)
          end
          img.write(hiFolder + '\\' + destfile + ".jpg"){ self.quality = 100 }