PDF to JPG page size and resolution issues
Posted: 2007-07-21T18:28:39-07:00
I'm using Ruby Magic - ie ImageMagick Version 6.3.0 11/22/06 Q8
My app requires that a jpg preview is of the same physical size (ie inches or centimeters) as the source image. These source images may be TIFF, PDF, EPS, JPG etc, etc of ANY size. I'm trying to then produce JPG RGB previews that are 300 DPI but the output Dimensions MUST equal to the input Dimensions ie a PDF of 85mmx55mm (biz card size) must result in a 300dpi JPG of 85mmx55mm and a letterhead PDF of 210mmx297mm must reulst in a 300dpi JPG of 210mmx297mm. So in other words:- the input files do NOT have any fixed Postscript page size.
For example: I'm reading a PDF called 'bate_optionsub_spot.pdf'
If I do an identify I get:-
Which when I look at the original Media/Crop Box in the Pdf (or view it in Acrobat Professional) seems to be correct ie 3.55 x 1.43 inches = 255x103 pixels at 28.35 DotsPerCentimeter.
If I enter:
Then the verbose listing gives:
And the resultant image (in Photoshop) is:
So I tried to do the same thing direct in Ghostscript:
And got the output
Loaded into Photoshop this gave me:-
So Ghostscript seems to work out the source page size from the CropBox whereas ImageMagick doesn't (or at least I cant figure out why it ends up at 762dpi)
Then I discovered that I can change what ImageMagick did by changing the Postscript Page Size. This seems bizarre - since ImageMagick already knows what the page size is (see output of identify above) - and I'm guessing that this may relate to other posts where the page size of each PDF page size is different. Surely IM can ascertain the page size from either the MediaBox or the CropBox for each page in the PDF ?!
So my solution to this (via ruby) in order to get a 300dpi image is to:
followed by
So my questions are:-
1. When you've told IM to read at 300dpi why do you need to reset the resolution on output to remind it what it has received? Is there a better way that doesn't require a new 'read' of the PDF?
2. For other images types (TIFF etc-) then you just need to 'img=Magick::Image::read(srcFolder + '\\' + file).first' followed by the below. Why is this different from compared to the PDF format?
My app requires that a jpg preview is of the same physical size (ie inches or centimeters) as the source image. These source images may be TIFF, PDF, EPS, JPG etc, etc of ANY size. I'm trying to then produce JPG RGB previews that are 300 DPI but the output Dimensions MUST equal to the input Dimensions ie a PDF of 85mmx55mm (biz card size) must result in a 300dpi JPG of 85mmx55mm and a letterhead PDF of 210mmx297mm must reulst in a 300dpi JPG of 210mmx297mm. So in other words:- the input files do NOT have any fixed Postscript page size.
For example: I'm reading a PDF called 'bate_optionsub_spot.pdf'
If I do an identify I get:-
Code: Select all
identify -verbose bate_optionsub_spot.pdf
Image: bate_optionsub_spot.pdf
Format: PDF (Portable Document Format)
Class: DirectClass
Geometry: 255x103
Type: ColorSeparation
Endianess: Undefined
Colorspace: CMYK
Channel depth:
Cyan: 8-bit
Magenta: 8-bit
Yellow: 8-bit
Black: 8-bit
Channel statistics:
Cyan:
Min: 0 (0)
Max: 250 (0.980392)
Mean: 3.70805 (0.0145414)
Standard deviation: 29.1035 (0.114131)
Magenta:
Min: 0 (0)
Max: 144 (0.564706)
Mean: 2.13547 (0.00837437)
Standard deviation: 16.7621 (0.0657337)
Yellow:
Min: 0 (0)
Max: 33 (0.129412)
Mean: 0.48597 (0.00190576)
Standard deviation: 3.83109 (0.0150239)
Black:
Min: 0 (0)
Max: 255 (1)
Mean: 37.3345 (0.14641)
Standard deviation: 85.9152 (0.336922)
Total ink density: 169%
Colors: 41
Histogram:
21146: #00000000 black
79: #00000022 black
191: #00000044 black
358: #00000088 black
53: #00000055 black
42: #00000077 black
67: #00000066 black
72: #000000DD black
3027: #000000FF black
111: #000000EE black
80: #00000099 black
124: #00000011 black
89: #00000033 black
84: #000000CC black
50: #000000AA black
198: #000000BB black
2: #000000D8 black
1: #000000F0 black
1: #000000E2 black
2: #000000D0 black
1: #000000E6 black
1: #000000E1 black
1: #000000C9 black
1: #000000B8 black
1: #000000D7 black
1: #000000E9 black
5: #10090200 cmyk(16,9,2,0)
2: #21130400 cmyk(33,19,4,0)
7: #321C0600 cmyk(50,28,6,0)
84: #42260801 cmyk(66,38,8,1)
2: #53300B01 cmyk(83,48,11,1)
2: #64390D01 cmyk(100,57,13,1)
6: #75430F01 cmyk(117,67,15,1)
22: #854D1102 cmyk(133,77,17,2)
1: #96561302 cmyk(150,86,19,2)
2: #A7601602 cmyk(167,96,22,2)
1: #B86A1802 cmyk(184,106,24,2)
1: #C8731A03 cmyk(200,115,26,3)
4: #D97D1C03 cmyk(217,125,28,3)
8: #EA861E03 cmyk(234,134,30,3)
335: #FA902104 cmyk(250,144,33,4)
Rendering intent: Undefined
Resolution: 28.35x28.35
Units: PixelsPerCentimeter
Filesize: 107.211kb
Interlace: None
Background color: white
Border color: cmyk(223,223,223,0)
Matte color: grey74
Transparent color: black
Page geometry: 255x103+0+0
Dispose: Undefined
Iterations: 0
Compression: Undefined
Orientation: Undefined
Signature: 414b03ff74f1c58fc3c05701ce7ab774b1ea5f26160f7deea522bc1e066e670b
Tainted: False
Version: ImageMagick 6.3.0 11/22/06 Q8 http://www.imagemagick.org
Which when I look at the original Media/Crop Box in the Pdf (or view it in Acrobat Professional) seems to be correct ie 3.55 x 1.43 inches = 255x103 pixels at 28.35 DotsPerCentimeter.
If I enter:
Code: Select all
convert -density 300x300 -units PixelsPerInch -verbose bate_optionsub_spot.pdf out.jpg
Code: Select all
[ghostscript library] -q -dBATCH -dSAFER -dMaxBitmap=500000000 -dNOPAUSE -dAlign
ToPixels=0 "-sDEVICE=bmpsep8" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-g1063x42
9" "-r300x300" "-sOutputFile=C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6
kF6d" "-fC:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-HmTuK5fw" "-fC:/DOCUME~1/
CLIVEW~1.CLI/LOCALS~1/Temp/magick-johD5.JT"C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Tem
p/magick-Xma6kF6d[0] BMP 1063x429 1063x429+0+0 PseudoClass 256c 8-bit 1.74535mb
0.040u 0:01
C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6kF6d[1] BMP 1063x429 1063x429+
0+0 PseudoClass 256c 8-bit 1.74535mb 0.030u 0:01
C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6kF6d[2] BMP 1063x429 1063x429+
0+0 PseudoClass 256c 8-bit 1.74535mb 0.020u 0:01
C:/DOCUME~1/CLIVEW~1.CLI/LOCALS~1/Temp/magick-Xma6kF6d[3] BMP 1063x429 1063x429+
0+0 PseudoClass 256c 8-bit 1.74535mb 0.010u 0:01
bate_optionsub_spot.pdf PDF 1063x429 1063x429+0+0 DirectClass 8-bit 1.74535mb 0.
150u 0:01
bate_optionsub_spot.pdf=>out.jpg PDF 1063x429 1063x429+0+0 DirectClass 8-bit 69.
4238kb 0.090u 0:01
- pixels: 1063 x 429
resolution: 762 dpi
dimension: 1.395 x 0.563 inches
So I tried to do the same thing direct in Ghostscript:
Code: Select all
GSWIN32C -dNOPAUSE -r300 -dBATCH -sDEVICE=jpeg -sOutputFile="out.jpg" -dFirstPage=1 -dLastPage=1
-GraphicsAlphaBits=4 bate_optionsub_spot.pdf
Code: Select all
AFPL Ghostscript 8.54 (2006-05-17)
Copyright (C) 2005 artofcode LLC, Benicia, CA. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
- pixels: 1064 x 428
resolution: 300 dpi
dimension: 3.547 x 1.427 inches
So Ghostscript seems to work out the source page size from the CropBox whereas ImageMagick doesn't (or at least I cant figure out why it ends up at 762dpi)
Then I discovered that I can change what ImageMagick did by changing the Postscript Page Size. This seems bizarre - since ImageMagick already knows what the page size is (see output of identify above) - and I'm guessing that this may relate to other posts where the page size of each PDF page size is different. Surely IM can ascertain the page size from either the MediaBox or the CropBox for each page in the PDF ?!
So my solution to this (via ruby) in order to get a 300dpi image is to:
Code: Select all
img=Magick::Image::read(srcFolder + '\\' + file).first
Code: Select all
if img.format == "PDF"
images = Magick::Image::read(srcFolder + '\\' + file){
self.density = "300x300" # re-read the image sampling at 300 dpi - else the image is an exploded 72 dpi and is 'fuzzy'
self.units = Magick::PixelsPerInchResolution
}
img = images.first
img.x_resolution = 300 # reset the parameters - else the image is saved at some obscure DPI
img.y_resolution = 300
img.units = Magick::PixelsPerInchResolution
end
1. When you've told IM to read at 300dpi why do you need to reset the resolution on output to remind it what it has received? Is there a better way that doesn't require a new 'read' of the PDF?
2. For other images types (TIFF etc-) then you just need to 'img=Magick::Image::read(srcFolder + '\\' + file).first' followed by the below. Why is this different from compared to the PDF format?
Code: Select all
# change resolution to DPI
if img.units == Magick::PixelsPerCentimeterResolution
img.x_resolution = img.x_resolution * 2.54
img.y_resolution = img.y_resolution * 2.54
img.units = Magick::PixelsPerInchResolution
end
# Write the hires image (limited to 300 dpi) in the original color space, at 100% quality
if img.x_resolution.to_i > 300
img = img.resample(300,300)
end
img.write(hiFolder + '\\' + destfile + ".jpg"){ self.quality = 100 }