Convert many images to a pdf: which is first, convertion or combination?
Convert many images to a pdf: which is first, convertion or combination?
I would like to convert about ~250 images (png and jpg files) in a directory to a pdf.
Can you provide the best way to do it?
What are the cons and pros of the following two ways:
first use "convert" to convert each image from png or jpg to pdf, and then use "convert" to combine the pdf files into one;
use "convert * my.pdf" to do all in one command.
My requirement is that the whole process
takes reasonable amount of disk space, see viewtopic.php?f=1&t=26544
and doesnt' degrade the image quality, see viewtopic.php?f=1&t=26546
Can you provide the best way to do it?
What are the cons and pros of the following two ways:
first use "convert" to convert each image from png or jpg to pdf, and then use "convert" to combine the pdf files into one;
use "convert * my.pdf" to do all in one command.
My requirement is that the whole process
takes reasonable amount of disk space, see viewtopic.php?f=1&t=26544
and doesnt' degrade the image quality, see viewtopic.php?f=1&t=26546
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Convert many images to a pdf: which is first, convertion or combination?
Are your pdf files totally vector or do they have raster images in them? Note that IM will rasterize each vector image and the result will be a vector shell around a raster image. So IM is not a very good tool to combine vector images. see http://www.imagemagick.org/Usage/formats/#vector
Re: Convert many images to a pdf: which is first, convertion or combination?
All iamge are rasterfmw42 wrote:Are your pdf files totally vector or do they have raster images in them? Note that IM will rasterize each vector image and the result will be a vector shell around a raster image. So IM is not a very good tool to combine vector images. see http://www.imagemagick.org/Usage/formats/#vector
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Convert many images to a pdf: which is first, convertion or combination?
Then besides snibgo's suggestion of using a Q8 compile of IM (imagemagick), the only other way, might be to use some of the memory control features of IM (see below). Q8 is an 8-bit compile of IM. Q16 is a 16-bit compile of IM. So the Q8 will use only half the memory of Q16.
see -limit and http://www.imagemagick.org/Usage/files/#massive, but I am not an expert on that and do not know if that will help.
see -limit and http://www.imagemagick.org/Usage/files/#massive, but I am not an expert on that and do not know if that will help.
Re: Convert many images to a pdf: which is first, convertion or combination?
The best approach very much depends on the answers to following questions:Tim wrote:I would like to convert about ~250 images (png and jpg files) in a directory to a pdf.
- Are the individual images within this set of 250 small or big in terms of their {width}x{height} dimensions?
- If they are big, do you need to scale them down in order to let not grow your PDF file too large?
So, what are the sizes of your PNG and JPEG input files? What is the output of the following command, executed in your image directory?
Code: Select all
du -hsc *.png *.jpg *.jpeg
Re: Convert many images to a pdf: which is first, convertion or combination?
84M.pipitas wrote:The best approach very much depends on the answers to following questions:Tim wrote:I would like to convert about ~250 images (png and jpg files) in a directory to a pdf.
In general, a PDF created with 'convert my.jpg my.pdf' will have roughly the same size as the input JPEG.
- Are the individual images within this set of 250 small or big in terms of their {width}x{height} dimensions?
- If they are big, do you need to scale them down in order to let not grow your PDF file too large?
So, what are the sizes of your PNG and JPEG input files? What is the output of the following command, executed in your image directory?If you do not (need to) down-sample your images, the PDF you create will roughly be the same size as the 'du -hsc' command suggests...Code: Select all
du -hsc *.png *.jpg *.jpeg
I am not sure if what you ask me to consider is what I plan to do:
I will use -resize 2500x3072 to resize all the images (some are just 800x983) to the largest one among all the images, and use -density 300x300 for the spatial resolution for images in the resulting pdf file.
Is that okay? Do I miss anything?
Re: Convert many images to a pdf: which is first, convertion or combination?
I suggest you test the complete command chain you have in mind with a single test image.
The first command chain I have in mind is the following:
Now test what resolution and what PDF page size you got:
Within a very large PDF page, your (very large) image is displayed by any PDF viewer (if set to 100% zoom) at 72 PPI.
No let's make that PDF page smaller:
and test the outcome again:
The tool now reports three things:
The third point can be handled by adding additional parameters to the Ghostscript scaling command, should this concern you for quality reasons.
My advice is: do not mess with 'density 300' for smaller images. It will only blow up your individual PDF's file size without giving you any gain in quality. Convert each image as is, directly. Then use the Ghostscript command to scale all PDFs to the same standard (A4?, letter?) size each (if you need that). This way you maintain the best control over the quality you'll get and not blow up file sizes unnecessarily.
(For more hints about Ghostscript usage when downsampling images within files, or changing their color spaces, or downscaling pages, see f.e. the following link: "How to downsample images within PDF file?": http://stackoverflow.com/a/9571488/359307 )
The first command chain I have in mind is the following:
Code: Select all
convert some-2500x3072.jpg some-2500x3072.pdf
Code: Select all
pdfimages -list 2500x3072.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2500 3072 rgb 3 8 jpeg no 8 0 72 72 47.0K 0.2%
No let's make that PDF page smaller:
Code: Select all
gs -o 2500x3072-downsized.pdf -sDEVICE=pdfwrite -g8420x5950 -dPDFFitPage 2500x3072.pdf
Code: Select all
pdfimages -list 2500x3072-downsized.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 2500 3072 rgb 3 8 image no 8 0 247 372 27.9K 0.1%
- The embedded image still has the same amount of pixels (2500x3072).
- The image in relation to the new page size has a resolution in the X-direction of 247 PPI, in Y-direction of 372 PPI
- The image's file size has changed from 47.0kB to 27.9kB
The third point can be handled by adding additional parameters to the Ghostscript scaling command, should this concern you for quality reasons.
My advice is: do not mess with 'density 300' for smaller images. It will only blow up your individual PDF's file size without giving you any gain in quality. Convert each image as is, directly. Then use the Ghostscript command to scale all PDFs to the same standard (A4?, letter?) size each (if you need that). This way you maintain the best control over the quality you'll get and not blow up file sizes unnecessarily.
(For more hints about Ghostscript usage when downsampling images within files, or changing their color spaces, or downscaling pages, see f.e. the following link: "How to downsample images within PDF file?": http://stackoverflow.com/a/9571488/359307 )
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Convert many images to a pdf: which is first, convertion or combination?
Ah, very useful, excellent, thanks.pipitas wrote:... pdfimages ...
This answers the problem of extracting images from PDF files. That program writes one PAM file per image, no fuss, no worries about density. IM can read the PAM files in the usual way.
snibgo's IM pages: im.snibgo.com
Re: Convert many images to a pdf: which is first, convertion or combination?
The Windows version does not have all the options I used in my examples. It's still based on the original XPDF code base. On Linux, Mac OSX and Unix you can get a forked version (based on the "Poppler"-fork of XPDF), which added some more options (like the '-list' I made use of). Unfortunately, the Poppler-version is not readily available for Windows as a pre-compiled binary.snibgo wrote:Ah, very useful, excellent, thanks.pipitas wrote:... pdfimages ...
This answers the problem of extracting images from PDF files. That program writes one PAM file per image, no fuss, no worries about density. IM can read the PAM files in the usual way.
Re: Convert many images to a pdf: which is first, convertion or combination?
Thanks. I don't quite understand your reply yet.
According to your reply in the other question viewtopic.php?f=1&t=26544, I ran
The all.pdf file seems okay, about 82MB, and in a pdf viewer, all the pages take up the same-size space on the screen.
2500x3080 is the largest size (both largest width and largest height) in pixels among all the image files, and I guess I don't lose any quality and enlarge some small images.
Is there some problem with it?
Do you recommend to run the gs command on pdf file(s), before or after combining the pdf files into one using pdftk?
Why?
According to your reply in the other question viewtopic.php?f=1&t=26544, I ran
Code: Select all
for i in *.png; do convert ${i} -resize 2500x3080 -units PixelsPerInch -density 300x300 ${i/.png/.pdf} ; done
for i in *.jpg; do convert ${i} -resize 2500x3080 -units PixelsPerInch -density 300x300 ${i/.jpg/.pdf} ; done
pdftk *.pdf cat output all.pdf
2500x3080 is the largest size (both largest width and largest height) in pixels among all the image files, and I guess I don't lose any quality and enlarge some small images.
Is there some problem with it?
Do you recommend to run the gs command on pdf file(s), before or after combining the pdf files into one using pdftk?
Why?
pipitas wrote: No let's make that PDF page smaller:The third point can be handled by adding additional parameters to the Ghostscript scaling command, should this concern you for quality reasons.Code: Select all
gs -o 2500x3072-downsized.pdf -sDEVICE=pdfwrite -g8420x5950 -dPDFFitPage 2500x3072.pdf
My advice is: do not mess with 'density 300' for smaller images. It will only blow up your individual PDF's file size without giving you any gain in quality. Convert each image as is, directly. Then use the Ghostscript command to scale all PDFs to the same standard (A4?, letter?) size each (if you need that). This way you maintain the best control over the quality you'll get and not blow up file sizes unnecessarily.
(For more hints about Ghostscript usage when downsampling images within files, or changing their color spaces, or downscaling pages, see f.e. the following link: "How to downsample images within PDF file?": http://stackoverflow.com/a/9571488/359307 )
Re: Convert many images to a pdf: which is first, convertion or combination?
What's also interesting for future readers of this thread: how much faster was this method, compared to your original one, do it asTim wrote:The all.pdf file seems okay
Code: Select all
convert * out.pdf
This is because your PDF viewer very likely by default is set to "scale to fit" (maybe that is even the setting contained within the PDF files -- I didnt check; if so, it will only take over, if the viewer does not override it).Tim wrote:in a pdf viewer, all the pages take up the same-size space on the screen,
Watch what the "Zoom" control in the window menu bar tells you when you toggle between pages which originated from smaller or larger images.Tim wrote:all the pages take up the same-size space on the screen
I do not recommend it. I'm offering it as an optional step, in case it bothers you that the individual page sizes are different, or too big, formally speaking. But you may be happy with the file as it looks like...Tim wrote:Do you recommend to run the gs command on pdf file(s), before or after combining the pdf files into one using pdftk? Why?
To check for the real page sizes (which are hidden from you by the viewer's "scale to fit page" behavior), as well as for possible CropBox settings, run this command:
Code: Select all
pdfinfo -box -f 1 -l 300 all.pdf | less
In real life it doesn't matter how big the pages are scaled. As you see, there is always the "scale to page" setting in viewers -- the same is true for printer driver settings ("scale to page" or "scale to fit" -- mostly preselected as the default setting).
Re: Convert many images to a pdf: which is first, convertion or combination?
If the pages in a pdf file have different physical sizes (regardless of which zoom level is in use), which command can I use to scale all the pages to have the same physical size?pipitas wrote:
IF you want to scale the pages to equal (and smaller) sizes, do it after combining the individual PDFs. Why? It's only one command then. Otherwise you have to do it for each page individually...
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Convert many images to a pdf: which is first, convertion or combination?
It is, through Cygwin.pipitas wrote:Unfortunately, the Poppler-version is not readily available for Windows as a pre-compiled binary.
snibgo's IM pages: im.snibgo.com
Re: Convert many images to a pdf: which is first, convertion or combination?
Ah. Good to know! -- Thanks for the hint.snibgo wrote:It is, through Cygwin.pipitas wrote:Unfortunately, the Poppler-version is not readily available for Windows as a pre-compiled binary.