How to convert large PDF with limit resources
How to convert large PDF with limit resources
We are using ImageMagick in our program to convert PDF files to images with -limit disk and -limit memory.
We do not want the program consume too much memory or disk space.
When the PDF files are very large, it naturally leads to the error CacheResourcesExhaused.
Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.
Looking forward to your suggestions
We do not want the program consume too much memory or disk space.
When the PDF files are very large, it naturally leads to the error CacheResourcesExhaused.
Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.
Looking forward to your suggestions
- GeeMack
- Posts: 718
- Joined: 2015-12-01T22:09:46-07:00
- Authentication code: 1151
- Location: Central Illinois, USA
Re: How to convert large PDF with limit resources
It would be helpful if you could describe your set-up, like what platform you're working on and what version of ImageMagick.liys_0 wrote:Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.
I use ImageMagick version 7 on a Windows 7 64 machine. I haven't done much with PDF files, but if it's a multi-page document you can open just one page, several individual pages, or a range of pages by specifying the page(s) in square brackets like these examples.
This would convert just page 5. (The indexing starts at number 0.)...
Code: Select all
convert -density 300 document1.pdf[4] -resize 2550x3300 document1_04.jpg
Code: Select all
convert -density 300 document1.pdf[6-11] -resize 2550x3300 -scene 6 document1_%02d.jpg
Code: Select all
convert -density 300 document1.pdf[6-11] -resize 2550x3300 -append document1_06-11.jpg
Re: How to convert large PDF with limit resources
ImageMagick may not be the best tool for this, depending on the source file.
If your source PDF container is simply one large raster image per page, with no vector graphics or other PDF features, then it's better to use the pdfimages tool to extract each image to a file, which is not computationally intensive, and in fact non-lossy as well. Of course, if these are fancier vector PDFs, then you must render each page, in which case you'll need ImageMagick, LaTeX, or the burst feature of pdftk.
If disk space is important is marginal lossyness is acceptable, consider converting the documents to the DjVu format.
If your source PDF container is simply one large raster image per page, with no vector graphics or other PDF features, then it's better to use the pdfimages tool to extract each image to a file, which is not computationally intensive, and in fact non-lossy as well. Of course, if these are fancier vector PDFs, then you must render each page, in which case you'll need ImageMagick, LaTeX, or the burst feature of pdftk.
If disk space is important is marginal lossyness is acceptable, consider converting the documents to the DjVu format.
Re: How to convert large PDF with limit resources
GeeMack wrote:It would be helpful if you could describe your set-up, like what platform you're working on and what version of ImageMagick.liys_0 wrote:Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.
I use ImageMagick version 7 on a Windows 7 64 machine. I haven't done much with PDF files, but if it's a multi-page document you can open just one page, several individual pages, or a range of pages by specifying the page(s) in square brackets like these examples.
This would convert just page 5. (The indexing starts at number 0.)...
This would convert pages 7 through 12 and output 6 individual files. (The operator "-scene 6" will start numbering the output files at 06.)...Code: Select all
convert -density 300 document1.pdf[4] -resize 2550x3300 document1_04.jpg
This would convert pages 7 through 12 then append them to a single very long file...Code: Select all
convert -density 300 document1.pdf[6-11] -resize 2550x3300 -scene 6 document1_%02d.jpg
Maybe you can stay inside your resource limit by building a "for" loop to process your PDFs in pieces smaller than the entire file at once.Code: Select all
convert -density 300 document1.pdf[6-11] -resize 2550x3300 -append document1_06-11.jpg
Thanks GeeMack. Your suggestion is helpful in some cases:)
We are now working on windows 7 using the version: ImageMagick-6.9.3-4-Q8-x86
Some of our PDF files only contain one very large page.
E.g. we use -density 200 to convert a one-page PDF to a .png file. The generated file is 26400x35200 pixels.
We have to increase the disk to 4GB for the conversion. As we know,it should not be the largest one from our users.
In this case, can we convert one page portion by portion to use less resources? : )
Re: How to convert large PDF with limit resources
Thanks for your suggestions : )atariZen wrote:ImageMagick may not be the best tool for this, depending on the source file.
If your source PDF container is simply one large raster image per page, with no vector graphics or other PDF features, then it's better to use the pdfimages tool to extract each image to a file, which is not computationally intensive, and in fact non-lossy as well. Of course, if these are fancier vector PDFs, then you must render each page, in which case you'll need ImageMagick, LaTeX, or the burst feature of pdftk.
If disk space is important is marginal lossyness is acceptable, consider converting the documents to the DjVu format.
The PDF files are vector PDFs. Do you mean that we convert the PDFs to Djvu first and then convert the Djvu to image files?
- GeeMack
- Posts: 718
- Joined: 2015-12-01T22:09:46-07:00
- Authentication code: 1151
- Location: Central Illinois, USA
Re: How to convert large PDF with limit resources
If you know the PDF is just a single page document, you could start by using ImageMagick to determine the output dimensions of the image using something like this from the command line...liys_0 wrote:In this case, can we convert one page portion by portion to use less resources? : )
Code: Select all
convert -density 200 largefile.pdf info:
Then when you know the size in pixels, you can read just a part of the PDF into IM's "convert" by putting the geometry of the requested portion in square brackets at the end of the file name.
For example, I checked the dimensions of my "largefile.pdf" using the command above and found it will make a 2400x2400 pixel PNG. Then if I want to get just the top left quarter of the PDF I would use a command like this...
Code: Select all
convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
Code: Select all
convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
convert -density 200 largefile.pdf[1200x1200+1200+0] -flatten part1B.png
convert -density 200 largefile.pdf[1200x1200+0+1200] -flatten part2A.png
convert -density 200 largefile.pdf[1200x1200+1200+1200] -flatten part2B.png
Code: Select all
convert ( part1A.png part1B.png +append ) ( part2A.png part2B.png +append ) -append largefile.png
To make the dis-assembly automated so it can use varying sizes of input files would require a slightly tricky BAT file with some nested "for" loops to get the image size into some variables, break it into parts by calculating those variables, and creating unique meaningful file names for all the output files. But that's a Windows programming issue, not ImageMagick.
I don't know how to run a "convert" command on a particular page of a multi-page PDF and have it use just a segment of the page, since both those processes use square brackets at the end of the file names to specify the details. I tried using two sets of square brackets and had no success. Someone else here might know a way to make that happen. It may not be possible.
Re: How to convert large PDF with limit resources
Since you're starting with vector PDFs, and must have non-djvu images in the end, I don't think my suggestions will help much. I would only use the lossy djvu format as a middle step if you desperately need to hack around a problem. But I don't think that will help you. In fact, djvu processing is very resource intensive.liys_0 wrote: The PDF files are vector PDFs. Do you mean that we convert the PDFs to Djvu first and then convert the Djvu to image files?
Re: How to convert large PDF with limit resources
Thanks very much for your detailed explanation! This is just as what I think.GeeMack wrote:I don't know how to run a "convert" command on a particular page of a multi-page PDF and have it use just a segment of the page, since both those processes use square brackets at the end of the file names to specify the details. I tried using two sets of square brackets and had no success. Someone else here might know a way to make that happen. It may not be possible.liys_0 wrote:In this case, can we convert one page portion by portion to use less resources? : )
Re: How to convert large PDF with limit resources
ThanksatariZen wrote:Since you're starting with vector PDFs, and must have non-djvu images in the end, I don't think my suggestions will help much. I would only use the lossy djvu format as a middle step if you desperately need to hack around a problem. But I don't think that will help you. In fact, djvu processing is very resource intensive.liys_0 wrote: The PDF files are vector PDFs. Do you mean that we convert the PDFs to Djvu first and then convert the Djvu to image files?
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: How to convert large PDF with limit resources
But does that help the resource problem? With a "-verbose" we can see the Ghostscript command. I get the same command with or without the geometry spec.GeeMack wrote:convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
Putting [0] as the page spec, I get a different GS command that includes "-dFirstPage=1 -dLastPage=1".
With the geometry spec, I conclude that IM tells GS to render all the pages, and all of each page, and then IM discards much of the data. So it may not help much (depending on how much memory GS uses).
For this type of problem, I would ignore IM, and see what options GS has for rendering parts of pages.
snibgo's IM pages: im.snibgo.com
- GeeMack
- Posts: 718
- Joined: 2015-12-01T22:09:46-07:00
- Authentication code: 1151
- Location: Central Illinois, USA
Re: How to convert large PDF with limit resources
I just drive the thing. I don't know what goes on under the hood. It stands to reason there would be no substantial savings if IM is calling the same GS command in either instance.snibgo wrote:But does that help the resource problem?
Sounds like a good plan. GS can break a PDF document into pieces and output them directly as PNG images without any help from IM at all.snibgo wrote:For this type of problem, I would ignore IM, and see what options GS has for rendering parts of pages.
Re: How to convert large PDF with limit resources
Thank you. This sounds good. Is there anyway to check the memory used by Ghostscript when do the PDF to image conversion?snibgo wrote:But does that help the resource problem? With a "-verbose" we can see the Ghostscript command. I get the same command with or without the geometry spec.GeeMack wrote:convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
Putting [0] as the page spec, I get a different GS command that includes "-dFirstPage=1 -dLastPage=1".
With the geometry spec, I conclude that IM tells GS to render all the pages, and all of each page, and then IM discards much of the data. So it may not help much (depending on how much memory GS uses).
For this type of problem, I would ignore IM, and see what options GS has for rendering parts of pages.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: How to convert large PDF with limit resources
Your operating system will have tools to monitor memory usage.
snibgo's IM pages: im.snibgo.com
Re: How to convert large PDF with limit resources
Thanks. Already done this. I found that GhostScrip uses much much less resources than ImageMagick.snibgo wrote:Your operating system will have tools to monitor memory usage.
To convert a PDF file with dimension (132 in x176 in) using density 200:
The IM consumes about 4GB disk; the GS only takes less than 200MB.
Their memory used are almost the same.