Page 1 of 1

How to get IM to process one page at a time in PDF file

Posted: 2014-05-02T16:41:08-07:00
by x0054
I have a large PDF file (100+ pages). It is a scanned document, scanned in color, so it's rather large. I am using IM to compress it to group4 tiff. The problem I am running into is that IM is eating up all my RAM, and I have 15gigs of it! I looks like it is loading all 100+ pages into memory in uncompressed bitmap format, and then works on them, and then saves them to TIFF. Is there a way to instruct IM to only work on one page at a time. So, load the first page into memory, apply filters, save to tiff, load next page, .... etc?

I am sure there is a bash way to do that, but any way it can be done internally?

- Bogdan

Re: How to get IM to process one page at a time in PDF file

Posted: 2014-05-02T17:43:46-07:00
by fmw42
You can loop over each layer using image.pdf[x] where x=0 for first layer, etc. But I am not sure that IM will not try to load the full image with all its layer just to get layer 0. I don't think there are any controls for reading in certain portions of the image such as with jpg. But I cannot say for sure. You can try

convert image.pdf[0] -compress group4 image.tiff

and see if it runs faster. If so then just build a bash shell script to loop over each layer using the above command where 0 is replace with the layer number from the looping structure.

IM relies upon Ghostscript to read PDF files. So you might see if you can run Ghostscript directly to do what you want in a loop.

I do not know if this will help but see http://cilab.math.upatras.gr/mikeagn/co ... e-pdf-file