convert : Working with multi-gigabyte files

markem · Post by **markem** » 2013-09-28T09:16:49-07:00

Convert dies when working with multi-gigabyte files. It is a memory problem. Convert keeps all of the images in memory instead of paging them out to the disk drive.

What I am doing: I am creating PDF files.
Methodology: Scan->Huge PDF file->Extract images as PNGs->Do all work to individual images->Create multi-paged TIFF file->Create PDF

I am trying to do this as a batch job for the extractions and creations. I'm scanning at 600dpi. The software I have will do that scanning and initial creation of the PDF document. For my example I will use stats from a multi-thousand paged document. First - PDF will not do over 1,000 pages per document so I reduced the number of pages per document to 750. The initial scan created a 4.5GB PDF document. I extracted to a PNG set of images with a program called pdftopng. It worked the first time - but now just dies when it attempts to do this again. I think I broke it.

Anyway, I extract to PNG so that the images are exactly like what was scanned. I then run some batch programs I wrote that allow me to do various things such as rotating the images, renumber all of the pages, moving pages from one location to another, lightening up or darkening down the images, etc.... Once I am through doing that I create a multi-page TIFF file. I do this because the original 4.5GB PDF document became around a 950MB TIFF file. I then use convert to convert this to a PDF file. Only - convert dies. It gobbles up all available memory, then all of the page file space, and then it dies. Takes about two hours for it to die.

I did a lot of reading up on this. Convert uses GhostScript to do the translation to PDF. ImageMagick people say "Go talk to the GhostScript people about this." Did so.

However. I also kept looking around on the internet for an alternate solution. Because the GhostScript's people's response was "Don't do that". Yeah. Ever so helpful. One person did though suggest I go out and buy a 64bit computer and just put lots and lots of memory onto the machine. I'm seeing thousands of dollars flying away simply because the GhostScript people don't want to make GhostScript disk based instead of memory based so it goes as fast as possible. The advantage of a disk based system though - is that so long as you have space on your hard drive you can keep making a larger file. So these huge PDF files really could be handled on a 32bit system if it was disk based.

But I digress. I went out and looked. And I wanted to point out to ImageMagick that there is a DLL out there called imPDF and it can be included in to any open source software for free. imPDF is a part of IrfanView and it will create PDFs. ImageMagick might want to check it out as it could just become a part of the convert program. ImPDF also has a huge number of options available to it like built-in security, passwords, etc... I'm not sure if it extracts PDFs - but the write-up does say it will create PDFs. I'm trying out IrfanView to see if it can handle these huge files I have and I just thought that maybe ImageMagick would like to check it out as an alternative to GhostView.

Later!

Post by **magick** » 2013-09-28T09:44:51-07:00

ImageMagick can process thousands of images to / from PDF. Just force the pixel processing to disk rather than memory, for example,

convert -monitor -limit memory 2GiB -limit map 4GiB -define registry:temporary-path=/data/tmp image.pdf image%02d.pdf

Where /data/tmp is a folder that has plenty of free space.

Legacy ImageMagick Discussions Archive

convert : Working with multi-gigabyte files

convert : Working with multi-gigabyte files

Re: convert : Working with multi-gigabyte files