Page 1 of 1

Unable to parallelize pdf -> png conversion

Posted: 2019-06-25T09:27:15-07:00
by Jerleth
Hi,

I have the following problem:

I am trying to speed up single page conversions from a pdf to a png image by starting several magick processes in parallel, one for each CPU core I have available on my machine.

I am passing the pdf document as stdin (with a page number) and reading the stdout of the magick process to retrieve a specific page.

Basically I am running something like the following in parallel:

Code: Select all

magick -density 300 -[0] png:-
magick -density 300 -[5] png:-
magick -density 300 -[15] png:-
The pdf that I pipe in is the same for all calls.

This works all fine and well, but I don't see a speed up or a any additional cpu load - just as if I had executed them one after the other.

That's strange, because I've successfully achieved speedups (and of course higher cpu load) for rotating png images by invoking magick in that way using:

Code: Select all

magick - -rotate 90 -
in parallel.

As this approach works fine for rotation, I don't think it's my code - so I was wondering if there could be something peculiar about pdfs - maybe magick takes a global lock instead of an application wide lock or something like that?!

I tried to look up the code in github but my crappy c++ skills failed me.

Does somebody have an idea?

Thanks in advance!

I am running on Win10 x64 and using the magick commandline with the following version:

Code: Select all

Version: ImageMagick 7.0.7-6 Q16 x64 2017-10-04 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180040629
Features: Cipher DPC Modules OpenMP
Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib
Take care,
Martin

Re: Unable to parallelize pdf -> png conversion

Posted: 2019-06-25T09:40:51-07:00
by snibgo
You try with the same input PDF for all processes. Have you tried this with different PDFs? What happens then?

IM delegates the work of rasterizing the PDF to Ghostscript. "-verbose" will show the command used. I think IM creates a temporary file for the PDF, copies the input PDF to that, and calls Ghostscript for that temporary file. So I doubt a file locking problem.

You might try running Ghostscript directly. Can it run in multiple processes simultaneously? If not, then that's the problem.

Re: Unable to parallelize pdf -> png conversion

Posted: 2019-06-25T11:36:59-07:00
by bratpit
Use ghostscript directly .
On my Linux machine is about 3 times faster than IM itself.

Ghostscript can not multithreading processing one multipage pdf file.

But if you have several pdf files You can each pdf file process with single core of processor.
I do not know win10 but in linux is pretty simple with find and xargs command

Re: Unable to parallelize pdf -> png conversion

Posted: 2019-06-25T13:05:01-07:00
by snibgo
Another possibility is a memory problem. When IM is used, Ghostscript rasterizes the entire document and passes that to IM which reads the entire rasterized document into memory. This might be hundreds of pages.

When using Ghostscript directly, I don't suppose it needs to hold all rasterized pages in memory at the same time.

Re: Unable to parallelize pdf -> png conversion

Posted: 2019-06-25T13:21:51-07:00
by Jerleth
Thank you guys!
That's a lot of valuable input!
I will investigate it in the next couple of days.

Take care,
Martin