Page 1 of 1

Efficiently generating thumbnails of a pdf in php

Posted: 2014-01-16T17:38:39-07:00
by chadcf
Hi all,

We have some legacy code that generates a pdf preview by converting the pdf to 250px png thumbnails of each page (so 10 thumbnails for a 10 page pdf). This has been thrashing our server royally as load has grown, because the original developer did the thumbnails by writing the source pdf to disk and then doing a system call (from PHP) 3 times for each file. We've switched over to using Imagick for PHP instead. Currently we're doing something simple like in the PHP manual:

Code: Select all

$image = new Imagick();
$image->readImageBlob($pdf->render());
$image->resetIterator();
$image = $image->appendImages(false);
$image->setImageFormat( "png" );
$image->scaleImage(0, 250);
return array(base64_encode($image->getImageBlob()));
After doing some load testing this has been a HUGE improvement in terms of server load. However, while it's much less likely to kill the server, we've found it's not actually any faster than doing it via command line calls. I suppose this somewhat makes sense... But what I'm trying to figure out is if there is any way we can optimize this process to be quicker. I've tried a few things like resizing the pdf before converting or calling $image->setSize() before reading it in, but those didn't seem to apply to pdfs.

Any tips would be appreicated!

Re: Efficiently generating thumbnails of a pdf in php

Posted: 2014-01-16T18:35:22-07:00
by fmw42
I am not an expert with Imagick. But your code seems to be not much different from the following command line that could be run from PHP exec().

convert image.pdf -thumbnail 250x250 image_%d.png

Using -thumbnail or the Imagick equivalent will make the files smaller by automatically removing any profiles and other meta data. You can do this also by using -strip -resize 250x250 or -strip -scale 250x250.

Re: Efficiently generating thumbnails of a pdf in php

Posted: 2014-01-16T21:00:01-07:00
by chadcf
Yes, that's another option. We currently do it that way, however like I said the problem we are running into is that at high load it's thrashing the server. Our metrics indicate the system calls to imagemagick are by far the most expensive part of our app. However, the person who wrote it did it in a rather terrible way where it basically loops over each page of the pdf and does 3 system calls to generate a thumbnail, per page of the pdf. Of course this also creates files that need to be cleaned up (they're a one time use) and a lot of disk IO.

So my thought was, if we can just do this all in memory we might be better off. It's just proving somewhat slow to do it all in memory sadly... Though for all I know it's still writing temp files or something, so I was hoping there were some options we could use to either speed things up or avoid disk writes or something.

Re: Efficiently generating thumbnails of a pdf in php

Posted: 2014-01-16T21:21:35-07:00
by fmw42
The command line command is about as efficient as I can see. It reads the image once and writes out one image per page. I am not sure there is any intermediate images being used. But I am not an expert on speed and memory use.

Perhaps I do not understand in what way you want do it more efficiently in memory. You have to read in each image and write out multiple image s, one per pdf page. Did have some other issue in mind?

If you have a lot of these to do and are willing to append all the thumbnails, you can use mogrify, though that might digress from your objectives. You could then crop the appended thumbnails into their individual images.

Example:

# create multipage pdf
convert rose: rose: rose: rose.pdf

#Process it:

mogrify -format png -thumbnail 25x25 -append rose.pdf

Re: Efficiently generating thumbnails of a pdf in php

Posted: 2014-01-17T00:58:29-07:00
by Bonzo
doing a system call (from PHP) 3 times for each file
That is strange; I wonder what he was doing.

Anyway as fmw42 says it should be possible to do it all in one line with exec() in php.