Page 1 of 1

Converting PDF to JPG when using searchable images

Posted: 2018-04-01T20:47:38-07:00
by Scarred Sun
I have a PDF-to-multiple-JPG output set up on my website, and I have been able to convert PDFs to multiple JPGs just fine up until now--when I've moved to using searchable image PDF setups. The way these PDFs are set up are very straightforward--one JPG image per page and the searchable text markup. I'd ideally keep the searchable text intact when running these PDFs through ImageMagick; is there any way I can just have the converter ignore the text inside and just make JPGs?

Edit: the exact error message I receive when trying to do this is
Error creating thumbnail: convert: no decode delegate for this image format `' @ error/constitute.c/ReadImage/504. convert: no images defined `/tmp/transform_2dc9f6e73735.jpg' @ error/convert.c/ConvertImageCommand/3258.

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-01T21:42:17-07:00
by fmw42

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-07T13:10:09-07:00
by Scarred Sun
So, I tried switching from ghostscript to pdfimages and am running into a similar problem: upon running

Code: Select all

(/usr/bin/pdfimages -f 1 -l 1 -j -p /path/to/myfile.pdf pdf | '/usr/bin/convert' '-depth' '8' '-quality' '95' '-resize' '190' '-' '/tmp/transform_6ee557c14986.jpg')
I still get

Code: Select all

Error creating thumbnail: I/O Error: Couldn't open image file 'pdf-001-000.jpg' convert: no decode delegate for this image format `' @ error/constitute.c/ReadImage/504. convert: no images defined `/tmp/transform_17fcb2522cee.jpg' @ error/convert.c/ConvertImageCommand/3258.
When I run

Code: Select all

identify -list format
I get JPG as rw- and

Code: Select all

convert -list configure
does list a JPEG delegate, so I'm a bit at a loss on how to debug from here.

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-07T13:36:12-07:00
by snibgo
pdfimages writes to files. Your command assumes it writes to stdout, so can be piped.
Scarred Sun wrote:'/usr/bin/convert' '-depth' '8' '-quality' '95' '-resize' '190' '-' '/tmp/transform_6ee557c14986.jpg'
What version IM are you running? In v7, you cannot operate on an image before reading it, so the "-resize" should come after "-".

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-07T17:29:02-07:00
by Scarred Sun
I'm using 6.9.7-4 in this case.

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-07T18:01:31-07:00
by fmw42
I do not think pdfimages has a stdout to pipe to convert (as snibgo said). So try separating the pdfimages command from the convert command. Save the result from the pdf images will be automatically created. Also as snibgo said, better to read the input right after convert as proper IM syntax.

For example:

Code: Select all

pdfimages -f 1 -l 1 -png lena1.pdf lena1
creates lena1-000.png

Seems that it does not respect the lack of -p and still seems to write page numbers anyway.

Then do

Code: Select all

convert lena1-000.png ...

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-07T19:24:17-07:00
by snibgo
pdfimages always includes the image number within the filename. With option "-p", it also includes the page number. (A pdf file may contain pages with more than one image, and pages with no images.)

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-07T19:48:16-07:00
by fmw42
snibgo wrote: 2018-04-07T19:24:17-07:00 pdfimages always includes the image number within the filename. With option "-p", it also includes the page number. (A pdf file may contain pages with more than one image, and pages with no images.)
Thanks for the clarification. I misunderstood the meaning of -000.

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-08T13:33:45-07:00
by Scarred Sun
The lack of a pipe leaves me in a bind--I'm actually using this along with Mediawiki to convert PDFs to JPG (thus the earlier gs use.) Are there any other options besides gs and pdfimages for this task? I'm assuming ImageMagick can't do this natively.

Re: Converting PDF to JPG when using searchable images

Posted: 2018-04-08T13:49:28-07:00
by fmw42
Imagemagick used Ghostscript to process PDF images. So not natively with it.