Page 1 of 1
PDF to images as STDOUT
Posted: 2013-03-08T12:53:50-07:00
by ldegruchy
I'm trying to use the convert command to convert a PDF into a series of images in STDOUT. Thus far, I've only been successful in doing this when call the command with the destination filename(s).
results in the following files on the filesystem:
test-0.jpg
test-1.jpg
test-2.jpg
However, this doesn't work:
Instead of seeing a combination of the bytes or all three jpg files as I would expect, I'm instead able to open the resulting test.jpg file in an image viewer and it shows the first page of the PDF. Calling the process from Java yields the same results.
This happens using either JPG and PNG as destination formats.
I referred the following post to try to get this to work, but the poster was working with a TIFF, not a PDF:
viewtopic.php?f=1&t=15913
The reason I'm doing this is that I want to call convert from a Java process that needs to send the bytes to a calling application. I'd rather not constantly write files to the filesystem, read them and then delete them.
Is there a command line option I'm missing, or am I doing something else incorrectly?
Re: PDF to images as STDOUT
Posted: 2013-03-08T13:57:31-07:00
by magick
ImageMagick creates 3 jpeg images and concatenates them. You only see one image with your viewers because that is all that is supported for JPEG. It is not a multi-frame image format like TIFF or GIF, for example.
Re: PDF to images as STDOUT
Posted: 2013-03-08T14:05:10-07:00
by ldegruchy
magick wrote:ImageMagick creates 3 jpeg images and concatenates them. You only see one image with your viewers because that is all that is supported for JPEG. It is not a multi-frame image format like TIFF or GIF, for example.
You misunderstood me. I was able to create 3 separate jpegs when outputting the results to the file. When I send the results to STDOUT, I only get the
first page of the PDF. Nothing is concatenated in either case.
Also, from what I understand, imagemagick uses ghostscript behind the scenes for PDF conversion. The following command worked when run from Java, as it produced 3 jpg files to STD, just like convert did directly to the filesystem.
Code: Select all
/usr/bin/gs -dSAFER -dBATCH -dNOPAUSE -r150 -sDEVICE=jpeg -dTextAlphaBits=4 -sOutputFile=- -f test.pdf jpg:-
I'm evaluating multiple tools for PDF to image conversion for my company, and part my evaluation includes different types of open source licenses, which is why I didn't just drop everything and choose ghostscript.
Re: PDF to images as STDOUT
Posted: 2013-03-08T14:42:03-07:00
by magick
What version of ImageMagick are you using? We're using ImageMagick 6.8.3-8 and we're getting the same results as your Ghostscript command. That is 3 JPEG images concatenated together. Here is our use case:
- convert rose: wizard: logo: test.pdf
gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -r150 -sDEVICE=jpeg -dTextAlphaBits=4 -sOutputFile=- -f test.pdf jpg:- > gs.jpg
convert test.pdf jpg:- > im.jpg
If we inspect gs.jpg and im.jpg we get 3 JFIF markers in each suggesting 3 JPEG image files concatenated.
Re: PDF to images as STDOUT
Posted: 2013-03-08T15:35:07-07:00
by ldegruchy
magick wrote:What version of ImageMagick are you using? We're using ImageMagick 6.8.3-8 and we're getting the same results as your Ghostscript command. That is 3 JPEG images concatenated together. Here is our use case:
- convert rose: wizard: logo: test.pdf
gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -r150 -sDEVICE=jpeg -dTextAlphaBits=4 -sOutputFile=- -f test.pdf jpg:- > gs.jpg
convert test.pdf jpg:- > im.jpg
If we inspect gs.jpg and im.jpg we get 3 JFIF markers in each suggesting 3 JPEG image files concatenated.
Thanks for responding so quickly and helpfully.
I was running older versions of ghostscript and imagemagick. I'm on RHEL 6 and the default versions of those programs are older.
I installed the newer versions of those programs from source:
Code: Select all
$ gs -version
GPL Ghostscript 9.07 (2013-02-14)
Copyright (C) 2012 Artifex Software, Inc. All rights reserved.
Code: Select all
$ convert -version
Version: ImageMagick 6.8.3-3 2013-03-08 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2013 ImageMagick Studio LLC
Features: DPC OpenMP
Delegates: bzlib fontconfig freetype jng jp2 jpeg lcms pango png ps tiff x xml zlib
However, I'm not getting the same results as you are (I ran the below commands in a newly created directory to ensure there were no artifacts)
Code: Select all
$ convert rose: wizard: logo: test.pdf
I see: rose image page 1, wizard at table image page 2, wizard standing up page 3
Code: Select all
$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -r150 -sDEVICE=jpeg -dTextAlphaBits=4 -sOutputFile=- -f test.pdf jpg:- > gs.jpg
GPL Ghostscript 9.07: Unrecoverable error, exit code 1
Code: Select all
$ ls -la im.jpg
-rw-r--r-- 1 {OMITTED} {OMITTED} 104744 Mar 8 17:30 im.jpg
I see the rose image in the JPG
I compiled imagemagick before ghostscript, but I doubt that has something to do with it since I'm getting a failure with ghostscript that you are not.
Again, thanks for your help. If you have any diagnostics you think I should run, I'll try them out.
Re: PDF to images as STDOUT
Posted: 2013-03-08T18:31:38-07:00
by magick
You will get the same results from gs.jpg as well. Most / all JPEG viewers expect one and only one JPEG image per image file. In both the Ghostscript and ImageMagick cases, 3 JPEG images are concatenated into 1 file but the viewers only see the first one.