Page 1 of 1
convert producing invalid pdfs from jpgs
Posted: 2017-05-01T23:16:57-07:00
by kraftydevil
I've got 2 Mac OS X machines with ImageMagick installed.
One machine always works and one machine produces unreadable pdfs about 80% of the time. Either they don't open or there are blank pages after a certain point.
I'm using the basic convert command for jpg > pdf:
Code: Select all
convert path/to/images/*.jpg name_of_pdf.pdf
There are too many pdfs to open each one to check, so I am using JHOVE to verify them, as shown here:
https://superuser.com/a/1204692/379229.
At first I thought it might be a version issue, but it always works on an older version:
Machine 1 (always works)
Code: Select all
$ convert --version
Version: ImageMagick 6.9.2-6 Q16 x86_64 2015-11-15 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC Modules
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib
$
$ which gs
/usr/local/bin/gs
$ /usr/local/bin/gs --version
9.18
Machine 2 (mostly doesn't work)
Code: Select all
$ convert --version
Version: ImageMagick 7.0.5-5 Q16 x86_64 2017-04-25 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC HDRI Modules
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib
$
$ which gs
/usr/local/bin/gs
$ /usr/local/bin/gs --version
9.21
So the question is...
Are there any other dependencies or configurations I should check to ensure I get the same results no matter what machine I'm using?
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-01T23:47:26-07:00
by Bonzo
At first I thought it might be a version issue, but it always works on an older version:
Bugs can be written in by mistake and it is not as though both versions are V6 versions. Are you using the same version of Ghostscript?
V7 prefers magick rather than convert; give that a go.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T00:17:48-07:00
by snibgo
IM uses Ghostscript to read PDFs, but not to write PDFs.
This may be a bug introduced in v7.0.5-5. kraftydevil: can you make a reproducible example? A test command that, when run, makes a bad PDF? Then other people can test.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T03:49:07-07:00
by kraftydevil
Bonzo wrote: ↑2017-05-01T23:47:26-07:00
At first I thought it might be a version issue, but it always works on an older version:
Bugs can be written in by mistake and it is not as though both versions are V6 versions. Are you using the same version of Ghostscript?
V7 prefers magick rather than convert; give that a go.
gs is 9.18 in the working machine and 9.21 in the one that isn't. Updated OP with gs info.
I tried the magick command with the same syntax. It almost worked. For whatever reason the last page simply said "PDF" with a white background on it. When I tried it 2 more times I got a corrupt pdf.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T03:53:34-07:00
by kraftydevil
snibgo wrote: ↑2017-05-02T00:17:48-07:00
IM uses Ghostscript to read PDFs, but not to write PDFs.
This may be a bug introduced in v7.0.5-5. kraftydevil: can you make a reproducible example? A test command that, when run, makes a bad PDF? Then other people can test.
This is reproducible about 80% of the time:
Code: Select all
convert path/to/images/*.jpg name_of_pdf.pdf
Will that work? Unfortunately I can't share the image files.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T04:07:51-07:00
by snibgo
By "reproducible" I mean by other people. If no-one can reproduce your problem, they can't fix it.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T04:25:44-07:00
by Bonzo
Will that work? Unfortunately I can't share the image files.
Can you make a pdf file from the original software that does not contain sensitive information?
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T06:05:52-07:00
by kraftydevil
snibgo wrote: ↑2017-05-02T04:07:51-07:00
By "reproducible" I mean by other people. If no-one can reproduce your problem, they can't fix it.
I suppose I can only share my personal experience and frequency when reproducing this issue.
Here's a formal write up for others to try. It's not much more complex than what I've mentioned in the OP:
Environment
Code: Select all
$ convert --version
Version: ImageMagick 7.0.5-5 Q16 x86_64 2017-04-25 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC HDRI Modules
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib
$
$ which gs
/usr/local/bin/gs
$ /usr/local/bin/gs --version
9.21
Example Images Used
https://drive.google.com/file/d/0Bx20O2 ... zhpd2RJbWM
Steps to reproduce
- Put the jpgs into a directory
- cd to the directory
- Run:
Expected Result
convert creates a pdf document named Example.pdf in the same directory. It should contain X pages where each page corresponds to a passed in jpg that was matched by the "*.jpg" argument. The document should be viewable in a pdf reader.
Actual Result
The pdf document is created, but it cannot be opened by the Preview app. Attempting to do so yields an error message: 'The file “Example.pdf” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.'
It can be opened, however, by the Adobe Reader app but there are several blank pages.
Here's some output from the jhove validation tool I mentioned in my OP:
Code: Select all
$ jhove -m pdf-hul Example.pdf
Jhove (Rel. 1.16.6, 2017-04-27)
Date: 2017-05-02 08:39:43 EDT
RepresentationInformation: Example.pdf
ReportingModule: PDF-hul, Rel. 1.8 (2017-03-14)
LastModified: 2017-05-02 08:39:03 EDT
Size: 3640291
Format: PDF
Version: 1.3
Status: Well-Formed, but not valid
SignatureMatches:
PDF-hul
ErrorMessage: Invalid page tree node
Offset: 1779234
MIMEtype: application/pdf
PDFMetadata:
Objects: 118
FreeObjects: 1
IncrementalUpdates: 0
DocumentCatalog:
PageLayout: SinglePage
PageMode: UseNone
Info:
Title: Example
Producer: /usr/local/Cellar/imagemagick/7.0.5-5/share/doc/ImageMagick-7//index.html
CreationDate: Tue May 02 08:39:03 EDT 2017
ModDate: Tue May 02 08:39:03 EDT 2017
ID: 0xb7539159b1a1e74cfd5e38ebde538b4a9caa08e091b6e318b7b4264f57747113, 0xb7539159b1a1e74cfd5e38ebde538b4a9caa08e091b6e318b7b4264f57747113
Filters:
FilterPipeline: DCTDecode
Images:
Image:
NisoImageMetadata:
CompressionScheme: JPEG
ImageWidth: 736
ImageHeight: 1091
BitsPerSample: 8
BitsPerSampleUnit: integer
Name: Im0
Image:
NisoImageMetadata:
CompressionScheme: JPEG
ImageWidth: 1057
ImageHeight: 1500
BitsPerSample: 8
BitsPerSampleUnit: integer
Name: Im1
Image:
NisoImageMetadata:
CompressionScheme: JPEG
ImageWidth: 1584
ImageHeight: 2129
BitsPerSample: 8
BitsPerSampleUnit: integer
Name: Im2
Image:
NisoImageMetadata:
CompressionScheme: JPEG
ImageWidth: 1000
ImageHeight: 1409
BitsPerSample: 8
BitsPerSampleUnit: integer
Name: Im3
Pages:
Page:
Sequence: 1
Thumb: true
Page:
Sequence: 2
Thumb: true
Page:
Sequence: 3
Thumb: true
Page:
Sequence: 4
Thumb: true
The main points I see from jhove's output is that the Example.pdf is "Well-Formed, but not valid" and the error given is "Invalid page tree node". Maybe there's more but I don't know everything I'm looking for there.
Maybe it's flat out broken, but I'm guessing it's more related to my environment.
I don't have a huge knowledge of the ImageMagick space or even general image processing so for those who do, please request more specific information.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T07:08:50-07:00
by snibgo
In your Zip file, you included Example.pdf. Adobe Acrobat Reader DC says "There was a problem reading this document (14)."
When I "convert *.jpg x.pdf" with IM v6.9.5-3, or "magick *.jpg x2.pdf" with v7.0.3-5, Adobe Reader reports no problems.
Can someone confirm that v7.0.5-5 creates a bad PDF?
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T09:43:36-07:00
by fmw42
I have tested the folder of jpegs on IM 7.0.5.5. Q16 Mac OSX with GS 9.21 and it seems to work fine. I get no error messages and Identify says there are 8 pages. But when I try to view the pdf on several viewers including Acrobat Reader, only the first 4 pages show. The last 4 are blank.
Oddly, it also fails the same way in IM 6.9.8.4 Q16 Mac OSX and GS 9.21.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-02T15:25:11-07:00
by GeeMack
snibgo wrote: ↑2017-05-02T07:08:50-07:00Can someone confirm that v7.0.5-5 creates a bad PDF?
I haven't tried the examples above, but I just encountered a possibly related issue today with 7.0.5-5 on Windows 10. I created a four layer document with PhotoShop Elements, four simple 8.5 by 11 inch images. I can read the PSD file with IM and correctly convert it into four PNGs or four JPGs, etc. When I try to convert the PSD to a PDF with a simple, no frills conversion...
Code: Select all
magick fourlayers.psd[1-4] fourpages.pdf
... the command runs without reporting any errors, and produces a PDF, but the finished file generates an error when opening in Acrobat. Then after I click the "OK" button to clear the error message, the file appears to be four blank pages. The first page is full size, and the remaining three just look like tiny squares vertically aligned below the first page.
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-04T22:53:06-07:00
by d-ph
Re: convert producing invalid pdfs from jpgs
Posted: 2017-05-07T13:51:51-07:00
by dlemstra
We can reproduce it and will have a patch to fix it in GIT master branch @
https://github.com/ImageMagick/ImageMagick later today. The patch will be available in the beta releases of ImageMagick @
http://www.imagemagick.org/download/beta/ by sometime tomorrow.