“JPEG” algo treats JPEGs differently when combining to a PDF
Posted: 2018-06-21T08:15:11-07:00
I wanted to find the best way to make a PDF from a bunch of JPEGs, so that the quality was preserved at maximum.
The JPEGs are scans of a book, JPEGs are only 1200 px height and have quality 60, so ideally I’d want them to be taken to PDF untouched. Since there may be some metadata manipulation in the process, I can’t compare the original image to the extracted from the PDF with md5sum. But what I need is that they would be visually identical, so I decided to make a SSIM comparison.
SSIM not equal to 1 means that they aren’t visually identical.
I take another image – and SSIM returns 1. For every image except for this one, what’s added to PDF is visually identical to the original. Then I remember, that 00000001.jpg is edited with GIMP – it’s a cover with fabric and it was originally scanned too dark, – so I edited the levels in GIMP. For an experiment I got the source image as it was in the library, named it 00000001.orig.jpg and reran the test.
Oh wonder! SSIM returned 1! This means, that a JPEG with levels adjusted in GIMP cannot be put as is into a PDF by the “convert” utility. OK, let’s not use GIMP. But what if there will happen to be source images, that aren’t compatible with convert’s “JPEG” algorithm? What limitations does it place? How do I make sure, that it adds images, and there is no visual quality loss?
00000001.jpg
00000001.orig.jpg
OS: Gentoo, x86_64
ssim.sh can be found at Fred Weinhaus website.
***
I know, that the “Zip” compression algorithm is recommended for combining images to PDF, but it increases size 5–7 times, and the jpegs, that the book is comprised of, already take 100 MiB.
The JPEGs are scans of a book, JPEGs are only 1200 px height and have quality 60, so ideally I’d want them to be taken to PDF untouched. Since there may be some metadata manipulation in the process, I can’t compare the original image to the extracted from the PDF with md5sum. But what I need is that they would be visually identical, so I decided to make a SSIM comparison.
Code: Select all
$ convert 00000001.jpg -compress JPEG 00000001.pdf
$ pdfimages -f 1 -l 1 -j 00000001.pdf testpage
$ ssim.sh 00000001.jpg testpage-000.jpg
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM1.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpI2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpM2.mpc]: 2 of 3, 100% complete
Fx/Image//tmp/SSIM.17975[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.17975[tmpC12.mpc]: 4 of 5, 100% complete
Fx/Image//tmp/SSIM.17975[tmpM1.mpc]: 1199 of 1200, 100% complete
ssim=0.998 dssim=0.002
I take another image – and SSIM returns 1. For every image except for this one, what’s added to PDF is visually identical to the original. Then I remember, that 00000001.jpg is edited with GIMP – it’s a cover with fabric and it was originally scanned too dark, – so I edited the levels in GIMP. For an experiment I got the source image as it was in the library, named it 00000001.orig.jpg and reran the test.
Code: Select all
$ convert 00000001.orig.jpg -compress JPEG "00000001.orig.pdf"
$ pdfimages -f 1 -l 1 -j 00000001.orig.pdf testpage.orig
$ ssim.sh 00000001.orig.jpg testpage.orig-000.jpg
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM1.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI2.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpI2.mpc]: 1 of 2, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpM2.mpc]: 2 of 3, 100% complete
Fx/Image//tmp/SSIM.21724[tmpI1.mpc]: 1199 of 1200, 100% complete
Mogrify/Image//tmp/SSIM.21724[tmpC12.mpc]: 4 of 5, 100% complete
Fx/Image//tmp/SSIM.21724[tmpM1.mpc]: 1199 of 1200, 100% complete
ssim=1 dssim=0
00000001.jpg
00000001.orig.jpg
OS: Gentoo, x86_64
Code: Select all
$ convert -version | head -n 1
Version: ImageMagick 7.0.7-35 Q16 x86_64 2018-06-04 https://www.imagemagick.org
Code: Select all
$ pdfimages --version |& head -n1
pdfimages version 0.65.0
***
I know, that the “Zip” compression algorithm is recommended for combining images to PDF, but it increases size 5–7 times, and the jpegs, that the book is comprised of, already take 100 MiB.