dowcet wrote:
Despite the much lower resolution, the resulting file is actually a little bit larger (and 16-bit):
Code: Select all
output.pdf PDF 764x1266 764x1266+0+0 16-bit Bilevel DirectClass 122KB 0.010u 0:00.010
BTW, you should never use 'identify' if you want to get some reliable information about a PDF input file.
Because 'identify' never gets to see and identify the original PDF. It employs Ghostscript as its delegate, which converts the PDF back to raster, which then is "identified" by 'identify'.
To see more reliable metadata about the images embedded in a PDF and their respective compressions, rather use `pdfimages -list` (but please use a very recent version of the Poppler fork for `pdfimages` for most detailled reporting):
Code: Select all
$ pdfimages -list output3.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1728 2292 gray 1 8 image no 8 0 300 300 4820K 125%
This is the report from the conversion command that used `-compress JBIG2`. As one can easily see, there is no compression at all applied to the embedded image (see column `enc`:
"image" means
"pure raster data here, no image-specific compression scheme is applied"). The compression ratio is 125% (i.e. expansion, not compression!). Compare this to the `-compress Group4` output, which clearly shows the CCITT compression being applied, where the compression condensed the original raster data size to only 7.3%:
Code: Select all
$ pdfimages -list output2.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1728 2292 gray 1 1 ccitt no 8 0 300 300 35.4K 7.3%