Tiff to JBIG2 PDF?

dowcet · Post by **dowcet** » 2015-05-01T07:57:38-07:00

I have a bunch of scanned monochrome images like this:

input.tif TIFF 3184x5276 3184x5276+0+0 1-bit Bilevel DirectClass 104KB 0.000u 0:00.000

I want to make a PDF which will not be so large. From what I understand JBIG2 compression is the best available for monochrome PDFs. So I tried this:

Code: Select all

convert input.tif -compress JBIG2 -density 300 output.pdf

Despite the much lower resolution, the resulting file is actually a little bit larger (and 16-bit):

Code: Select all

output.pdf PDF 764x1266 764x1266+0+0 16-bit Bilevel DirectClass 122KB 0.010u 0:00.010

What am I doing wrong?

Post by **snibgo** » 2015-05-01T08:03:06-07:00

I don't know much about compression within PDF files. You might try all the "-compress" options to see which is the best.

Post by **fmw42** » 2015-05-01T08:35:18-07:00

add -depth 8 to your command to keep the result as 8-bit

At http://www.imagemagick.org/script/comma ... p#compress, JBIG2 is not an option listed for -compress, though that could be a documentation error.

However it does show when doing the following if you have installed the JBIG delegate:

Code: Select all

convert -list compress
B44
B44A
BZip
DXT1
DXT3
DXT5
Fax
Group4
JBIG1
JBIG2
JPEG
JPEG2000
Lossless
LosslessJPEG
LZMA
LZW
None
Piz
Pxr24
RLE
Zip

You can tell if it is installed using

convert -version

dowcet · Post by **dowcet** » 2015-05-01T20:17:35-07:00

I do see JBIG2 on the list when I do `convert -list compress` but every compression option I have tried seems to give the exact same result. I have also tried the depth option:

Code: Select all

convert input.tif -compress JBIG2 -density 300 -depth 8 output.pdf

But again, the resulting output is also exactly the same as in the example I gave in my original post, still 16-bit.

This made me wonder if neither compress nor depth work in the case of PDF output. So I tried:

Code: Select all

convert input.tif -compress JBIG2 -resize 30% output.tif

But the results are still larger in file size then the original, despite the much lower resolution. (This image also looks terribly degraded when I view it.)

Code: Select all

output.tif TIFF 955x1583 955x1583+0+0 1-bit Bilevel DirectClass 190KB 0.000u 0:00.000

I would not think it difficult to get a monchrome PDF under 100k per page, but I am stumped.

In all cases, I notice that`indentify -verbose` lists `Compression: Undefined`, but this seems to be normal?

Post by **fmw42** » 2015-05-01T21:34:09-07:00

I suspect that there is no compression for PDF output. IM does not create vector PDFs only raster with a vector shell. I might expect that the tif would be compressed before wrapping the vector shell around it. But I do not know if that is true.

Also the density will likely change the pdf file size. I do not know if TIFF supports JBIG2 compression. You can test compression methods by converting your tiff to another tiff and see if the file size changes. But I do not know if that carries over to PDF. Perhaps PDF does not support raster compressed input file.

Bilevel tiffs may not be compressable. I just do not know.

You could also try converting with -monochrome, but that will dither the output, but it might be smaller.

Code: Select all

convert input.tif -compress XXX -density 300 -type bilevel -depth 8  output.tif

or

Code: Select all

convert input.tif -density 300 -monochrome -depth 8  output.tif

You could try to do the following also:

Code: Select all

convert input.tif -compress XXX -density 300 -type bilevel -depth 8  TIFF:- | convert - output.pdf

See if that changes your results.

See http://www.imagemagick.org/Usage/formats/#vector and http://www.imagemagick.org/Usage/formats/#ps

One of the IM developers may have to comment here on your question or someone more versed with IM created PDFs.

dowcet · Post by **dowcet** » 2015-05-01T23:57:14-07:00

These are helpful suggestions. I am already finding that resizing the TIFF first and then converting to PDF in a separate step seems to make difference. I'll keep experimenting for now.

dowcet · Post by **dowcet** » 2015-05-02T07:12:46-07:00

This one gives me a file about the same size as the original:

Code: Select all

convert input.tif -compress Group4 -density 300 -type bilevel TIFF:- | convert - output.pdf

I saved a fair bit of space without too much loss in quality like so:

Code: Select all

convert input.tif -compress Group4 -adaptive-resize 75% -density 200 -type bilevel TIFF:- | convert - output.pdf

But if I do the first command using JBIG2 instead of Group4, the pdf file is over 20 meg! Clearly that just isn't working right. But at least I have a workable solution. Thanks fmw42!

pipitas · Post by **pipitas** » 2015-05-04T08:07:37-07:00

snibgo wrote:I don't know much about compression within PDF files.

JBIG2 is a compression scheme for bi-level images only. Image data within a PDF can be compressed with JBIG2.

JBIG2, using "pattern matching & replacement" algorithms, however poses the danger of swapping the glyphs of letters and number which look similarly enough. After having been applied, this change of the source raster image can only be discovered by close comparisons.

Last Dec, David Kriesel gave a talk at the 31st Chaos Communication Congress (31C3) in Hamburg about how he discovered that Xerox Scanners produced "wrong" scan images, when the output PDFs where using JBIG2 compression for the embedded raster pages:

https://media.ccc.de/browse/congress/20 ... html#video (Unfortunately, in German only)
https://www.youtube.com/watch?v=zXXmhxbQ-hk (Apparently with English translation of audio -- I haven't checked how good or bad it is...)

BTW, interesting facts about JBIG2:

The JBIG2 compression algorithm is not standardized. Algorithms can be implemented badly (and the JBIG2 spec document explicitely warns about this).
What IS standardized for JBIG2 is the way you uncompress the respective file in order to display it.

In Germany, the "Federal Office for Security in the Information Industry" (Bundesamt für Sicherheit im Informationswesen, BSI) has now banned the use of JBIG2 compression for any certified workflow process of scanning official documents where destroying the paper originals is involved. This is in their most recent guideline for RESISCAN, TR-03138:

https://www.bsi.bund.de/DE/Publikatione ... x_htm.html

So, as an advice to the OP: think twice before you really convert your TIFFs to JBIG2 PDFs. And if you do, think about how to minimize the risks, and how you can discover if a conversion introduced errors...

pipitas · Post by **pipitas** » 2015-05-04T08:12:48-07:00

dowcet wrote:I have a bunch of scanned monochrome images like this:
Code: Select all
input.tif TIFF 3184x5276 3184x5276+0+0 1-bit Bilevel DirectClass 104KB 0.000u 0:00.000

Could you please post (a link to) the sample TIFF files you are using?

pipitas · Post by **pipitas** » 2015-05-04T08:30:08-07:00

dowcet wrote: Despite the much lower resolution, the resulting file is actually a little bit larger (and 16-bit):
Code: Select all
output.pdf PDF 764x1266 764x1266+0+0 16-bit Bilevel DirectClass 122KB 0.010u 0:00.010

BTW, you should never use 'identify' if you want to get some reliable information about a PDF input file.

Because 'identify' never gets to see and identify the original PDF. It employs Ghostscript as its delegate, which converts the PDF back to raster, which then is "identified" by 'identify'.

To see more reliable metadata about the images embedded in a PDF and their respective compressions, rather use `pdfimages -list` (but please use a very recent version of the Poppler fork for `pdfimages` for most detailled reporting):

Code: Select all

$ pdfimages -list output3.pdf 

  page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
  --------------------------------------------------------------------------------------------
     1     0 image    1728  2292  gray    1   8  image  no         8  0   300   300 4820K 125%

This is the report from the conversion command that used `-compress JBIG2`. As one can easily see, there is no compression at all applied to the embedded image (see column `enc`: "image" means "pure raster data here, no image-specific compression scheme is applied"). The compression ratio is 125% (i.e. expansion, not compression!). Compare this to the `-compress Group4` output, which clearly shows the CCITT compression being applied, where the compression condensed the original raster data size to only 7.3%:

Code: Select all

$ pdfimages -list output2.pdf 

  page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
  --------------------------------------------------------------------------------------------
     1     0 image    1728  2292  gray    1   1  ccitt  no         8  0   300   300 35.4K 7.3%

dowcet · Post by **dowcet** » 2015-05-05T20:47:29-07:00

Thanks for the tips, pipitas. I'm just going to stick with Group4 and forget about JBIG2. Knowing about `pdfimages -list` is also a big help.

Legacy ImageMagick Discussions Archive

Tiff to JBIG2 PDF?

Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?

Re: Tiff to JBIG2 PDF?