Page 1 of 1
Tiff to JBIG2 PDF?
Posted: 2015-05-01T07:57:38-07:00
by dowcet
I have a bunch of scanned monochrome images like this:
Code: Select all
input.tif TIFF 3184x5276 3184x5276+0+0 1-bit Bilevel DirectClass 104KB 0.000u 0:00.000
I want to make a PDF which will not be so large. From what I understand JBIG2 compression is the best available for monochrome PDFs. So I tried this:
Code: Select all
convert input.tif -compress JBIG2 -density 300 output.pdf
Despite the much lower resolution, the resulting file is actually a little bit larger (and 16-bit):
Code: Select all
output.pdf PDF 764x1266 764x1266+0+0 16-bit Bilevel DirectClass 122KB 0.010u 0:00.010
What am I doing wrong?
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-01T08:03:06-07:00
by snibgo
I don't know much about compression within PDF files. You might try all the "-compress" options to see which is the best.
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-01T08:35:18-07:00
by fmw42
add -depth 8 to your command to keep the result as 8-bit
At
http://www.imagemagick.org/script/comma ... p#compress, JBIG2 is not an option listed for -compress, though that could be a documentation error.
However it does show when doing the following if you have installed the JBIG delegate:
Code: Select all
convert -list compress
B44
B44A
BZip
DXT1
DXT3
DXT5
Fax
Group4
JBIG1
JBIG2
JPEG
JPEG2000
Lossless
LosslessJPEG
LZMA
LZW
None
Piz
Pxr24
RLE
Zip
You can tell if it is installed using
convert -version
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-01T20:17:35-07:00
by dowcet
I do see JBIG2 on the list when I do `convert -list compress` but every compression option I have tried seems to give the exact same result. I have also tried the depth option:
Code: Select all
convert input.tif -compress JBIG2 -density 300 -depth 8 output.pdf
But again, the resulting output is also exactly the same as in the example I gave in my original post, still 16-bit.
This made me wonder if neither compress nor depth work in the case of PDF output. So I tried:
Code: Select all
convert input.tif -compress JBIG2 -resize 30% output.tif
But the results are still larger in file size then the original, despite the much lower resolution. (This image also looks terribly degraded when I view it.)
Code: Select all
output.tif TIFF 955x1583 955x1583+0+0 1-bit Bilevel DirectClass 190KB 0.000u 0:00.000
I would not think it difficult to get a monchrome PDF under 100k per page, but I am stumped.
In all cases, I notice that`indentify -verbose` lists `Compression: Undefined`, but this seems to be normal?
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-01T21:34:09-07:00
by fmw42
I suspect that there is no compression for PDF output. IM does not create vector PDFs only raster with a vector shell. I might expect that the tif would be compressed before wrapping the vector shell around it. But I do not know if that is true.
Also the density will likely change the pdf file size. I do not know if TIFF supports JBIG2 compression. You can test compression methods by converting your tiff to another tiff and see if the file size changes. But I do not know if that carries over to PDF. Perhaps PDF does not support raster compressed input file.
Bilevel tiffs may not be compressable. I just do not know.
You could also try converting with -monochrome, but that will dither the output, but it might be smaller.
Code: Select all
convert input.tif -compress XXX -density 300 -type bilevel -depth 8 output.tif
or
Code: Select all
convert input.tif -density 300 -monochrome -depth 8 output.tif
You could try to do the following also:
Code: Select all
convert input.tif -compress XXX -density 300 -type bilevel -depth 8 TIFF:- | convert - output.pdf
See if that changes your results.
See
http://www.imagemagick.org/Usage/formats/#vector and
http://www.imagemagick.org/Usage/formats/#ps
One of the IM developers may have to comment here on your question or someone more versed with IM created PDFs.
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-01T23:57:14-07:00
by dowcet
These are helpful suggestions. I am already finding that resizing the TIFF first and then converting to PDF in a separate step seems to make difference. I'll keep experimenting for now.
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-02T07:12:46-07:00
by dowcet
This one gives me a file about the same size as the original:
Code: Select all
convert input.tif -compress Group4 -density 300 -type bilevel TIFF:- | convert - output.pdf
I saved a fair bit of space without too much loss in quality like so:
Code: Select all
convert input.tif -compress Group4 -adaptive-resize 75% -density 200 -type bilevel TIFF:- | convert - output.pdf
But if I do the first command using JBIG2 instead of Group4, the pdf file is over 20 meg! Clearly that just isn't working right. But at least I have a workable solution. Thanks fmw42!
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-04T08:07:37-07:00
by pipitas
snibgo wrote:I don't know much about compression within PDF files.
JBIG2 is a compression scheme for bi-level images only. Image data within a PDF can be compressed with JBIG2.
JBIG2, using
"pattern matching & replacement" algorithms, however poses the danger of swapping the glyphs of letters and number which look similarly enough. After having been applied, this change of the source raster image can only be discovered by close comparisons.
Last Dec, David Kriesel gave a talk at the 31st Chaos Communication Congress (31C3) in Hamburg about how he discovered that Xerox Scanners produced "wrong" scan images, when the output PDFs where using JBIG2 compression for the embedded raster pages:
BTW, interesting facts about JBIG2:
- The JBIG2 compression algorithm is not standardized. Algorithms can be implemented badly (and the JBIG2 spec document explicitely warns about this).
- What IS standardized for JBIG2 is the way you uncompress the respective file in order to display it.
In Germany, the "Federal Office for Security in the Information Industry" (
Bundesamt für Sicherheit im Informationswesen, BSI) has now banned the use of JBIG2 compression for any certified workflow process of scanning official documents where destroying the paper originals is involved. This is in their most recent guideline for RESISCAN, TR-03138:
So, as an advice to the OP: think twice before you really convert your TIFFs to JBIG2 PDFs. And if you do, think about how to minimize the risks, and how you can discover if a conversion introduced errors...
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-04T08:12:48-07:00
by pipitas
dowcet wrote:I have a bunch of scanned monochrome images like this:
Code: Select all
input.tif TIFF 3184x5276 3184x5276+0+0 1-bit Bilevel DirectClass 104KB 0.000u 0:00.000
Could you please post (a link to) the sample TIFF files you are using?
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-04T08:30:08-07:00
by pipitas
dowcet wrote:
Despite the much lower resolution, the resulting file is actually a little bit larger (and 16-bit):
Code: Select all
output.pdf PDF 764x1266 764x1266+0+0 16-bit Bilevel DirectClass 122KB 0.010u 0:00.010
BTW, you should never use 'identify' if you want to get some reliable information about a PDF input file.
Because 'identify' never gets to see and identify the original PDF. It employs Ghostscript as its delegate, which converts the PDF back to raster, which then is "identified" by 'identify'.
To see more reliable metadata about the images embedded in a PDF and their respective compressions, rather use `pdfimages -list` (but please use a very recent version of the Poppler fork for `pdfimages` for most detailled reporting):
Code: Select all
$ pdfimages -list output3.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1728 2292 gray 1 8 image no 8 0 300 300 4820K 125%
This is the report from the conversion command that used `-compress JBIG2`. As one can easily see, there is no compression at all applied to the embedded image (see column `enc`:
"image" means
"pure raster data here, no image-specific compression scheme is applied"). The compression ratio is 125% (i.e. expansion, not compression!). Compare this to the `-compress Group4` output, which clearly shows the CCITT compression being applied, where the compression condensed the original raster data size to only 7.3%:
Code: Select all
$ pdfimages -list output2.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1728 2292 gray 1 1 ccitt no 8 0 300 300 35.4K 7.3%
Re: Tiff to JBIG2 PDF?
Posted: 2015-05-05T20:47:29-07:00
by dowcet
Thanks for the tips, pipitas. I'm just going to stick with Group4 and forget about JBIG2. Knowing about `pdfimages -list` is also a big help.