Page 1 of 1
Converting PDF to Monochrome
Posted: 2013-01-24T17:43:54-07:00
by howard39
I'm working with a lot of PDF files that were obtained by scanning 1 to 10 page documents consisting mostly of text, and saving as PDF. The ones that were saved in black and white mode are nice and compact, but the ones that were saved in grayscale or color mode are too long. So I'd like to use a tool to convert the latter to monochrome.
The following does what I want:
>convert inputpath -monochrome outputpath
*except* that it degrades the resolution too much. Looks like it's working with about 72 dpi and I'd like 600, or at least 300.
Using the -identify switch, an input file gives, "PDF 612x792 612x792+0+0 16-bit ColorSeparation CMYK 1.939MB" and the output file from the above convert operation gives, "PDF 734x950 734x950+0+0 16-bit sRGB 37.7KB". The actual file lengths are 1.1 MB and 77 KB.
What do I need to do to convert a multipage pdf file from a scanned document to a monochrome version without loss or resolution.
Re: Converting PDF to Monochrome
Posted: 2013-01-24T19:23:04-07:00
by snibgo
1. Review the "density" option. The default 72 is generally too low.
Code: Select all
convert -density 288 in.pdf -resize 25% out.png
Or higher, if you want.
2. I generally don't like text that has been though "-monochrome", because it removes the anti-aliasing of the edges. Better quality possibilities include "-level 25x75%", etc.
Re: Converting PDF to Monochrome
Posted: 2013-01-24T19:31:59-07:00
by fmw42
-monochrome dithers by default. see
http://www.imagemagick.org/Usage/quantize/#monochrome. You would be better of using -threshold or something else and setting the -type to bilevel.
Re: Converting PDF to Monochrome
Posted: 2013-01-25T18:08:14-07:00
by howard39
Unfortunately I haven't had much luck.
convert -density 288 in.pdf -resize 25% out.pdf makes the file twice as large with about 1/3 the resolution, with dithering and grayscale.
convert in.pdf -threshold 50% -type bilevel outfile.pdf makes the file 15 times smaller with 1/8 the resolution, bilevel.
The images in my input pdf file, which was produced by a scanner, appear to be conpressed, but those in the ImageMagick outputs may not be compressed.
I would have thought that convert in.pdf out.pdf would produce an output file that is the same as the imput file, but actually the output fileis twice as large and samples at approx 1/8 the resolution.
Re: Converting PDF to Monochrome
Posted: 2013-01-25T18:55:58-07:00
by snibgo
If you provide a sample of your input file, we won't need to guess.
A PDF file is generally vector data, which includes text, though it can contain bitmaps. IM is bitmap software. If it sees vector data, it will convert it to bitmaps. If your output file is PDF, it will contain one bitmap for each page. "-density" is the general way of getting higher resolution from vector data, but at the expense of filesize. Vector data can contain almost any level of detail, so you might need a massive value of "-density" to see it all.
It would be nice if IM recognised the special case of PDF files with one bitmap image per page, all at the same resolution, and could report the appropriate detail. Perhaps your file is like this. I can't tell. There are workarounds to get this information, so you can convert using the exact density it was scanned at.
Re: Converting PDF to Monochrome
Posted: 2013-01-26T13:30:36-07:00
by howard39
Maybe the trick is to get ImageMagick to compress the bitmaps that it embeds in the output PDF file. Can it do this?
Re: Converting PDF to Monochrome
Posted: 2013-01-26T13:44:43-07:00
by snibgo
Re: Converting PDF to Monochrome
Posted: 2013-01-26T14:48:08-07:00
by howard39
OK, this seems to work pretty well
Code: Select all
convert -density 600 in.pdf -threshold 15% -type bilevel -compress fax out.pdf
It compress the input pdf file by a factor of about 8 and changes it to black and white. I just need to test on a larger collection of files to make sure it can handle a variety of different scanned documents.
It's a bit slow -- takes 10-15 sec for a single page doc on a fast pc.
Thanks for the help.
Re: Converting PDF to Monochrome
Posted: 2013-01-26T16:55:51-07:00
by fmw42
The lack of speed is due to setting the density to 600. But that is important to get quality.