I'm working with a lot of PDF files that were obtained by scanning 1 to 10 page documents consisting mostly of text, and saving as PDF. The ones that were saved in black and white mode are nice and compact, but the ones that were saved in grayscale or color mode are too long. So I'd like to use a tool to convert the latter to monochrome.
The following does what I want:
>convert inputpath -monochrome outputpath
*except* that it degrades the resolution too much. Looks like it's working with about 72 dpi and I'd like 600, or at least 300.
Using the -identify switch, an input file gives, "PDF 612x792 612x792+0+0 16-bit ColorSeparation CMYK 1.939MB" and the output file from the above convert operation gives, "PDF 734x950 734x950+0+0 16-bit sRGB 37.7KB". The actual file lengths are 1.1 MB and 77 KB.
What do I need to do to convert a multipage pdf file from a scanned document to a monochrome version without loss or resolution.
Converting PDF to Monochrome
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Converting PDF to Monochrome
1. Review the "density" option. The default 72 is generally too low.
Or higher, if you want.
2. I generally don't like text that has been though "-monochrome", because it removes the anti-aliasing of the edges. Better quality possibilities include "-level 25x75%", etc.
Code: Select all
convert -density 288 in.pdf -resize 25% out.png
2. I generally don't like text that has been though "-monochrome", because it removes the anti-aliasing of the edges. Better quality possibilities include "-level 25x75%", etc.
snibgo's IM pages: im.snibgo.com
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to Monochrome
-monochrome dithers by default. see http://www.imagemagick.org/Usage/quantize/#monochrome. You would be better of using -threshold or something else and setting the -type to bilevel.
Re: Converting PDF to Monochrome
Unfortunately I haven't had much luck.
convert -density 288 in.pdf -resize 25% out.pdf makes the file twice as large with about 1/3 the resolution, with dithering and grayscale.
convert in.pdf -threshold 50% -type bilevel outfile.pdf makes the file 15 times smaller with 1/8 the resolution, bilevel.
The images in my input pdf file, which was produced by a scanner, appear to be conpressed, but those in the ImageMagick outputs may not be compressed.
I would have thought that convert in.pdf out.pdf would produce an output file that is the same as the imput file, but actually the output fileis twice as large and samples at approx 1/8 the resolution.
convert -density 288 in.pdf -resize 25% out.pdf makes the file twice as large with about 1/3 the resolution, with dithering and grayscale.
convert in.pdf -threshold 50% -type bilevel outfile.pdf makes the file 15 times smaller with 1/8 the resolution, bilevel.
The images in my input pdf file, which was produced by a scanner, appear to be conpressed, but those in the ImageMagick outputs may not be compressed.
I would have thought that convert in.pdf out.pdf would produce an output file that is the same as the imput file, but actually the output fileis twice as large and samples at approx 1/8 the resolution.
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Converting PDF to Monochrome
If you provide a sample of your input file, we won't need to guess.
A PDF file is generally vector data, which includes text, though it can contain bitmaps. IM is bitmap software. If it sees vector data, it will convert it to bitmaps. If your output file is PDF, it will contain one bitmap for each page. "-density" is the general way of getting higher resolution from vector data, but at the expense of filesize. Vector data can contain almost any level of detail, so you might need a massive value of "-density" to see it all.
It would be nice if IM recognised the special case of PDF files with one bitmap image per page, all at the same resolution, and could report the appropriate detail. Perhaps your file is like this. I can't tell. There are workarounds to get this information, so you can convert using the exact density it was scanned at.
A PDF file is generally vector data, which includes text, though it can contain bitmaps. IM is bitmap software. If it sees vector data, it will convert it to bitmaps. If your output file is PDF, it will contain one bitmap for each page. "-density" is the general way of getting higher resolution from vector data, but at the expense of filesize. Vector data can contain almost any level of detail, so you might need a massive value of "-density" to see it all.
It would be nice if IM recognised the special case of PDF files with one bitmap image per page, all at the same resolution, and could report the appropriate detail. Perhaps your file is like this. I can't tell. There are workarounds to get this information, so you can convert using the exact density it was scanned at.
snibgo's IM pages: im.snibgo.com
Re: Converting PDF to Monochrome
Maybe the trick is to get ImageMagick to compress the bitmaps that it embeds in the output PDF file. Can it do this?
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Converting PDF to Monochrome
Yes. See http://www.imagemagick.org/script/comma ... p#compress
Also see -quality for the jpeg compression.
Also see -quality for the jpeg compression.
snibgo's IM pages: im.snibgo.com
Re: Converting PDF to Monochrome
OK, this seems to work pretty well
It compress the input pdf file by a factor of about 8 and changes it to black and white. I just need to test on a larger collection of files to make sure it can handle a variety of different scanned documents.
It's a bit slow -- takes 10-15 sec for a single page doc on a fast pc.
Thanks for the help.
Code: Select all
convert -density 600 in.pdf -threshold 15% -type bilevel -compress fax out.pdf
It's a bit slow -- takes 10-15 sec for a single page doc on a fast pc.
Thanks for the help.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: Converting PDF to Monochrome
The lack of speed is due to setting the density to 600. But that is important to get quality.