Page 1 of 1

Convert part of PDF at high quality

Posted: 2013-06-21T07:45:20-07:00
by Mongoose1021
Hello, Imagemagic,
I have a large batch of poor-quality scanned forms in PDF format. I want a specific field from this form, which is consistently located, in high-resolution .png for OCR. My current -convert is:
convert -quality 100 -density 800 -resize x4400 -crop 3200x1200+0+2300 $target $target.png
This, so far as I can tell, converts the entire pdf to a high-quality .png and then cuts out my desired segment. Is there a way to pull out a specific part of a pdf for rendering, rather than resampling the entire image at density 800?
The issue of a pdf's "resolution" is confusing and definitely part of my problem.
Thanks in advance!

Re: Convert part of PDF at high quality

Posted: 2013-06-21T10:00:33-07:00
by snibgo
The standard command format is...

Code: Select all

convert input options output
... so your command should be...

Code: Select all

convert -density 800 $target -resize x4400 -crop 3200x1200+0+2300 -quality 100 $target.png
("-density" is part of the input and must be placed before the filename, and "-quality" refers to the output file so I'd put it at the end.)
Mongoose1021 wrote:Is there a way to pull out a specific part of a pdf for rendering, rather than resampling the entire image at density 800?
I don't think so. If there is, it would have to be an option to ghostscript.

Re: Convert part of PDF at high quality

Posted: 2013-06-21T10:20:28-07:00
by GreenKoopa
Since this is a scan, you are better off using a pdf-specific tool to extract the scanned image from the pdf wrapper. This will give you the image at the resolution at which it was scanned. Then ImageMagick can be used to crop, resize, or whatever processing you need.

If the pdf is multi-page, you can limit IM to one page. See
http://www.imagemagick.org/Usage/files/#read_mods

IM does offer ways of handling very large images. I don't know if this helps with pdf, as ghostscript is used for rendering.
http://www.imagemagick.org/Usage/basics/#stream