Convert part of PDF at high quality

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Mongoose1021
Posts: 1
Joined: 2013-06-21T07:37:53-07:00
Authentication code: 6789

Convert part of PDF at high quality

Post by Mongoose1021 »

Hello, Imagemagic,
I have a large batch of poor-quality scanned forms in PDF format. I want a specific field from this form, which is consistently located, in high-resolution .png for OCR. My current -convert is:
convert -quality 100 -density 800 -resize x4400 -crop 3200x1200+0+2300 $target $target.png
This, so far as I can tell, converts the entire pdf to a high-quality .png and then cuts out my desired segment. Is there a way to pull out a specific part of a pdf for rendering, rather than resampling the entire image at density 800?
The issue of a pdf's "resolution" is confusing and definitely part of my problem.
Thanks in advance!
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Convert part of PDF at high quality

Post by snibgo »

The standard command format is...

Code: Select all

convert input options output
... so your command should be...

Code: Select all

convert -density 800 $target -resize x4400 -crop 3200x1200+0+2300 -quality 100 $target.png
("-density" is part of the input and must be placed before the filename, and "-quality" refers to the output file so I'd put it at the end.)
Mongoose1021 wrote:Is there a way to pull out a specific part of a pdf for rendering, rather than resampling the entire image at density 800?
I don't think so. If there is, it would have to be an option to ghostscript.
snibgo's IM pages: im.snibgo.com
User avatar
GreenKoopa
Posts: 457
Joined: 2010-11-04T17:24:08-07:00
Authentication code: 8675308

Re: Convert part of PDF at high quality

Post by GreenKoopa »

Since this is a scan, you are better off using a pdf-specific tool to extract the scanned image from the pdf wrapper. This will give you the image at the resolution at which it was scanned. Then ImageMagick can be used to crop, resize, or whatever processing you need.

If the pdf is multi-page, you can limit IM to one page. See
http://www.imagemagick.org/Usage/files/#read_mods

IM does offer ways of handling very large images. I don't know if this helps with pdf, as ghostscript is used for rendering.
http://www.imagemagick.org/Usage/basics/#stream
Post Reply