Hello, Imagemagic,
I have a large batch of poor-quality scanned forms in PDF format. I want a specific field from this form, which is consistently located, in high-resolution .png for OCR. My current -convert is:
convert -quality 100 -density 800 -resize x4400 -crop 3200x1200+0+2300 $target $target.png
This, so far as I can tell, converts the entire pdf to a high-quality .png and then cuts out my desired segment. Is there a way to pull out a specific part of a pdf for rendering, rather than resampling the entire image at density 800?
The issue of a pdf's "resolution" is confusing and definitely part of my problem.
Thanks in advance!
Convert part of PDF at high quality
-
- Posts: 1
- Joined: 2013-06-21T07:37:53-07:00
- Authentication code: 6789
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Convert part of PDF at high quality
The standard command format is...
... so your command should be...
("-density" is part of the input and must be placed before the filename, and "-quality" refers to the output file so I'd put it at the end.)
Code: Select all
convert input options output
Code: Select all
convert -density 800 $target -resize x4400 -crop 3200x1200+0+2300 -quality 100 $target.png
I don't think so. If there is, it would have to be an option to ghostscript.Mongoose1021 wrote:Is there a way to pull out a specific part of a pdf for rendering, rather than resampling the entire image at density 800?
snibgo's IM pages: im.snibgo.com
- GreenKoopa
- Posts: 457
- Joined: 2010-11-04T17:24:08-07:00
- Authentication code: 8675308
Re: Convert part of PDF at high quality
Since this is a scan, you are better off using a pdf-specific tool to extract the scanned image from the pdf wrapper. This will give you the image at the resolution at which it was scanned. Then ImageMagick can be used to crop, resize, or whatever processing you need.
If the pdf is multi-page, you can limit IM to one page. See
http://www.imagemagick.org/Usage/files/#read_mods
IM does offer ways of handling very large images. I don't know if this helps with pdf, as ghostscript is used for rendering.
http://www.imagemagick.org/Usage/basics/#stream
If the pdf is multi-page, you can limit IM to one page. See
http://www.imagemagick.org/Usage/files/#read_mods
IM does offer ways of handling very large images. I don't know if this helps with pdf, as ghostscript is used for rendering.
http://www.imagemagick.org/Usage/basics/#stream