Page 1 of 2
[SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-01T14:38:54-07:00
by wpK
Hello,
I'm using ImageMagick's (v6.3.7) convert tool on Ubuntu to convert PDF's to images. Although it works great, I'm running into a bit of a problem of text quality.
I've played around with the -unsharp parameter but still can't seem to get it correct. I have two images, what I'm outputting and what I'd like to output and I was hoping someone could give me a tip.
Original PDF:
http://www.irs.gov/pub/irs-pdf/fw9.pdf
Command:
Code: Select all
convert fw9.pdf[0] -thumbnail 675x -quality 100 -units pixels-per-inch -density 72 page0.png
Image I can produce:
Image I'm trying to achieve:
Any help would be greatly appreciated.
Thanks in advanced.
Re: PDF to PNG - Text Sharpness
Posted: 2009-09-01T16:12:09-07:00
by magick
Try this
- convert -density 400% fw9.pdf[0] -resize 25% page0.png
Re: PDF to PNG - Text Sharpness
Posted: 2009-09-01T17:02:38-07:00
by wpK
magick wrote:Try this
- convert -density 400% fw9.pdf[0] -resize 25% page0.png
That actually worked 100% for the quality. I did:
Code: Select all
convert fw9.pdf[0] -density 400% -resize 680x page0.png
and it came out perfect, the only problem is it takes along time to do, especially when I convert each page to an image. I realize that you lose speed for quality, and there might not be anything I can do about it, but is there anything I can add to speed it up?
Thanks again for your help, you helped so much.
Edit: Maybe something was wrong because it's now going a lot faster.
Thanks Magick! Also, great software, I'm very happy that something like ImageMagick exists as it made my life so much easier.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-01T17:52:32-07:00
by anthony
The recommend method is
Code: Select all
-density 288 fw9.pdf[0] -density 72
Which read the image at the higher density, then resizes it to the density specified.
This has the advantage of ensuring the density meta-data setting of the final image is correct.
However if that output density is not important, whatever works for you is fine!
see IM Examples, Postscript and PDF input
http://www.imagemagick.org/Usage/text/#postscript
http://www.imagemagick.org/Usage/formats/#pdf
The former shows various techniques for modifying intput documents, especially the page background. The later include specal options such as limiting the size ot the 'mediabox' or other factors.
Edit: Maybe something was wrong because it's now going a lot faster.
It is going faster because you were reading the image BEFORE you set the input density! That means you are not reading the larger image (16 times larger) and thus losing the quality you originally wanted.!!!!!
The order of the operations you are performing is critical!
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-01T17:55:33-07:00
by wpK
anthony wrote:The recommend method is
Code: Select all
-density 288 fw9.pdf[0] -density 72
Which read the image at the higher density, then resizes it to the density specified.
This has the advantage of ensuring the density meta-data setting of the final image is correct.
However if that output density is not important, whatever works for you is fine!
Yeah I just realized that the reason the speed changed was I moved -density 400% before the PDF file. I see your point, the output density doesn't matter, what matter is quality and speed. Right now it takes 4-5 seconds just for that first page.
Is there anything I could be doing to possibly speed it up?
Thanks for the tip anthony.
Edit: I just noticed the links you posted, going through them now. Thanks
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-01T18:05:25-07:00
by anthony
wpK wrote:Is there anything I could be doing to possibly speed it up?
See that first link. at the end is how to call ghostscript directly to handle the initial image to raster format conversion. You can also see what IM is going by looking at the delegate commands.
http://www.imagemagick.org/Usage/files/#delegates
As a security measure for 'delegates' IM performs a number of extra file copies. and for large files this can be very slow. Directly using ghostscript to generate ANY raster image output, whcih can then be processed by IM "convert" as appropriate will be a lot faster.
Also you can speed up ghostscript by extracting the one page from the PDF before passing it to ghostscript, so that ghostscript itself only handles ONE page.
Basically Imagemagick does not really know much about vector file formats like PDF. It is after all a raster image processor. See
A word about Vector File Formats
http://www.imagemagick.org/Usage/formats/#vector
As such using the specialized tools in a more intelligent manner to get exactly what you are wanting, will always produce faster result.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-01T18:18:05-07:00
by wpK
anthony wrote:wpK wrote:Is there anything I could be doing to possibly speed it up?
See that first link. at the end is how to call ghostscript directly to handle the initial image to raster format conversion. You can also see what IM is going by looking at the delegate commands.
http://www.imagemagick.org/Usage/files/#delegates
As a security measure for 'delegates' IM performs a number of extra file copies. and for large files this can be very slow. Directly using ghostscript to generate ANY raster image output, whcih can then be processed by IM "convert" as appropriate will be a lot faster.
Also you can speed up ghostscript by extracting the one page from the PDF before passing it to ghostscript, so that ghostscript itself only handles ONE page.
Basically Imagemagick does not really know much about vector file formats like PDF. It is after all a raster image processor. See
A word about Vector File Formats
http://www.imagemagick.org/Usage/formats/#vector
As such using the specialized tools in a more intelligent manner to get exactly what you are wanting, will always produce faster result.
So your recommendation is to use GhostScript directly to create an image (at 400% density for instance) and then use IM to resize and save the image for faster quality since GS is used by IM already plus there are other things IM does that I can skip (like the file copies).
If thats what you're saying it sounds good, I'll definitely switch to that as soon as possible. Thanks again for your help Anthony, speed is a very important factor and I honestly wouldn't of even thought about using GS directly to speed things up.
It also seems there is a lot information on the site I should be reading, you guys did a good job filling the Usage section.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-01T18:29:05-07:00
by anthony
My full suggestion is
- use some pdf tool to extract ONE PAGE
- run it through ghostscript to some form of raster image at a high density/resolution
- use "convert" to read, resize, and do whatever other processing you need, before saving the image in the image file format wanted.
If you can do the above in a pipeline without saving to temporary files then even better, as that will save a lot of IO usage too.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-02T06:53:48-07:00
by wpK
Well I actually need to convert every single page of each PDF to an image. The reason I used [0] in my example was because I was trying to get the two pages to look the same and didn't want to wait for all 4 pages to convert.
Thank you for your suggestions, I'll definitely have to use GhostScript directly. Whenever I get to switching everything over I'll post back here with the result for anyone else with the same question.
Thanks again Anthony, you've been very helpful.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-02T08:29:17-07:00
by wpK
You weren't lieing about GhostScript being faster. It's converting MUCH faster with the following command:
Code: Select all
gs -sDEVICE=pngalpha -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -dDOINTERPOLATE -sOutputFile=page%.png -dSAFER -dBATCH -dNOPAUSE -r120% fw9.pdf
and the quality is coming out amazing.
It still seems like resizing is slow even when I'm giving it png's instead of a PDF. GhostScript is actually generating these images faster than ImageMagick is able to resize them.
The image I'm trying to resize is black and white (same page as the first post above) and I'm telling it to resize from 1020x1320 to 810x1048 and it takes around 1500ms per resize. This could be normal, I'm not 100% sure.
The server is Ubuntu with a Pentium D 2.8GHz and 3GB of RAM. (Not the fastest computer in the entire world so it could be that.)
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-02T19:29:01-07:00
by anthony
Why do you have a percent symbol for the -r or (resolution / density) option?
Also for comparision you would want to use -r480 and then pipe the output into IM for final resize.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-03T05:48:33-07:00
by wpK
Well heres the thing, I'll be generating an image per page of the PDF with GhostScript, is it really possible to pipe all of those images from GhostScript into imagemagick?
Let me explain a bit more into detail on what I'm doing exactly.
I have a Java application that takes in a file, that file gets converted to a PDF (OpenOffice) which then gets sent to GhostScript to turn that into images (thanks to you).
After that, I run mogrify three times (and if there is a way around this I'd be VERY happy) to create three versions of each page image. A small, medium, and large sized versions of each page image.
After that I take these images and send them to a storage server and update the database.
Short story:
If there is a way to pipe multiple images from GhostScript into mogrify (or something similar) that can generate three different versions of each page with as little I/O as possible, I'd be VERY happy. If you know something off the top of your head it would be appreciated, although you have already helped so much, I'm not expecting anything (nor was I before).
Thanks again Anthony.
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-03T08:10:43-07:00
by HugoRune
If you can redirect the output of ghostscript to a pipe, you can then handle it with
"GHOSTSCRIPT_COMMAND | convert - -resize 25% outfile.png".
use "-" instead of the input filename
wpK wrote:
After that, I run mogrify three times (and if there is a way around this I'd be VERY happy) to create three versions of each page image. A small, medium, and large sized versions of each page image.
You can generate multiple versions in one command:
convert infile.png -resize 50% -write large.png -resize 50% -write medium.png -resize 50% small.png
Not sure about the syntax in mogrify.
The quality of the smaller image might be slightly lower than if you resize them directly, especially if you are using scaling factors that are not multiples of 2
To avoid this use the following command which is slightly slower and needs more ram:
convert infile.png \( +clone -resize 50% -write large.png +delete \) \( +clone -resize 25% -write medium.png +delete \) -resize 12.5% small.png
If speed of the resize is a concern, check out the various options at
http://www.imagemagick.org/Usage/resize/
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-03T09:55:25-07:00
by wpK
Thanks HugoRune,
That is very helpful. I'll have to check ImageMagick's & GhostScripts documentation to see if GhostScript could somehow pipe multiple images into ImageMagick.
Thanks!
Re: [SOLVED] PDF to PNG - Text Sharpness
Posted: 2009-09-03T10:10:14-07:00
by fmw42
mogrify does not appear to let you do a -write for the other sizes. i tried but was unsuccessful. naming of outputs is an issue in mogrify. you have little control.
the other way is to write all your images into a directory, then mogrify multiple times to different other directories for each size
or write a script that will use convert in place of mogrify like HugoRune suggested to write all the sizes in one command, but you need the script to loop over each image in the directory for your starting images.