Page 1 of 1

Getting Resolution of Multi-Page PDFs *Before* Splitting?

Posted: 2007-10-25T07:33:26-07:00
by RadicalBender
I want to parse out a multipage PDF, but I need to know what the resolution of the PDF is before I split it (so that the individual PDF pages have the same resolution as the initial file). However, I'm supposed to set the resolution before I read the file, but I can't set the resolution to what I need until I read the file and *get* the resolution.

I tried to work around this by creating one MagickWand resource to read and then a second one to split the PDF. The problem is when I try to get the resolution of a multipage PDF with MagickGetImageResolution(), it returns no value (and neither does MagickGetImageHeight(), MagickGetImageWidth() or MagickGetImageFormat()). Why don't these values show up in a multipage PDF? (Height and width I can understand, but shouldn't format work at least?) And how can I work around this?

--Ben

Re: Getting Resolution of Multi-Page PDFs *Before* Splitting?

Posted: 2007-10-25T07:58:51-07:00
by magick
PDF is resolution independent. You set the resolution you want (default 72 DPI) and the PDF is rendered at that resolution.

Re: Getting Resolution of Multi-Page PDFs *Before* Splitting?

Posted: 2007-10-25T09:45:22-07:00
by RadicalBender
Okay...maybe I'm looking at the problem wrong. Let me explain my situation and see if maybe my approach is off.

I have several multipage PDFs that I'm testing. (I pulled several from the internet at random and created several of my own to get a broad range of possibilities.) What I'm trying to do is split the multipage PDF into individual PDF pages and then turn them into JPG thumbnails.

However, I'm having three sets of problems:

1. About half of the multipage PDF files fail miserably. They don't get read, they return no information (no height, width, image format, resolution, etc.), it doesn't loop through any pages, no files are written and the whole script fails. I can't figure out why these files specifically have this problem. They open fine in Acrobat Reader (and Preview on the Mac), but ImageMagick refuses to do anything with these files. This code returns no information from these kinds of multipage PDFs:

Code: Select all

$w = NewMagickWand();
MagickReadImage($w,$UploadFile);
$OrigFileWidth = MagickGetImageWidth($w);
$OrigFileHeight = MagickGetImageHeight($w);
$OrigFileFormat = MagickGetImageFormat($w);
$OrigFileResolution = MagickGetImageResolution($w);
$OrigFileResolutionX = $OrigFileResolution[0];
$OrigFileResolutionY = $OrigFileResolution[1];
ClearMagickWand($w);

echo '<p>File: ' . $FileName . '<br />'; // Pulled from PHP -- This actually displays
echo 'Upload Location: ' . $UploadDir . '<br />'; // Pulled from PHP -- This actually displays
echo 'File Format: ' . $OrigFileFormat . '<br />'; // Displays no information
echo 'Dimensions: ' . $OrigFileWidth . '&times;' . $OrigFileHeight . '<br />'; // Displays no information 
echo 'Resolution: ' . $OrigFileResolutionX . '&times;' . $OrigFileResolutionY . '</p>'; // Displays no information
2. A few work exactly the way I want, except the resulting files range in quality from "almost acceptable" to "horribly disfigured." This is why I was asking about the resolution, thinking maybe it's a problem with that. However, bumping the resolution up from 72x72 to 150x150 (or even just 96x96) causes the thumbnails to shrink to a tiny portion of the resulting image. I don't know what's happening here. Is there a setting to have text aliasing in the rastered PDF document? That seems to be the biggest problem I'm having with this one. See image below (right is local version, left is ImageMagick's):
Image

3. In one file that I'm testing, I'm having problems with what I'm assuming is a character encoding issue: lots of boxes and other weird characters. What causes this and can I fix it or work around it somehow? Screenshot:
Image

Any help you can give me will save me a lot of frustration. Thanks!

--Ben

Re: Getting Resolution of Multi-Page PDFs *Before* Splitting?

Posted: 2007-10-25T09:51:32-07:00
by magick
ImageMagick may not be the best solution for near real-time rendering of PDF. To get improved quality you need the very latest version of Ghostscript and type:
  • convert -density 400x400 image.pdf -resize 25% image.png
You'll see a much improved rendering. Another option is to get more/better fonts for your version of Ghostscript. The default fonts are not very good. Finally consider a commercial PDF engine such as Adobe products or Appligent.