Page 1 of 1

How do I determine if a pdf is raster or vector

Posted: 2012-09-30T07:28:26-07:00
by yazinsai
Hello,

I'd like to know if there's a way to determine if a PDF file is raster or vector, using ImageMagick. I've spent the better part of half an hour looking for this and could not find the solution.

To clarify, I mean how do I determine if there are any raster elements in a PDF file.

Thanks!

Re: How do I determine if a pdf is raster or vector

Posted: 2012-09-30T10:17:56-07:00
by whugemann
I am pretty sure that there is no way of doing that with ImageMagick, because IM essentially is a raster image processor and uses GhostScript as the main PDF processor.

Basically, you could search the PDF for certain raster drawing commands. As PDF is pure ASCII text, you could use a simple text processor such as SED or even the operating system's text filter commands to perform this job.

You could also use pdfimages from Xpdf and try to extract images from the file. If that produces no output files, the PDF is definitively purely vector.

Re: How do I determine if a pdf is raster or vector

Posted: 2012-09-30T10:27:14-07:00
by yazinsai
Thanks - unfortunately, i'm running off a shared host and there's no way they are going to allow me to install Xpdf.
I'm sure the pros have some way of figuring this out..

===
To give you some context on the application, I'm running a website where users can upload their files and we print them on large posters. I want to be able to check if a user has uploaded a pdf that is purely vector, whereby no size restrictions exist (vs. if the user uploads a raster pdf where I would have to go with the resolution on the file).

Re: How do I determine if a pdf is raster or vector

Posted: 2012-10-01T09:51:43-07:00
by yazinsai
whugemann wrote:As PDF is pure ASCII text, you could use a simple text processor such as SED or even the operating system's text filter commands to perform this job
Could you please suggest how this might work? I'm not so good with SED and i'm not sure what to search for.

Thanks a ton!
Yazin

Re: How do I determine if a pdf is raster or vector

Posted: 2012-10-01T12:41:00-07:00
by whugemann
yazinsai wrote:Could you please suggest how this might work?
Remember, this is a forum on ImageMagick, not on PDF treatment. But I think you will quickly find out, if you just open several PDFs that contain raster graphics in a text editor. A quick check by myself offered "image/width" as a possible string to search for. If this is not contained in the PDF file, it's probably a pure vector PDF. But I have no experience how dummy-proof this simple check is. You will possible have to check for alternative methods of embedding raster graphics in the PDF, i.e. other clue words in the PDF.

If you take this approach, please let us know how succesful it is.

Re: How do I determine if a pdf is raster or vector

Posted: 2012-10-01T19:33:45-07:00
by yazinsai
whugemann wrote:A quick check by myself offered "image/width" as a possible string to search for.
That's brilliant! Makes for a lovely hack. I'll try with a few pdf files and see what works best; thanks for the lead!! I'll update this when I find something out.

Re: How do I determine if a pdf is raster or vector

Posted: 2012-10-16T05:37:47-07:00
by yazinsai
This worked perfectly. I used the following grep command to determine if a raster object existed:

Code: Select all

grep -c -i "/image" thisfile.pdf
whugemann, you ROCK!