Page 1 of 1

trouble with pdf pages of different sizes.

Posted: 2007-04-04T11:27:09-07:00
by dlink
Hi.
We are having trouble extending our use of imageMagick to converting PDF pages into JPEG images. Version 6.0.7

Currently we are doing this and it works fine. It generates a jpeg for ea. page:
mogrify -rotate "90<" -resize 800 -format jpeg document.pdf'

However, now we have pdf documents whose first page is of a different size (larger) than its subsequent pages. ImageMagick seems to use the geometry of that first page no matter what we do. For example these do not work:

mogrify -rotate "90<" -crop 567x439+0+0 -resize 800 -format jpeg document.pdf

The cropped image is correct, but it is not getting resized up to 800. Any suggestions would be great. Thanks.

Re: trouble with pdf pages of different sizes.

Posted: 2007-04-04T16:36:45-07:00
by anthony
Yes, don't use mogrify. mogrify is find for simple operations that involve simple single images.
I suggest you use a looped convert instead. Just make sure you read the image in before
you modify it with operation options.

Re: trouble with pdf pages of different sizes.

Posted: 2007-04-05T13:36:24-07:00
by dlink
I am trying with convert untility now, but still having trouble. Resize after crop seems to make crop not work. For example:

convert -rotate "90<" -crop 819x646+0+0 -format jpeg document.pdf[2] x.jpg

crops the correct image area, however, after adding -resize option, the crop seems to no longer have any effect and the entire image is resized:

convert -rotate "90<" -crop 819x646+0+0 -resize 800 -format jpeg document.pdf[2] x.jpg

If I place resize before the crop they both work, but this not a convenient way of working:

convert -rotate "90<" -resize 2237 -crop 800x620+0+0 -format jpeg document.pdf[2] x.jpg

Thanks for your help.

Re: trouble with pdf pages of different sizes.

Posted: 2007-04-06T04:29:57-07:00
by anthony
It may be an interaction with 'virtual canvas' information crop leaves behind.
Try adding a +repage after the crop.

See Removing Canvas/Page Geometry
http://www.imagemagick.org/Usage/crop/#crop_repage

After the rsize you can also try -set page A4
to set the images 'virtual canvas' or 'page' for the PDF.

Please let me know how it goes.

Re: trouble with pdf pages of different sizes.

Posted: 2007-04-06T08:50:13-07:00
by dlink
Thank you Anthony for your posts.

The +repage doesn't have any effect.

I think the real problem is that the page geometry is being set by the first page in the PDF, which is unusually wide. And code we have working for the subsequent pages now needs to be modified to treat those pages as though they was as wide as the first page. A complex execise in measurement transformations.

Thanks again for your help.

Re: trouble with pdf pages of different sizes.

Posted: 2007-04-07T05:24:44-07:00
by anthony
Sorry to hear it didn't work. Check with -identify before your save to PDF and see
what IM has to work with at that point.

In any case let us know what you come up with. Don't just leave us hanging, as others may have simular problems.

Re: trouble with pdf pages of different sizes.

Posted: 2007-07-05T09:58:58-07:00
by nathanziarek
I think I am experiencing something similar.

Basically, I have a bunch of PDFs that I want to create thumbnails for. Some are docs, but many more are PowerPoints or Excel charts that may have odd sizes. What it appears that IM is doing is taking the PDF, rasterizing it, and then fitting it to a A4 sheet of paper.

What I'd like:
Image

What I'm getting:
Image

...or in the case of a document:

What I'd like:
Image

What I'm getting:
Image

For the document, it seems to just be adding height to page (width seems OK), and then centering the content. For the PowerPoint PDF, it adds white space up top.

The PDFs look like the "What I'd Like" thumbs. Any further suggestions?

Nate

Re: trouble with pdf pages of different sizes.

Posted: 2007-07-05T10:02:37-07:00
by nathanziarek
I've tried a number of different options with "+repage" and "-size" to the same result.

A sample "identify" from a PDF is listed below:

Code: Select all

Image: 1090.pdf
  Format: PDF (Portable Document Format)
  Geometry: 612x842
  Class: DirectClass
  Type: TrueColor
  Endianess: Undefined
  Colorspace: RGB
  Channel depth:
    Red: 8-bits
    Green: 8-bits
    Blue: 8-bits
  Channel statistics:
    Red:
      Min: 0 (0)
      Max: 255 (1)
      Mean: 242.061 (0.949257)
      Standard deviation: 44.952 (0.176282)
    Green:
      Min: 0 (0)
      Max: 255 (1)
      Mean: 242.593 (0.951345)
      Standard deviation: 42.9759 (0.168533)
    Blue:
      Min: 0 (0)
      Max: 255 (1)
      Mean: 245.41 (0.96239)
      Standard deviation: 35.6866 (0.139947)
  Colors: 1605
  Rendering-intent: Undefined
  Resolution: 72x72
  Units: Undefined
  Filesize: 1.5mb
  Interlace: None
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: grey74
  Page geometry: 612x842+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: Undefined
  Orientation: Undefined
  Comment:  Image generated by ESP Ghostscript (device=pnmraw)

  Signature: 6475bcba2a703dd4e34d24ef04b07ebd3ad26be2f6486b1b6305d7bd1017da16
  Tainted: False
  Version: ImageMagick 6.2.4 02/16/07 Q16 http://www.imagemagick.org
Image: 1090.pdf
  Format: PDF (Portable Document Format)
  Geometry: 612x842
  Class: PseudoClass
  Type: Bilevel
  Endianess: Undefined
  Colorspace: Gray
  Channel depth:
    Gray: 1-bits
  Channel statistics:
    Gray:
      Min: 1 (1)
      Max: 1 (1)
      Mean: 1 (1)
      Standard deviation: 0 (0)
  Colors: 2
  Histogram:
    515304: (255,255,255)       white
  Rendering-intent: Undefined
  Resolution: 72x72
  Units: Undefined
  Filesize: 1.5mb
  Interlace: None
  Background Color: white
  Border Color: #DFDFDF
  Matte Color: grey74
  Page geometry: 612x842+0+0
  Dispose: Undefined
  Iterations: 0
  Scene: 1
  Compression: Undefined
  Orientation: Undefined
  Comment:  Image generated by ESP Ghostscript (device=pnmraw)

  Signature: 5bb6259c6b6bc718959ca769757433d97c00d3da7e974fbdf0f8e45f84823597
  Tainted: False
  User Time: 0.830u
  Elapsed Time: 0:02
  Pixels per second: 503kb
  Version: ImageMagick 6.2.4 02/16/07 Q16 http://www.imagemagick.org

Re: trouble with pdf pages of different sizes.

Posted: 2007-07-05T16:57:36-07:00
by anthony
It may be that IM always uses a default page size for PDF and postscript conversion. That is is is not sizing the page to the size given somewhere and somehow in the PDF document. The Image size above is that of an A4 page at 72 dpi resolution.

It is probably caused by the way IM uses Ghostscript to do the image conversion.

This could be classed as a bug or a feature, though in your case it is probably a bug.

The only solution I can see is to try and set the page size and density correctly based in information in the PDF file format. How IM can do this, or how you can do this, that is the question.

If you like to see this fixed, all I can suggest is to try to determine how to get page info, and put in a 'bug report' with the soution to be incorporated into IM. Without a solution or method to fix, any report is likely to take time as it will go into a todo.

Re: trouble with pdf pages of different sizes.

Posted: 2007-08-22T10:04:18-07:00
by nathanziarek
I've kept plugging away at this to no resolve until today. Very very small step forward, but a step forward non-the-less. On a whim I tried using the ImageMagick Studio online to see what results it gave...and it works perfect.

Example: 1373.pdf is a PowerPoint converted to PDF with OpenOffice.org 2...

on my local system, typing "identify 1373.pdf" provides:

Code: Select all

1373.pdf[0] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.660u 0:02
1373.pdf[1] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.620u 0:02
1373.pdf[2] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.580u 0:02
1373.pdf[3] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.540u 0:02
1373.pdf[4] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.490u 0:02
1373.pdf[5] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.450u 0:02
1373.pdf[6] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.400u 0:02
1373.pdf[7] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.360u 0:02
1373.pdf[8] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.340u 0:02
1373.pdf[9] PDF 720x842 720x842+0+0 DirectClass 26.0mb 0.290u 0:02
1373.pdf[10] PDF 720x842 720x842+0+0 DirectClass 26.0mb 
1373.pdf[11] PDF 720x842 720x842+0+0 DirectClass 26.0mb 
1373.pdf[12] PDF 720x842 720x842+0+0 DirectClass 26.0mb 
1373.pdf[13] PDF 720x842 720x842+0+0 DirectClass 26.0mb 
1373.pdf[14] PDF 720x842 720x842+0+0 DirectClass 26.0mb 
...the important part is the 720x842. This particular PowerPoint file is wider than it is tall and opening it in any PDF viewer seems to work (i.e. it looks as it should).

Using the ImageMagick Studio identify command, I get:

Code: Select all

Image: 1373.pdf
  Base filename: MagickStudio.mpc
  Format: pdf (Portable Document Format)
  Class: DirectClass
  Geometry: 720x540+0+0
  Type: Palette
  Endianess: Undefined...
...or the correct size (at least in ratio). The conversion process, then, also produces a correctly sized image.

So, the question is, where is my system out-of-date in comparision to IMStudio? I am running IM 6.2.4 and GhostScript 815.04, both of which seem to be the most recent stable builds. Is it possible that IMStudio isn't using GS? What would I substitute?

Hopefully this will shed some light on the matter.

Thanks!

Nate

Re: trouble with pdf pages of different sizes.

Posted: 2007-08-22T10:44:12-07:00
by nathanziarek
...continuing...

my box uses ESP Ghostscript and IM Studio uses GNU Ghostscript (although the difference I know not). I would guess the problem lie in the ESP Ghostscript converter.

Any ideas on how to install the GNU Ghostscript converter in Ubuntu and have IM use it instead of the ESP converter?

Nate

Re: trouble with pdf pages of different sizes.

Posted: 2007-08-22T11:29:10-07:00
by nathanziarek
... Removing gs-esp from the machine and installing gs-gpl (apparently the new name of GNU GS) has changed the "Image Generated By..." line to

Code: Select all

Comment:  Image generated by GPL Ghostscript (device=pnmraw)
...installing the AFPL Ghostscript is the same yet again.

Code: Select all

Comment:  Image generated by AFPL Ghostscript (device=pnmraw)
In all cases, the information stays the same...that is to say wrong. IMStudio translate the files great though, so I must be close-ish.

Nate