PDF to image causes 1st page margin
PDF to image causes 1st page margin
I am trying to convert a PDF to a jpg/png. They need to be exact every time, and the first page of the PDF is producing a margin on the right as shown in the attached picture. Anyone have any ideas why this might be happening?
[php]
$Page = new Imagick();
$Page -> setResolution(450,450);
$Page -> readImage("temp2.pdf");
$Count = $Page -> getNumberImages();
for($C = 0; $C < $Count; $C++) {
$Page -> readImage("temp2.pdf[$C]");
$Page -> setImageFormat('png');
$Page -> writeImages("Results-$C.png", false);
}
[/php]
I also used the below code first, but neither works.
[php]
$Page = new Imagick();
$Page -> setResolution(450,450);
$Page -> readImage("temp2.pdf");
$Page -> setImageFormat('png');
$Page -> writeImages("Results.png", false);
[/php]
[php]
$Page = new Imagick();
$Page -> setResolution(450,450);
$Page -> readImage("temp2.pdf");
$Count = $Page -> getNumberImages();
for($C = 0; $C < $Count; $C++) {
$Page -> readImage("temp2.pdf[$C]");
$Page -> setImageFormat('png');
$Page -> writeImages("Results-$C.png", false);
}
[/php]
I also used the below code first, but neither works.
[php]
$Page = new Imagick();
$Page -> setResolution(450,450);
$Page -> readImage("temp2.pdf");
$Page -> setImageFormat('png');
$Page -> writeImages("Results.png", false);
[/php]
- Attachments
-
- margin.jpg (39.42 KiB) Viewed 16097 times
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to image causes 1st page margin
Just some guesses. The pdf may have a larger canvas than the image it contains. If this is a virtual canvas, then it can be removed by adding +repage after reading the image. see http://www.imagemagick.org/script/comma ... php#repage
However, I am more inclined to believe that the pdf has a clip path that IM is not using. See http://www.imagemagick.org/script/comma ... #clip-path and http://www.imagemagick.org/script/comma ... s.php#clip
Can you post either your pdf file or the results from the command line command of:
identify -verbose image.pdf
You should be able to get that from PHP exec() on the command.
However, I am more inclined to believe that the pdf has a clip path that IM is not using. See http://www.imagemagick.org/script/comma ... #clip-path and http://www.imagemagick.org/script/comma ... s.php#clip
Can you post either your pdf file or the results from the command line command of:
identify -verbose image.pdf
You should be able to get that from PHP exec() on the command.
Re: PDF to image causes 1st page margin
Thanks for the help! I have attached the PDF and the result PNGs. The PDF is zipped to allow for upload.
I'm not real sure how to apply repage or clip with the PHP I'm using. Just run an exec right after reading the image? If either of those is the issue, could you show me example code to mix in with mine?
I'm not real sure how to apply repage or clip with the PHP I'm using. Just run an exec right after reading the image? If either of those is the issue, could you show me example code to mix in with mine?
- Attachments
-
- Examples.zip
- (134.21 KiB) Downloaded 740 times
-
- Page 2.png (174.9 KiB) Viewed 16078 times
-
- Page 1.png (298.7 KiB) Viewed 16078 times
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to image causes 1st page margin
I do not see any extra margins in the files you have posted. I also downloaded your PDF and converted to PNG and I do not see any margins in my conversion either. The verbose information from your file shows that both pages of the pdf are size 614x1008 and there is a pdf:HiResBoundingBox: 614x1008. So they both match.
Please clarify about the margins.
When converting PDF to any other format, Imagemagick used the Ghostscript delegate library. You may want to try upgrading that if it is old and try again. Also if going to PNG, you may want to upgrade the libpng delegate also.
On my system:
libpng @1.4.11_0 (which is not the most current)
ghostscript @9.06_1
Try doing the conversion using PHP exec() command and see if you see the same issues? If so, then one of the delegates may need upgrading or Imagemagick. If not, then that would point to Imagick.
What version of Imagemagick are you using and what version of Imagick?
Please clarify about the margins.
When converting PDF to any other format, Imagemagick used the Ghostscript delegate library. You may want to try upgrading that if it is old and try again. Also if going to PNG, you may want to upgrade the libpng delegate also.
On my system:
libpng @1.4.11_0 (which is not the most current)
ghostscript @9.06_1
Try doing the conversion using PHP exec() command and see if you see the same issues? If so, then one of the delegates may need upgrading or Imagemagick. If not, then that would point to Imagick.
What version of Imagemagick are you using and what version of Imagick?
Re: PDF to image causes 1st page margin
You'll have to save the image as the margin doesn't show in the browser. I saved it via browser and opened it in Fireworks and I can see it.
I just installed it yesterday with yum, so I believe they are up to date. I'll check on the GS too, and try out the exec method.
I just installed it yesterday with yum, so I believe they are up to date. I'll check on the GS too, and try out the exec method.
Re: PDF to image causes 1st page margin
Attached is another example of what I'm seeing in Fireworks. The canvas sizes are the same, but the first page's document is a different size.
- Attachments
-
- example.jpg (71.36 KiB) Viewed 16067 times
Re: PDF to image causes 1st page margin
And to add a bit more... I did try : exec('gs -q -dNOPAUSE -sDEVICE=tiffg4 -sOutputFile=temp3.tif temp2.pdf -c quit');
This converted the PDF to the same exact size. I really don't care how it's done as long as I get the same results every time. This works, although I don't know how to change the resolution. It would still be interesting to know why the margin was produced with IM.
This converted the PDF to the same exact size. I really don't care how it's done as long as I get the same results every time. This works, although I don't know how to change the resolution. It would still be interesting to know why the margin was produced with IM.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to image causes 1st page margin
BigLittle wrote:You'll have to save the image as the margin doesn't show in the browser. I saved it via browser and opened it in Fireworks and I can see it.
I just installed it yesterday with yum, so I believe they are up to date. I'll check on the GS too, and try out the exec method.
I did download the images and they looked fine for me. Perhaps it is your Fireworks viewer.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to image causes 1st page margin
I looked again in another viewer and it does show the margin. What I think is happening from looking at the verbose information is that your pdf has a section of transparency in it. So when you convert it you want to flatten it against a white background.
convert image.pdf -background white -flatten image.png
See if that helps.
convert image.pdf -background white -flatten image.png
See if that helps.
Re: PDF to image causes 1st page margin
I attached the result. It looks like it replaces the margin with a white background. This is still not possible to work with, since I need the document to be the same conversion at the second. If all the pages had the same margin, it wouldn't matter. It's really odd that this is happening, because it didn't happen before. I'm going to try to determine what changed, but if I can't I may just use ghostscript.
- Attachments
-
- image.tiff (76.19 KiB) Viewed 16056 times
Re: PDF to image causes 1st page margin
Well I did determine that it is something unique about the PDF that's causing it. Attached are two PDFs. Dumb.pdf is one that produces a margin, and Dumb2.pdf does not. It does it with this conversion: gs -q -dNOPAUSE -sDEVICE=pngalpha -sOutputFile=O4.png dumb.pdf -c quit
Any ideas why?
Any ideas why?
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to image causes 1st page margin
I think it is because in the bad one, there is transparency (alpha channel) and perhaps in the other there is not.
I would think that both pages from your first example would end up the same size with white filled for transparency. IM verbose information shows the pdf pages and resulting pngs images being the same size. So I am not sure what is still the issue.
Note that at one time, I was told that IM could not handle multipage images with transparency. You had to choose an sDevice (pngalpha or pnmraw) in the delegates.xml file to properly use one or the other situation. pngalpha was used if you had only one page with transparency and pnmraw was used if you wanted to process multipage but not transparency pdfs. This may have changed, but it is the device that is sent to ghostscript to do the processing. So it was likely a Ghostscript issue. But perhaps with more recent versions of Ghostscript that may no longer be the case.
I would think that both pages from your first example would end up the same size with white filled for transparency. IM verbose information shows the pdf pages and resulting pngs images being the same size. So I am not sure what is still the issue.
Note that at one time, I was told that IM could not handle multipage images with transparency. You had to choose an sDevice (pngalpha or pnmraw) in the delegates.xml file to properly use one or the other situation. pngalpha was used if you had only one page with transparency and pnmraw was used if you wanted to process multipage but not transparency pdfs. This may have changed, but it is the device that is sent to ghostscript to do the processing. So it was likely a Ghostscript issue. But perhaps with more recent versions of Ghostscript that may no longer be the case.
Re: PDF to image causes 1st page margin
My goal of this was to overlay an inverted watermark and drown it out enough so I could OCR a bunch of document for research. The watermark is there for printing the documents, which I don't care about.
Most people run from "remove watermark", but I'm not using for the purpose of the watermark. I may have to try to find better OCR software or something, but if there is a way to drown the watermark enough to OCR it that would probably be the better solution.
Most people run from "remove watermark", but I'm not using for the purpose of the watermark. I may have to try to find better OCR software or something, but if there is a way to drown the watermark enough to OCR it that would probably be the better solution.
- fmw42
- Posts: 25562
- Joined: 2007-07-02T17:14:51-07:00
- Authentication code: 1152
- Location: Sunnyvale, California, USA
Re: PDF to image causes 1st page margin
Do you want to enhance or remove the watermark?
If you want to remove it, try using -morphology operators if the watermark dots are small enough. The image you show is so low a resolution I would doubt that you could even OCR the text if the watermark was removed. Do you have an higher resolution image or the source file? If so post a link to that not inside <code>...</code> tags so it is easy to download.
I tried working with your first example, but got nowhere. Do you have a blank page with the watermark on it? If so you can use that to remove (via -compose subtract or -compose divide ) the watermark in the text image.
If you want to remove it, try using -morphology operators if the watermark dots are small enough. The image you show is so low a resolution I would doubt that you could even OCR the text if the watermark was removed. Do you have an higher resolution image or the source file? If so post a link to that not inside <code>...</code> tags so it is easy to download.
I tried working with your first example, but got nowhere. Do you have a blank page with the watermark on it? If so you can use that to remove (via -compose subtract or -compose divide ) the watermark in the text image.
Re: PDF to image causes 1st page margin
I'm trying to remove it enough for OCR to read the text correctly. I uploaded a compressed folder of all the documents at http://www.filedropper.com/files_2
Sorry for the download site, but it's 9MB and I couldn't upload it here. *Be sure to click the gray "Download this file" and not the big green "Start Download" ad button.
The Result2-0.jpg is a completed merge and how it's supposed to look. It OCRs perfectly. The other two were off a bit so it just makes it worse and the OCR is horrible where the watermark passes through text. This is the code is used to do it:
I used GD because I couldn't get it to work with IM.
I know it's possible to remove a watermark, but since the original is a secured PDF I can't find a way to do it.
Sorry for the download site, but it's 9MB and I couldn't upload it here. *Be sure to click the gray "Download this file" and not the big green "Start Download" ad button.
The Result2-0.jpg is a completed merge and how it's supposed to look. It OCRs perfectly. The other two were off a bit so it just makes it worse and the OCR is horrible where the watermark passes through text. This is the code is used to do it:
Code: Select all
$Page = new Imagick();
$Page -> setResolution(500,500);
$Page -> readImage("temp2.pdf");
$Count = $Page -> getNumberImages();
$Page -> setImageFormat('png');
$Page -> writeImages("Result.png", false);
for($C = 0; $C < $Count; $C++) {
// $Page -> contrastImage(5);
$img = imagecreatefrompng('Result-'.$C.'.png');
$img2 = imagecreatefrompng('PNGMasterOverlay.png');
imagecopymerge($img, $img2, 0, 0, 0, 0, imagesx($img), imagesy($img), 50);
imagejpeg($img, 'Result2-'.$C.'.jpg');
}
I know it's possible to remove a watermark, but since the original is a secured PDF I can't find a way to do it.