Wrong image size when reading ImageInfo from PDF
Wrong image size when reading ImageInfo from PDF
Hello,
I want to get the image size from an image which is embedded in a PDF file.
The size of the image itself is 1024 x 768.
If a open the PDF in Acrobat Reader, mark the image and copy it to e.g. paint, then the image size is correct (1024x768)
However, if I use the API function PingImage, then columns and rows (which i was expected to be the correct size) is 1475x1106 which is definitivly wrong.
Any hints?
Does anyone know how to get the correct image size?
I want to get the image size from an image which is embedded in a PDF file.
The size of the image itself is 1024 x 768.
If a open the PDF in Acrobat Reader, mark the image and copy it to e.g. paint, then the image size is correct (1024x768)
However, if I use the API function PingImage, then columns and rows (which i was expected to be the correct size) is 1475x1106 which is definitivly wrong.
Any hints?
Does anyone know how to get the correct image size?
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Wrong image size when reading ImageInfo from PDF
Does the image occupy the entire page? IM will rasterize the page, and return those dimensions. If the page contains a smaller raster image, IM won't give that to you. Pdfimages is a better tool for that.
snibgo's IM pages: im.snibgo.com
Re: Wrong image size when reading ImageInfo from PDF
But I need to get the image out from the PDF with IM, is there an other way?
The requirement is to use only the IM C API for that
The requirement is to use only the IM C API for that
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Wrong image size when reading ImageInfo from PDF
IM will rasterize the entire page, at whatever resolution you give it in "-density".
snibgo's IM pages: im.snibgo.com
Re: Wrong image size when reading ImageInfo from PDF
The information of the image stands in the dictionary of the PDF.
Is there a way to get the dictionary and the stream from the pdf itself as raw data?
Code: Select all
<</BitsPerComponent 8/ColorSpace/DeviceRGB/DecodeParms<</Blend 1/ColorTransform 1/Colors 3/Columns 1024/HSamples[1 1 1 1]/QFactor 0.0/Rows 768/VSamples[1 1 1 1]>>/Filter/DCTDecode/Height 768/ImageName/ab123.jpg/Intent/RelativeColorimetric/Length 599237/Name/ab123.jpg/Subtype/Image/Type/XObject/Width 1024>>stream
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Wrong image size when reading ImageInfo from PDF
IM doesn't read the PDF file. It simply passes the PDF to Ghostscript, which does the rasterizing.
Perhaps Ghostscript has a command to do what you want.
I think pdfimages can do what you want. (You can test this.) Perhaps you can add an entry to delegates.xml to call pdfimages and pass the results back to IM.
Perhaps Ghostscript has a command to do what you want.
I think pdfimages can do what you want. (You can test this.) Perhaps you can add an entry to delegates.xml to call pdfimages and pass the results back to IM.
snibgo's IM pages: im.snibgo.com
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Wrong image size when reading ImageInfo from PDF
Yes, that works, for IM v6.9.5-3 on Windows 8.1. Assuming convert and pdfimages are both in the system path, I add this to delegates.xml:snibgo wrote:Perhaps you can add an entry to delegates.xml to call pdfimages and pass the results back to IM.
Code: Select all
<delegate decode="expdf" command="cmd.exe /c (pdfimages -all "%i" "%u" ) & (convert "%u-*.*" "miff:%o" )" />
Now, when an input file to convert is prefixed with "expdf:" I get all the raster images from the PDF, with no resizing. For example:
Code: Select all
f:\web\im>convert expdf:pump.pdf info:
expdf:pump.pdf[0] MIFF 1620x1306 1620x1306+0+0 8-bit TrueColor sRGB 11.41MB 0.031u 0:00.023
expdf:pump.pdf[1] MIFF 2000x633 2000x633+0+0 8-bit TrueColor sRGB 11.41MB 0.141u 0:00.141
expdf:pump.pdf[2] MIFF 2000x633 2000x633+0+0 8-bit Palette Gray 2c 11.41MB 0.156u 0:00.147
EDIT: This leaves the images from pdfimages as files on disk (%u). They should probably be deleted.
EDIT2: Some versions of IM sanitise delegate commands, removing wildcards "*" and "?". For those, we can put the commands in a shell script, which we call with %i, %o and %u.
snibgo's IM pages: im.snibgo.com
Re: Wrong image size when reading ImageInfo from PDF
Thanks for the hint
Are there C API calls too?
Are there C API calls too?
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Wrong image size when reading ImageInfo from PDF
If you edit delegates.xml as I have shown, C API calls such as PingImage will indirectly do the extraction.
snibgo's IM pages: im.snibgo.com
Re: Wrong image size when reading ImageInfo from PDF
After I changed the xml file i got the following on the command line:
In my program i call
but i doesn't see any results
Code: Select all
>convert expdf:pdf_test.pdf info:
pdfimages version 3.04
Copyright 1996-2014 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
-f <int> : first page to convert
-l <int> : last page to convert
-j : write JPEG images as JPEG files
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-q : don't print any messages or errors
-cfg <string> : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
convert: unable to open image 'C:\Users\user1\AppData\Local\Temp\magick-113316WkMC-8vc-UNR-*.*': Invalid argument @ error/blob.c/OpenBlob/3094.
convert: no images defined `miff:C:\Users\user1\AppData\Local\Temp\magick-11331617V09dbGPIik' @ error/convert.c/ConvertImageCommand/3254.
convert: delegate failed `cmd.exe /c (pdfimages -all "%i" "%u" ) & (convert "%u-*.*" "miff:%o" )' @ error/delegate.c/InvokeDelegate/1845.
convert: unable to open module file 'C:\Program Files\ImageMagick-7.0.5-Q16\modules\coders\IM_MOD_RL_EXPDF_.dll': No such file or directory @ warning/module.c/GetMagickModulePath/680.
convert: unable to open file 'C:/Users/user1/AppData/Local/Temp/magick-113316VJ90QPEiEmqR': No such file or directory @ error/constitute.c/ReadImage/549.
convert: no images defined `info:' @ error/convert.c/ConvertImageCommand/3254.
Code: Select all
Image *image = PingImage(imageInfo, exception);
-
- Posts: 12159
- Joined: 2010-01-23T23:01:33-07:00
- Authentication code: 1151
- Location: England, UK
Re: Wrong image size when reading ImageInfo from PDF
Your version of pdfimages is old. It doesn't have the "-all" option. The version I have (from Cygwin) says:
Code: Select all
pdfimages version 0.30.0
Copyright 2005-2015 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011 Glyph & Cog, LLC
Usage: pdfimages [options] <PDF-file> <image-root>
-f <int> : first page to convert
-l <int> : last page to convert
-png : change the default output format to PNG
-tiff : change the default output format to TIFF
-j : write JPEG images as JPEG files
-jp2 : write JPEG2000 images as JP2 files
-jbig2 : write JBIG2 images as JBIG2 files
-ccitt : write CCITT images as CCITT files
-all : equivalent to -png -tiff -j -jp2 -jbig2 -ccitt
-list : print list of images instead of saving
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-p : include page numbers in output file names
-q : don't print any messages or errors
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
snibgo's IM pages: im.snibgo.com